Skip to content

Utils

In the case you have the original .pth models you can use the convert_and_quantize.sh script which is going to:

  • Clone llama.cpp and compile it
  • Convert the model to FP16 and quantize the model to 4-bits