Utils
In the case you have the original .pth models you can use the convert_and_quantize.sh
script which is going to:
- Clone llama.cpp and compile it
- Convert the model to FP16 and quantize the model to 4-bits
In the case you have the original .pth models you can use the convert_and_quantize.sh
script which is going to: