What is GGUF and llama.cpp?
LLM Engineer Interview Questions: Inference Optimization, KV Cache, Speculative Decoding, Quantization
Audio flashcard · 0:19Nortren·
What is GGUF and llama.cpp?
0:19
GGUF is a quantized model format used by llama.cpp, a high-performance C++ inference engine for LLMs. GGUF files contain quantized weights along with metadata, designed for fast loading and CPU or GPU inference. llama.cpp is the most popular way to run open-source LLMs locally on Mac, Windows, and Linux without GPU dependencies.
github.com