Question

What is GGUF and llama.cpp?

Accepted Answer

GGUF is a quantized model format used by llama.cpp, a high-performance C++ inference engine for LLMs. GGUF files contain quantized weights along with metadata, designed for fast loading and CPU or GPU inference. llama.cpp is the most popular way to run open-source LLMs locally on Mac, Windows, and Linux without GPU dependencies.