Smarter Local LLMs, Lower VRAM Costs – All Without Sacrificing Quality, Thanks to Google’s New QAT Optimization
What makes QAT particularly impressive is its ability to maintain model quality despite the dramatic reduction in precision. According to Google, they’ve reduced the perplexity drop by 54% (using llama.cpp perplexity evaluation) when quantizing down to Q4_0.