Tagged "vram-optimization"

Google's Gemma 4: The Most Practical Local LLM Despite Not Being The Smartest 16 April 2026
MiniMax-M2.7 Delivers Exceptional Performance on Consumer Hardware 13 April 2026
Gemma 4 31B vs Qwen 3.5 27B: Comprehensive Long Context Benchmark 11 April 2026
Quansloth Using Google's Turboquant Breaks the VRAM Wall for Local LLMs 7 April 2026
Context Window Optimization: Extending Gemma 4 Context Length Through Efficient Projection Quantization 6 April 2026
Gemma 4 KV Cache Memory Issues Fixed in llama.cpp 4 April 2026
VRAM Optimization Technique Cuts Gemma 4 Memory Usage by 3x 3 April 2026
Google Gemma 4 Released with GGUF Quantizations 3 April 2026
Bonsai 1-Bit Models Deliver Exceptional Local Inference Performance 2 April 2026
TurboQuant: Understanding the Quantization Breakthrough 29 March 2026