Tagged "memory-efficiency"
- Gemma 4 GGUF Models Updated with Critical Quantization Fixes
- TurboQuant in Llama.cpp Achieves 6X Smaller KV Cache
- OpenUMA – Apple-Style Unified Memory for x86 AI Inference
- PrismML Announces 1-Bit Bonsai: First Commercially Viable 1-Bit LLMs
- Google's TurboQuant Shows Memory Constraints Remain Critical for Local LLM Inference
- Mixed KV Cache Quantization: Performance Risks and Pitfalls
- OPPO and MediaTek Highlight On-Device AI Innovations at MWC 2026
- Qwen3 Coder Next Remains Effective at Aggressive Quantization Levels
- Qwen3 Coder Next 8FP Demonstrates Exceptional Long-Context Performance on 128GB System
- GPT-OSS 120B Uncensored Model Released in Native MXFP4 Precision