Tagged "memory-efficiency"

Tether AI Upgrades QVAC SDK With TurboQuant for Data Center-Sized Memory on Everyday Devices 2 June 2026
New 8B Local LLM Design Marks Biggest Shift Since DeepSeek R1 23 May 2026
SynapseKit: A New Production Framework for Deploying LLMs 16 May 2026
Running a Serious AI Model on a Consumer GPU Just Got Easier and That Matters More Than the Benchmark 3 May 2026
Xmemory: Benchmarking Structured AI Memory Against RAG and Hybrid RAG 1 May 2026
Gemma 4 GGUF Models Updated with Critical Quantization Fixes 9 April 2026
TurboQuant in Llama.cpp Achieves 6X Smaller KV Cache 6 April 2026
OpenUMA – Apple-Style Unified Memory for x86 AI Inference 3 April 2026
PrismML Announces 1-Bit Bonsai: First Commercially Viable 1-Bit LLMs 1 April 2026
Google's TurboQuant Shows Memory Constraints Remain Critical for Local LLM Inference 29 March 2026
Mixed KV Cache Quantization: Performance Risks and Pitfalls 29 March 2026
OPPO and MediaTek Highlight On-Device AI Innovations at MWC 2026 6 March 2026
Qwen3 Coder Next Remains Effective at Aggressive Quantization Levels 21 February 2026
Qwen3 Coder Next 8FP Demonstrates Exceptional Long-Context Performance on 128GB System 20 February 2026
GPT-OSS 120B Uncensored Model Released in Native MXFP4 Precision 14 February 2026