Tagged "model-quantization"

Show HN: Phonetic Formatter – Offline English Text to IPA on iPhone and iPad 26 April 2026
Run a Local LLM Server on Raspberry Pi with Remote Access Capabilities 25 April 2026
Google's Gemma 4 Brings Powerful On-Device AI to Phones and Laptops 25 April 2026
Netherlands Reaches Deal to Cut Reliance on U.S. Cloud Tech 24 April 2026
I Replaced My Local LLM With a Model Half Its Size and Got Better Results 24 April 2026
Llama 4 Scout on MLX: The Complete Apple Silicon Guide (2026) 23 April 2026
Externalization in LLM Agents: Unified Review of Memory and Harness Engineering 23 April 2026
10GB VRAM Local LLM: The Complete Setup Guide (2026) 23 April 2026
The Open-Source AI Ecosystem Keeps Treating llama.cpp Like a Second-Class Citizen 21 April 2026
Minisforum Launches N5 Max AI NAS with OpenClaw 19 April 2026
Laimark – 8B LLM That Self-Improves on Consumer GPUs 18 April 2026
115 TOPS in 0.67L: CHUWI AuBox X Packs On-Device AI Power Into a Palm-Sized Mini PC 18 April 2026
Building a Voice AI Wearable in a Casio F91W with Whisper and BLE 16 April 2026
Bonsai 1.7B in the Browser: A 290MB 1-bit LLM on WebGPU 16 April 2026
MiniMax M2.7 GGUF Investigation Reveals NaN Issues Affecting 21-38% of Hugging Face Conversions 15 April 2026
Running Gemma 4 on an iPhone 13 Pro 15 April 2026
Sovereign AI: Why the Next GPT Will Be Born in Our Living Rooms 14 April 2026
MiniMax M2.7 Achieves SOTA Performance Under 64GB on Mac with TQ Quantization 14 April 2026
Speculative Decoding Achieves 29% Speed Boost for Gemma-4 31B 13 April 2026
Qwen3 Audio and Vision Support Now Available in llama.cpp 13 April 2026
MiniMax-M2.7 Delivers Exceptional Performance on Consumer Hardware 13 April 2026
Unsloth Completes Comprehensive MiniMax M2.7 GGUF Quantization Suite 12 April 2026
Universal Knowledge Store and Grounding Layer for AI Reasoning Engines 12 April 2026
MiniMax M2.7 Released: New Model Available for Local Deployment 12 April 2026
The Best Local AI Model for Home Assistant Isn't Always the Biggest One 12 April 2026
Intel Arc Pro B70 32GB Achieves 12 Tokens/Sec on Qwen 3.5-27B 11 April 2026
Gemma 4 31B vs Qwen 3.5 27B: Comprehensive Long Context Benchmark 11 April 2026
LLM Wiki v2: Extended Knowledge Base for LLM Practitioners 10 April 2026
Running a 1.7B Parameters LLM on an Apple Watch 9 April 2026
Gemma 4 Support Stabilized in Llama.cpp 9 April 2026
Gemma 4 GGUF Models Updated with Critical Quantization Fixes 9 April 2026
EXAONE 4.5 33B Model Released with Multiple Quantization Formats 9 April 2026
Comprehensive Benchmark: 37 LLMs Tested on MacBook Air M5 With Open-Source Tool 7 April 2026
TurboQuant-Optimized llama.cpp Fork Delivers GFX906 GPU Acceleration 7 April 2026
Quantization Strategy Comparison: Balancing Quality and Speed on Consumer Laptops 6 April 2026
Context Window Optimization: Extending Gemma 4 Context Length Through Efficient Projection Quantization 6 April 2026
Google AI Edge Gallery Tops App Store Charts with On-Device Gemma 4 6 April 2026
Gemma 4 31B Achieves Exceptional Performance on Local Hardware 6 April 2026
Qwen 3.6 Free Model Available via OpenRouter 5 April 2026
Qualcomm Snapdragon Innovations Enable Advanced On-Device AI for Wearables 5 April 2026
DGX Spark Hardware Limitations: Missing NVFP4 Support Undermines Local AI Value Proposition 5 April 2026
GMKtec NucBox K17 Launches with 97 TOPS AI Performance for Local Inference 5 April 2026
Gemma 4 26B MoE Emerges as Optimal All-Around Local Model for Consumer Hardware 5 April 2026
Nex Life Logger: Local Activity Tracker with AI Agent Integration 4 April 2026
Google Gemma 4 Released with GGUF Quantizations 3 April 2026
Gemma 4 26B A4B Outperforms Qwen 3.5 35B on Apple Silicon 3 April 2026
Gemma 4 2B Successfully Runs on Raspberry Pi 5 3 April 2026
Gemma 4 on Arm: Optimized On-Device AI for Mobile and Edge Deployment 3 April 2026
Qwen 3.6-Plus Released 2 April 2026
Bonsai 1-Bit Models Deliver Exceptional Local Inference Performance 2 April 2026
Satcove – Query 5 AI Models Simultaneously and Get Structured Verdicts 1 April 2026
Llama.cpp Merging TurboQuant Lite (attn-rot) with Major Performance Gains 1 April 2026
PrismML Announces 1-Bit Bonsai: First Commercially Viable 1-Bit LLMs 1 April 2026
Ollama Launches Pi: The Minimal Coding Agent That Powers OpenClaw Is Now Yours to Customize 31 March 2026
Select the Right Hardware for Your Local LLM Deployment with This Online Guide 30 March 2026
TurboQuant: Understanding the Quantization Breakthrough 29 March 2026
Google's TurboQuant Shows Memory Constraints Remain Critical for Local LLM Inference 29 March 2026
ESP32-S31: 320MHz 2-Core Microcontroller with 512KB SRAM and Networking 29 March 2026
TurboQuant KV Cache Compression Achieves 22.8% Faster Decoding at 32K Context 28 March 2026
Qwen3 512k Context via TurboQuant on Mac mini 28 March 2026
TurboQuant Benchmarked in Llama.cpp: Google's Extreme Compression Research Tested in Practice 27 March 2026
RotorQuant: 10-19x Faster Quantisation Alternative Using Clifford Algebra 27 March 2026
Coding Implementation to Run Qwen3.5 Reasoning Models Distilled With Claude-Style Thinking Using GGUF and 4-Bit Quantization 27 March 2026
Quantization Reveals Outliers Impacting LLM Accuracy 27 March 2026
Apple Gets Full Gemini Access and Uses Distillation to Build Lightweight On-Device AI 27 March 2026
Intel Launches Arc Pro B70/B65 with 32GB VRAM for Local AI Inference 26 March 2026
Google's TurboQuant: The Unsexy AI Breakthrough Worth Watching 26 March 2026
Apple Plans Slimmed-Down Gemini Models for Local iPhone AI Features 26 March 2026
Google TurboQuant: Extreme Compression for Local LLM Deployment 25 March 2026
Running an Open-Weight LLM Locally on an Apple Watch 25 March 2026
OmniCoder v2 Released: Improved Code Generation for Local Deployment 25 March 2026
Researcher Successfully Runs Local LLMs on Legacy "Dead" GPU With Surprising Results 25 March 2026