Tagged "model-quantization"
- Tether AI Upgrades QVAC SDK With TurboQuant for Data Center-Sized Memory on Everyday Devices
- Qualcomm Reveals Snapdragon C with Advanced On-Device AI Engine
- Fine-tuning an LLM to Write Docs Like It's 1995
- Snapdragon C Specs Revealed: 6nm Process, On-Device AI Engine for Budget Laptops
- MediaTek Dimensity 7500 Brings On-Device AI and Enhanced Power Efficiency to Mid-Range Phones
- Privacy-Focused Raspberry Pi Zero 2W DIY Security Camera with On-Device AI and End-to-End Encryption
- Alibaba Cloud Joins PyTorch Foundation as Platinum Member
- Local LLM Setup: How to Use RAG and an Embedding Model to Stop Wasting Context
- Samsung's Exynos 2800 Brings HBM Memory to Mobile AI, Enabling Faster Local Model Inference
- Dell Launches 14 Plus Laptop with Intel Core Ultra 9 and 32GB RAM at $1,499.99, Enabling Local Model Inference
- Anker Soundcore Liberty 5 Pro Earbuds Feature Dedicated On-Device AI Chip with Touch Screen
- Users Report Superior Performance Switching from LM Studio to llama.cpp
- How to Self-Host LibreChat with Docker
- Self-Hosting LLMs Reveals Local AI Has a Friction Problem, Not a Quality Problem
- 110 Tokens/Second on RTX 4070 Super with Qwen 3.6 35B
- Show HN: Interactive and Stylized AI Chat Chrome Extension
- The Brain vs. Deep Learning Part I: Computational Complexity Analysis
- A/B Tested Gemini 3.1 Pro vs. Claude Opus 4.6 – Usage Quota and Quality Comparison
- Benchmarking a Portable AI Workstation: Lenovo ThinkPad P16 Gen 3, Part 2
- Meta Plans Agentic AI on Smartphones and Wearables by 2026
- Running Large Language Models on Single-Board Computer Clusters: Creative Edge Deployment
- Local LLMs Enable Intelligent Smart Camera Control Without Cloud Dependency
- Towards Local Plug-and-Play AI
- Local LLM Takes Control of Video Doorbell—The Future of Smart Cameras
- A Cheap Fix That Saves the AI $400M Dollars a Year and Brings 4B People Online
- Offline Voice-to-Text and AI Keyboard App for Local Processing
- Running Local AI LLMs on Mini PCs Without NVIDIA GPUs
- Chrome Automatically Downloads 4GB AI Model for Local Processing
- I Stopped Paying for ChatGPT and Switched to a Local LLM That Runs on My Laptop
- Running a Local LLM on a 12-Year-Old Raspberry Pi
- Mainline Linux 6.12 on Annapurna Labs Alpine V2 (Ubiquiti UNVR, UDM-Pro)
- How I Used a Local LLM to Organize the Store on My NAS
- BT Explainer: Google's Gemma 4 Could Put Powerful AI on Your Phone and Laptop
- Running a Local LLM on a 12-Year-Old Raspberry Pi: Practical Edge Inference
- I Think I Figured Out What an AI IDE Looks Like
- One LM Studio Setting Makes Local LLMs Competitive With Cloud Models
- How to Run LLMs Locally on Your Laptop for Free: A Beginner's Guide
- Chrome's On-Device AI Features Consuming 4GB of Storage for Gemini Nano
- Lemonade Gives AMD Startups a Wider Path to Local Inference
- Local LLM Rewrites Resume Better Than ChatGPT, and It's Not Even Close
- Nota AI Partners with Mobilint to Accelerate On-Device AI on Domestic NPU Infrastructure
- Improving Code Quality with Local Claude and Codex Models
- 5 Things I Wish Someone Had Told Me Before I Tried Self-Hosting a Local LLM
- NIST's CAISI Evaluation of DeepSeek V4 Pro Finds It On Par with GPT-5
- Building a Raspberry Pi-Based Local LLM Server for Remote Access
- New Open-Source Tool Automatically Matches Local LLMs to Your PC Hardware
- Running Capable Local LLMs Without Expensive GPU Hardware
- How Much "Brain Damage" Can an LLM Tolerate?
- Google's Gemma 4 Brings Powerful AI Capabilities to Phones and Laptops
- Estimating Black-Box LLM Parameter Counts via Factual Capacity
- Building a Remote-Accessible Local LLM Server on Raspberry Pi
- Stop Guessing: Open-Source Tool Predicts Which Local LLMs Run on Your PC
- Local AI Isn't Just Ollama—Here's the Ecosystem That Actually Makes It Useful
- Google's Gemma 4 Could Put Powerful AI on Your Phone and Laptop
- Show HN: Phonetic Formatter – Offline English Text to IPA on iPhone and iPad
- Run a Local LLM Server on Raspberry Pi with Remote Access Capabilities
- Google's Gemma 4 Brings Powerful On-Device AI to Phones and Laptops
- Netherlands Reaches Deal to Cut Reliance on U.S. Cloud Tech
- I Replaced My Local LLM With a Model Half Its Size and Got Better Results
- Llama 4 Scout on MLX: The Complete Apple Silicon Guide (2026)
- Externalization in LLM Agents: Unified Review of Memory and Harness Engineering
- 10GB VRAM Local LLM: The Complete Setup Guide (2026)
- The Open-Source AI Ecosystem Keeps Treating llama.cpp Like a Second-Class Citizen
- Minisforum Launches N5 Max AI NAS with OpenClaw
- Laimark – 8B LLM That Self-Improves on Consumer GPUs
- 115 TOPS in 0.67L: CHUWI AuBox X Packs On-Device AI Power Into a Palm-Sized Mini PC
- Building a Voice AI Wearable in a Casio F91W with Whisper and BLE
- Bonsai 1.7B in the Browser: A 290MB 1-bit LLM on WebGPU
- MiniMax M2.7 GGUF Investigation Reveals NaN Issues Affecting 21-38% of Hugging Face Conversions
- Running Gemma 4 on an iPhone 13 Pro
- Sovereign AI: Why the Next GPT Will Be Born in Our Living Rooms
- MiniMax M2.7 Achieves SOTA Performance Under 64GB on Mac with TQ Quantization
- Speculative Decoding Achieves 29% Speed Boost for Gemma-4 31B
- Qwen3 Audio and Vision Support Now Available in llama.cpp
- MiniMax-M2.7 Delivers Exceptional Performance on Consumer Hardware
- Unsloth Completes Comprehensive MiniMax M2.7 GGUF Quantization Suite
- Universal Knowledge Store and Grounding Layer for AI Reasoning Engines
- MiniMax M2.7 Released: New Model Available for Local Deployment
- The Best Local AI Model for Home Assistant Isn't Always the Biggest One
- Intel Arc Pro B70 32GB Achieves 12 Tokens/Sec on Qwen 3.5-27B
- Gemma 4 31B vs Qwen 3.5 27B: Comprehensive Long Context Benchmark
- LLM Wiki v2: Extended Knowledge Base for LLM Practitioners
- Running a 1.7B Parameters LLM on an Apple Watch
- Gemma 4 Support Stabilized in Llama.cpp
- Gemma 4 GGUF Models Updated with Critical Quantization Fixes
- EXAONE 4.5 33B Model Released with Multiple Quantization Formats
- Comprehensive Benchmark: 37 LLMs Tested on MacBook Air M5 With Open-Source Tool
- TurboQuant-Optimized llama.cpp Fork Delivers GFX906 GPU Acceleration
- Quantization Strategy Comparison: Balancing Quality and Speed on Consumer Laptops
- Context Window Optimization: Extending Gemma 4 Context Length Through Efficient Projection Quantization
- Google AI Edge Gallery Tops App Store Charts with On-Device Gemma 4
- Gemma 4 31B Achieves Exceptional Performance on Local Hardware
- Qwen 3.6 Free Model Available via OpenRouter
- Qualcomm Snapdragon Innovations Enable Advanced On-Device AI for Wearables
- DGX Spark Hardware Limitations: Missing NVFP4 Support Undermines Local AI Value Proposition
- GMKtec NucBox K17 Launches with 97 TOPS AI Performance for Local Inference
- Gemma 4 26B MoE Emerges as Optimal All-Around Local Model for Consumer Hardware
- Nex Life Logger: Local Activity Tracker with AI Agent Integration
- Google Gemma 4 Released with GGUF Quantizations
- Gemma 4 26B A4B Outperforms Qwen 3.5 35B on Apple Silicon
- Gemma 4 2B Successfully Runs on Raspberry Pi 5
- Gemma 4 on Arm: Optimized On-Device AI for Mobile and Edge Deployment
- Qwen 3.6-Plus Released
- Bonsai 1-Bit Models Deliver Exceptional Local Inference Performance
- Satcove – Query 5 AI Models Simultaneously and Get Structured Verdicts
- Llama.cpp Merging TurboQuant Lite (attn-rot) with Major Performance Gains
- PrismML Announces 1-Bit Bonsai: First Commercially Viable 1-Bit LLMs
- Ollama Launches Pi: The Minimal Coding Agent That Powers OpenClaw Is Now Yours to Customize
- Select the Right Hardware for Your Local LLM Deployment with This Online Guide
- TurboQuant: Understanding the Quantization Breakthrough
- Google's TurboQuant Shows Memory Constraints Remain Critical for Local LLM Inference
- ESP32-S31: 320MHz 2-Core Microcontroller with 512KB SRAM and Networking
- TurboQuant KV Cache Compression Achieves 22.8% Faster Decoding at 32K Context
- Qwen3 512k Context via TurboQuant on Mac mini
- TurboQuant Benchmarked in Llama.cpp: Google's Extreme Compression Research Tested in Practice
- RotorQuant: 10-19x Faster Quantisation Alternative Using Clifford Algebra
- Coding Implementation to Run Qwen3.5 Reasoning Models Distilled With Claude-Style Thinking Using GGUF and 4-Bit Quantization
- Quantization Reveals Outliers Impacting LLM Accuracy
- Apple Gets Full Gemini Access and Uses Distillation to Build Lightweight On-Device AI
- Intel Launches Arc Pro B70/B65 with 32GB VRAM for Local AI Inference
- Google's TurboQuant: The Unsexy AI Breakthrough Worth Watching
- Apple Plans Slimmed-Down Gemini Models for Local iPhone AI Features
- Google TurboQuant: Extreme Compression for Local LLM Deployment
- Running an Open-Weight LLM Locally on an Apple Watch
- OmniCoder v2 Released: Improved Code Generation for Local Deployment
- Researcher Successfully Runs Local LLMs on Legacy "Dead" GPU With Surprising Results