Tagged "cpu-only"
- Intel OpenVINO 2026.1 Integrates llama.cpp with Wildcat Lake and Arc Pro B70
- The Open-Source AI Ecosystem Keeps Treating llama.cpp Like a Second-Class Citizen
- Sorting 1M u64 KV-Pairs in 20ms on i9-13980HX Using Branchless Rust Implementation
- Dynamic Expert Cache in llama.cpp Achieves 27% Faster Inference on Large MoE Models
- Qwen 3.5 Small – On-Device Multimodal Models Released
- A Deep Dive into Tinygrad AI Compiler
- The Best Local AI Model for Home Assistant Isn't Always the Biggest One
- Building Offline AI Companions on Severely Constrained Hardware (8GB RAM)
- 5 Open-Source Projects Running Transformers on CPUs to GPUs in Pure Java
- Speculative Decoding Made My Local LLM Actually Usable
- Run Qwen3.5 on an Old Laptop: A Lightweight Local Agentic AI Setup Guide
- Intel Releases OpenVINO 2026.1 With Backend For Llama.cpp, New Hardware Support
- Your Next Assistant is Your PC: How On-Device AI is Transforming Work, One Workflow at a Time
- Octopoda: Open Source Memory Layer for Fully Offline AI Agents
- AMD Announces Day 0 Support for Google Gemma 4 Across Processors and GPUs
- TurboQuant in Llama.cpp Achieves 6X Smaller KV Cache
- Kokoro TTS Achieves 20× Realtime Speed on CPU-Only On-Device Inference
- Gemma 4 KV Cache Memory Issues Fixed in llama.cpp
- AMD Rolls Out Gemma 4 Model Support Across Full Range of GPUs & CPUs
- OpenUMA – Apple-Style Unified Memory for x86 AI Inference
- Show HN: Extra-Platforms, Python Library to Detect OS, Arch, Shell, CI, AI
- Local AI Ecosystem Extends Far Beyond Ollama
- Claw64 – Full Agentic Loop in <4KB on Commodore 64
- PrismML Announces 1-Bit Bonsai: First Commercially Viable 1-Bit LLMs
- Select the Right Hardware for Your Local LLM Deployment with This Online Guide
- DeepSeek V3 Complete Guide: Deploy and Optimize Local AI in 2026
- TurboQuant KV Cache Compression Achieves 22.8% Faster Decoding at 32K Context
- Samsung Galaxy Book6 Series Brings Intel Core Ultra Chips for On-Device LLM Inference
- TurboQuant Benchmarked in Llama.cpp: Google's Extreme Compression Research Tested in Practice
- Coding Implementation to Run Qwen3.5 Reasoning Models Distilled With Claude-Style Thinking Using GGUF and 4-Bit Quantization
- HP Launches IQ On-Device AI Assistant, Advancing Enterprise AI Adoption on PCs
- .APKs Are Just .ZIPs: Semi-Legally Hacking Software for Orphaned Hardware