Tagged "cpu-only"

Intel OpenVINO 2026.1 Integrates llama.cpp with Wildcat Lake and Arc Pro B70 23 April 2026
The Open-Source AI Ecosystem Keeps Treating llama.cpp Like a Second-Class Citizen 21 April 2026
Sorting 1M u64 KV-Pairs in 20ms on i9-13980HX Using Branchless Rust Implementation 18 April 2026
Dynamic Expert Cache in llama.cpp Achieves 27% Faster Inference on Large MoE Models 15 April 2026
Qwen 3.5 Small – On-Device Multimodal Models Released 14 April 2026
A Deep Dive into Tinygrad AI Compiler 12 April 2026
The Best Local AI Model for Home Assistant Isn't Always the Biggest One 12 April 2026
Building Offline AI Companions on Severely Constrained Hardware (8GB RAM) 10 April 2026
5 Open-Source Projects Running Transformers on CPUs to GPUs in Pure Java 10 April 2026
Speculative Decoding Made My Local LLM Actually Usable 9 April 2026
Run Qwen3.5 on an Old Laptop: A Lightweight Local Agentic AI Setup Guide 9 April 2026
Intel Releases OpenVINO 2026.1 With Backend For Llama.cpp, New Hardware Support 9 April 2026
Your Next Assistant is Your PC: How On-Device AI is Transforming Work, One Workflow at a Time 7 April 2026
Octopoda: Open Source Memory Layer for Fully Offline AI Agents 7 April 2026
AMD Announces Day 0 Support for Google Gemma 4 Across Processors and GPUs 7 April 2026
TurboQuant in Llama.cpp Achieves 6X Smaller KV Cache 6 April 2026
Kokoro TTS Achieves 20× Realtime Speed on CPU-Only On-Device Inference 4 April 2026
Gemma 4 KV Cache Memory Issues Fixed in llama.cpp 4 April 2026
AMD Rolls Out Gemma 4 Model Support Across Full Range of GPUs & CPUs 4 April 2026
OpenUMA – Apple-Style Unified Memory for x86 AI Inference 3 April 2026
Show HN: Extra-Platforms, Python Library to Detect OS, Arch, Shell, CI, AI 2 April 2026
Local AI Ecosystem Extends Far Beyond Ollama 1 April 2026
Claw64 – Full Agentic Loop in <4KB on Commodore 64 1 April 2026
PrismML Announces 1-Bit Bonsai: First Commercially Viable 1-Bit LLMs 1 April 2026
Select the Right Hardware for Your Local LLM Deployment with This Online Guide 30 March 2026
DeepSeek V3 Complete Guide: Deploy and Optimize Local AI in 2026 30 March 2026
TurboQuant KV Cache Compression Achieves 22.8% Faster Decoding at 32K Context 28 March 2026
Samsung Galaxy Book6 Series Brings Intel Core Ultra Chips for On-Device LLM Inference 28 March 2026
TurboQuant Benchmarked in Llama.cpp: Google's Extreme Compression Research Tested in Practice 27 March 2026
Coding Implementation to Run Qwen3.5 Reasoning Models Distilled With Claude-Style Thinking Using GGUF and 4-Bit Quantization 27 March 2026
HP Launches IQ On-Device AI Assistant, Advancing Enterprise AI Adoption on PCs 25 March 2026
.APKs Are Just .ZIPs: Semi-Legally Hacking Software for Orphaned Hardware 25 March 2026