Tagged "llama-cpp"

I built Rubric, an open source Sentry for AI. Looking for beta testers 24 March 2026
LM Studio Releases Reworked Plugins with Fully Local Web Research 23 March 2026
Llama.cpp ROCm 7 vs Vulkan Performance Benchmarks on AMD Mi50 23 March 2026
Rust Project Perspectives on AI 22 March 2026
Setting Up a Private AI Brain on Windows: Complete Guide to Local LLM Deployment 22 March 2026
ik_llama.cpp Fork Delivers 26x Faster Prompt Processing on Qwen 3.5 27B 22 March 2026
Careless Whisper – Personal Local Speech to Text 22 March 2026
Automating Read-It-Later Workflows with Local LLMs for Overnight Summarization 22 March 2026
Qualcomm and Samsung's 30-Year AI Alliance Enters a New Phase as On-Device AI Chip Race Heats Up 21 March 2026
What AI Augmentation Means for Technical Leaders 21 March 2026
Community Converges on Optimal KV Cache Quantization Strategies for Qwen 3.5 Models 20 March 2026
LMCache Dramatically Accelerates LLM Inference on Oracle Data Science Platform 20 March 2026
Kilo Is the VS Code Extension That Actually Works With Every Local LLM I Throw At It 19 March 2026
Unsloth Studio: Open-Source Web UI for Training and Running LLMs Locally 18 March 2026
On-Device AI: Tether's QVAC Fabric Enables Local Training 18 March 2026
I Switched to a Local LLM for These 5 Tasks and the Cloud Version Hasn't Been Worth It Since 18 March 2026
LucidShark – Local-first, open-source quality and security gate 18 March 2026
You're Using Your Local LLM Wrong If You're Prompting It Like a Cloud LLM 18 March 2026
Hugging Face Releases One-Liner for Automatic Hardware Detection and Model Selection 18 March 2026
Run LLMs Locally with Llama.cpp 17 March 2026
I Ran Local LLMs on a 'Dead' GPU, and the Results Surprised Me 17 March 2026
Mistral Releases Small 4 Open-Source Model Under Apache 2.0 17 March 2026
How I Used Lima for an AI Coding Agent Sandbox 17 March 2026
Kimi Introduces Attention Residuals: 1.25x Compute Performance at <2% Overhead 17 March 2026
Practical Fix for Qwen 3.5 Overthinking in llama.cpp 16 March 2026
This External GPU Enclosure Tries to Break Cloud Dependence for Local AI Inference 16 March 2026
Apple's On-Device AI Raises Privacy Alarms Across British Parliament 16 March 2026
AMD Declares 'AI on the PC Has Crossed an Important Line' – Agent Computers as Next Breakthrough 16 March 2026
OpenClaw vs Eigent vs Claude Cowork: Comparing Open-Source AI Collaboration Platforms 15 March 2026
Running Qwen3.5-27B Across Multiple GPUs Over LAN Achieves Practical Speed for Local Inference 15 March 2026
AMD Launches Agent System Optimized for Local AI Inference With Ryzen and Radeon 15 March 2026
Intel OpenVINO Backend Support Now Available in llama.cpp 14 March 2026
Memory Should Decay: Implementing Temporal Memory Decay in Local LLM Systems 14 March 2026
How to Run Local LLMs in 2026: The Complete Developer's Guide 14 March 2026
AgentArmor: Open-Source 8-Layer Security Framework for AI Agents 14 March 2026
3-Path Agent Memory: 8 KB Recurrent State vs. 156 MB KV Cache at 10K Tokens 14 March 2026
Quantization Explained: Q4_K_M vs AWQ vs FP16 for Local LLMs 12 March 2026
Nvidia Releases Nemotron 3 Super: 120B MoE Model for Local Deployment 12 March 2026
Llama.cpp Adds True Reasoning Budget Support 12 March 2026
Cutile.jl Brings Nvidia CUDA Tile-Based Programming to Julia 12 March 2026
SK Hynix Completes Qualification for LPDDR6 Memory Optimized for AI Inference 11 March 2026
Sarvam Open-Sources 30B and 105B Reasoning Models 11 March 2026
NVIDIA Jetson Brings Open Models to Life at the Edge 11 March 2026
LMF – LLM Markup Format 11 March 2026
Llama.cpp Celebrates Major Milestone: From Leak to Industry Standard 11 March 2026
Qwen 3.5 Ultra-Compact Models Enable On-Device AI from Watches to Gaming 10 March 2026
Mnemos: Persistent Memory System for Local AI Agents 10 March 2026
FreeBSD 14.4 Released: Implications for Local LLM Deployment 10 March 2026
M5 Max and M5 Ultra Chipsets Demonstrate Significant Bandwidth Improvements for Local LLM Inference 10 March 2026
Community Survey: AI Content Automation Stacks in 2026 10 March 2026
Strix Halo (Ryzen AI Max+ 395) Achieves Strong Local Inference Performance with ROCm 7.2 9 March 2026
Sarvam Open-Sources 30B and 105B Reasoning Models 9 March 2026
Llama.cpp Prompt Processing Optimization: Ubatch Size Configuration Guide 8 March 2026
HP Refreshes Lineup with AI-Focused Workstations 8 March 2026
Llama.cpp Merges Automatic Parser Generator to Mainline 7 March 2026
Turning Your Linux Terminal into a Local AI Assistant 7 March 2026
llama.cpp Merges Agentic Loop and MCP Client Support 6 March 2026
Apple Unveils MacBook Pro with M5 Pro and M5 Max Featuring On-Device AI 5 March 2026
OpenWrt 25.12.0 – Stable Release 4 March 2026
Apple Unveils MacBook Pro With M5 Pro and M5 Max for On-Device AI 4 March 2026
AMD Launches Copilot+ Desktop Chips to Compete in On-Device AI Market 4 March 2026
ÆTHERYA Core – Deterministic Policy Engine for Governing LLM Actions 4 March 2026
Qwen 3.5 Small Models Released: 0.8B to 9B Parameters Optimized for On-Device Inference 3 March 2026
Qwen 3.5 0.8B Successfully Deployed on 7-Year-Old Samsung S10E Using llama.cpp 3 March 2026
Framework Choice Critical: llama.cpp and vLLM Outperform Ollama for Qwen 3.5 Testing 3 March 2026
Critical: Qwen 3.5 Requires BF16 KV Cache, Not FP16 for Accurate Inference 2 March 2026
GitDelivr: A Free CDN for Git Clones Built on Cloudflare Workers and R2 2 March 2026
C7: Pipe Up-to-Date Library Docs Into Any LLM From the Terminal 2 March 2026
Huawei's SuperPoD Portfolio Creates New Option for Global Computing at MWC Barcelona 2026 1 March 2026
Unsloth Dynamic 2.0 GGUFs 28 February 2026
5 Useful Docker Containers for Agentic Developers 28 February 2026
Seco Launches Edge AI System-on-Module at Embedded World 2026 27 February 2026
Arduino and Qualcomm Bring On-Device AI Learning to Indian Schools 27 February 2026
DeepSeek Releases DualPath: Addressing Storage Bandwidth Bottlenecks in Agentic Inference 26 February 2026
DeepSeek Paper – DualPath: Breaking the Bandwidth Bottleneck in LLM Inference 26 February 2026
Qwen3.5 Thinking Mode Can Be Disabled for Production Inference Optimization 25 February 2026
Qwen3.5-27B Identified as Sweet Spot for Mid-Range Local Deployment 25 February 2026
Mirai Announces $10M to Advance On-Device AI Performance for Consumer Devices 25 February 2026
Advanced Quantization Techniques Show Surprising Performance Gains Over Standard Methods 25 February 2026
How AI is Redefining Price and Performance in Modern Laptops 25 February 2026
Show HN: A Ground Up TLS 1.3 Client Written in C 24 February 2026
Enterprise Infrastructure Guide: Running Local LLMs for 70-150 Developers 24 February 2026
Apple Accelerates U.S. Manufacturing with Mac Mini Production 24 February 2026
nanollama: Open-Source Framework for Training Llama 3 from Scratch with One-Command GGUF Export 23 February 2026
Open-Source llama.cpp Finds Long-Term Home at Hugging Face 23 February 2026
Breaking the Speed Limit: Strategies for 17k Tokens/Sec Local Inference 23 February 2026
Ouro 2.6B Thinking Model GGUFs Released with Q8_0 and Q4_K_M Quantization 22 February 2026
Strix Halo Performance Benchmarks: Minimax M2.5, Step 3.5 Flash, Qwen3 Coder 21 February 2026
I Thought I Needed a GPU to Run AI Until I Learned About These Models 21 February 2026
Open-Source + AI: ggml Joins Hugging Face, llama.cpp Stays Open—Local AI's Long-Term Home 21 February 2026
GGML.AI Acquired by Hugging Face 21 February 2026
PaddleOCR-VL Now Integrated into llama.cpp for Multilingual OCR 20 February 2026
Kitten TTS V0.8 Released: New State-of-the-Art Super-Tiny TTS Model Under 25 MB 20 February 2026
Enhanced Quantization Visualization Methods for Understanding LLM Compression Trade-offs 19 February 2026
Kitten TTS V0.8 Released: State-of-the-Art Super-Tiny Text-to-Speech Model Under 25MB 19 February 2026
Self-Hosted AI: A Complete Roadmap for Beginners 17 February 2026
Meet Sarvam Edge: India's AI Model That Runs on Phones and Laptops With No Internet 17 February 2026
Qwen 3.5-397B-A17B Now Available for Local Inference with Aggressive Quantisation 17 February 2026
Open-Source Models Now Comprise 4 of Top 5 Most-Used Endpoints on OpenRouter 17 February 2026
Switching From Ollama And LM Studio To llama.cpp: A Performance Comparison 14 February 2026
SnowBall Technique Addresses Context Window Limitations in Local LLMs 14 February 2026
Scaling llama.cpp On Neoverse N2: Solving Cross-NUMA Performance Issues 14 February 2026
MiniMax Releases M2.5 Model with SOTA Coding and Agent Capabilities 14 February 2026
MiniMax-M2.5 230B MoE Model Released with GGUF Support for Local Deployment 14 February 2026
GNOME's AI Assistant Newelle Adds llama.cpp Support and Command Execution 14 February 2026
Context Management Identified as Real Bottleneck in AI-Assisted Coding 14 February 2026
Switching From Ollama and LM Studio to llama.cpp: Performance Benefits 13 February 2026
Optimal llama.cpp Settings Found for Qwen3 Coder Next Loop Issues 13 February 2026
GitHub Announces Support for Open Source AI Project Maintainers 13 February 2026
Scaling llama.cpp On Neoverse N2: Solving Cross-NUMA Performance Issues 13 February 2026
Scaling llama.cpp On Neoverse N2: Solving Cross-NUMA Performance Issues 12 February 2026
New Header-Only C++ Benchmark Tool for Predictive Models on Raw Binary Streams 12 February 2026
Developer Switches from Ollama and LM Studio to llama.cpp for Better Performance 11 February 2026