Tagged "llama-cpp-optimization"

Developer Switches from LM Studio to llama.cpp, Reports No Performance Downgrade 26 May 2026
llama.cpp Checkpoint Fix Accelerates Local Coding Agents 22 May 2026
Dynamic Expert Cache in llama.cpp Achieves 27% Faster Inference on Large MoE Models 15 April 2026
Users Report Significant Performance Improvements After Migrating from Ollama to llama.cpp 12 April 2026
Gemini-CLI, Llama.cpp, and Qwen3.5 Running on NVIDIA Jetson TK1 9 April 2026
TurboQuant-Optimized llama.cpp Fork Delivers GFX906 GPU Acceleration 7 April 2026
SmolLM2-360M Running on Samsung Galaxy Watch 4 with 74% Memory Reduction 2 April 2026
ik_llama.cpp Fork Delivers 26x Faster Prompt Processing on Qwen 3.5 27B 22 March 2026
Practical Fix for Qwen 3.5 Overthinking in llama.cpp 16 March 2026
Llama.cpp Prompt Processing Optimization: Ubatch Size Configuration Guide 8 March 2026
Switching From Ollama And LM Studio To llama.cpp: A Performance Comparison 14 February 2026
Scaling llama.cpp On Neoverse N2: Solving Cross-NUMA Performance Issues 14 February 2026
Scaling llama.cpp On Neoverse N2: Solving Cross-NUMA Performance Issues 13 February 2026
Scaling llama.cpp On Neoverse N2: Solving Cross-NUMA Performance Issues 12 February 2026
Developer Switches from Ollama and LM Studio to llama.cpp for Better Performance 11 February 2026