LocalFTW
Why Local
All Posts
Guides
Contribute
Clinic
Topic Graph
Bookmarks
Tagged "memory-bandwidth-optimization"
Dynamic Expert Cache in llama.cpp Achieves 27% Faster Inference on Large MoE Models
15 April 2026
Gemma 4 on Arm: Optimized On-Device AI for Mobile and Edge Deployment
3 April 2026
DeepSeek Paper – DualPath: Breaking the Bandwidth Bottleneck in LLM Inference
26 February 2026