Tagged "memory-bandwidth-optimization"

Dynamic Expert Cache in llama.cpp Achieves 27% Faster Inference on Large MoE Models 15 April 2026
Gemma 4 on Arm: Optimized On-Device AI for Mobile and Edge Deployment 3 April 2026
DeepSeek Paper – DualPath: Breaking the Bandwidth Bottleneck in LLM Inference 26 February 2026