Tagged "throughput-optimization"

Prefill Is Compute-Bound, Decode Is Memory-Bound: Optimizing GPU Utilization for LLM Inference 16 April 2026
Qwen 3.5 27B Achieves 1.1M Tokens/Second on B200 GPUs with Optimized vLLM Config 27 March 2026
P-EAGLE: Faster LLM Inference with Parallel Speculative Decoding in vLLM 14 March 2026
DeepSeek Releases DualPath: Addressing Storage Bandwidth Bottlenecks in Agentic Inference 26 February 2026