LocalFTW
Why Local
All Posts
Guides
Contribute
Clinic
Topic Graph
Bookmarks
Tagged "inference-pipeline-design"
Prefill Is Compute-Bound, Decode Is Memory-Bound: Optimizing GPU Utilization for LLM Inference
16 April 2026