Tagged "batch-processing"

ik_llama.cpp Fork Delivers 26x Faster Prompt Processing on Qwen 3.5 27B 22 March 2026
Automating Read-It-Later Workflows with Local LLMs for Overnight Summarization 22 March 2026
Achieving 2000 Tokens Per Second with QWEN 3.5 27B on RTX-5090 14 March 2026
Nemotron 9B Powers Large-Scale Local Inference: Patent Classification and Real-Time Applications 9 March 2026
Breaking the Speed Limit: Strategies for 17k Tokens/Sec Local Inference 23 February 2026
Show HN: PgCortex – AI enrichment per Postgres row, zero transaction blocking 17 February 2026