Tagged "inference-optimization"
- Qwen3.5 Thinking Mode Can Be Disabled for Production Inference Optimization
- PyTorch Foundation Announces New Members as Agentic AI Demand Grows
- Show HN: Pluckr – LLM-Powered HTML Scraper That Caches Selectors and Auto-Heals
- Mirai Announces $10M to Advance On-Device AI Performance for Consumer Devices
- Show HN: 100% LLM Accuracy–No Fine-Tuning, JSON Only
- Which Web Frameworks Are Most Token-Efficient for AI Agents?
- Breaking the Speed Limit: Strategies for 17k Tokens/Sec Local Inference
- South Korea to Launch $687 Million Project to Develop On-Device AI Semiconductors
- Custom Portable Workstation Optimized for Local AI Inference Builds
- Show HN: Horizon – My AI-Powered Personal News Aggregator and Summarizer
- DietPi Released a New Version v10.1
- Vellium v0.3.5: Major Writing Mode Overhaul and Native KoboldCpp Support
- Taalas Etches AI Models onto Transistors to Rocket Boost Inference
- I Run Local LLMs in One of the World's Priciest Energy Markets, and I Can Barely Tell
- [Release] Ouro-2.6B-Thinking: ByteDance's Recurrent Model Now Runnable Locally
- GGML.AI Acquired by Hugging Face
- Sarvam Brings AI to Feature Phones, Cars, and Smart Glasses
- Enhanced Quantization Visualization Methods for Understanding LLM Compression Trade-offs
- LayerScale Launches Inference Engine Faster Than vLLM, SGLang, and TRT-LLM
- Cloudflare Releases Agents SDK v0.5.0 with Rust-Powered Infire Engine for Edge Inference
- AMD Announces Day 0 Support for Qwen 3.5 LLM on Instinct GPUs
- Open-Source Models Now Comprise 4 of Top 5 Most-Used Endpoints on OpenRouter
- Cohere Releases Tiny Aya: Efficient 3.3B Multilingual Model for 70+ Languages
- GPU-Accelerated DataFrame Library for Local Inference Workloads
- First Vibecoded AI Operating System for Local Deployment
- ByteDance Releases Seedance 2.0 AI Development Platform
- Use Recursive Language Models to address huge contexts for local LLM
- Mistral AI Debugs Critical Memory Leak in vLLM Inference Engine
- Carmack Proposes Using Long Fiber Lines as L2 Cache for Streaming AI Data