Tagged "inference"

Powerful AI Search Engine Built on Single GeForce RTX 5090 23 March 2026
Community Converges on Optimal KV Cache Quantization Strategies for Qwen 3.5 Models 20 March 2026
Repurpose Old GPUs as Dedicated AI Inference Accelerators 20 March 2026
Llamafile 0.10 Released with GPU Support and Rebuilt Core 20 March 2026
Unsloth Studio: Open-Source Web UI for Training and Running LLMs Locally 18 March 2026
Mamba 3: State Space Model Architecture Optimized for Inference 18 March 2026
You're Using Your Local LLM Wrong If You're Prompting It Like a Cloud LLM 18 March 2026
Custom GPU Multiplexer Achieves 0.3ms Model Switching on Legacy Hardware 18 March 2026
This External GPU Enclosure Tries to Break Cloud Dependence for Local AI Inference 16 March 2026
Show HN: Buxo.ai – Calendly alternative where LLM decides which slots to show 15 March 2026
I made Karpathy's Autoresearch work on CPU 15 March 2026
Sarvam Open-Sources 30B and 105B Reasoning Models 9 March 2026
Reverse engineering a DOS game with no source code using Codex 5.4 8 March 2026
Windows 11 Notepad to Feature On-Device AI Text Generation Without Subscription 6 March 2026
llama-swap Emerges as Superior Alternative to Ollama and LM-Studio 6 March 2026
HP ZBook Ultra 14 G1a Workstation Reclaims Local AI Workflows for Professionals 2 March 2026
AMD Expands Ryzen AI 400 Series Portfolio for Consumer and Enterprise AI PC Options 2 March 2026
Accuracy vs. Speed in Local LLMs: Finding Your Sweet Spot 28 February 2026
DeepSeek Releases DualPath: Addressing Storage Bandwidth Bottlenecks in Agentic Inference 26 February 2026
Qwen3.5 Thinking Mode Can Be Disabled for Production Inference Optimization 25 February 2026
Qwen3's Voice Embeddings Enable Local Voice Cloning and Mathematical Voice Manipulation 23 February 2026
Qwen3-Code-Next Proves Practical for Local Development: Real-World Coding Tasks on Mac Studio 23 February 2026
Custom Portable Workstation Optimized for Local AI Inference Builds 23 February 2026
Open-Source llama.cpp Finds Long-Term Home at Hugging Face 23 February 2026
GPT-OSS 20B Demonstrates Practical Agentic Capabilities Running Fully Locally 23 February 2026
AI-Powered Reverse-Engineering of Rosetta 2 for Linux 23 February 2026
Ouro 2.6B Thinking Model GGUFs Released with Q8_0 and Q4_K_M Quantization 22 February 2026