Tagged "inference"
- Powerful AI Search Engine Built on Single GeForce RTX 5090
- Community Converges on Optimal KV Cache Quantization Strategies for Qwen 3.5 Models
- Repurpose Old GPUs as Dedicated AI Inference Accelerators
- Llamafile 0.10 Released with GPU Support and Rebuilt Core
- Unsloth Studio: Open-Source Web UI for Training and Running LLMs Locally
- Mamba 3: State Space Model Architecture Optimized for Inference
- You're Using Your Local LLM Wrong If You're Prompting It Like a Cloud LLM
- Custom GPU Multiplexer Achieves 0.3ms Model Switching on Legacy Hardware
- This External GPU Enclosure Tries to Break Cloud Dependence for Local AI Inference
- Show HN: Buxo.ai – Calendly alternative where LLM decides which slots to show
- I made Karpathy's Autoresearch work on CPU
- Sarvam Open-Sources 30B and 105B Reasoning Models
- Reverse engineering a DOS game with no source code using Codex 5.4
- Windows 11 Notepad to Feature On-Device AI Text Generation Without Subscription
- llama-swap Emerges as Superior Alternative to Ollama and LM-Studio
- HP ZBook Ultra 14 G1a Workstation Reclaims Local AI Workflows for Professionals
- AMD Expands Ryzen AI 400 Series Portfolio for Consumer and Enterprise AI PC Options
- Accuracy vs. Speed in Local LLMs: Finding Your Sweet Spot
- DeepSeek Releases DualPath: Addressing Storage Bandwidth Bottlenecks in Agentic Inference
- Qwen3.5 Thinking Mode Can Be Disabled for Production Inference Optimization
- Qwen3's Voice Embeddings Enable Local Voice Cloning and Mathematical Voice Manipulation
- Qwen3-Code-Next Proves Practical for Local Development: Real-World Coding Tasks on Mac Studio
- Custom Portable Workstation Optimized for Local AI Inference Builds
- Open-Source llama.cpp Finds Long-Term Home at Hugging Face
- GPT-OSS 20B Demonstrates Practical Agentic Capabilities Running Fully Locally
- AI-Powered Reverse-Engineering of Rosetta 2 for Linux
- Ouro 2.6B Thinking Model GGUFs Released with Q8_0 and Q4_K_M Quantization