Tagged "inference"

A Cinematic Landing-Page Hero for 80 Cents (GPT Image 2 and Veo 3.1) 2 June 2026
JetBrains Releases Mellum2: A 12B MoE Model for Fast, Specialized Tasks 2 June 2026
Netflix Wiz Creates App to Slash AI Bills, Then Open Sources It 1 June 2026
Netflix Wiz Creates App to Slash AI Bills by Pruning Agent Instructions, Then Open-Sources It 31 May 2026
Mistral AI Launches Mistral Vibe 28 May 2026
The Anatomy of an LLM 28 May 2026
AI Token Streaming Isn't About SSE vs. WebSockets 21 May 2026
Samsung's Exynos 2800 Could Be the First Mobile Chip to Use HBM for Powerful On-Device AI 19 May 2026
Local LLMs Offer Unique Advantages That Cloud AI Services Cannot Match 18 May 2026
Linux 7.1-rc4 Released: Kernel Updates Relevant to Local LLM Inference 18 May 2026
The Time Bomb Went Off: AI's All-You-Can-Eat Era Just Ended in Real Time 18 May 2026
AI/ML Benchmark Tool for Local LLM Inference and XGBoost Training 16 May 2026
RelaxAI – UK sovereign LLM inference at 80% cheaper than OpenAI/Claude 15 May 2026
Kog AI – Building a Real-Time Inference Stack on AMD Instinct GPUs 15 May 2026
Local LLM Persistent Context Prevents Repetitive Mistakes 14 May 2026
One LM Studio Setting Change Makes Local LLMs Competitive With Cloud Models 11 May 2026
DFlash Speculative Decoding Delivers 8.5x Speed Improvement for LLM Inference 11 May 2026
Dikaletus: Open-Source Meeting Recording and Transcription Using Mistral AI 9 May 2026
How to make SSE token streams resumable, cancellable, and multi-device 7 May 2026
How to Make SSE Token Streams Resumable, Cancellable, and Multi-Device 1 May 2026
Why the Same LLM Gives Different Answers in Different Environments 28 April 2026
What Type of AI Usage? Deployment Patterns and Implementation Considerations 28 April 2026
Blueprint: AI Hardware Design 26 April 2026
Using a Local LLM as a Zero-Shot Classifier 24 April 2026
I Cancelled Codex Two Months Ago. Opus 4.7 Brought Me Back 23 April 2026
Llama.cpp's Auto Fit Feature Quietly Reshapes Local AI Inference on Consumer Hardware 22 April 2026
Google's Gemma 4 Finally Makes Local LLM Deployment Compelling for Practitioners 22 April 2026
go-AI: New Inference API Library for Go Released 22 April 2026
Running DeepSeek R1 Locally: Your Complete Setup Guide 20 April 2026
Gemma 4 Just Replaced My Whole Local LLM Stack 19 April 2026
oMLX Framework Implements DFlash Attention for Optimized Inference 14 April 2026
VoxCPM2: New Open-Source TTS Model with Voice Cloning and Design 9 April 2026
GitHub Copilot CLI Adds Support for BYOK and Local Model Deployment 8 April 2026
Quansloth Using Google's Turboquant Breaks the VRAM Wall for Local LLMs 7 April 2026
TurboQuant in Llama.cpp Achieves 6X Smaller KV Cache 6 April 2026
Show HN: Lightweight LLM Tracing Tool with CLI 6 April 2026
GPU Memory for LLM Inference (Part 1) 6 April 2026
Gemma 4 Shows Strong Reasoning Performance with Thinking Tokens 3 April 2026
AMD Provides Day 0 Support for Gemma 4 on Ryzen AI Processors and GPUs 3 April 2026
Satcove – Query 5 AI Models Simultaneously and Get Structured Verdicts 1 April 2026
DeepSeek V3 Complete Guide: Deploy and Optimize Local AI in 2026 30 March 2026
DeepSeek-R1 Chain-of-Thought Debugging: A Developer's Guide 30 March 2026
Intel Launches Arc Pro B70/B65 with 32GB VRAM for Local AI Inference 26 March 2026
Powerful AI Search Engine Built on Single GeForce RTX 5090 23 March 2026
Community Converges on Optimal KV Cache Quantization Strategies for Qwen 3.5 Models 20 March 2026
Repurpose Old GPUs as Dedicated AI Inference Accelerators 20 March 2026
Llamafile 0.10 Released with GPU Support and Rebuilt Core 20 March 2026
Unsloth Studio: Open-Source Web UI for Training and Running LLMs Locally 18 March 2026
Mamba 3: State Space Model Architecture Optimized for Inference 18 March 2026
You're Using Your Local LLM Wrong If You're Prompting It Like a Cloud LLM 18 March 2026
Custom GPU Multiplexer Achieves 0.3ms Model Switching on Legacy Hardware 18 March 2026
This External GPU Enclosure Tries to Break Cloud Dependence for Local AI Inference 16 March 2026
Show HN: Buxo.ai – Calendly alternative where LLM decides which slots to show 15 March 2026
I made Karpathy's Autoresearch work on CPU 15 March 2026
Sarvam Open-Sources 30B and 105B Reasoning Models 9 March 2026
Reverse engineering a DOS game with no source code using Codex 5.4 8 March 2026
Windows 11 Notepad to Feature On-Device AI Text Generation Without Subscription 6 March 2026
llama-swap Emerges as Superior Alternative to Ollama and LM-Studio 6 March 2026
HP ZBook Ultra 14 G1a Workstation Reclaims Local AI Workflows for Professionals 2 March 2026
AMD Expands Ryzen AI 400 Series Portfolio for Consumer and Enterprise AI PC Options 2 March 2026
Accuracy vs. Speed in Local LLMs: Finding Your Sweet Spot 28 February 2026
DeepSeek Releases DualPath: Addressing Storage Bandwidth Bottlenecks in Agentic Inference 26 February 2026
Qwen3.5 Thinking Mode Can Be Disabled for Production Inference Optimization 25 February 2026
Qwen3's Voice Embeddings Enable Local Voice Cloning and Mathematical Voice Manipulation 23 February 2026
Qwen3-Code-Next Proves Practical for Local Development: Real-World Coding Tasks on Mac Studio 23 February 2026
Custom Portable Workstation Optimized for Local AI Inference Builds 23 February 2026
Open-Source llama.cpp Finds Long-Term Home at Hugging Face 23 February 2026
GPT-OSS 20B Demonstrates Practical Agentic Capabilities Running Fully Locally 23 February 2026
AI-Powered Reverse-Engineering of Rosetta 2 for Linux 23 February 2026
Ouro 2.6B Thinking Model GGUFs Released with Q8_0 and Q4_K_M Quantization 22 February 2026