Tagged "local-inference"

Open-Source AI Text-to-Speech Models You Can Run Locally for Natural Voice 24 March 2026
Powerful AI Search Engine Built on Single GeForce RTX 5090 23 March 2026
Nvidia Nemotron Cascade 2 30B Emerges as Powerful Alternative to Qwen Models 22 March 2026
AI Playground for Developers Built in Vite and Python 22 March 2026
Self-Hosted AI Code Review with Local LLMs: Secure Automation Guide 21 March 2026
Running an AI Agent on a 448KB RAM Microcontroller 21 March 2026
Pydantic-Deep: Production Deep Agents for Pydantic AI 21 March 2026
Multi-Token Prediction support coming to MLX-LM for Qwen 3.5 21 March 2026
Apple M5 Max 128GB real-world performance benchmarks for local inference 21 March 2026
Cursor's Composer 2 model attribution dispute highlights open-source licensing concerns 21 March 2026
Your Site Content Is Powering AI. Your Bank Account Has No Idea 21 March 2026
Build a $1,500 AI Server with DeepSeek-R1 on RTX 4090 21 March 2026
Atuin v18.13 – Better Search, a PTY Proxy, and AI for Your Shell 21 March 2026
SwarmHawk – Open-Source CLI for Vulnerability Scanning with AI Synthesis 20 March 2026
Qwen 3.5 Emerges as Top Performer for Local Deployment with Extensive Quantization Options 20 March 2026
Repurpose Old GPUs as Dedicated AI Inference Accelerators 20 March 2026
NVIDIA Nemotron Cascade 2 30B Delivers 120B-Class Performance in Compact Form Factor 20 March 2026
Llamafile 0.10 Released with GPU Support and Rebuilt Core 20 March 2026
Unsloth Studio: Open-Source Web UI for Training and Running LLMs Locally 18 March 2026
MiniMax-M2.7: New Compact Model Announced for Local Deployment 18 March 2026
Mamba 3: State Space Model Architecture Optimized for Inference 18 March 2026
LucidShark – Local-first, open-source quality and security gate 18 March 2026
Auto-retry Claude Code on subscription rate limits (zero deps, tmux-based) 18 March 2026
Browser-Based Transcription Tools 18 March 2026
Run LLMs Locally with Llama.cpp 17 March 2026
Mistral Releases Small 4 Open-Source Model Under Apache 2.0 17 March 2026
KAIST Develops World's First Hyper-Personalized On-Device AI Chip 17 March 2026
Open-Source LLMs Rapidly Displacing Proprietary SOTA Models 16 March 2026
OmniCoder-9B: Efficient Coding Model for 8GB GPUs 16 March 2026
NVIDIA Updates Nemotron 3 122B License, Removes Deployment Restrictions 16 March 2026
Show HN: Merrilin.ai – Code Blocks in Your Books, Finally 16 March 2026
LoKI – Local AI Assistant for Linux and WSL 16 March 2026
This External GPU Enclosure Tries to Break Cloud Dependence for Local AI Inference 16 March 2026
Dictare – Open-source Voice Layer for AI Coding Agents (100% Local) 16 March 2026
Custom AI Smart Speaker 16 March 2026
Startup Transforms Mac Mini Into Full-Powered AI Inference System With External GPU 15 March 2026
India's Mobile-First AI Strategy Could Accelerate Local Inference Adoption in Emerging Markets 15 March 2026
AMD Launches Agent System Optimized for Local AI Inference With Ryzen and Radeon 15 March 2026
Show HN: Bots of WallStreet – Multi-Agent Debate and Prediction Framework 14 March 2026
Linux 7.0 AMDGPU Fixing Idle Power Issue For RDNA4 GPUs After Compute Workloads 13 March 2026
Nvidia Pushes Jetson as Edge Hub for Open AI Models 12 March 2026
Apple M5 Max 128GB Benchmark Results for Local LLM Inference 12 March 2026
Qwen 3.5-35B Uncensored GGUF Models Now Available 11 March 2026
NVIDIA Jetson Brings Open Models to Life at the Edge 11 March 2026
Llama.cpp Celebrates Major Milestone: From Leak to Industry Standard 11 March 2026
HP OMEN MAX 16 Review: Is Local AI on a Laptop Viable in 2026? 10 March 2026
Google Delivers On-Device AI Features in New Chromebook Plus Model 10 March 2026
M5 Max and M5 Ultra Chipsets Demonstrate Significant Bandwidth Improvements for Local LLM Inference 10 March 2026
Strix Halo (Ryzen AI Max+ 395) Achieves Strong Local Inference Performance with ROCm 7.2 9 March 2026
When Running Ollama on Your PC for Local AI, One Thing Matters More Than Most 9 March 2026
Nemotron 9B Powers Large-Scale Local Inference: Patent Classification and Real-Time Applications 9 March 2026
commitgen-cc – Generate Conventional Commit Messages Locally with Ollama 9 March 2026
Qwen 3.5 27B Achieves Strong Local Inference Performance 8 March 2026
Benchmark: Local Open-Source LLMs Competitive in Real-Time Trading Applications 8 March 2026
Show HN: Ivy – the first proactive, offline AI tutor 8 March 2026
Apple Launches MacBook Neo with A18 Pro Chip for Affordable Local AI Inference 8 March 2026
Self-Hosted Paperless-ngx With Optional Local AI Integration 7 March 2026
Show HN: RedDragon – LLM-Assisted IR Analysis of Code Across Languages 7 March 2026
Mojo: Creating a Programming Language for an AI World with Chris Lattner 7 March 2026
Llama.cpp Merges Automatic Parser Generator to Mainline 7 March 2026
Show HN: Asterode – Multi-Model AI App with Memory and Power Features 7 March 2026
Building PyTorch-Native Support for IBM Spyre Accelerator 6 March 2026
HyperExcel Seeks 150 Billion Won Series B to Scale LPU and Verda in Korea 6 March 2026
Show HN: BoardMint – A PCB Review Tool That Avoids AI Hallucinations 6 March 2026
Apple Unveils MacBook Pro with M5 Pro and M5 Max Featuring On-Device AI 5 March 2026
SynthesisOS – A Local-First, Agentic Desktop Layer Built in Rust 4 March 2026
OpenWrt 25.12.0 – Stable Release 4 March 2026
On-Device AI Laptop Lineups Become Standard Across Major Manufacturers 4 March 2026
AMD Launches Copilot+ Desktop Chips to Compete in On-Device AI Market 4 March 2026
Qwen 3.5 Small Models Released: 0.8B to 9B Parameters Optimized for On-Device Inference 3 March 2026
Intel Arc Pro B70 Workstation GPU Confirmed via vLLM AI Release Notes 3 March 2026
Framework Choice Critical: llama.cpp and vLLM Outperform Ollama for Qwen 3.5 Testing 3 March 2026
Building a Dependency-Free GPT on a Custom OS 3 March 2026
Running Local AI Models on Mac Studio 128GB: 4B, 20B & 120B Tested 2 March 2026
Qualcomm Launches Snapdragon Wear Elite for On-Device AI on Wearables 2 March 2026
HP ZBook Ultra 14 G1a Workstation Reclaims Local AI Workflows for Professionals 2 March 2026
Change Intent Records: The Missing Artifact in AI-Assisted Development 2 March 2026
C7: Pipe Up-to-Date Library Docs Into Any LLM From the Terminal 2 March 2026
Qwen 3.5-35B-A3B Emerges as Efficient Daily Driver, Replacing 120B Models 1 March 2026
4 Free Tools to Run Powerful AI on Your PC Without a Subscription 1 March 2026
Unsloth Dynamic 2.0 GGUFs 28 February 2026
5 Useful Docker Containers for Agentic Developers 28 February 2026
On-Device Function Calling in Google AI Edge Gallery 27 February 2026
Show HN: Caret – Tab to Complete at Any App on Your Mac 27 February 2026
Android Phones Are Getting Smarter Without Internet — Here's Why On-Device AI Is the Next Big Shift 27 February 2026
Every agent framework has the same bug – prompt decay. Here's a fix 26 February 2026
Building a Privacy-Preserving RAG System in the Browser 26 February 2026
Ollama for JavaScript Developers: Building AI Apps Without API Keys 26 February 2026
LM Studio vs Ollama: Complete Comparison 26 February 2026
DeepSeek Paper – DualPath: Breaking the Bandwidth Bottleneck in LLM Inference 26 February 2026
Apple: Python bindings for access to the on-device Apple Intelligence model 26 February 2026
Red Hat Launches AI Enterprise for Hybrid AI Deployments 25 February 2026
Show HN: Pluckr – LLM-Powered HTML Scraper That Caches Selectors and Auto-Heals 25 February 2026
How AI is Redefining Price and Performance in Modern Laptops 25 February 2026
Mirai Tech Raises $10 Million for On-Device AI Innovation 24 February 2026
Enhanced Interface Speed Enables High-Performance On-Device AI Features in Smartphones 24 February 2026
Anthropic Has Never Open-Sourced an LLM: Implications for Local Deployment Strategy 24 February 2026
Qwen3 Demonstrates Advanced Voice Cloning via Embeddings 23 February 2026
Open-Source Framework Achieves Gemini 3 Deep Think Level Performance Through Local Model Scaffolding 23 February 2026
Local GPT-OSS 20B Model Demonstrates Practical Agentic Capabilities 23 February 2026
Open-Source llama.cpp Finds Long-Term Home at Hugging Face 23 February 2026
Yet Another Fix Coming for Older AMD GPUs on Linux – Thanks to Valve Developer 23 February 2026
Show HN: Horizon – My AI-Powered Personal News Aggregator and Summarizer 22 February 2026
GGML Joins Hugging Face: What This Means for Local Model Optimization 22 February 2026
AI PCs Explained: 7 Critical Truths About NPUs and Privacy 22 February 2026
[Release] Ouro-2.6B-Thinking: ByteDance's Recurrent Model Now Runnable Locally 21 February 2026
GGML.AI Acquired by Hugging Face 21 February 2026
Apple Researchers Develop On-Device AI Agent That Interacts With Apps for You 21 February 2026
VaultAI – 42 AI Models on a Portable SSD, Works Offline for $399 20 February 2026
SanityBoard Adds 27 New Model Evaluations Including Qwen 3.5 Plus, GLM 5, and Gemini 3.1 Pro 20 February 2026
I Stopped Paying for ChatGPT and Built a Private AI Setup That Anyone Can Run 20 February 2026
Ollama Production Deployment: Docker-Compose Setup Guide 20 February 2026
NVIDIA Releases Dynamo v0.9.0: Infrastructure Overhaul With FlashIndexer and Multi-Modal Support 20 February 2026
Kitten TTS V0.8 Released: State-of-the-Art Super-Tiny Text-to-Speech Model Under 25MB 19 February 2026
Tailscale Releases New Tool to Prevent Sensitive Data Leakage to Cloud AI Services 18 February 2026
Sarvam AI Launches Edge Model to Challenge Major AI Players with Local-First Approach 18 February 2026
GLM-5 Technical Report: DSA Innovation Reduces Training and Inference Costs 18 February 2026
Can We Leverage AI/LLMs for Self-Learning? 18 February 2026
Meet Sarvam Edge: India's AI Model That Runs on Phones and Laptops With No Internet 17 February 2026
Qwen 3.5-397B-A17B Now Available for Local Inference with Aggressive Quantisation 17 February 2026
Show HN: PgCortex – AI enrichment per Postgres row, zero transaction blocking 17 February 2026
High Bandwidth Flash Memory Could Alleviate VRAM Constraints in Local LLM Inference 17 February 2026
Asus ExpertBook B3 G2 Laptop Features Ryzen AI 9 HX 470 CPU in 1.41kg Ultraportable Form Factor 17 February 2026
GPU-Accelerated DataFrame Library for Local Inference Workloads 16 February 2026
Alibaba Unveils Major AI Model Upgrade Ahead of DeepSeek Release 16 February 2026
Scaling llama.cpp On Neoverse N2: Solving Cross-NUMA Performance Issues 14 February 2026
MiniMax-M2.5 230B MoE Model Released with GGUF Support for Local Deployment 14 February 2026
LLaDA2.1 Introduces Token Editing for Massive Speed Gains in Local Inference 14 February 2026
GPT-OSS 20B Now Runs 100% Locally in Browser via WebGPU 14 February 2026
GNOME's AI Assistant Newelle Adds llama.cpp Support and Command Execution 14 February 2026
First Vibecoded AI Operating System for Local Deployment 13 February 2026
Optimal llama.cpp Settings Found for Qwen3 Coder Next Loop Issues 13 February 2026
Ming-flash-omni-2.0: 100B MoE Omni-Modal Model Released 13 February 2026
The Future of AI Slop Is Constraints - Implications for Local Models 13 February 2026
Running Mistral-7B on Intel NPU Achieves 12.6 Tokens/Second 12 February 2026
Developer Creates Custom Local AI Headshot Generator After Commercial Solutions Fail 11 February 2026
Community Member Builds 144GB VRAM Local LLM Powerhouse 11 February 2026