Tagged "local-inference"
-
Open-Source AI Text-to-Speech Models You Can Run Locally for Natural Voice
-
Powerful AI Search Engine Built on Single GeForce RTX 5090
-
Nvidia Nemotron Cascade 2 30B Emerges as Powerful Alternative to Qwen Models
-
AI Playground for Developers Built in Vite and Python
-
Self-Hosted AI Code Review with Local LLMs: Secure Automation Guide
-
Running an AI Agent on a 448KB RAM Microcontroller
-
Pydantic-Deep: Production Deep Agents for Pydantic AI
-
Multi-Token Prediction support coming to MLX-LM for Qwen 3.5
-
Apple M5 Max 128GB real-world performance benchmarks for local inference
-
Cursor's Composer 2 model attribution dispute highlights open-source licensing concerns
-
Your Site Content Is Powering AI. Your Bank Account Has No Idea
-
Build a $1,500 AI Server with DeepSeek-R1 on RTX 4090
-
Atuin v18.13 – Better Search, a PTY Proxy, and AI for Your Shell
-
SwarmHawk – Open-Source CLI for Vulnerability Scanning with AI Synthesis
-
Qwen 3.5 Emerges as Top Performer for Local Deployment with Extensive Quantization Options
-
Repurpose Old GPUs as Dedicated AI Inference Accelerators
-
NVIDIA Nemotron Cascade 2 30B Delivers 120B-Class Performance in Compact Form Factor
-
Llamafile 0.10 Released with GPU Support and Rebuilt Core
-
Unsloth Studio: Open-Source Web UI for Training and Running LLMs Locally
-
MiniMax-M2.7: New Compact Model Announced for Local Deployment
-
Mamba 3: State Space Model Architecture Optimized for Inference
-
LucidShark – Local-first, open-source quality and security gate
-
Auto-retry Claude Code on subscription rate limits (zero deps, tmux-based)
-
Browser-Based Transcription Tools
-
Run LLMs Locally with Llama.cpp
-
Mistral Releases Small 4 Open-Source Model Under Apache 2.0
-
KAIST Develops World's First Hyper-Personalized On-Device AI Chip
-
Open-Source LLMs Rapidly Displacing Proprietary SOTA Models
-
OmniCoder-9B: Efficient Coding Model for 8GB GPUs
-
NVIDIA Updates Nemotron 3 122B License, Removes Deployment Restrictions
-
Show HN: Merrilin.ai – Code Blocks in Your Books, Finally
-
LoKI – Local AI Assistant for Linux and WSL
-
This External GPU Enclosure Tries to Break Cloud Dependence for Local AI Inference
-
Dictare – Open-source Voice Layer for AI Coding Agents (100% Local)
-
Custom AI Smart Speaker
-
Startup Transforms Mac Mini Into Full-Powered AI Inference System With External GPU
-
India's Mobile-First AI Strategy Could Accelerate Local Inference Adoption in Emerging Markets
-
AMD Launches Agent System Optimized for Local AI Inference With Ryzen and Radeon
-
Show HN: Bots of WallStreet – Multi-Agent Debate and Prediction Framework
-
Linux 7.0 AMDGPU Fixing Idle Power Issue For RDNA4 GPUs After Compute Workloads
-
Nvidia Pushes Jetson as Edge Hub for Open AI Models
-
Apple M5 Max 128GB Benchmark Results for Local LLM Inference
-
Qwen 3.5-35B Uncensored GGUF Models Now Available
-
NVIDIA Jetson Brings Open Models to Life at the Edge
-
Llama.cpp Celebrates Major Milestone: From Leak to Industry Standard
-
HP OMEN MAX 16 Review: Is Local AI on a Laptop Viable in 2026?
-
Google Delivers On-Device AI Features in New Chromebook Plus Model
-
M5 Max and M5 Ultra Chipsets Demonstrate Significant Bandwidth Improvements for Local LLM Inference
-
Strix Halo (Ryzen AI Max+ 395) Achieves Strong Local Inference Performance with ROCm 7.2
-
When Running Ollama on Your PC for Local AI, One Thing Matters More Than Most
-
Nemotron 9B Powers Large-Scale Local Inference: Patent Classification and Real-Time Applications
-
commitgen-cc – Generate Conventional Commit Messages Locally with Ollama
-
Qwen 3.5 27B Achieves Strong Local Inference Performance
-
Benchmark: Local Open-Source LLMs Competitive in Real-Time Trading Applications
-
Show HN: Ivy – the first proactive, offline AI tutor
-
Apple Launches MacBook Neo with A18 Pro Chip for Affordable Local AI Inference
-
Self-Hosted Paperless-ngx With Optional Local AI Integration
-
Show HN: RedDragon – LLM-Assisted IR Analysis of Code Across Languages
-
Mojo: Creating a Programming Language for an AI World with Chris Lattner
-
Llama.cpp Merges Automatic Parser Generator to Mainline
-
Show HN: Asterode – Multi-Model AI App with Memory and Power Features
-
Building PyTorch-Native Support for IBM Spyre Accelerator
-
HyperExcel Seeks 150 Billion Won Series B to Scale LPU and Verda in Korea
-
Show HN: BoardMint – A PCB Review Tool That Avoids AI Hallucinations
-
Apple Unveils MacBook Pro with M5 Pro and M5 Max Featuring On-Device AI
-
SynthesisOS – A Local-First, Agentic Desktop Layer Built in Rust
-
OpenWrt 25.12.0 – Stable Release
-
On-Device AI Laptop Lineups Become Standard Across Major Manufacturers
-
AMD Launches Copilot+ Desktop Chips to Compete in On-Device AI Market
-
Qwen 3.5 Small Models Released: 0.8B to 9B Parameters Optimized for On-Device Inference
-
Intel Arc Pro B70 Workstation GPU Confirmed via vLLM AI Release Notes
-
Framework Choice Critical: llama.cpp and vLLM Outperform Ollama for Qwen 3.5 Testing
-
Building a Dependency-Free GPT on a Custom OS
-
Running Local AI Models on Mac Studio 128GB: 4B, 20B & 120B Tested
-
Qualcomm Launches Snapdragon Wear Elite for On-Device AI on Wearables
-
HP ZBook Ultra 14 G1a Workstation Reclaims Local AI Workflows for Professionals
-
Change Intent Records: The Missing Artifact in AI-Assisted Development
-
C7: Pipe Up-to-Date Library Docs Into Any LLM From the Terminal
-
Qwen 3.5-35B-A3B Emerges as Efficient Daily Driver, Replacing 120B Models
-
4 Free Tools to Run Powerful AI on Your PC Without a Subscription
-
Unsloth Dynamic 2.0 GGUFs
-
5 Useful Docker Containers for Agentic Developers
-
On-Device Function Calling in Google AI Edge Gallery
-
Show HN: Caret – Tab to Complete at Any App on Your Mac
-
Android Phones Are Getting Smarter Without Internet — Here's Why On-Device AI Is the Next Big Shift
-
Every agent framework has the same bug – prompt decay. Here's a fix
-
Building a Privacy-Preserving RAG System in the Browser
-
Ollama for JavaScript Developers: Building AI Apps Without API Keys
-
LM Studio vs Ollama: Complete Comparison
-
DeepSeek Paper – DualPath: Breaking the Bandwidth Bottleneck in LLM Inference
-
Apple: Python bindings for access to the on-device Apple Intelligence model
-
Red Hat Launches AI Enterprise for Hybrid AI Deployments
-
Show HN: Pluckr – LLM-Powered HTML Scraper That Caches Selectors and Auto-Heals
-
How AI is Redefining Price and Performance in Modern Laptops
-
Mirai Tech Raises $10 Million for On-Device AI Innovation
-
Enhanced Interface Speed Enables High-Performance On-Device AI Features in Smartphones
-
Anthropic Has Never Open-Sourced an LLM: Implications for Local Deployment Strategy
-
Qwen3 Demonstrates Advanced Voice Cloning via Embeddings
-
Open-Source Framework Achieves Gemini 3 Deep Think Level Performance Through Local Model Scaffolding
-
Local GPT-OSS 20B Model Demonstrates Practical Agentic Capabilities
-
Open-Source llama.cpp Finds Long-Term Home at Hugging Face
-
Yet Another Fix Coming for Older AMD GPUs on Linux – Thanks to Valve Developer
-
Show HN: Horizon – My AI-Powered Personal News Aggregator and Summarizer
-
GGML Joins Hugging Face: What This Means for Local Model Optimization
-
AI PCs Explained: 7 Critical Truths About NPUs and Privacy
-
[Release] Ouro-2.6B-Thinking: ByteDance's Recurrent Model Now Runnable Locally
-
GGML.AI Acquired by Hugging Face
-
Apple Researchers Develop On-Device AI Agent That Interacts With Apps for You
-
VaultAI – 42 AI Models on a Portable SSD, Works Offline for $399
-
SanityBoard Adds 27 New Model Evaluations Including Qwen 3.5 Plus, GLM 5, and Gemini 3.1 Pro
-
I Stopped Paying for ChatGPT and Built a Private AI Setup That Anyone Can Run
-
Ollama Production Deployment: Docker-Compose Setup Guide
-
NVIDIA Releases Dynamo v0.9.0: Infrastructure Overhaul With FlashIndexer and Multi-Modal Support
-
Kitten TTS V0.8 Released: State-of-the-Art Super-Tiny Text-to-Speech Model Under 25MB
-
Tailscale Releases New Tool to Prevent Sensitive Data Leakage to Cloud AI Services
-
Sarvam AI Launches Edge Model to Challenge Major AI Players with Local-First Approach
-
GLM-5 Technical Report: DSA Innovation Reduces Training and Inference Costs
-
Can We Leverage AI/LLMs for Self-Learning?
-
Meet Sarvam Edge: India's AI Model That Runs on Phones and Laptops With No Internet
-
Qwen 3.5-397B-A17B Now Available for Local Inference with Aggressive Quantisation
-
Show HN: PgCortex – AI enrichment per Postgres row, zero transaction blocking
-
High Bandwidth Flash Memory Could Alleviate VRAM Constraints in Local LLM Inference
-
Asus ExpertBook B3 G2 Laptop Features Ryzen AI 9 HX 470 CPU in 1.41kg Ultraportable Form Factor
-
GPU-Accelerated DataFrame Library for Local Inference Workloads
-
Alibaba Unveils Major AI Model Upgrade Ahead of DeepSeek Release
-
Scaling llama.cpp On Neoverse N2: Solving Cross-NUMA Performance Issues
-
MiniMax-M2.5 230B MoE Model Released with GGUF Support for Local Deployment
-
LLaDA2.1 Introduces Token Editing for Massive Speed Gains in Local Inference
-
GPT-OSS 20B Now Runs 100% Locally in Browser via WebGPU
-
GNOME's AI Assistant Newelle Adds llama.cpp Support and Command Execution
-
First Vibecoded AI Operating System for Local Deployment
-
Optimal llama.cpp Settings Found for Qwen3 Coder Next Loop Issues
-
Ming-flash-omni-2.0: 100B MoE Omni-Modal Model Released
-
The Future of AI Slop Is Constraints - Implications for Local Models
-
Running Mistral-7B on Intel NPU Achieves 12.6 Tokens/Second
-
Developer Creates Custom Local AI Headshot Generator After Commercial Solutions Fail
-
Community Member Builds 144GB VRAM Local LLM Powerhouse