Tagged "production-deployment"

Rewriting CRIU in Zig using LLM 30 May 2026
Superpowers: An Agentic Skills Framework for AI Coding Workflows 28 May 2026
Samsung's Exynos 2800 Brings HBM Memory to Mobile AI, Enabling Faster Local Model Inference 26 May 2026
Self-Hosting LLMs Reveals Local AI Has a Friction Problem, Not a Quality Problem 23 May 2026
Nvidia Raises Video Encoder Limit to 12 on Consumer GPUs 21 May 2026
Local LLM with Claude Fallback: Hybrid Architecture for Reliable Local-First Setup 21 May 2026
SynapseKit: A New Production Framework for Deploying LLMs 16 May 2026
Kog AI – Building a Real-Time Inference Stack on AMD Instinct GPUs 15 May 2026
Local LLM Persistent Context Prevents Repetitive Mistakes 14 May 2026
Privatemode.ai – AI Provider with Confidential Computing 12 May 2026
Ollama Out-of-Bounds Read Vulnerability Allows Remote Process Memory Leak 11 May 2026
How to make SSE token streams resumable, cancellable, and multi-device 7 May 2026
Building a Local LLM News Brief Taught Me the Real Problem Wasn't the Sources, It Was the Apps 7 May 2026
Microsoft VibeVoice C++ Port Enables Local Voice AI on CPU and GPU Without Python 6 May 2026
Anker's Thus Chip Puts AI On-Device, Promising Faster Responses And Better Privacy 4 May 2026
The Tooling Problem in Local AI Is Finally Getting Solved and That Matters as Much as the Models 3 May 2026
AMD Posts HDMI 2.1 FRL Patches for Amdgpu Linux Driver 2 May 2026
How to Make SSE Token Streams Resumable, Cancellable, and Multi-Device 1 May 2026
Single-Command Setup Tool Automates Claude AI Workstation Configuration 1 May 2026
Self-Hosted LLMs in Production: Real-World Limits and Practical Lessons 30 April 2026
Private LLM vs. ChatGPT: When It Makes Sense for Business 30 April 2026
Building a Local AI Stack: Five Docker Containers to Replace ChatGPT Subscriptions 28 April 2026
Pocket LLM v1.5.0 Brings Multimodal AI to Android with No Cloud Required 27 April 2026
Singapore's Foreign Minister Builds an AI "Second Brain" Using NanoClaw 26 April 2026
Build Your Own Local AI Stack with 5 Docker Containers and Eliminate ChatGPT Subscriptions 25 April 2026
I Built a Local AI Stack With 5 Docker Containers, and Now I'll Never Pay for ChatGPT Again 24 April 2026
Show HN: We built an OCR server that can process 270 dense images/s on a 5090 23 April 2026
ZeusHammer: Built an AI Agent That Thinks Locally 20 April 2026
PCMind: Local AI Analysis of Docs, Audio, Video and Images 19 April 2026
We Built a Local Model Arena in 30 Minutes — Infrastructure Mattered More Than the App 18 April 2026
I Built a Local AI Stack with 5 Docker Containers, and Now I'll Never Pay for ChatGPT Again 18 April 2026
Researcher Discovers 221 Bugs in vLLM Stemming From Single Root Cause 16 April 2026
Building Practical Local Coding Assistants: A Working Stack for Editor Integration 15 April 2026
Users Report Significant Performance Improvements After Migrating from Ollama to llama.cpp 12 April 2026
I Gave My AI Shell Access and Felt Uneasy – So I Sandboxed It 12 April 2026
Parakeet Streaming ASR on Apple Silicon via CoreML 11 April 2026
Ollama's Limitations for Production Local LLM Deployments 10 April 2026
Ollama is Still the Easiest Way to Start Local LLMs, But It's the Worst Way to Keep Running Them 9 April 2026
NVIDIA Accelerates Gemma 4 for Local Agentic AI on RTX GPUs 3 April 2026
Ask HN: What do you use for local embeddings? 31 March 2026
Qwen 3.5 27B Achieves 1.1M Tokens/Second on B200 GPUs with Optimized vLLM Config 27 March 2026
See What Your AI Agents Are Doing: Multi-Agent Observability Tool 27 March 2026
Nota AI and SiMa.ai Partner on Physical AI Technology for Local Deployment 26 March 2026
Show HN: Open Agent Spec – Treat AI Agents Like Typed Functions, Not Prompt Chains 25 March 2026
I built Rubric, an open source Sentry for AI. Looking for beta testers 24 March 2026
Qwen 3.5 Models: Optimal Settings and Reduced Overthinking Configuration 23 March 2026
LM Studio Releases Reworked Plugins with Fully Local Web Research 23 March 2026
How to Build a Self-Hosted AI Server with LM Studio: Step-by-Step Guide 23 March 2026
Nvidia Nemotron Cascade 2 30B Emerges as Powerful Alternative to Qwen Models 22 March 2026
Qwen 3.5 Emerges as Top Performer for Local Deployment with Extensive Quantization Options 20 March 2026
Meet Sarvam Edge: India's AI Model That Runs on Phones and Laptops With No Internet 19 March 2026
LucidShark – Local-first, open-source quality and security gate 18 March 2026
Auto-retry Claude Code on subscription rate limits (zero deps, tmux-based) 18 March 2026
Open-Source LLMs Rapidly Displacing Proprietary SOTA Models 16 March 2026
NVIDIA Updates Nemotron 3 122B License, Removes Deployment Restrictions 16 March 2026
Nvidia's Nemotron 3 Super: Understanding the Significance for Local LLM Deployment 15 March 2026
Nvidia Pushes Jetson as Edge Hub for Open AI Models 12 March 2026
MeepaChat – Slack for AI Agents (iOS, macOS, Web / Cloud, Self-Hosted) 12 March 2026
Show HN: Detect When an LLM Silently Changes Behavior for the Same Prompt 12 March 2026
Ex-Manus Backend Lead Shares: Moving Beyond Function Calling in Agent Design 12 March 2026
Qwen 3.5-35B Uncensored GGUF Models Now Available 11 March 2026
NVIDIA Jetson Brings Open Models to Life at the Edge 11 March 2026
Gyro-Claw – Secure Execution Runtime for AI Agents 9 March 2026
OpenSpec: Spec-driven development (SDD) for AI coding assistants 8 March 2026
Continuum – CI Drift Guard for LLM Workflows 3 March 2026
AgentLens – Open-Source Observability for AI Agents 1 March 2026
Qwen3.5-35B Unsloth Dynamic GGUFs Achieve SOTA Across Nearly All Quantisation Levels 28 February 2026
Accuracy vs. Speed in Local LLMs: Finding Your Sweet Spot 28 February 2026
Show HN: MCP Server for AI Compliance Documentation 27 February 2026
The Complete Developer's Guide to Running LLMs Locally: From Ollama to Production 26 February 2026
Elastic Introduces Best-in-Class Embedding Models for High Performance Semantic Search 24 February 2026
Enterprise Infrastructure Guide: Running Local LLMs for 70-150 Developers 24 February 2026
South Korea to Launch $687 Million Project to Develop On-Device AI Semiconductors 23 February 2026
Qwen3-Code-Next Proves Practical for Local Development: Real-World Coding Tasks on Mac Studio 23 February 2026
Open-Source llama.cpp Finds Long-Term Home at Hugging Face 23 February 2026
The Complete Stack for Local Autonomous Agents: From GGML to Orchestration 23 February 2026
Ollama 0.17 Released With Improved OpenClaw Onboarding 22 February 2026
24 Simultaneous Claude Code Agents on Local Hardware 21 February 2026
Qwen3 Coder Next 8FP Demonstrates Exceptional Long-Context Performance on 128GB System 20 February 2026
Ollama Production Deployment: Docker-Compose Setup Guide 20 February 2026
NVIDIA Releases Dynamo v0.9.0: Infrastructure Overhaul With FlashIndexer and Multi-Modal Support 20 February 2026
Show HN: Forked – A Local Time-Travel Debugger for OpenClaw Agents 20 February 2026
Self-Hosted AI: A Complete Roadmap for Beginners 17 February 2026
I broke into my own AI system in 10 minutes. I built it 17 February 2026
Researchers Find 175,000 Publicly Exposed Ollama AI Servers Across 130 Countries 12 February 2026