Tagged "production-deployment"
-
Building a Local AI Stack: Five Docker Containers to Replace ChatGPT Subscriptions
-
Pocket LLM v1.5.0 Brings Multimodal AI to Android with No Cloud Required
-
Singapore's Foreign Minister Builds an AI "Second Brain" Using NanoClaw
-
Build Your Own Local AI Stack with 5 Docker Containers and Eliminate ChatGPT Subscriptions
-
I Built a Local AI Stack With 5 Docker Containers, and Now I'll Never Pay for ChatGPT Again
-
Show HN: We built an OCR server that can process 270 dense images/s on a 5090
-
ZeusHammer: Built an AI Agent That Thinks Locally
-
PCMind: Local AI Analysis of Docs, Audio, Video and Images
-
We Built a Local Model Arena in 30 Minutes — Infrastructure Mattered More Than the App
-
I Built a Local AI Stack with 5 Docker Containers, and Now I'll Never Pay for ChatGPT Again
-
Researcher Discovers 221 Bugs in vLLM Stemming From Single Root Cause
-
Building Practical Local Coding Assistants: A Working Stack for Editor Integration
-
Users Report Significant Performance Improvements After Migrating from Ollama to llama.cpp
-
I Gave My AI Shell Access and Felt Uneasy – So I Sandboxed It
-
Parakeet Streaming ASR on Apple Silicon via CoreML
-
Ollama's Limitations for Production Local LLM Deployments
-
Ollama is Still the Easiest Way to Start Local LLMs, But It's the Worst Way to Keep Running Them
-
NVIDIA Accelerates Gemma 4 for Local Agentic AI on RTX GPUs
-
Ask HN: What do you use for local embeddings?
-
Qwen 3.5 27B Achieves 1.1M Tokens/Second on B200 GPUs with Optimized vLLM Config
-
See What Your AI Agents Are Doing: Multi-Agent Observability Tool
-
Nota AI and SiMa.ai Partner on Physical AI Technology for Local Deployment
-
Show HN: Open Agent Spec – Treat AI Agents Like Typed Functions, Not Prompt Chains
-
I built Rubric, an open source Sentry for AI. Looking for beta testers
-
Qwen 3.5 Models: Optimal Settings and Reduced Overthinking Configuration
-
LM Studio Releases Reworked Plugins with Fully Local Web Research
-
How to Build a Self-Hosted AI Server with LM Studio: Step-by-Step Guide
-
Nvidia Nemotron Cascade 2 30B Emerges as Powerful Alternative to Qwen Models
-
Qwen 3.5 Emerges as Top Performer for Local Deployment with Extensive Quantization Options
-
Meet Sarvam Edge: India's AI Model That Runs on Phones and Laptops With No Internet
-
LucidShark – Local-first, open-source quality and security gate
-
Auto-retry Claude Code on subscription rate limits (zero deps, tmux-based)
-
Open-Source LLMs Rapidly Displacing Proprietary SOTA Models
-
NVIDIA Updates Nemotron 3 122B License, Removes Deployment Restrictions
-
Nvidia's Nemotron 3 Super: Understanding the Significance for Local LLM Deployment
-
Nvidia Pushes Jetson as Edge Hub for Open AI Models
-
MeepaChat – Slack for AI Agents (iOS, macOS, Web / Cloud, Self-Hosted)
-
Show HN: Detect When an LLM Silently Changes Behavior for the Same Prompt
-
Ex-Manus Backend Lead Shares: Moving Beyond Function Calling in Agent Design
-
Qwen 3.5-35B Uncensored GGUF Models Now Available
-
NVIDIA Jetson Brings Open Models to Life at the Edge
-
Gyro-Claw – Secure Execution Runtime for AI Agents
-
OpenSpec: Spec-driven development (SDD) for AI coding assistants
-
Continuum – CI Drift Guard for LLM Workflows
-
AgentLens – Open-Source Observability for AI Agents
-
Qwen3.5-35B Unsloth Dynamic GGUFs Achieve SOTA Across Nearly All Quantisation Levels
-
Accuracy vs. Speed in Local LLMs: Finding Your Sweet Spot
-
Show HN: MCP Server for AI Compliance Documentation
-
The Complete Developer's Guide to Running LLMs Locally: From Ollama to Production
-
Elastic Introduces Best-in-Class Embedding Models for High Performance Semantic Search
-
Enterprise Infrastructure Guide: Running Local LLMs for 70-150 Developers
-
South Korea to Launch $687 Million Project to Develop On-Device AI Semiconductors
-
Qwen3-Code-Next Proves Practical for Local Development: Real-World Coding Tasks on Mac Studio
-
Open-Source llama.cpp Finds Long-Term Home at Hugging Face
-
The Complete Stack for Local Autonomous Agents: From GGML to Orchestration
-
Ollama 0.17 Released With Improved OpenClaw Onboarding
-
24 Simultaneous Claude Code Agents on Local Hardware
-
Qwen3 Coder Next 8FP Demonstrates Exceptional Long-Context Performance on 128GB System
-
Ollama Production Deployment: Docker-Compose Setup Guide
-
NVIDIA Releases Dynamo v0.9.0: Infrastructure Overhaul With FlashIndexer and Multi-Modal Support
-
Show HN: Forked – A Local Time-Travel Debugger for OpenClaw Agents
-
Self-Hosted AI: A Complete Roadmap for Beginners
-
I broke into my own AI system in 10 minutes. I built it
-
Researchers Find 175,000 Publicly Exposed Ollama AI Servers Across 130 Countries