Tagged "production-ops"
-
TemplateFlow – Build AI Workflows, Not Prompts
-
SanityBoard Adds 27 New Model Evaluations Including Qwen 3.5 Plus, GLM 5, and Gemini 3.1 Pro
-
Qwen3 Coder Next 8FP Demonstrates Exceptional Long-Context Performance on 128GB System
-
I Stopped Paying for ChatGPT and Built a Private AI Setup That Anyone Can Run
-
The Path to Ubiquitous AI (17k tokens/sec)
-
Ollama Production Deployment: Docker-Compose Setup Guide
-
NVIDIA Releases Dynamo v0.9.0: Infrastructure Overhaul With FlashIndexer and Multi-Modal Support
-
Mirai Secures $10M to Optimize On-Device AI Amid Cloud Cost Surge
-
Show HN: Forked – A Local Time-Travel Debugger for OpenClaw Agents
-
Self-Hosted Local LLMs for Document Management with Paperless-ngx
-
Enhanced Quantization Visualization Methods for Understanding LLM Compression Trade-offs
-
Mihup and Qualcomm Collaborate to Advance Secure On-Device Voice AI for BFSI
-
LayerScale Launches Inference Engine Faster Than vLLM, SGLang, and TRT-LLM
-
Aegis.rs: Open Source Rust-Based LLM Security Proxy Released
-
Tailscale Releases New Tool to Prevent Sensitive Data Leakage to Cloud AI Services
-
Show HN: Shiro.computer Static Page, Unix/NPM Shimmed to Host Claude Code
-
Alibaba's Qwen3.5-397B Achieves #3 Position in Open Weights Model Rankings
-
Same INT8 Model Shows 93% to 71% Accuracy Variance Across Snapdragon Chipsets
-
GLM-5 Technical Report: DSA Innovation Reduces Training and Inference Costs
-
Matmul-Free Language Model Trained on CPU in 1.2 Hours
-
Real-World Coding Benchmark Tests LLMs on 65 Production Codebase Tasks
-
Cloudflare Releases Agents SDK v0.5.0 with Rust-Powered Infire Engine for Edge Inference
-
Ask HN: How Do You Debug Multi-Step AI Workflows When the Output Is Wrong?
-
AMD Announces Day 0 Support for Qwen 3.5 LLM on Instinct GPUs
-
Self-Hosted AI: A Complete Roadmap for Beginners
-
Show HN: PgCortex – AI enrichment per Postgres row, zero transaction blocking
-
Open-Source Models Now Comprise 4 of Top 5 Most-Used Endpoints on OpenRouter
-
I attacked my own LangGraph agent system. All 6 attacks worked
-
Show HN: Inkog – Pre-flight check for AI agents (governance, loops, injection)
-
High Bandwidth Flash Memory Could Alleviate VRAM Constraints in Local LLM Inference
-
Chinese AI Chipmaker Axera Semiconductor Plans $379 Million Hong Kong IPO for Edge Inference Hardware
-
Asus ExpertBook B3 G2 Laptop Features Ryzen AI 9 HX 470 CPU in 1.41kg Ultraportable Form Factor
-
I broke into my own AI system in 10 minutes. I built it
-
GPU-Accelerated DataFrame Library for Local Inference Workloads
-
Critical vLLM RCE Vulnerability Allows Remote Code Execution via Video Links
-
Scaling llama.cpp On Neoverse N2: Solving Cross-NUMA Performance Issues
-
LLM APIs Reconceptualized as State Synchronization Challenge
-
LLaDA2.1 Introduces Token Editing for Massive Speed Gains in Local Inference
-
175,000 Publicly Exposed Ollama AI Servers Discovered Across 130 Countries
-
First Vibecoded AI Operating System for Local Deployment
-
Switching From Ollama and LM Studio to llama.cpp: Performance Benefits
-
Simile AI Raises $100M Series A for Local AI Infrastructure
-
175,000 Publicly Exposed Ollama AI Servers Discovered Across 130 Countries
-
Scaling llama.cpp On Neoverse N2: Solving Cross-NUMA Performance Issues
-
ByteDance Releases Seedance 2.0 AI Development Platform
-
Scaling llama.cpp On Neoverse N2: Solving Cross-NUMA Performance Issues
-
Running Mistral-7B on Intel NPU Achieves 12.6 Tokens/Second
-
Researchers Find 175,000 Publicly Exposed Ollama AI Servers Across 130 Countries
-
Heaps Do Lie: Debugging a Memory Leak in vLLM
-
New Header-Only C++ Benchmark Tool for Predictive Models on Raw Binary Streams
-
Analysis Reveals AI's Real Impact on Software Launches and Development
-
Mistral AI Debugs Critical Memory Leak in vLLM Inference Engine
-
175,000 Publicly Exposed Ollama Servers Create Major Security Risk
-
NAS System Achieves 18 tok/s with 80B LLM Using Only Integrated Graphics
-
Community Member Builds 144GB VRAM Local LLM Powerhouse