Tagged "edge-deployment"
-
NVIDIA Nemotron 3 Nano Omni Powers Multimodal Agent Reasoning in a Single Efficient Open Model
-
Llama.cpp Runs on SGI Power Challenge from 1995 with MIPS R8000 Kernel
-
GraphOS: Visual Runtime and Debugger for AI Agents with Local-First Execution
-
Why the Same LLM Gives Different Answers in Different Environments
-
What Type of AI Usage? Deployment Patterns and Implementation Considerations
-
Google's Gemma 4: Powerful AI Models Optimized for Your Phone and Laptop
-
Pocket LLM v1.5.0 Brings Multimodal AI to Android with No Cloud Required
-
The New Linux Kernel AI Bot Uncovering Bugs Is A Local LLM On Framework Desktop + AMD Ryzen AI Max
-
Google's Gemma 4 Could Put Powerful AI on Your Phone and Laptop
-
Singapore's Foreign Minister Builds an AI "Second Brain" Using NanoClaw
-
Thinking Outside the Box: New Attack Surfaces in Sandboxed AI Agents
-
Pluggable's TBT5-AI: First Thunderbolt Dock Explicitly Targeting Local LLM Workstations
-
Show HN: Phonetic Formatter – Offline English Text to IPA on iPhone and iPad
-
Google's Gemma 4 Could Put Powerful AI on Your Phone and Laptop
-
Blueprint: AI Hardware Design
-
SiGit Code: Local-First Coding Agent
-
Rust Open-Source Headless Browser for AI Agents and Web Scraping
-
Run a Local LLM Server on Raspberry Pi with Remote Access Capabilities
-
LLMs Consume 5.4x Less Mobile Energy Than Ad-Supported Web Search
-
Show HN: A Karpathy-Style LLM Wiki Your Agents Maintain
-
Google's Gemma 4 Brings Powerful On-Device AI to Phones and Laptops
-
Seed3D 2.0
-
Hackers Exploit Ollama Model Uploads to Leak Server Data
-
Netherlands Reaches Deal to Cut Reliance on U.S. Cloud Tech
-
Using a Local LLM as a Zero-Shot Classifier
-
How to Make Sense of AI
-
Building Real-World On-Device AI with LiteRT and NPU
-
AI Agent Designs a RISC-V CPU Core from Scratch
-
Intel OpenVINO 2026.1 Integrates llama.cpp with Wildcat Lake and Arc Pro B70
-
Tesseron: New API Framework for AI Agents with Developer-Defined Configuration
-
Sarvam Edge: India's Offline AI Model Runs on Phones and Laptops Without Internet
-
Developer Turns Phone Into Local LLM Server with Vision, Voice, and Tool Calling Capabilities
-
Llama.cpp's Auto Fit Feature Quietly Reshapes Local AI Inference on Consumer Hardware
-
Google's Gemma 4 Finally Makes Local LLM Deployment Compelling for Practitioners
-
go-AI: New Inference API Library for Go Released
-
Cursor-Autoresearch: AI Research Automation Port for Local Workflows
-
16 Ways to Make a Small Language Model Think Bigger
-
The Open-Source AI Ecosystem Keeps Treating llama.cpp Like a Second-Class Citizen
-
Malicious GGUF Models Could Trigger Remote Code Execution on SGLang Servers
-
Gemma 4 Just Replaced My Whole Local LLM Stack
-
DeepX and Hyundai Motor Group Robotics LAB Partner to Develop Next-Generation Physical AI Compute Platform
-
ZeusHammer: Built an AI Agent That Thinks Locally
-
Complete Local Coding Assistant Stack Running Inside Your Editor
-
Intel Extends AI PC Reach With New Core Ultra Series 3 Launch
-
Bun v1.3.13
-
AI Quota Inflation Is No Token Effort. It's Baked In
-
Waterloo's Live AI-Goose Tracker: Real-Time Edge Vision
-
PCMind: Local AI Analysis of Docs, Audio, Video and Images
-
Minisforum Launches N5 Max AI NAS with OpenClaw
-
Memjar: Uncompromising Local-First Second Brain
-
I Connected My Local LLM to My Browser and It Changed How I Automated Tasks
-
Local AI Isn't Just Ollama—Here's the Ecosystem That Actually Makes It Useful
-
Kilo is the VS Code Extension That Actually Works with Every Local LLM
-
Gemma 4 Just Replaced My Whole Local LLM Stack
-
Laimark – 8B LLM That Self-Improves on Consumer GPUs
-
Exposed LLM Infrastructure: How Attackers Find and Exploit Misconfigured AI Deployments
-
115 TOPS in 0.67L: CHUWI AuBox X Packs On-Device AI Power Into a Palm-Sized Mini PC
-
Build a More Secure, Always-On Local AI Agent with OpenClaw and NVIDIA NemoClaw
-
BibCrit – LLM Grounded in ETCBC Corpus Data for Biblical Textual Criticism
-
Kilo Is the VS Code Extension That Actually Works With Every Local LLM I Throw at It
-
Local AI Isn't Just Ollama—Here's the Ecosystem That Actually Makes It Useful
-
ChatMCP – Connect your AI browser chats to your coding agents
-
Building a Voice AI Wearable in a Casio F91W with Whisper and BLE
-
Bonsai 1.7B in the Browser: A 290MB 1-bit LLM on WebGPU
-
Xiaomi 12 Pro Converted Into 24/7 Headless AI Server With Ollama and Gemma4
-
Self-Hosted LLMs Transform Personal Knowledge Management Systems
-
Building Practical Local Coding Assistants: A Working Stack for Editor Integration
-
Google's Gemma 4 Brings Game-Changing Performance to Local Laptop Inference
-
Running Gemma 4 on an iPhone 13 Pro
-
DotLLM – Building an LLM Inference Engine in C#
-
DGX Spark Setup Guide: Running vLLM and PyTorch for Local LLM Inference Backend
-
DFlash Doubles Token Generation Speed of Qwen3.5 27B on Mac M5 Max
-
Ubiquiti UniFi G6 Turret 4K Camera Features On-Device AI Processing at $199 Price Point
-
Talking to a Local LLM in the Firefox Sidebar
-
Sovereign AI: Why the Next GPT Will Be Born in Our Living Rooms
-
Fine-Tuned Qwen3.5-0.8B for OCR Outperforms Previous 2B Release
-
Qwen 3.5 Small – On-Device Multimodal Models Released
-
oMLX Framework Implements DFlash Attention for Optimized Inference
-
Minisforum N5 MAX AI NAS Delivers 126 TOPS with 200TB Storage for Local LLM Workloads
-
MiniMax M2.7 Achieves SOTA Performance Under 64GB on Mac with TQ Quantization
-
Local LLM Connected to Home Assistant via MCP Now Enables Autonomous Smart Home Management
-
Developer Shares Golden Stack for Local Coding Assistant Integration Directly Inside Code Editors
-
Self-Hosted LLM Took Personal Knowledge Management System to the Next Level
-
Qwen3 Audio and Vision Support Now Available in llama.cpp
-
On-Device AI Inference Emerges as New Security Blind Spot for CISOs
-
Defender – Local Prompt Injection Detection for AI Agents
-
Audio Processing Support Lands in llama.cpp with Gemma-4
-
Researchers Achieve 1-Bit Quantization of OLMo-3 7B Using Distillation
-
A Deep Dive into Tinygrad AI Compiler
-
Self-Hosted LLM Elevates Personal Knowledge Management Systems to New Levels
-
On-Device AI: Achieving Powerful AI Capabilities Without Internet Connectivity
-
MiniMax M2.7 Is Now Open Source
-
Google's Gemma 4 Brings Free Agentic AI to Your Phone With Zero Data Leaving the Device
-
Google Gemma 4 Delivers Exceptional Speed and Accuracy for Local Inference
-
Google's Gemini Nano 4 Offers Faster, Smarter Local Inference Capabilities
-
ASUS ExpertBook P1 Integrates On-Device AI for Enterprise Collaboration
-
AI PC Market Projected to Reach $235B by 2032, Driven by On-Device Computing Adoption
-
Self-Installing Skill Manager for AI Agents
-
Tether Launches QVAC SDK for Cross-Platform Local AI Development
-
Samsung Integrates On-Device AI Features into Galaxy A-Series Smartphones
-
Ollama's Limitations for Production Local LLM Deployments
-
LLM Wiki v2: Extended Knowledge Base for LLM Practitioners
-
CarryAI's Serverless Vision-Language Models Enable On-Device Multimodal AI
-
On-Device Apple Intelligence Vulnerable to Prompt Injection Attacks
-
AI Scans 400k Reddit Posts to Flag Overlooked GLP-1 Side Effects
-
Energy Consumption: The Final Frontier for AI and Local Inference
-
Running a 1.7B Parameters LLM on an Apple Watch
-
Run Qwen3.5 on an Old Laptop: A Lightweight Local Agentic AI Setup Guide
-
Mano-P: Open-Source On-Device GUI Agent, #1 on OSWorld Benchmark
-
Ask HN: Local-First Meetings Recorder and Transcriber
-
Gemini-CLI, Llama.cpp, and Qwen3.5 Running on NVIDIA Jetson TK1
-
Gemma 4 Support Stabilized in Llama.cpp
-
GitHub Copilot CLI Adds Support for BYOK and Local Model Deployment
-
Google's Gemma 4 Brings Powerful On-Device AI to Android and iOS
-
StyleSeed – Design Rules That Make AI Coding Tools Produce Professional UI
-
Running AI Natively on Windows 11 Using an eGPU
-
Quansloth Using Google's Turboquant Breaks the VRAM Wall for Local LLMs
-
Your Next Assistant is Your PC: How On-Device AI is Transforming Work, One Workflow at a Time
-
Octopoda: Open Source Memory Layer for Fully Offline AI Agents
-
MemPalace, the Highest-Scoring AI Memory System Ever Benchmarked
-
Google Launches Offline AI Dictation App for iOS with Gemma
-
CricketBrain: Neuromorphic Signal Processor in Rust (0.175us/step, 944 bytes)
-
AMD Announces Day 0 Support for Google Gemma 4 Across Processors and GPUs
-
VLA Learns How to Act. S2S Decides Whether the Motion Is Physically Trustworthy
-
Verbatim 140W GAN: One of the First Chargers With USB PD 3.2 AVS (SPR) Support
-
TurboQuant in Llama.cpp Achieves 6X Smaller KV Cache
-
METATRON: Open-Source AI Penetration Testing with Local LLMs
-
HunyuanOCR 1B: High-Quality OCR Now Viable on Budget Consumer Hardware
-
Google AI Edge Gallery Tops App Store Charts with On-Device Gemma 4
-
Real-time Multimodal AI on Apple Silicon: Gemma E2B Demo Shows Practical Edge Deployment
-
Apple Brings Enhanced On-Device AI Features to iPhone
-
Show HN: Turn Photos Into Wordle Puzzles with AI That Runs 100% in Your Browser
-
Vektor – Local-First Associative Memory for AI Agents
-
Satsgate: Monetize AI Agents and APIs with Lightning L402 Protocol
-
Microsoft Quantum Development Kit Ported to Rust: 100x Faster and Smaller
-
Google Previews Gemini Nano 4 for Android AICore with On-Device Capabilities
-
GMKtec NucBox K17 Launches with 97 TOPS AI Performance for Local Inference
-
Gemma 4 31B Achieves Third Place on FoodTruck Bench, Beating Larger Models
-
Run AutoGEN with Ollama and LiteLLM in Simple Steps
-
Apple Research Shows Self-Distillation Significantly Improves Local Code Generation
-
Samsung Launches Galaxy Book6 Series with NVIDIA RTX 5070 and On-Device AI
-
NVIDIA and Google Optimize Gemma 4 AI Models for Local RTX Deployment
-
Nex Life Logger: Local Activity Tracker with AI Agent Integration
-
Netflix Open-Sources VOID Model for Video Object Deletion
-
Kokoro TTS Achieves 20× Realtime Speed on CPU-Only On-Device Inference
-
Google Launches Gemma 4 For Advanced On-Device AI
-
Free AI Video Clipper Using Scene and Speech-Based Segmentation
-
SkillCompass – Diagnose and Improve AI Agent Skills Across 6 Dimensions
-
Google Gemma 4 Released with GGUF Quantizations
-
Google Launches Gemma 4 Open Models for Local On-Device AI
-
Gemma 4 2B Successfully Runs on Raspberry Pi 5
-
Gemma 4 on Arm: Optimized On-Device AI for Mobile and Edge Deployment
-
Apfel – The Free AI Already on Your Mac
-
AMD Provides Day 0 Support for Gemma 4 on Ryzen AI Processors and GPUs
-
How to Integrate VS Code with Ollama for Local AI Assistance
-
SmolLM2-360M Running on Samsung Galaxy Watch 4 with 74% Memory Reduction
-
Qwen 3.6-Plus Released
-
Men Are Ditching TV for YouTube as AI Usage and Social Media Fatigue Grow
-
TinyGPU Adds Mac Support for External Nvidia GPU Acceleration
-
Lotte Innovate and DeepX Collaborate on Mass Production of Domestic AI Semiconductors
-
A Journey to a Reliable and Enjoyable Locally Hosted Voice Assistant
-
git11 Is an AI Workspace for GitHub Engineering Teams
-
Show HN: Extra-Platforms, Python Library to Detect OS, Arch, Shell, CI, AI
-
Bonsai 1-Bit Models Deliver Exceptional Local Inference Performance
-
Ollama Adopts Apple's MLX Framework for Faster Local AI on Mac
-
Local AI Ecosystem Extends Far Beyond Ollama
-
Claw64 – Full Agentic Loop in <4KB on Commodore 64
-
PrismML Announces 1-Bit Bonsai: First Commercially Viable 1-Bit LLMs
-
Samsung launches Galaxy Book6 series in India with Nvidia RTX 5070 graphics and on-device AI
-
Running AI on a Raspberry Pi, Part 2: Running AI on a Pi in Under 5 minutes
-
Orca – Executable skills and capabilities for AI agent workflows
-
Samsung Launches Galaxy Book6 Series in India with NVIDIA RTX 5070 Graphics and On-Device AI
-
Dell Technologies Unveils 10 AI PC Models for Business, from Ultralight Laptops to Ultracompact Desktops
-
DeepSeek V3 Complete Guide: Deploy and Optimize Local AI in 2026
-
TurboQuant: Understanding the Quantization Breakthrough
-
Google's TurboQuant Shows Memory Constraints Remain Critical for Local LLM Inference
-
Scion: Running Concurrent LLM Agents with Isolated Identities and Workspaces
-
Samsung Galaxy Book6 Brings Consumer-Grade On-Device AI Hardware to Market
-
OLED Emerges as the Display Standard for Energy-Efficient AI Systems
-
Local AI Ecosystem Extends Far Beyond Ollama
-
IBM Granite 4.0 3B Vision: Compact Enterprise-Grade Document AI
-
ESP32-S31: 320MHz 2-Core Microcontroller with 512KB SRAM and Networking
-
DaVinci-MagiHuman: Open-Source AI Model for Realistic Video Generation
-
Samsung Galaxy Book6 Series Brings Intel Core Ultra Chips for On-Device LLM Inference
-
Qwen3 512k Context via TurboQuant on Mac mini
-
Introduction to Nyreth v1.0
-
HP Launches Copilot+ PCs in India with On-Device AI Capabilities for Local Inference
-
GPU Passthrough to LXCs in Proxmox Simplifies Local LLM Deployment
-
GLM-5.1 Model Weights Launching Early April for Local Deployment
-
CERN Embeds Tiny AI Models in Silicon Chips for Real-Time LHC Data Filtering
-
Acer TravelMate AI Laptops Launch in UAE for Business On-Device Inference
-
This Wearable Runs an On-Device AI With 2-Week Battery Life
-
mlx-Code: Run Claude Code Locally with MLX-LM
-
Mistral AI Releases Voxtral: Open-Source TTS Model Beating ElevenLabs on Local Hardware
-
Hold on to Your Hardware: Implications for Local LLM Deployment
-
Apple Gets Full Gemini Access and Uses Distillation to Build Lightweight On-Device AI
-
Book on AI Agents for the Layman: Understanding Agent-Based Systems
-
See What Your AI Agents Are Doing: Multi-Agent Observability Tool
-
Samsung Galaxy A37 and A57 5G Launch with On-Device AI Capabilities in India
-
Why Responsible AI Is the Bedrock of AI-Powered Applications
-
Pluggable's TBT5-AI: First Thunderbolt Dock Explicitly Targeting Local LLM Workstations
-
NVIDIA Releases GPT-OSS-Puzzle-88B, a Deployment-Optimized Model
-
Meta Releases HyperAgents: Self-Improving AI
-
Liquid AI's LFM2-24B Achieves 50 Tokens/Second in Web Browser via WebGPU
-
Operating Systems. One USB. ZFS on Root. AI-Powered. Free
-
Apple Plans Slimmed-Down Gemini Models for Local iPhone AI Features
-
Google TurboQuant: Extreme Compression for Local LLM Deployment
-
Running an Open-Weight LLM Locally on an Apple Watch
-
New Open-Weight Models Released: GigaChat-3.1-Ultra and Lightning Variants
-
Lemonade 10.0.1 Improves Setup Process For Using AMD Ryzen AI NPUs On Linux
-
HP Launches IQ On-Device AI Assistant, Advancing Enterprise AI Adoption on PCs
-
.APKs Are Just .ZIPs: Semi-Legally Hacking Software for Orphaned Hardware
-
Ultra-Large 400B-Class LLM Runs on iPhone in Test
-
Open-Source AI Text-to-Speech Models You Can Run Locally for Natural Voice
-
Open-Source Tool Helps Determine Which Local LLMs Run on Your PC
-
A Journey to a Reliable and Enjoyable Locally Hosted Voice Assistant
-
llm-d Joins the Cloud Native Computing Foundation
-
Velr: Embedded Property-Graph Database for Local LLM Applications
-
Self-Hostable AI Agents and Internal Software Framework Released
-
Qt 6.11 Released with Enhanced Cross-Platform Deployment Capabilities
-
Korea to Deploy Domestic AI Chips in Smart Cities as NPU Trials Scale Up
-
Alibaba Commits to Continuous Open-Sourcing of Qwen and Wan Models
-
Building a Production AI Receptionist: Practical Local LLM Deployment Case Study
-
Qwen 3.5 122B Uncensored (Aggressive) Released with New K_P Quantisations
-
Careless Whisper – Personal Local Speech to Text
-
BrowserOS 0.44.0 Release: Advances in Local AI Integration for Web-Based Applications
-
Brezn – Decentralized Local Communication
-
A Little Gap That Will Ensure the Future of AI Agents Being Autonomous
-
Self-Hosted AI Code Review with Local LLMs: Secure Automation Guide
-
Running an AI Agent on a 448KB RAM Microcontroller
-
Qualcomm and Samsung's 30-Year AI Alliance Enters a New Phase as On-Device AI Chip Race Heats Up
-
Pydantic-Deep: Production Deep Agents for Pydantic AI
-
MacinAI Local brings functional LLM inference to classic Macintosh hardware
-
Local AI Coding Assistant: Free Cursor Alternative with VS Code, Ollama & Continue
-
DeepSeek R1 RTX 4090 vs Apple M3 Max: Benchmark & Performance Guide
-
Atuin v18.13 – Better Search, a PTY Proxy, and AI for Your Shell
-
What AI Augmentation Means for Technical Leaders
-
SwarmHawk – Open-Source CLI for Vulnerability Scanning with AI Synthesis
-
Ultra-Compact 28M Parameter Models Show Promise for Specialized Domain Tasks
-
NVIDIA Nemotron Cascade 2 30B Delivers 120B-Class Performance in Compact Form Factor
-
NVIDIA Nemotron 3 Nano 4B Enables On-Device Inference Directly in Web Browsers via WebGPU
-
Cybersecurity Skills for AI Agents – agentskills.io Standard Implementation
-
Claude Code Permissions Hook – Delegate Permission Approval to LLM
-
ASUS ExpertCenter PN55 Mini PC Combines AMD AI CPU and 55 TOPS NPU
-
AI's Impact on Mathematics Analogous to Car's Impact on Cities
-
Meet Sarvam Edge: India's AI Model That Runs on Phones and Laptops With No Internet
-
Multiverse Computing Targets On-Device AI With Compressed Models and New API Portal
-
Dell Pro Max 16 Plus Launches With Enterprise-Grade Discrete NPU for On-Device AI
-
Tether's QVAC Introduces Cross-Platform Bitnet LoRA Framework for On-Device AI Training
-
Unsloth Studio: Open-Source Web UI for Training and Running LLMs Locally
-
On-Device AI: Tether's QVAC Fabric Enables Local Training
-
Snapdragon 8 Elite Gen 5 Hands the Galaxy S26 the AI Upgrade We've Been Waiting For
-
MiniMax-M2.7: New Compact Model Announced for Local Deployment
-
LucidShark – Local-first, open-source quality and security gate
-
Auto-retry Claude Code on subscription rate limits (zero deps, tmux-based)
-
Browser-Based Transcription Tools
-
OpenJarvis: Local-First AI Agents That Run Entirely On-Device
-
A New Magnetic Material for the AI Era
-
Mistral Small 4 119B Released with NVFP4 Quantisation Support
-
Mistral Releases Small 4 Open-Source Model Under Apache 2.0
-
How I Used Lima for an AI Coding Agent Sandbox
-
Researcher Discovers Universal "Danger Zone" in Transformer Model Architecture at 50% Depth
-
Kimi Introduces Attention Residuals: 1.25x Compute Performance at <2% Overhead
-
KAIST Develops World's First Hyper-Personalized On-Device AI Chip
-
The Moment AI Agents Stopped Being a Feature and Started Becoming a System
-
How AI Agents Should Pay for API Calls: X402 and USDC Verification on Base
-
OpenClaw Isn't the Only Raspberry Pi AI Tool—Here Are 4 Others You Can Try This Week
-
Qwen 3.5 122B Demonstrates Exceptional Reasoning for Local Deployment
-
OmniCoder-9B: Efficient Coding Model for 8GB GPUs
-
Nota Added to Three Technology and Growth ETFs in a Row – Market Recognition for AI Efficiency
-
Custom AI Smart Speaker
-
Apple's On-Device AI Raises Privacy Alarms Across British Parliament
-
AMD Declares 'AI on the PC Has Crossed an Important Line' – Agent Computers as Next Breakthrough
-
Show HN: Voice-tracked teleprompter using on-device ASR in the browser
-
Startup Transforms Mac Mini Into Full-Powered AI Inference System With External GPU
-
India's Mobile-First AI Strategy Could Accelerate Local Inference Adoption in Emerging Markets
-
Hybrid AI Desktop Layer Combining DOM-Automation and API-Integrations
-
Cicikus v3 Prometheus 4.4B – An Experimental Franken-Merge for Edge Reasoning
-
Show HN: Buxo.ai – Calendly alternative where LLM decides which slots to show
-
Local Manga Translator: Production LLM Pipeline with YOLO, OCR, and Inpainting
-
Lemonade v10 Brings Linux NPU Support and Multi-Modal Capabilities
-
I Fed My Home Assistant Logs Into a Local LLM, and It Found Problems I'd Been Ignoring for Months
-
Best Local LLM Models 2026: Developer Comparison
-
3-Path Agent Memory: 8 KB Recurrent State vs. 156 MB KV Cache at 10K Tokens
-
Linux 7.0 AMDGPU Fixing Idle Power Issue For RDNA4 GPUs After Compute Workloads
-
Show HN: VmExit – An Experiment in AI-Native Computing
-
Sarvam Open-Sources 30B and 105B Reasoning Models
-
Qwodel – An Open-Source Unified Pipeline for LLM Quantization
-
Nvidia Pushes Jetson as Edge Hub for Open AI Models
-
MeepaChat – Slack for AI Agents (iOS, macOS, Web / Cloud, Self-Hosted)
-
Experiment: 0.8B Model Self-Improvement on MacBook Air Yields Surprising Results
-
Texas Instruments Launches NPU-Powered MCUs for Low-Power Edge AI
-
SK Hynix Completes Qualification for LPDDR6 Memory Optimized for AI Inference
-
Sarvam Open-Sources 30B and 105B Reasoning Models
-
Simple Layer Duplication Technique Achieves Top Open LLM Leaderboard Performance
-
NVIDIA Jetson Brings Open Models to Life at the Edge
-
Kali Linux Integrates Local Ollama and MCP for AI-Driven Penetration Testing
-
SK Hynix Develops 1c LPDDR6 DRAM to Boost On-Device AI Performance in Mobile Devices
-
Qwen 3.5 Ultra-Compact Models Enable On-Device AI from Watches to Gaming
-
PhotoPrism AI-Powered Photos App Brings Better Ollama Integration
-
Mnemos: Persistent Memory System for Local AI Agents
-
Google Delivers On-Device AI Features in New Chromebook Plus Model
-
FreeBSD 14.4 Released: Implications for Local LLM Deployment
-
Fish Audio Open-Sources S2: Expressive Text-to-Speech with Natural Language Control and 100ms Latency
-
M5 Max and M5 Ultra Chipsets Demonstrate Significant Bandwidth Improvements for Local LLM Inference
-
Qwen 3.5 Small Expands On-Device AI to Phones and IoT with Offline Support
-
Nota AI to Showcase End-to-End On-Device AI Optimization at Embedded World 2026
-
Engram – Open-Source Persistent Memory for AI Agents
-
commitgen-cc – Generate Conventional Commit Messages Locally with Ollama
-
VoiceShelf: Fully Offline Android Audiobook Reader Using Kokoro TTS
-
Snapdragon Wear Elite Unveiled at MWC 2026, Advancing Wearable AI Inference
-
Samsung Opens Registration for Vision AI QLED and OLED Television Integration
-
Qwen 3.5 27B Achieves Strong Local Inference Performance
-
Show HN: Proxly – Self-hosted tunneling on your own domain in 60 seconds
-
Student Researcher Achieves 42x Model Compression Through Novel Architecture
-
Show HN: Ivy – the first proactive, offline AI tutor
-
HP Refreshes Lineup with AI-Focused Workstations
-
Apple Launches MacBook Neo with A18 Pro Chip for Affordable Local AI Inference
-
AI Agent Reliability Tracker
-
Windows 11 Notepad Gets On-Device AI Text Generation Without Subscription
-
Self-Hosted Paperless-ngx With Optional Local AI Integration
-
Building PyTorch-Native Support for IBM Spyre Accelerator
-
Open WebUI Adds Native Terminal Tool Calling with Qwen3.5 35B Support
-
Mojo: Creating a Programming Language for an AI World with Chris Lattner
-
Llama.cpp Merges Automatic Parser Generator to Mainline
-
Jse v2.0 AI Output Specification
-
IBM Granite 4.0 1B Speech Model Released for Multilingual Speech Recognition
-
Show HN: Asterode – Multi-Model AI App with Memory and Power Features
-
Alibaba Releases Qwen 3.5 AI Model with On-Device AI Support
-
Windows 11 Notepad to Feature On-Device AI Text Generation Without Subscription
-
The Emerging Role of SRAM-Centric Chips in AI Inference
-
Final Qwen3.5 Unsloth GGUF Update with Improved Size/Quality Tradeoffs
-
Real-World Qwen 3.5 9B Agent Performance on M1 Pro Validates Edge Deployment
-
OPPO and MediaTek Highlight On-Device AI Innovations at MWC 2026
-
Alibaba Releases Qwen 3.5 AI Model with On-Device AI Support
-
Unity Showcases Manufacturing AI Workflow at Smart Factory Expo
-
MediaTek Advances Omni Model for Efficient Smartphone Inference
-
Kakao Launches Kanana AI for On-Device Schedule and Recommendation Management
-
Apple Unveils MacBook Pro with M5 Pro and M5 Max Featuring On-Device AI
-
SynthesisOS – A Local-First, Agentic Desktop Layer Built in Rust
-
RunAnywhere Launches Production-Grade On-Device AI Platform for Enterprise Scale
-
Qwen 3.5-4B Generates Fully Functional OS in Single Prompt
-
Qualcomm Snapdragon Wear Elite Brings On-Device AI to Smartwatches
-
OpenWrt 25.12.0 – Stable Release
-
On-Device AI Laptop Lineups Become Standard Across Major Manufacturers
-
Glyph – A Local-First Markdown Notes App for macOS Built With Rust
-
Apple Unveils MacBook Pro With M5 Pro and M5 Max for On-Device AI
-
Apple M5 Pro and M5 Max: 4× Faster LLM Processing
-
AMD Launches Copilot+ Desktop Chips to Compete in On-Device AI Market
-
ÆTHERYA Core – Deterministic Policy Engine for Governing LLM Actions
-
VibeWhisper – macOS Voice-to-Text with 100% Local Processing Option
-
Qwen 3.5 Small Models Released: 0.8B to 9B Parameters Optimized for On-Device Inference
-
Qwen 3.5 0.8B Successfully Deployed on 7-Year-Old Samsung S10E Using llama.cpp
-
Qualcomm Snapdragon Wear Elite: 2B Parameter NPU for Personal AI Wearables
-
Intel Arc Pro B70 Workstation GPU Confirmed via vLLM AI Release Notes
-
Building a Dependency-Free GPT on a Custom OS
-
Apple M4 iPad Air Targets AI Users with Double M1 Speed Performance
-
AMD Ryzen AI 400 Series Desktop Processors Launch with Integrated 60 TOPS NPU
-
Alibaba's Qwen 3.5 Small Model Runs Directly on iPhone 17
-
RAG vs. Skill vs. MCP vs. RLM: Comparing LLM Enhancement Patterns
-
Qualcomm Launches Snapdragon Wear Elite for On-Device AI on Wearables
-
HP ZBook Ultra 14 G1a Workstation Reclaims Local AI Workflows for Professionals
-
Change Intent Records: The Missing Artifact in AI-Assisted Development
-
Browser Use vs. Claude Computer Use: Comparing Agent Automation Frameworks
-
Apple Neural Engine Reverse-Engineered for Local Model Training on Mac Mini M4
-
AMD Expands Ryzen AI 400 Series Portfolio for Consumer and Enterprise AI PC Options
-
Alibaba's Open-Source CoPaw AI Agent Now Compatible with MCP and ClawHub Skills
-
How to Run High-Performance LLMs Locally on the Arduino UNO Q
-
Qwen 3.5-35B-A3B Emerges as Efficient Daily Driver, Replacing 120B Models
-
ParseHive – AI-Powered Invoice Data Extraction for Windows and Mac
-
Nummi – AI Companion with Memory and Daily Guidance
-
DeepSeek V4 Multimodal Model Coming Next Week With Image and Video Generation
-
Bare-Metal LLM Inference: UEFI Application Boots Directly Into LLM Chat
-
Apple Intelligence, Galaxy AI, Gemini: Why Your AI-Powered Phone Is Worth Repairing
-
AI-Native Store Research
-
AgentLens – Open-Source Observability for AI Agents
-
Qwen3.5-35B Successfully Runs on Raspberry Pi 5 at 3+ Tokens/Second
-
On-Device AI in Mobile Apps: What Should Run on the Phone vs the Cloud (A 2026 Decision Guide)
-
The ML.energy Leaderboard
-
Meta Reveals AI-Packed Smartwatch In 2026 – Why Wearables Shift Now
-
Galaxy S26 Debuts AI-Powered Scam Detection in Bold Security Push
-
5 Useful Docker Containers for Agentic Developers
-
Arduino, Qualcomm Bring On-Device AI and Robotics Learning to Indian School Systems
-
Accuracy vs. Speed in Local LLMs: Finding Your Sweet Spot
-
Snapdragon 8 Elite Gen 5 for Galaxy Official: 5 Key Improvements that Push the Boundaries
-
Seco Launches Edge AI System-on-Module at Embedded World 2026
-
Snapdragon 8 Elite Gen 5 Powers Galaxy S26 Series With Enhanced On-Device AI
-
On-Device AI in Mobile Apps: What Should Run on the Phone vs the Cloud (A 2026 Decision Guide)
-
On-Device Function Calling in Google AI Edge Gallery
-
Extracting 100K Concepts from an 8B LLM
-
Show HN: Caret – Tab to Complete at Any App on Your Mac
-
Arduino, Qualcomm Bring On-Device AI and Robotics Learning to Indian School Systems
-
Arduino and Qualcomm Bring On-Device AI Learning to Indian Schools
-
Android Phones Are Getting Smarter Without Internet — Here's Why On-Device AI Is the Next Big Shift
-
Android Phones Are Getting Smarter Without Internet — On-Device AI as the Next Shift
-
Running LLMs on Raspberry Pi and Edge Devices: A Practical Guide
-
Researchers Develop Persistent Memory System for Local LLMs—No RAG Required
-
DeepSeek Paper – DualPath: Breaking the Bandwidth Bottleneck in LLM Inference
-
Apple: Python bindings for access to the on-device Apple Intelligence model
-
New Era of On-Device AI Driven by High-Speed UFS 5.0 Storage
-
Red Hat Launches AI Enterprise for Hybrid AI Deployments
-
PyTorch Foundation Announces New Members as Agentic AI Demand Grows
-
Mirai Announces $10M to Advance On-Device AI Performance for Consumer Devices
-
Show HN: MCP-Enabled File Storage for AI Agents, Auth via Ethereum Wallet
-
Show HN: 100% LLM Accuracy–No Fine-Tuning, JSON Only
-
What Breaks When AI Agent Frameworks Are Forced Into <1MB RAM and Sub-ms Startup
-
Show HN: A Ground Up TLS 1.3 Client Written in C
-
Mirai Tech Raises $10 Million for On-Device AI Innovation
-
No, Local LLMs Can't Replace ChatGPT or Gemini — I Tried
-
Kioxia Sampling UFS 5.0 Embedded Flash Memory for Next-Generation Mobile Applications
-
Enhanced Interface Speed Enables High-Performance On-Device AI Features in Smartphones
-
Elastic Introduces Best-in-Class Embedding Models for High Performance Semantic Search
-
Show HN: Dypai – Build Backends from Your IDE Using AI and MCP
-
Enterprise Infrastructure Guide: Running Local LLMs for 70-150 Developers
-
Apple Accelerates U.S. Manufacturing with Mac Mini Production
-
Anthropic Has Never Open-Sourced an LLM: Implications for Local Deployment Strategy
-
Comparing Manual vs. AI Requirements Gathering: 2 Sentences vs. 127-Point Spec
-
Which Web Frameworks Are Most Token-Efficient for AI Agents?
-
Breaking the Speed Limit: Strategies for 17k Tokens/Sec Local Inference
-
South Korea to Launch $687 Million Project to Develop On-Device AI Semiconductors
-
Qwen3's Voice Embeddings Enable Local Voice Cloning and Mathematical Voice Manipulation
-
Custom Portable Workstation Optimized for Local AI Inference Builds
-
Nvidia Could Launch Its First Laptops With Its Own Processors
-
Massu: Governance Layer for AI Coding Assistants with 51 MCP Tools
-
Local GPT-OSS 20B Model Demonstrates Practical Agentic Capabilities
-
Open-Source llama.cpp Finds Long-Term Home at Hugging Face
-
GPT-OSS 20B Demonstrates Practical Agentic Capabilities Running Fully Locally
-
Gix: Go CLI for AI-Generated Commit Messages
-
Future of Mobile AI: What On-Device Intelligence Means for App Developers
-
Future of Mobile AI: What On-Device Intelligence Means for App Developers
-
Elastic Introduces Best-in-Class Embedding Models for High Performance Semantic Search
-
Breaking the Speed Limit: Strategies for 17k Tokens/Sec Local Inference
-
Yet Another Fix Coming for Older AMD GPUs on Linux – Thanks to Valve Developer
-
AI Is Stress Testing Processor Architectures and RISC-V Fits the Moment
-
Ollama 0.17 Released With Improved OpenClaw Onboarding
-
How Slow Local LLMs Are on My Framework 13 AMD Strix Point
-
At India AI Impact Summit, Intel Showcases AI PCs and Cost-Efficient Frugal AI
-
Show HN: Horizon – My AI-Powered Personal News Aggregator and Summarizer
-
Google Open-Sources NPU IP, Synaptics Implements It for Hardware Acceleration
-
GGML Joins Hugging Face: What This Means for Local Model Optimization
-
DietPi Released a New Version v10.1
-
CPU-Trained Language Model Outperforms GPU Baseline After 40 Hours
-
Asus ExpertBook B3 G2 with 50 TOPS AI Sets New Enterprise Standard
-
AI PCs Explained: 7 Critical Truths About NPUs and Privacy
-
Vellium v0.3.5: Major Writing Mode Overhaul and Native KoboldCpp Support
-
Taalas Etches AI Models onto Transistors to Rocket Boost Inference
-
I Run Local LLMs in One of the World's Priciest Energy Markets, and I Can Barely Tell
-
Qwen3 Coder Next Remains Effective at Aggressive Quantization Levels
-
[Release] Ouro-2.6B-Thinking: ByteDance's Recurrent Model Now Runnable Locally
-
At India AI Impact Summit, Intel Showcases Its AI PCs and Cost-Efficient Frugal AI
-
Google Is Exploring Ways to Use Its Financial Might to Take on Nvidia
-
Open-Source + AI: ggml Joins Hugging Face, llama.cpp Stays Open—Local AI's Long-Term Home
-
GGML.AI Acquired by Hugging Face
-
Apple Researchers Develop On-Device AI Agent That Interacts With Apps for You
-
VaultAI – 42 AI Models on a Portable SSD, Works Offline for $399
-
SanityBoard Adds 27 New Model Evaluations Including Qwen 3.5 Plus, GLM 5, and Gemini 3.1 Pro
-
I Stopped Paying for ChatGPT and Built a Private AI Setup That Anyone Can Run
-
PaddleOCR-VL Now Integrated into llama.cpp for Multilingual OCR
-
NVIDIA Releases Dynamo v0.9.0: Infrastructure Overhaul With FlashIndexer and Multi-Modal Support
-
Mirai Secures $10M to Optimize On-Device AI Amid Cloud Cost Surge
-
Kitten TTS V0.8 Released: New State-of-the-Art Super-Tiny TTS Model Under 25 MB
-
Why AI Models Fail at Iterative Reasoning and What Could Fix It
-
Show HN: Forked – A Local Time-Travel Debugger for OpenClaw Agents
-
Self-Hosted Local LLMs for Document Management with Paperless-ngx
-
Sarvam Brings AI to Feature Phones, Cars, and Smart Glasses
-
Running Local LLMs and VLMs on Arduino UNO Q with yzma
-
Mihup and Qualcomm Collaborate to Advance Secure On-Device Voice AI for BFSI
-
Local-First RAG: Vector Search in SQLite with Hamming Distance
-
LayerScale Launches Inference Engine Faster Than vLLM, SGLang, and TRT-LLM
-
Kitten TTS V0.8 Released: State-of-the-Art Super-Tiny Text-to-Speech Model Under 25MB
-
Clipthesis: Free Local App for Video Tagging and Search Across Drives
-
Why My Country's AI Scene Is Built on Sand
-
Tailscale Releases New Tool to Prevent Sensitive Data Leakage to Cloud AI Services
-
Show HN: Shiro.computer Static Page, Unix/NPM Shimmed to Host Claude Code
-
Sarvam AI Launches Edge Model to Challenge Major AI Players with Local-First Approach
-
Qualcomm Ventures Positions India as Blueprint for Affordable On-Device AI Infrastructure
-
OpenClaw Refactored in Go, Runs on $10 Hardware
-
GLM-5 Technical Report: DSA Innovation Reduces Training and Inference Costs
-
Matmul-Free Language Model Trained on CPU in 1.2 Hours
-
Cloudflare Releases Agents SDK v0.5.0 with Rust-Powered Infire Engine for Edge Inference
-
Can We Leverage AI/LLMs for Self-Learning?
-
Ask HN: How Do You Debug Multi-Step AI Workflows When the Output Is Wrong?
-
AMD Announces Day 0 Support for Qwen 3.5 LLM on Instinct GPUs
-
Meet Sarvam Edge: India's AI Model That Runs on Phones and Laptops With No Internet
-
Cohere Releases Tiny Aya: Efficient 3.3B Multilingual Model for 70+ Languages
-
Chinese AI Chipmaker Axera Semiconductor Plans $379 Million Hong Kong IPO for Edge Inference Hardware
-
ASUS Zenbook 14 Launches in India with AI-Capable Hardware, Starting at Rs 1,15,990
-
Asus ExpertBook B3 G2 Laptop Features Ryzen AI 9 HX 470 CPU in 1.41kg Ultraportable Form Factor
-
Ask HN: What is the best bang for buck budget AI coding?
-
I broke into my own AI system in 10 minutes. I built it
-
Sourdine: Open-Source macOS App for 100% Local AI Transcription
-
Alibaba Unveils Major AI Model Upgrade Ahead of DeepSeek Release
-
MiniMax-M2.5 230B MoE Model Released with GGUF Support for Local Deployment
-
GPT-OSS 20B Now Runs 100% Locally in Browser via WebGPU
-
Simile AI Raises $100M Series A for Local AI Infrastructure
-
Scaling llama.cpp On Neoverse N2: Solving Cross-NUMA Performance Issues
-
Samsung's REAM: Alternative Model Compression Technique
-
Running Mistral-7B on Intel NPU Achieves 12.6 Tokens/Second
-
Memio Launches AI-Powered Knowledge Hub for Android with Local Processing
-
Nanbeige4.1-3B: A Small General Model that Reasons, Aligns, and Acts
-
Energy-Based Models Compared Against Frontier AI for Sudoku Solving
-
Arm SME2 Technology Expands CPU Capabilities for On-Device AI