Tagged "quantisation"
-
Wipeout Clone Runs Native on ESP32-S3, Pushing Edge Hardware to Its Limits
-
Picking Your First Local LLM Is Easier Than the Internet Makes It Sound
-
NVIDIA Nemotron 3 Nano Omni Powers Multimodal Agent Reasoning in a Single Efficient Open Model
-
Intel N150 Mini PC Runs Local LLM for Home Assistant
-
Grokfeed: Terminal Feed Reader for HN, Reddit, and Lobste.rs Using Claude Code
-
Why the Same LLM Gives Different Answers in Different Environments
-
Stop Guessing: Open-Source Tool Predicts Which Local LLMs Run on Your PC
-
Building a Local AI Stack: Five Docker Containers to Replace ChatGPT Subscriptions
-
Local AI Isn't Just Ollama—Here's the Ecosystem That Actually Makes It Useful
-
Google's Gemma 4: Powerful AI Models Optimized for Your Phone and Laptop
-
Economic Implications of AI Adoption: Why Local Deployment Matters for Cost Control
-
Linux Crushes Windows on llama.cpp Inference by Double Digits
-
Google's Gemma 4 Could Put Powerful AI on Your Phone and Laptop
-
Show HN: Phonetic Formatter – Offline English Text to IPA on iPhone and iPad
-
Google's Gemma 4 Could Put Powerful AI on Your Phone and Laptop
-
Run a Local LLM Server on Raspberry Pi with Remote Access Capabilities
-
Google's Gemma 4 Brings Powerful On-Device AI to Phones and Laptops
-
Netherlands Reaches Deal to Cut Reliance on U.S. Cloud Tech
-
I Replaced My Local LLM With a Model Half Its Size and Got Better Results
-
How to Make Sense of AI
-
Building Real-World On-Device AI with LiteRT and NPU
-
AI Agent Designs a RISC-V CPU Core from Scratch
-
Show HN: We built an OCR server that can process 270 dense images/s on a 5090
-
Llama 4 Scout on MLX: The Complete Apple Silicon Guide (2026)
-
Externalization in LLM Agents: Unified Review of Memory and Harness Engineering
-
Anker Unveils 'Thus' Chip to Bring On-Device AI Across Product Line
-
10GB VRAM Local LLM: The Complete Setup Guide (2026)
-
Llama.cpp's Auto Fit Feature Quietly Reshapes Local AI Inference on Consumer Hardware
-
The Open-Source AI Ecosystem Keeps Treating llama.cpp Like a Second-Class Citizen
-
Malicious GGUF Models Could Trigger Remote Code Execution on SGLang Servers
-
Controlling the Secondary Fan on Minisforum AI Pro HX 370
-
Intel Extends AI PC Reach With New Core Ultra Series 3 Launch
-
Running DeepSeek R1 Locally: Your Complete Setup Guide
-
Minisforum Launches N5 Max AI NAS with OpenClaw
-
Local AI Isn't Just Ollama—Here's the Ecosystem That Actually Makes It Useful
-
Unweight: Lossless MLP Weight Compression for LLM Inference
-
Laimark – 8B LLM That Self-Improves on Consumer GPUs
-
115 TOPS in 0.67L: CHUWI AuBox X Packs On-Device AI Power Into a Palm-Sized Mini PC
-
Building a Voice AI Wearable in a Casio F91W with Whisper and BLE
-
Project Glasswing and the ASF: Open-Source's Chance to Win the AI Era
-
Prefill Is Compute-Bound, Decode Is Memory-Bound: Optimizing GPU Utilization for LLM Inference
-
Bonsai 1.7B in the Browser: A 290MB 1-bit LLM on WebGPU
-
SigMap – Shrink AI Coding Context 97% with Auto-Scaling Token Budget
-
MiniMax M2.7 GGUF Investigation Reveals NaN Issues Affecting 21-38% of Hugging Face Conversions
-
Running Gemma 4 on an iPhone 13 Pro
-
Sovereign AI: Why the Next GPT Will Be Born in Our Living Rooms
-
Fine-Tuned Qwen3.5-0.8B for OCR Outperforms Previous 2B Release
-
Qwen 3.5 Small – On-Device Multimodal Models Released
-
Minisforum N5 MAX AI NAS Delivers 126 TOPS with 200TB Storage for Local LLM Workloads
-
MiniMax M2.7 Achieves SOTA Performance Under 64GB on Mac with TQ Quantization
-
Speculative Decoding Achieves 29% Speed Boost for Gemma-4 31B
-
Show HN: SkillCompass – Open-Source Quality Evaluator for Your AI Skills
-
Qwen3 Audio and Vision Support Now Available in llama.cpp
-
MiniMax-M2.7 Delivers Exceptional Performance on Consumer Hardware
-
Learn LLM Internals
-
Researchers Achieve 1-Bit Quantization of OLMo-3 7B Using Distillation
-
Unsloth Completes Comprehensive MiniMax M2.7 GGUF Quantization Suite
-
Universal Knowledge Store and Grounding Layer for AI Reasoning Engines
-
On-Device AI: Achieving Powerful AI Capabilities Without Internet Connectivity
-
Users Report Significant Performance Improvements After Migrating from Ollama to llama.cpp
-
MiniMax M2.7 Released: New Model Available for Local Deployment
-
MiniMax M2.7 Is Now Open Source
-
Google's Gemma 4 Brings Free Agentic AI to Your Phone With Zero Data Leaving the Device
-
The Best Local AI Model for Home Assistant Isn't Always the Biggest One
-
Critical Unsloth Gemma-4 Chat Template Updates for Tool Calling
-
Intel Arc Pro B70 32GB Achieves 12 Tokens/Sec on Qwen 3.5-27B
-
Google's Gemini Nano 4 Offers Faster, Smarter Local Inference Capabilities
-
Gemma 4 31B vs Qwen 3.5 27B: Comprehensive Long Context Benchmark
-
AI PC Market Projected to Reach $235B by 2032, Driven by On-Device Computing Adoption
-
Building Offline AI Companions on Severely Constrained Hardware (8GB RAM)
-
LLM Wiki v2: Extended Knowledge Base for LLM Practitioners
-
5 Open-Source Projects Running Transformers on CPUs to GPUs in Pure Java
-
CarryAI's Serverless Vision-Language Models Enable On-Device Multimodal AI
-
Energy Consumption: The Final Frontier for AI and Local Inference
-
Speculative Decoding Made My Local LLM Actually Usable
-
Running a 1.7B Parameters LLM on an Apple Watch
-
Run Qwen3.5 on an Old Laptop: A Lightweight Local Agentic AI Setup Guide
-
I Replaced My Local LLM With a Model Half Its Size and Got Better Results — and It Wasn't About the Parameters
-
Gemini-CLI, Llama.cpp, and Qwen3.5 Running on NVIDIA Jetson TK1
-
Intel Releases OpenVINO 2026.1 With Backend For Llama.cpp, New Hardware Support
-
Gemma 4 Support Stabilized in Llama.cpp
-
Gemma 4 GGUF Models Updated with Critical Quantization Fixes
-
EXAONE 4.5 33B Model Released with Multiple Quantization Formats
-
Google AI Edge Gallery Showcases Offline Inference with Gemma 4
-
Quansloth Using Google's Turboquant Breaks the VRAM Wall for Local LLMs
-
Comprehensive Benchmark: 37 LLMs Tested on MacBook Air M5 With Open-Source Tool
-
TurboQuant-Optimized llama.cpp Fork Delivers GFX906 GPU Acceleration
-
Gemma 4 26B Achieves Impressive Local Performance With Proper Configuration
-
CricketBrain: Neuromorphic Signal Processor in Rust (0.175us/step, 944 bytes)
-
TurboQuant in Llama.cpp Achieves 6X Smaller KV Cache
-
Quantization Strategy Comparison: Balancing Quality and Speed on Consumer Laptops
-
Context Window Optimization: Extending Gemma 4 Context Length Through Efficient Projection Quantization
-
Show HN: Lightweight LLM Tracing Tool with CLI
-
GPU Memory for LLM Inference (Part 1)
-
Google AI Edge Gallery Tops App Store Charts with On-Device Gemma 4
-
Gemma 4 31B Achieves Exceptional Performance on Local Hardware
-
Unpaved: Audit Toolkit for AI Developer Tool Bias in Global South Contexts
-
Qwen 3.5 397B Reduced to 35% Parameters With Usable Quality on 96GB GPU
-
Qwen 3.6 Free Model Available via OpenRouter
-
Qualcomm Snapdragon Innovations Enable Advanced On-Device AI for Wearables
-
Ollama Gets Blazing Fast on Macs with Full MLX Support and 2× Speedups
-
DGX Spark Hardware Limitations: Missing NVFP4 Support Undermines Local AI Value Proposition
-
GMKtec NucBox K17 Launches with 97 TOPS AI Performance for Local Inference
-
Gemma 4 31B Achieves Third Place on FoodTruck Bench, Beating Larger Models
-
Gemma 4 26B MoE Emerges as Optimal All-Around Local Model for Consumer Hardware
-
Apple Research Shows Self-Distillation Significantly Improves Local Code Generation
-
Nex Life Logger: Local Activity Tracker with AI Agent Integration
-
Mixed Precision Quantization on MLX with TurboQuant Implementation
-
Google Gemma 4 Released with GGUF Quantizations
-
Gemma 4 26B A4B Outperforms Qwen 3.5 35B on Apple Silicon
-
Gemma 4 2B Successfully Runs on Raspberry Pi 5
-
Gemma 4 on Arm: Optimized On-Device AI for Mobile and Edge Deployment
-
TurboQuant Enables Qwen 3.5-27B on 16GB Consumer GPUs
-
Qwen 3.6-Plus Released
-
Bonsai 1-Bit Models Deliver Exceptional Local Inference Performance
-
Satcove – Query 5 AI Models Simultaneously and Get Structured Verdicts
-
Local AI Ecosystem Extends Far Beyond Ollama
-
Llama.cpp Merging TurboQuant Lite (attn-rot) with Major Performance Gains
-
ByteShape Releases Qwen 3.5 9B Quantisations with Hardware-Matched Tuning Guide
-
PrismML Announces 1-Bit Bonsai: First Commercially Viable 1-Bit LLMs
-
Running AI on a Raspberry Pi, Part 2: Running AI on a Pi in Under 5 minutes
-
Does RAG Help AI Coding Tools?
-
Ollama Launches Pi: The Minimal Coding Agent That Powers OpenClaw Is Now Yours to Customize
-
Local AI didn't replace my subscriptions, but it did take over these 6 tasks
-
Intel's $949 GPU has 32GB of VRAM for local AI, but the software is why Nvidia keeps winning
-
Select the Right Hardware for Your Local LLM Deployment with This Online Guide
-
Samsung Launches Galaxy Book6 Series in India with NVIDIA RTX 5070 Graphics and On-Device AI
-
TurboQuant: Understanding the Quantization Breakthrough
-
Google's TurboQuant Shows Memory Constraints Remain Critical for Local LLM Inference
-
OLED Emerges as the Display Standard for Energy-Efficient AI Systems
-
Mixed KV Cache Quantization: Performance Risks and Pitfalls
-
ESP32-S31: 320MHz 2-Core Microcontroller with 512KB SRAM and Networking
-
DaVinci-MagiHuman: Open-Source AI Model for Realistic Video Generation
-
TurboQuant KV Cache Compression Achieves 22.8% Faster Decoding at 32K Context
-
Samsung Galaxy Book6 Series Brings Intel Core Ultra Chips for On-Device LLM Inference
-
Qwen3 512k Context via TurboQuant on Mac mini
-
Introduction to Nyreth v1.0
-
This Wearable Runs an On-Device AI With 2-Week Battery Life
-
TurboQuant Benchmarked in Llama.cpp: Google's Extreme Compression Research Tested in Practice
-
RotorQuant: 10-19x Faster Quantisation Alternative Using Clifford Algebra
-
Coding Implementation to Run Qwen3.5 Reasoning Models Distilled With Claude-Style Thinking Using GGUF and 4-Bit Quantization
-
Qwen 3.5 27B Achieves 1.1M Tokens/Second on B200 GPUs with Optimized vLLM Config
-
Quantization Reveals Outliers Impacting LLM Accuracy
-
Hold on to Your Hardware: Implications for Local LLM Deployment
-
Apple Gets Full Gemini Access and Uses Distillation to Build Lightweight On-Device AI
-
Samsung Galaxy A37 and A57 5G Launch with On-Device AI Capabilities in India
-
RF-DETR Nano and YOLO26 Enable On-Device Object Detection on Smartphones
-
NVIDIA Releases GPT-OSS-Puzzle-88B, a Deployment-Optimized Model
-
Show HN: Beforeyouship – Pre-Build Tool to Estimate LLM Cost
-
Intel Launches Arc Pro B70/B65 with 32GB VRAM for Local AI Inference
-
Google's TurboQuant: The Unsexy AI Breakthrough Worth Watching
-
Apple Plans Slimmed-Down Gemini Models for Local iPhone AI Features
-
Google TurboQuant: Extreme Compression for Local LLM Deployment
-
Running an Open-Weight LLM Locally on an Apple Watch
-
OmniCoder v2 Released: Improved Code Generation for Local Deployment
-
Private Brain LLM Setup on Windows PC Eliminates Need for Paid Cloud Services
-
Researcher Successfully Runs Local LLMs on Legacy "Dead" GPU With Surprising Results
-
.APKs Are Just .ZIPs: Semi-Legally Hacking Software for Orphaned Hardware
-
Ultra-Large 400B-Class LLM Runs on iPhone in Test
-
Four Raspberry Pi AI Tools You Can Try This Week Beyond OpenClaw
-
Open-Source Tool Helps Determine Which Local LLMs Run on Your PC
-
LLM Neuroanatomy II: Modern LLM Hacking and Hints of a Universal Language
-
KV Cache Quantization Levels Benchmarked on SWE-bench: Practical Trade-offs for Local Inference
-
FOMOE: Running 397B Parameter Qwen3.5 MoE at 5-9 tok/s on $2,100 Desktop Hardware
-
FlashAttention-4 Delivers 2.7x Faster Inference with 1613 TFLOPs/s on Blackwell GPUs
-
Chinese LLM Ecosystem Landscape: ByteDance Doubao, Alibaba, and Open-Source Competition
-
Qt 6.11 Released with Enhanced Cross-Platform Deployment Capabilities
-
Korea to Deploy Domestic AI Chips in Smart Cities as NPU Trials Scale Up
-
How to Build a Self-Hosted AI Server with LM Studio: Step-by-Step Guide
-
Powerful AI Search Engine Built on Single GeForce RTX 5090
-
Ditching Paid AI Services: Building Self-Hosted LLM Solutions as ChatGPT, Claude, and Gemini Alternatives
-
Rust Project Perspectives on AI
-
Qwen 3.5 122B Uncensored (Aggressive) Released with New K_P Quantisations
-
Setting Up a Private AI Brain on Windows: Complete Guide to Local LLM Deployment
-
ik_llama.cpp Fork Delivers 26x Faster Prompt Processing on Qwen 3.5 27B
-
Why You Should Use Both ChatGPT and Local LLMs: A Practical Hybrid Approach
-
BrowserOS 0.44.0 Release: Advances in Local AI Integration for Web-Based Applications
-
AI Playground for Developers Built in Vite and Python
-
Running an AI Agent on a 448KB RAM Microcontroller
-
Qwen 3.5 397B emerges as top-performing local coding model
-
Apple M5 Max 128GB real-world performance benchmarks for local inference
-
Ultra-Compact 28M Parameter Models Show Promise for Specialized Domain Tasks
-
Qwen 3.5 Emerges as Top Performer for Local Deployment with Extensive Quantization Options
-
Community Converges on Optimal KV Cache Quantization Strategies for Qwen 3.5 Models
-
Repurpose Old GPUs as Dedicated AI Inference Accelerators
-
ASUS ExpertCenter PN55 Mini PC Combines AMD AI CPU and 55 TOPS NPU
-
Unsloth Studio: Open-Source Web UI for Training and Running LLMs Locally
-
Snapdragon 8 Elite Gen 5 Hands the Galaxy S26 the AI Upgrade We've Been Waiting For
-
You're Using Your Local LLM Wrong If You're Prompting It Like a Cloud LLM
-
Hugging Face Releases One-Liner for Automatic Hardware Detection and Model Selection
-
Browser-Based Transcription Tools
-
Run LLMs Locally with Llama.cpp
-
I Ran Local LLMs on a 'Dead' GPU, and the Results Surprised Me
-
Qwen 3.5 4B Outperforms Nvidia Nemotron 3 4B in Local Benchmarks
-
Mistral Small 4 119B Released with NVFP4 Quantisation Support
-
Mistral Releases Small 4 Open-Source Model Under Apache 2.0
-
Researcher Discovers Universal "Danger Zone" in Transformer Model Architecture at 50% Depth
-
OpenClaw Isn't the Only Raspberry Pi AI Tool—Here Are 4 Others You Can Try This Week
-
OmniCoder-9B: Efficient Coding Model for 8GB GPUs
-
Nota Added to Three Technology and Growth ETFs in a Row – Market Recognition for AI Efficiency
-
AMD Declares 'AI on the PC Has Crossed an Important Line' – Agent Computers as Next Breakthrough
-
Running Qwen3.5-27B Across Multiple GPUs Over LAN Achieves Practical Speed for Local Inference
-
Two Local Models Prove Competitive Enough to Replace ChatGPT, Gemini, and Copilot
-
India's Mobile-First AI Strategy Could Accelerate Local Inference Adoption in Emerging Markets
-
I made Karpathy's Autoresearch work on CPU
-
Best Local LLM Models 2026: Developer Comparison
-
Sarvam Open-Sources 30B and 105B Reasoning Models
-
Qwodel – An Open-Source Unified Pipeline for LLM Quantization
-
Quantization Explained: Q4_K_M vs AWQ vs FP16 for Local LLMs
-
Nvidia Releases Nemotron 3 Super: 120B MoE Model for Local Deployment
-
Show HN: Detect When an LLM Silently Changes Behavior for the Same Prompt
-
Experiment: 0.8B Model Self-Improvement on MacBook Air Yields Surprising Results
-
SK Hynix Completes Qualification for LPDDR6 Memory Optimized for AI Inference
-
Sarvam Open-Sources 30B and 105B Reasoning Models
-
Qwen 3.5-35B Uncensored GGUF Models Now Available
-
NVIDIA Jetson Brings Open Models to Life at the Edge
-
Llama.cpp Celebrates Major Milestone: From Leak to Industry Standard
-
Qwen 3.5 Ultra-Compact Models Enable On-Device AI from Watches to Gaming
-
.ispec: Runtime Specification Validation for AI System Consistency
-
HP OMEN MAX 16 Review: Is Local AI on a Laptop Viable in 2026?
-
FreeBSD 14.4 Released: Implications for Local LLM Deployment
-
Community Survey: AI Content Automation Stacks in 2026
-
Qwen 3.5 Small Expands On-Device AI to Phones and IoT with Offline Support
-
Qwen 3.5 Family Benchmark Comparison Shows Strong Performance Across Smaller Models
-
Qwen 3.5 Derestricted Model Available for Local Deployment
-
Nota AI to Showcase End-to-End On-Device AI Optimization at Embedded World 2026
-
How to Run Your Own Local LLM — 2026 Edition
-
Snapdragon Wear Elite Unveiled at MWC 2026, Advancing Wearable AI Inference
-
Samsung Opens Registration for Vision AI QLED and OLED Television Integration
-
Qwen 3.5 27B Achieves Strong Local Inference Performance
-
HP Refreshes Lineup with AI-Focused Workstations
-
Apple Launches MacBook Neo with A18 Pro Chip for Affordable Local AI Inference
-
Windows 11 Notepad Gets On-Device AI Text Generation Without Subscription
-
Mojo: Creating a Programming Language for an AI World with Chris Lattner
-
Show HN: TLDR – Free Chrome Extension for AI-Powered Article Summarization
-
Final Qwen3.5 Unsloth GGUF Update with Improved Size/Quality Tradeoffs
-
OPPO and MediaTek Highlight On-Device AI Innovations at MWC 2026
-
Alibaba Releases Qwen 3.5 AI Model with On-Device AI Support
-
Unity Showcases Manufacturing AI Workflow at Smart Factory Expo
-
MediaTek Advances Omni Model for Efficient Smartphone Inference
-
Kakao Launches Kanana AI for On-Device Schedule and Recommendation Management
-
Apple Unveils MacBook Pro with M5 Pro and M5 Max Featuring On-Device AI
-
Qwen 3.5-27B Q4 Quantization Comparison and Analysis
-
Qualcomm Snapdragon Wear Elite Brings On-Device AI to Smartwatches
-
OpenWrt 25.12.0 – Stable Release
-
On-Device AI Laptop Lineups Become Standard Across Major Manufacturers
-
Apple Unveils MacBook Pro With M5 Pro and M5 Max for On-Device AI
-
Alibaba's Qwen 3.5 Small Model Runs Directly on iPhone 17
-
Running Local AI Models on Mac Studio 128GB: 4B, 20B & 120B Tested
-
Qwen 3.5 27B Achieves 100+ Tokens/s Decode on Dual RTX 3090s with 170K Context
-
Qualcomm Launches Snapdragon Wear Elite for On-Device AI on Wearables
-
Local LLM Performance Improvements: A Year of Progress Since DeepSeek R1 Moment
-
HP ZBook Ultra 14 G1a Workstation Reclaims Local AI Workflows for Professionals
-
How to Run High-Performance LLMs Locally on the Arduino UNO Q
-
Apple Intelligence, Galaxy AI, Gemini: Why Your AI-Powered Phone Is Worth Repairing
-
Unsloth Dynamic 2.0 GGUFs
-
Qwen3.5-35B Unsloth Dynamic GGUFs Achieve SOTA Across Nearly All Quantisation Levels
-
Qwen3.5-35B RTX 5080 Experiments Confirm KV q8_0 as Free Lunch, Q4_K_M Remains Optimal
-
Qwen3.5-35B Successfully Runs on Raspberry Pi 5 at 3+ Tokens/Second
-
Qwen 3.5-35B Unsloth Dynamic GGUFs Achieve SOTA Quantisation Benchmarks
-
Qwen 3.5-35B RTX 5080 Benchmarks Confirm KV Q8_0 as Free Lunch, Q4_K_M Remains Optimal
-
Meta Reveals AI-Packed Smartwatch In 2026 – Why Wearables Shift Now
-
Galaxy S26 Debuts AI-Powered Scam Detection in Bold Security Push
-
Arduino, Qualcomm Bring On-Device AI and Robotics Learning to Indian School Systems
-
Accuracy vs. Speed in Local LLMs: Finding Your Sweet Spot
-
Snapdragon 8 Elite Gen 5 for Galaxy Official: 5 Key Improvements that Push the Boundaries
-
On-Device AI in Mobile Apps: What Should Run on the Phone vs the Cloud (A 2026 Decision Guide)
-
5 Useful Docker Containers for Agentic Developers
-
Arduino, Qualcomm Bring On-Device AI and Robotics Learning to Indian School Systems
-
Arduino and Qualcomm Bring On-Device AI Learning to Indian Schools
-
Android Phones Are Getting Smarter Without Internet — Here's Why On-Device AI Is the Next Big Shift
-
Android Phones Are Getting Smarter Without Internet — On-Device AI as the Next Shift
-
Running LLMs on Raspberry Pi and Edge Devices: A Practical Guide
-
New Era of On-Device AI Driven by High-Speed UFS 5.0 Storage
-
Qwen3.5 Series Releases Comprehensive Model Lineup Across All Tiers
-
Qwen3.5-27B Identified as Sweet Spot for Mid-Range Local Deployment
-
PyTorch Foundation Announces New Members as Agentic AI Demand Grows
-
Mirai Announces $10M to Advance On-Device AI Performance for Consumer Devices
-
Advanced Quantization Techniques Show Surprising Performance Gains Over Standard Methods
-
How AI is Redefining Price and Performance in Modern Laptops
-
What Breaks When AI Agent Frameworks Are Forced Into <1MB RAM and Sub-ms Startup
-
No, Local LLMs Can't Replace ChatGPT or Gemini — I Tried
-
Kioxia Sampling UFS 5.0 Embedded Flash Memory for Next-Generation Mobile Applications
-
Enhanced Interface Speed Enables High-Performance On-Device AI Features in Smartphones
-
Elastic Introduces Best-in-Class Embedding Models for High Performance Semantic Search
-
Show HN: Dypai – Build Backends from Your IDE Using AI and MCP
-
Enterprise Infrastructure Guide: Running Local LLMs for 70-150 Developers
-
Anthropic Has Never Open-Sourced an LLM: Implications for Local Deployment Strategy
-
Breaking the Speed Limit: Strategies for 17k Tokens/Sec Local Inference
-
How Do You Know Which SKILL.md Is Good?
-
Qwen3-Code-Next Proves Practical for Local Development: Real-World Coding Tasks on Mac Studio
-
Custom Portable Workstation Optimized for Local AI Inference Builds
-
Open-Source Framework Achieves Gemini 3 Deep Think Level Performance Through Local Model Scaffolding
-
Nvidia Could Launch Its First Laptops With Its Own Processors
-
nanollama: Open-Source Framework for Training Llama 3 from Scratch with One-Command GGUF Export
-
Open-Source llama.cpp Finds Long-Term Home at Hugging Face
-
Future of Mobile AI: What On-Device Intelligence Means for App Developers
-
The Complete Stack for Local Autonomous Agents: From GGML to Orchestration
-
Breaking the Speed Limit: Strategies for 17k Tokens/Sec Local Inference
-
Ouro 2.6B Thinking Model GGUFs Released with Q8_0 and Q4_K_M Quantization
-
At India AI Impact Summit, Intel Showcases AI PCs and Cost-Efficient Frugal AI
-
GGML Joins Hugging Face: What This Means for Local Model Optimization
-
DietPi Released a New Version v10.1
-
CPU-Trained Language Model Outperforms GPU Baseline After 40 Hours
-
AI PCs Explained: 7 Critical Truths About NPUs and Privacy
-
Taalas Etches AI Models onto Transistors to Rocket Boost Inference
-
Strix Halo Performance Benchmarks: Minimax M2.5, Step 3.5 Flash, Qwen3 Coder
-
Qwen3 Coder Next Remains Effective at Aggressive Quantization Levels
-
[Release] Ouro-2.6B-Thinking: ByteDance's Recurrent Model Now Runnable Locally
-
I Thought I Needed a GPU to Run AI Until I Learned About These Models
-
GGML.AI Acquired by Hugging Face
-
VaultAI – 42 AI Models on a Portable SSD, Works Offline for $399
-
The Path to Ubiquitous AI (17k tokens/sec)
-
Mirai Secures $10M to Optimize On-Device AI Amid Cloud Cost Surge
-
Sarvam Brings AI to Feature Phones, Cars, and Smart Glasses
-
Running Local LLMs and VLMs on Arduino UNO Q with yzma
-
Enhanced Quantization Visualization Methods for Understanding LLM Compression Trade-offs
-
Local Vision-Language Models for Document OCR and PII Detection in Privacy-Critical Workflows
-
Hardware Economics Shift: DDR5 RDIMM Pricing Now Comparable to GPUs for Local Inference
-
Qualcomm Ventures Positions India as Blueprint for Affordable On-Device AI Infrastructure
-
Same INT8 Model Shows 93% to 71% Accuracy Variance Across Snapdragon Chipsets
-
Qwen 3.5-397B-A17B Now Available for Local Inference with Aggressive Quantisation
-
Ask HN: What is the best bang for buck budget AI coding?
-
Alibaba Unveils Major AI Model Upgrade Ahead of DeepSeek Release
-
MiniMax Releases M2.5 Model with SOTA Coding and Agent Capabilities
-
MiniMax-M2.5 230B MoE Model Released with GGUF Support for Local Deployment
-
GPT-OSS 120B Uncensored Model Released in Native MXFP4 Precision
-
Ring-1T-2.5 Released with SOTA Deep Thinking Performance
-
GitHub Announces Support for Open Source AI Project Maintainers
-
Running Mistral-7B on Intel NPU Achieves 12.6 Tokens/Second
-
New Header-Only C++ Benchmark Tool for Predictive Models on Raw Binary Streams
-
GLM-5 Released: 744B Parameter MoE Model Targeting Complex Tasks
-
Community Member Builds 144GB VRAM Local LLM Powerhouse