Tagged "quantisation"
-
Four Raspberry Pi AI Tools You Can Try This Week Beyond OpenClaw
-
Open-Source Tool Helps Determine Which Local LLMs Run on Your PC
-
LLM Neuroanatomy II: Modern LLM Hacking and Hints of a Universal Language
-
KV Cache Quantization Levels Benchmarked on SWE-bench: Practical Trade-offs for Local Inference
-
FOMOE: Running 397B Parameter Qwen3.5 MoE at 5-9 tok/s on $2,100 Desktop Hardware
-
FlashAttention-4 Delivers 2.7x Faster Inference with 1613 TFLOPs/s on Blackwell GPUs
-
Chinese LLM Ecosystem Landscape: ByteDance Doubao, Alibaba, and Open-Source Competition
-
Qt 6.11 Released with Enhanced Cross-Platform Deployment Capabilities
-
Korea to Deploy Domestic AI Chips in Smart Cities as NPU Trials Scale Up
-
How to Build a Self-Hosted AI Server with LM Studio: Step-by-Step Guide
-
Powerful AI Search Engine Built on Single GeForce RTX 5090
-
Ditching Paid AI Services: Building Self-Hosted LLM Solutions as ChatGPT, Claude, and Gemini Alternatives
-
Rust Project Perspectives on AI
-
Qwen 3.5 122B Uncensored (Aggressive) Released with New K_P Quantisations
-
Setting Up a Private AI Brain on Windows: Complete Guide to Local LLM Deployment
-
ik_llama.cpp Fork Delivers 26x Faster Prompt Processing on Qwen 3.5 27B
-
Why You Should Use Both ChatGPT and Local LLMs: A Practical Hybrid Approach
-
BrowserOS 0.44.0 Release: Advances in Local AI Integration for Web-Based Applications
-
AI Playground for Developers Built in Vite and Python
-
Running an AI Agent on a 448KB RAM Microcontroller
-
Qwen 3.5 397B emerges as top-performing local coding model
-
Apple M5 Max 128GB real-world performance benchmarks for local inference
-
Ultra-Compact 28M Parameter Models Show Promise for Specialized Domain Tasks
-
Qwen 3.5 Emerges as Top Performer for Local Deployment with Extensive Quantization Options
-
Community Converges on Optimal KV Cache Quantization Strategies for Qwen 3.5 Models
-
Repurpose Old GPUs as Dedicated AI Inference Accelerators
-
ASUS ExpertCenter PN55 Mini PC Combines AMD AI CPU and 55 TOPS NPU
-
Unsloth Studio: Open-Source Web UI for Training and Running LLMs Locally
-
Snapdragon 8 Elite Gen 5 Hands the Galaxy S26 the AI Upgrade We've Been Waiting For
-
You're Using Your Local LLM Wrong If You're Prompting It Like a Cloud LLM
-
Hugging Face Releases One-Liner for Automatic Hardware Detection and Model Selection
-
Browser-Based Transcription Tools
-
Run LLMs Locally with Llama.cpp
-
I Ran Local LLMs on a 'Dead' GPU, and the Results Surprised Me
-
Qwen 3.5 4B Outperforms Nvidia Nemotron 3 4B in Local Benchmarks
-
Mistral Small 4 119B Released with NVFP4 Quantisation Support
-
Mistral Releases Small 4 Open-Source Model Under Apache 2.0
-
Researcher Discovers Universal "Danger Zone" in Transformer Model Architecture at 50% Depth
-
OpenClaw Isn't the Only Raspberry Pi AI Tool—Here Are 4 Others You Can Try This Week
-
OmniCoder-9B: Efficient Coding Model for 8GB GPUs
-
Nota Added to Three Technology and Growth ETFs in a Row – Market Recognition for AI Efficiency
-
AMD Declares 'AI on the PC Has Crossed an Important Line' – Agent Computers as Next Breakthrough
-
Running Qwen3.5-27B Across Multiple GPUs Over LAN Achieves Practical Speed for Local Inference
-
Two Local Models Prove Competitive Enough to Replace ChatGPT, Gemini, and Copilot
-
India's Mobile-First AI Strategy Could Accelerate Local Inference Adoption in Emerging Markets
-
I made Karpathy's Autoresearch work on CPU
-
Best Local LLM Models 2026: Developer Comparison
-
Sarvam Open-Sources 30B and 105B Reasoning Models
-
Qwodel – An Open-Source Unified Pipeline for LLM Quantization
-
Quantization Explained: Q4_K_M vs AWQ vs FP16 for Local LLMs
-
Nvidia Releases Nemotron 3 Super: 120B MoE Model for Local Deployment
-
Show HN: Detect When an LLM Silently Changes Behavior for the Same Prompt
-
Experiment: 0.8B Model Self-Improvement on MacBook Air Yields Surprising Results
-
SK Hynix Completes Qualification for LPDDR6 Memory Optimized for AI Inference
-
Sarvam Open-Sources 30B and 105B Reasoning Models
-
Qwen 3.5-35B Uncensored GGUF Models Now Available
-
NVIDIA Jetson Brings Open Models to Life at the Edge
-
Llama.cpp Celebrates Major Milestone: From Leak to Industry Standard
-
Qwen 3.5 Ultra-Compact Models Enable On-Device AI from Watches to Gaming
-
.ispec: Runtime Specification Validation for AI System Consistency
-
HP OMEN MAX 16 Review: Is Local AI on a Laptop Viable in 2026?
-
FreeBSD 14.4 Released: Implications for Local LLM Deployment
-
Community Survey: AI Content Automation Stacks in 2026
-
Qwen 3.5 Small Expands On-Device AI to Phones and IoT with Offline Support
-
Qwen 3.5 Family Benchmark Comparison Shows Strong Performance Across Smaller Models
-
Qwen 3.5 Derestricted Model Available for Local Deployment
-
Nota AI to Showcase End-to-End On-Device AI Optimization at Embedded World 2026
-
How to Run Your Own Local LLM — 2026 Edition
-
Snapdragon Wear Elite Unveiled at MWC 2026, Advancing Wearable AI Inference
-
Samsung Opens Registration for Vision AI QLED and OLED Television Integration
-
Qwen 3.5 27B Achieves Strong Local Inference Performance
-
HP Refreshes Lineup with AI-Focused Workstations
-
Apple Launches MacBook Neo with A18 Pro Chip for Affordable Local AI Inference
-
Windows 11 Notepad Gets On-Device AI Text Generation Without Subscription
-
Mojo: Creating a Programming Language for an AI World with Chris Lattner
-
Show HN: TLDR – Free Chrome Extension for AI-Powered Article Summarization
-
Final Qwen3.5 Unsloth GGUF Update with Improved Size/Quality Tradeoffs
-
OPPO and MediaTek Highlight On-Device AI Innovations at MWC 2026
-
Alibaba Releases Qwen 3.5 AI Model with On-Device AI Support
-
Unity Showcases Manufacturing AI Workflow at Smart Factory Expo
-
MediaTek Advances Omni Model for Efficient Smartphone Inference
-
Kakao Launches Kanana AI for On-Device Schedule and Recommendation Management
-
Apple Unveils MacBook Pro with M5 Pro and M5 Max Featuring On-Device AI
-
Qwen 3.5-27B Q4 Quantization Comparison and Analysis
-
Qualcomm Snapdragon Wear Elite Brings On-Device AI to Smartwatches
-
OpenWrt 25.12.0 – Stable Release
-
On-Device AI Laptop Lineups Become Standard Across Major Manufacturers
-
Apple Unveils MacBook Pro With M5 Pro and M5 Max for On-Device AI
-
Alibaba's Qwen 3.5 Small Model Runs Directly on iPhone 17
-
Running Local AI Models on Mac Studio 128GB: 4B, 20B & 120B Tested
-
Qwen 3.5 27B Achieves 100+ Tokens/s Decode on Dual RTX 3090s with 170K Context
-
Qualcomm Launches Snapdragon Wear Elite for On-Device AI on Wearables
-
Local LLM Performance Improvements: A Year of Progress Since DeepSeek R1 Moment
-
HP ZBook Ultra 14 G1a Workstation Reclaims Local AI Workflows for Professionals
-
How to Run High-Performance LLMs Locally on the Arduino UNO Q
-
Apple Intelligence, Galaxy AI, Gemini: Why Your AI-Powered Phone Is Worth Repairing
-
Unsloth Dynamic 2.0 GGUFs
-
Qwen3.5-35B Unsloth Dynamic GGUFs Achieve SOTA Across Nearly All Quantisation Levels
-
Qwen3.5-35B RTX 5080 Experiments Confirm KV q8_0 as Free Lunch, Q4_K_M Remains Optimal
-
Qwen3.5-35B Successfully Runs on Raspberry Pi 5 at 3+ Tokens/Second
-
Qwen 3.5-35B Unsloth Dynamic GGUFs Achieve SOTA Quantisation Benchmarks
-
Qwen 3.5-35B RTX 5080 Benchmarks Confirm KV Q8_0 as Free Lunch, Q4_K_M Remains Optimal
-
Meta Reveals AI-Packed Smartwatch In 2026 – Why Wearables Shift Now
-
Galaxy S26 Debuts AI-Powered Scam Detection in Bold Security Push
-
Arduino, Qualcomm Bring On-Device AI and Robotics Learning to Indian School Systems
-
Accuracy vs. Speed in Local LLMs: Finding Your Sweet Spot
-
Snapdragon 8 Elite Gen 5 for Galaxy Official: 5 Key Improvements that Push the Boundaries
-
On-Device AI in Mobile Apps: What Should Run on the Phone vs the Cloud (A 2026 Decision Guide)
-
5 Useful Docker Containers for Agentic Developers
-
Arduino, Qualcomm Bring On-Device AI and Robotics Learning to Indian School Systems
-
Arduino and Qualcomm Bring On-Device AI Learning to Indian Schools
-
Android Phones Are Getting Smarter Without Internet — Here's Why On-Device AI Is the Next Big Shift
-
Android Phones Are Getting Smarter Without Internet — On-Device AI as the Next Shift
-
Running LLMs on Raspberry Pi and Edge Devices: A Practical Guide
-
New Era of On-Device AI Driven by High-Speed UFS 5.0 Storage
-
Qwen3.5 Series Releases Comprehensive Model Lineup Across All Tiers
-
Qwen3.5-27B Identified as Sweet Spot for Mid-Range Local Deployment
-
PyTorch Foundation Announces New Members as Agentic AI Demand Grows
-
Mirai Announces $10M to Advance On-Device AI Performance for Consumer Devices
-
Advanced Quantization Techniques Show Surprising Performance Gains Over Standard Methods
-
How AI is Redefining Price and Performance in Modern Laptops
-
What Breaks When AI Agent Frameworks Are Forced Into <1MB RAM and Sub-ms Startup
-
No, Local LLMs Can't Replace ChatGPT or Gemini — I Tried
-
Kioxia Sampling UFS 5.0 Embedded Flash Memory for Next-Generation Mobile Applications
-
Enhanced Interface Speed Enables High-Performance On-Device AI Features in Smartphones
-
Elastic Introduces Best-in-Class Embedding Models for High Performance Semantic Search
-
Show HN: Dypai – Build Backends from Your IDE Using AI and MCP
-
Enterprise Infrastructure Guide: Running Local LLMs for 70-150 Developers
-
Anthropic Has Never Open-Sourced an LLM: Implications for Local Deployment Strategy
-
Breaking the Speed Limit: Strategies for 17k Tokens/Sec Local Inference
-
How Do You Know Which SKILL.md Is Good?
-
Qwen3-Code-Next Proves Practical for Local Development: Real-World Coding Tasks on Mac Studio
-
Custom Portable Workstation Optimized for Local AI Inference Builds
-
Open-Source Framework Achieves Gemini 3 Deep Think Level Performance Through Local Model Scaffolding
-
Nvidia Could Launch Its First Laptops With Its Own Processors
-
nanollama: Open-Source Framework for Training Llama 3 from Scratch with One-Command GGUF Export
-
Open-Source llama.cpp Finds Long-Term Home at Hugging Face
-
Future of Mobile AI: What On-Device Intelligence Means for App Developers
-
The Complete Stack for Local Autonomous Agents: From GGML to Orchestration
-
Breaking the Speed Limit: Strategies for 17k Tokens/Sec Local Inference
-
Ouro 2.6B Thinking Model GGUFs Released with Q8_0 and Q4_K_M Quantization
-
At India AI Impact Summit, Intel Showcases AI PCs and Cost-Efficient Frugal AI
-
GGML Joins Hugging Face: What This Means for Local Model Optimization
-
DietPi Released a New Version v10.1
-
CPU-Trained Language Model Outperforms GPU Baseline After 40 Hours
-
AI PCs Explained: 7 Critical Truths About NPUs and Privacy
-
Taalas Etches AI Models onto Transistors to Rocket Boost Inference
-
Strix Halo Performance Benchmarks: Minimax M2.5, Step 3.5 Flash, Qwen3 Coder
-
Qwen3 Coder Next Remains Effective at Aggressive Quantization Levels
-
[Release] Ouro-2.6B-Thinking: ByteDance's Recurrent Model Now Runnable Locally
-
I Thought I Needed a GPU to Run AI Until I Learned About These Models
-
GGML.AI Acquired by Hugging Face
-
VaultAI – 42 AI Models on a Portable SSD, Works Offline for $399
-
The Path to Ubiquitous AI (17k tokens/sec)
-
Mirai Secures $10M to Optimize On-Device AI Amid Cloud Cost Surge
-
Sarvam Brings AI to Feature Phones, Cars, and Smart Glasses
-
Running Local LLMs and VLMs on Arduino UNO Q with yzma
-
Enhanced Quantization Visualization Methods for Understanding LLM Compression Trade-offs
-
Local Vision-Language Models for Document OCR and PII Detection in Privacy-Critical Workflows
-
Hardware Economics Shift: DDR5 RDIMM Pricing Now Comparable to GPUs for Local Inference
-
Qualcomm Ventures Positions India as Blueprint for Affordable On-Device AI Infrastructure
-
Same INT8 Model Shows 93% to 71% Accuracy Variance Across Snapdragon Chipsets
-
Qwen 3.5-397B-A17B Now Available for Local Inference with Aggressive Quantisation
-
Ask HN: What is the best bang for buck budget AI coding?
-
Alibaba Unveils Major AI Model Upgrade Ahead of DeepSeek Release
-
MiniMax Releases M2.5 Model with SOTA Coding and Agent Capabilities
-
MiniMax-M2.5 230B MoE Model Released with GGUF Support for Local Deployment
-
GPT-OSS 120B Uncensored Model Released in Native MXFP4 Precision
-
Ring-1T-2.5 Released with SOTA Deep Thinking Performance
-
GitHub Announces Support for Open Source AI Project Maintainers
-
Running Mistral-7B on Intel NPU Achieves 12.6 Tokens/Second
-
New Header-Only C++ Benchmark Tool for Predictive Models on Raw Binary Streams
-
GLM-5 Released: 744B Parameter MoE Model Targeting Complex Tasks
-
Community Member Builds 144GB VRAM Local LLM Powerhouse