Tagged "benchmarks"
-
Intel N150 Mini PC Runs Local LLM for Home Assistant
-
Hipfire: A Rust-Native AMD Inference Engine That Outperforms llama.cpp
-
Linux Crushes Windows on llama.cpp Inference by Double Digits
-
The New Linux Kernel AI Bot Uncovering Bugs Is A Local LLM On Framework Desktop + AMD Ryzen AI Max
-
LLMs Consume 5.4x Less Mobile Energy Than Ad-Supported Web Search
-
Fixing Hallucination in LLM Prediction With Only One 48GB GPU
-
I Replaced My Local LLM With a Model Half Its Size and Got Better Results
-
Show HN: We built an OCR server that can process 270 dense images/s on a 5090
-
I Cancelled Codex Two Months Ago. Opus 4.7 Brought Me Back
-
Gemma 4 Just Replaced My Whole Local LLM Stack
-
Claude vs Local LLM: Real-World Prompt Comparison Reveals Trade-offs
-
Unweight: Lossless MLP Weight Compression for LLM Inference
-
We Built a Local Model Arena in 30 Minutes — Infrastructure Mattered More Than the App
-
Laimark – 8B LLM That Self-Improves on Consumer GPUs
-
115 TOPS in 0.67L: CHUWI AuBox X Packs On-Device AI Power Into a Palm-Sized Mini PC
-
Build a More Secure, Always-On Local AI Agent with OpenClaw and NVIDIA NemoClaw
-
LLM Personalization Breaks Down in High-Stakes Finance
-
Google's Gemma 4: The Most Practical Local LLM Despite Not Being The Smartest
-
Self-Hosted LLMs Transform Personal Knowledge Management Systems
-
MiniMax M2.7 GGUF Investigation Reveals NaN Issues Affecting 21-38% of Hugging Face Conversions
-
DFlash Doubles Token Generation Speed of Qwen3.5 27B on Mac M5 Max
-
OpenClaw at 250K GitHub Stars: Community Explores Practical Limitations Beyond News Digests
-
MiniMax M2.7 Achieves SOTA Performance Under 64GB on Mac with TQ Quantization
-
Speculative Decoding Achieves 29% Speed Boost for Gemma-4 31B
-
Show HN: SkillCompass – Open-Source Quality Evaluator for Your AI Skills
-
MiniMax-M2.7 Delivers Exceptional Performance on Consumer Hardware
-
Running Same Prompts Through Claude and Local LLM Revealed Unexpected Results
-
Self-Hosted LLM Elevates Personal Knowledge Management Systems to New Levels
-
Users Report Significant Performance Improvements After Migrating from Ollama to llama.cpp
-
Google Gemma 4 Delivers Exceptional Speed and Accuracy for Local Inference
-
Intel Arc Pro B70 32GB Achieves 12 Tokens/Sec on Qwen 3.5-27B
-
Google's Gemini Nano 4 Offers Faster, Smarter Local Inference Capabilities
-
GLM 5.1 Dominates Agentic Benchmarks, Outperforming Most Models at 1/3 Opus Cost
-
Gemma 4 31B vs Qwen 3.5 27B: Comprehensive Long Context Benchmark
-
Warp Decode vs. vLLM's Triton Kernel: Performance Crossover Analysis
-
Qwen 3.5 122B Achieves 198 Tokens/sec on Dual RTX PRO 6000 Blackwell GPUs
-
Local Small LLMs Match Enterprise Model Performance on Vulnerability Detection
-
I Replaced My Local LLM With a Model Half Its Size and Got Better Results — and It Wasn't About the Parameters
-
Mano-P: Open-Source On-Device GUI Agent, #1 on OSWorld Benchmark
-
Gemma 4 GGUF Models Updated with Critical Quantization Fixes
-
Show HN: Willitrun – Check if Any ML Model Runs on Any Device (Benchmark-Backed)
-
MemPalace, the Highest-Scoring AI Memory System Ever Benchmarked
-
Comprehensive Benchmark: 37 LLMs Tested on MacBook Air M5 With Open-Source Tool
-
Gemma 4 Achieves Top Multilingual Performance Across European Languages
-
Quantization Strategy Comparison: Balancing Quality and Speed on Consumer Laptops
-
Real-time Multimodal AI on Apple Silicon: Gemma E2B Demo Shows Practical Edge Deployment
-
Gemma 4 31B Achieves Exceptional Performance on Local Hardware
-
Unpaved: Audit Toolkit for AI Developer Tool Bias in Global South Contexts
-
Qwen 3.5 397B Reduced to 35% Parameters With Usable Quality on 96GB GPU
-
Qwen 3.6 Free Model Available via OpenRouter
-
Gemma 4 31B Achieves Third Place on FoodTruck Bench, Beating Larger Models
-
Apple Research Shows Self-Distillation Significantly Improves Local Code Generation
-
YC-Bench: GLM-5 Matches Claude Opus 4.6 at 11× Lower Cost
-
GPUs vs. TPUs: Decoding the Powerhouses of AI
-
Gemma 4 31B Outperforms GLM 5.1 in Real-World Testing
-
Gemma 4 Shows Strong Reasoning Performance with Thinking Tokens
-
Gemma 4 26B A4B Outperforms Qwen 3.5 35B on Apple Silicon
-
Gemma 4 Makes Local AI Agents Practical
-
Qwen 3.6-Plus Released
-
Bonsai 1-Bit Models Deliver Exceptional Local Inference Performance
-
Qwen 3.5-27B Demonstrates Superior Performance vs Gemini 3.1 Pro and GPT-5.3
-
Llama.cpp Merging TurboQuant Lite (attn-rot) with Major Performance Gains
-
ByteShape Releases Qwen 3.5 9B Quantisations with Hardware-Matched Tuning Guide
-
PrismML Announces 1-Bit Bonsai: First Commercially Viable 1-Bit LLMs
-
Does RAG Help AI Coding Tools?
-
Qwen3 512k Context via TurboQuant on Mac mini
-
M5 Max Delivers 1.7x Faster Inference Than M3 Max on Qwen 3.5 Models
-
Forensic Beats Mem0 with 90.1% on LOCOMO Benchmark
-
Reverse-Engineering the Apollo 11 Code with AI
-
TurboQuant Benchmarked in Llama.cpp: Google's Extreme Compression Research Tested in Practice
-
Qwen 3.5 27B Achieves 1.1M Tokens/Second on B200 GPUs with Optimized vLLM Config
-
Comparison of Two Frameworks: 40% Token Efficiency Improvement
-
Homelab Consolidation: Replacing 3 Models with Single 122B MoE Model on AMD Ryzen AI MAX+
-
Real-World Benchmark: DeepSeek-V3 Matches Claude Sonnet on Routine Coding Tasks
-
Llama.cpp Benchmark: RTX 5090 vs Enterprise Systems Compared
-
South Korea Science Ministry Seeks Five On-Device AI Pilot Projects for Public Services
-
KV Cache Quantization Levels Benchmarked on SWE-bench: Practical Trade-offs for Local Inference
-
FlashAttention-4 Delivers 2.7x Faster Inference with 1613 TFLOPs/s on Blackwell GPUs
-
MiniMax M2.7 Model to Be Released as Open Weights
-
Llama.cpp ROCm 7 vs Vulkan Performance Benchmarks on AMD Mi50
-
Llama 8B Matches 70B Performance on Multi-Hop QA Using Structured Prompting
-
ik_llama.cpp Fork Delivers 26x Faster Prompt Processing on Qwen 3.5 27B
-
Qwen 3.5 397B emerges as top-performing local coding model
-
Multi-Token Prediction support coming to MLX-LM for Qwen 3.5
-
Apple M5 Max 128GB real-world performance benchmarks for local inference
-
DeepSeek R1 RTX 4090 vs Apple M3 Max: Benchmark & Performance Guide
-
Build a $1,500 AI Server with DeepSeek-R1 on RTX 4090
-
NVIDIA Nemotron Cascade 2 30B Delivers 120B-Class Performance in Compact Form Factor
-
My Dinner with AI
-
Hugging Face Releases One-Liner for Automatic Hardware Detection and Model Selection
-
I Ran Local LLMs on a 'Dead' GPU, and the Results Surprised Me
-
Qwen 3.5 4B Outperforms Nvidia Nemotron 3 4B in Local Benchmarks
-
Two Local Models Prove Competitive Enough to Replace ChatGPT, Gemini, and Copilot
-
Achieving 2000 Tokens Per Second with QWEN 3.5 27B on RTX-5090
-
P-EAGLE: Faster LLM Inference with Parallel Speculative Decoding in vLLM
-
Runpod Report: Qwen Has Overtaken Meta's Llama As The Most-Deployed Self-Hosted LLM
-
Comprehensive MoE Backend Benchmarks for Qwen3.5-397B: Real Numbers vs Hype
-
Apple M5 Max 128GB Benchmark Results for Local LLM Inference
-
Simple Layer Duplication Technique Achieves Top Open LLM Leaderboard Performance
-
Qwen 3.5-35B Uncensored GGUF Models Now Available
-
HP OMEN MAX 16 Review: Is Local AI on a Laptop Viable in 2026?
-
Fine-Tuned Qwen SLMs (0.6–8B) Demonstrate Competitive Performance Against Frontier LLMs on Specialized Tasks
-
M5 Max and M5 Ultra Chipsets Demonstrate Significant Bandwidth Improvements for Local LLM Inference
-
Strix Halo (Ryzen AI Max+ 395) Achieves Strong Local Inference Performance with ROCm 7.2
-
Qwen 3.5 Family Benchmark Comparison Shows Strong Performance Across Smaller Models
-
FretBench – Testing 14 LLMs on Reading Guitar Tabs Reveals Performance Gaps
-
Qwen 3.5 27B Achieves Strong Local Inference Performance
-
Benchmark: Local Open-Source LLMs Competitive in Real-Time Trading Applications
-
AI Agent Reliability Tracker
-
Qwen3-Coder-Next Achieves Top Ranking on SWE-bench at Pass@5
-
Qwen 3.5-35B-A3B Achieves 37.8% on SWE-bench Verified Hard
-
Qwen 3.5-27B Q4 Quantization Comparison and Analysis
-
Qwen 3.5 vs Qwen 3 Benchmark Analysis: Generational Performance Improvements Visualized
-
Running Local AI Models on Mac Studio 128GB: 4B, 20B & 120B Tested
-
Local LLM Performance Improvements: A Year of Progress Since DeepSeek R1 Moment
-
Google Research Finds Longer Chain-of-Thought Correlates Negatively With Accuracy
-
Qwen3.5-35B Unsloth Dynamic GGUFs Achieve SOTA Across Nearly All Quantisation Levels
-
Qwen3.5-35B RTX 5080 Experiments Confirm KV q8_0 as Free Lunch, Q4_K_M Remains Optimal
-
Qwen 3.5-35B Unsloth Dynamic GGUFs Achieve SOTA Quantisation Benchmarks
-
Qwen 3.5-35B RTX 5080 Benchmarks Confirm KV Q8_0 as Free Lunch, Q4_K_M Remains Optimal
-
The ML.energy Leaderboard
-
Accuracy vs. Speed in Local LLMs: Finding Your Sweet Spot
-
Qwen 3.5 MoE Delivers 100K Context Window at 40+ TPS on RTX 5060 Ti
-
Qwen 3.5 Underperforms on Hard Coding Tasks—APEX Benchmark Analysis
-
DeepSeek Paper – DualPath: Breaking the Bandwidth Bottleneck in LLM Inference
-
Qwen3.5 Series Releases Comprehensive Model Lineup Across All Tiers
-
Show HN: 100% LLM Accuracy–No Fine-Tuning, JSON Only
-
Advanced Quantization Techniques Show Surprising Performance Gains Over Standard Methods
-
The Real AI Competition Is Closed-Source vs Open-Source, Not America vs China
-
Anthropic Has Never Open-Sourced an LLM: Implications for Local Deployment Strategy
-
Which Web Frameworks Are Most Token-Efficient for AI Agents?
-
How Do You Know Which SKILL.md Is Good?
-
Qwen3-Code-Next Proves Practical for Local Development: Real-World Coding Tasks on Mac Studio
-
GLM-5 Becomes Top Open-Weights Model on Extended NYT Connections Benchmark
-
How Slow Local LLMs Are on My Framework 13 AMD Strix Point
-
Asus ExpertBook B3 G2 with 50 TOPS AI Sets New Enterprise Standard
-
Strix Halo Performance Benchmarks: Minimax M2.5, Step 3.5 Flash, Qwen3 Coder
-
I Run Local LLMs in One of the World's Priciest Energy Markets, and I Can Barely Tell
-
Qwen3 Coder Next Remains Effective at Aggressive Quantization Levels
-
SanityBoard Adds 27 New Model Evaluations Including Qwen 3.5 Plus, GLM 5, and Gemini 3.1 Pro
-
Qwen3 Coder Next 8FP Demonstrates Exceptional Long-Context Performance on 128GB System
-
The Path to Ubiquitous AI (17k tokens/sec)
-
Free ASIC-Accelerated Llama 3.1 8B Inference at 16,000 Tokens/Second
-
Enhanced Quantization Visualization Methods for Understanding LLM Compression Trade-offs
-
GPT4All Replaces Ollama On Mac After Quick Trial
-
Hardware Economics Shift: DDR5 RDIMM Pricing Now Comparable to GPUs for Local Inference
-
Alibaba's Qwen3.5-397B Achieves #3 Position in Open Weights Model Rankings
-
Real-World Coding Benchmark Tests LLMs on 65 Production Codebase Tasks
-
Qwen 3.5-397B-A17B Now Available for Local Inference with Aggressive Quantisation
-
MiniMax-M2.5 230B MoE Model Released with GGUF Support for Local Deployment
-
Optimal llama.cpp Settings Found for Qwen3 Coder Next Loop Issues
-
MiniMax M2.5: 230B Parameter MoE Model Coming to HuggingFace
-
Running Your Own AI Assistant for €19/Month: Complete Self-Hosting Guide
-
Running Mistral-7B on Intel NPU Achieves 12.6 Tokens/Second
-
OpenClaw with vLLM Running for Free on AMD Developer Cloud
-
New Header-Only C++ Benchmark Tool for Predictive Models on Raw Binary Streams
-
Use Recursive Language Models to address huge contexts for local LLM