Tagged "benchmarks"

A Cinematic Landing-Page Hero for 80 Cents (GPT Image 2 and Veo 3.1) 2 June 2026
Microsoft and Nvidia to Unveil First Windows PCs with Nvidia CPUs and AI Capabilities 31 May 2026
Liquid AI Launches Edge-Focused LFM2.5 Model to Power On-Device AI Agents 31 May 2026
Real-time LLM Inference on Standard GPUs: 3k tokens/s per request 29 May 2026
DeepSeek's Flagship V4 Pro Model Drops to 75% Lower Pricing, Increasing Competitive Pressure on Local Inference Economics 26 May 2026
vLLM vs Ollama 2026: Performance Benchmark Reveals 9x Throughput Gap 25 May 2026
Show HN: I Built a Debugging Challenge for the AI Coding Age 25 May 2026
Qualcomm's AI-Device Strategy Reflects Growing Market Momentum in On-Device Intelligence 24 May 2026
Redditor Successfully Runs 1 Trillion Parameter LLM Using Cheap Intel Optane DIMMs 24 May 2026
New 8B Local LLM Design Marks Biggest Shift Since DeepSeek R1 23 May 2026
M5 Max MacBook Runs Local Large Language Models Efficiently 23 May 2026
110 Tokens/Second on RTX 4070 Super with Qwen 3.6 35B 22 May 2026
A/B Tested Gemini 3.1 Pro vs. Claude Opus 4.6 – Usage Quota and Quality Comparison 22 May 2026
Benchmarking a Portable AI Workstation: Lenovo ThinkPad P16 Gen 3, Part 2 21 May 2026
Hardware LLM Taalas Reaches >14,000 TPS on Llama 3.1 8B 21 May 2026
Google's Offline AI App Gets Three Major Feature Upgrades 20 May 2026
Bito's AI Architect Improves Claude Opus Task Success Rate by 35% 19 May 2026
HP's On-Device AI Needs More If It Is Going to Compete With Copilot 17 May 2026
Google Limits Gemini Intelligence to New Flagships—Hardware Requirements for Local Deployment 17 May 2026
AI/ML Benchmark Tool for Local LLM Inference and XGBoost Training 16 May 2026
Show HN: Find the best local LLM for your hardware, ranked by benchmarks 15 May 2026
ROCm 7.2.3 Delivers Performance Improvements Over 7.0.0 on AMD Radeon AI PRO 15 May 2026
LLM temporal and causal reasoning research 15 May 2026
Geometry Conflict: Explaining and Controlling Forgetting in LLM Continual Post-Training 14 May 2026
Claude Opus 4.7 System Prompt Leaks Raise Local Deployment Questions 14 May 2026
Researchers Report AI Breaking Every Benchmark for Autonomous Cyber Capability 14 May 2026
Legacy System Analysis with AI Reveals Modern Architecture Under the Hood 14 May 2026
Running a Local LLM on a 12-Year-Old Raspberry Pi 13 May 2026
BT Explainer: Google's Gemma 4 Could Put Powerful AI on Your Phone and Laptop 13 May 2026
Gemma 4 Replaces Entire Local LLM Stack for Many Practitioners 12 May 2026
$200 NVIDIA V100 Server GPU Mod Beats RTX 3060 in Local LLM Test 11 May 2026
Small On-Device AI Model Beats Claude Sonnet 4.5 and GPT-5 10 May 2026
One LM Studio Setting Makes Local LLMs Competitive With Cloud Models 10 May 2026
Anthropic Develops Tool to Detect When Claude Recognizes It's Being Tested 9 May 2026
Local LLM Rewrites Resume Better Than ChatGPT, and It's Not Even Close 8 May 2026
Building a Local LLM News Brief Taught Me the Real Problem Wasn't the Sources, It Was the Apps 7 May 2026
Microsoft VibeVoice C++ Port Enables Local Voice AI on CPU and GPU Without Python 6 May 2026
Improving Code Quality with Local Claude and Codex Models 6 May 2026
A 49-Line Physics Classifier That Beats kNN on 76% of Benchmarks 5 May 2026
Supercharging LLM Inference on Google TPUs: Achieving 3X Speedups With Diffusion-Style Speculative Decoding 5 May 2026
Eval Skills for AI Agents 4 May 2026
Control AI Risk with Pre-Built Frameworks and Ready-to-Run Evaluations 4 May 2026
Running a Serious AI Model on a Consumer GPU Just Got Easier and That Matters More Than the Benchmark 3 May 2026
NIST's CAISI Evaluation of DeepSeek V4 Pro Finds It On Par with GPT-5 3 May 2026
Local AI Just Got Easier on Windows and the Implications Go Beyond the Benchmark 3 May 2026
PFlash Claims 10x Prefill Speedup Over llama.cpp 2 May 2026
Local LLMs Work Best When You're Not Loyal to Just One 2 May 2026
Study: AI Models That Consider User Feelings Are More Likely to Make Errors 2 May 2026
AI Coding Tools Are Silently Disagreeing with Each Other 2 May 2026
Xmemory: Benchmarking Structured AI Memory Against RAG and Hybrid RAG 1 May 2026
96.8% of MCP Tool Descriptions Don't Warn the Agent About Destructive Behaviour 1 May 2026
Self-Hosted LLMs in Production: Real-World Limits and Practical Lessons 30 April 2026
Private LLM vs. ChatGPT: When It Makes Sense for Business 30 April 2026
Running Capable Local LLMs Without Expensive GPU Hardware 30 April 2026
IBM Introduces Granite 4.1 Family of Models for Local Deployment 30 April 2026
How Much "Brain Damage" Can an LLM Tolerate? 30 April 2026
Estimating Black-Box LLM Parameter Counts via Factual Capacity 30 April 2026
Intel N150 Mini PC Runs Local LLM for Home Assistant 29 April 2026
Hipfire: A Rust-Native AMD Inference Engine That Outperforms llama.cpp 28 April 2026
Linux Crushes Windows on llama.cpp Inference by Double Digits 27 April 2026
The New Linux Kernel AI Bot Uncovering Bugs Is A Local LLM On Framework Desktop + AMD Ryzen AI Max 27 April 2026
LLMs Consume 5.4x Less Mobile Energy Than Ad-Supported Web Search 25 April 2026
Fixing Hallucination in LLM Prediction With Only One 48GB GPU 25 April 2026
I Replaced My Local LLM With a Model Half Its Size and Got Better Results 24 April 2026
Show HN: We built an OCR server that can process 270 dense images/s on a 5090 23 April 2026
I Cancelled Codex Two Months Ago. Opus 4.7 Brought Me Back 23 April 2026
Gemma 4 Just Replaced My Whole Local LLM Stack 21 April 2026
Claude vs Local LLM: Real-World Prompt Comparison Reveals Trade-offs 20 April 2026
Unweight: Lossless MLP Weight Compression for LLM Inference 18 April 2026
We Built a Local Model Arena in 30 Minutes — Infrastructure Mattered More Than the App 18 April 2026
Laimark – 8B LLM That Self-Improves on Consumer GPUs 18 April 2026
115 TOPS in 0.67L: CHUWI AuBox X Packs On-Device AI Power Into a Palm-Sized Mini PC 18 April 2026
Build a More Secure, Always-On Local AI Agent with OpenClaw and NVIDIA NemoClaw 18 April 2026
LLM Personalization Breaks Down in High-Stakes Finance 16 April 2026
Google's Gemma 4: The Most Practical Local LLM Despite Not Being The Smartest 16 April 2026
Self-Hosted LLMs Transform Personal Knowledge Management Systems 15 April 2026
MiniMax M2.7 GGUF Investigation Reveals NaN Issues Affecting 21-38% of Hugging Face Conversions 15 April 2026
DFlash Doubles Token Generation Speed of Qwen3.5 27B on Mac M5 Max 15 April 2026
OpenClaw at 250K GitHub Stars: Community Explores Practical Limitations Beyond News Digests 14 April 2026
MiniMax M2.7 Achieves SOTA Performance Under 64GB on Mac with TQ Quantization 14 April 2026
Speculative Decoding Achieves 29% Speed Boost for Gemma-4 31B 13 April 2026
Show HN: SkillCompass – Open-Source Quality Evaluator for Your AI Skills 13 April 2026
MiniMax-M2.7 Delivers Exceptional Performance on Consumer Hardware 13 April 2026
Running Same Prompts Through Claude and Local LLM Revealed Unexpected Results 13 April 2026
Self-Hosted LLM Elevates Personal Knowledge Management Systems to New Levels 12 April 2026
Users Report Significant Performance Improvements After Migrating from Ollama to llama.cpp 12 April 2026
Google Gemma 4 Delivers Exceptional Speed and Accuracy for Local Inference 12 April 2026
Intel Arc Pro B70 32GB Achieves 12 Tokens/Sec on Qwen 3.5-27B 11 April 2026
Google's Gemini Nano 4 Offers Faster, Smarter Local Inference Capabilities 11 April 2026
GLM 5.1 Dominates Agentic Benchmarks, Outperforming Most Models at 1/3 Opus Cost 11 April 2026
Gemma 4 31B vs Qwen 3.5 27B: Comprehensive Long Context Benchmark 11 April 2026
Warp Decode vs. vLLM's Triton Kernel: Performance Crossover Analysis 10 April 2026
Qwen 3.5 122B Achieves 198 Tokens/sec on Dual RTX PRO 6000 Blackwell GPUs 10 April 2026
Local Small LLMs Match Enterprise Model Performance on Vulnerability Detection 10 April 2026
I Replaced My Local LLM With a Model Half Its Size and Got Better Results — and It Wasn't About the Parameters 9 April 2026
Mano-P: Open-Source On-Device GUI Agent, #1 on OSWorld Benchmark 9 April 2026
Gemma 4 GGUF Models Updated with Critical Quantization Fixes 9 April 2026
Show HN: Willitrun – Check if Any ML Model Runs on Any Device (Benchmark-Backed) 7 April 2026
MemPalace, the Highest-Scoring AI Memory System Ever Benchmarked 7 April 2026
Comprehensive Benchmark: 37 LLMs Tested on MacBook Air M5 With Open-Source Tool 7 April 2026
Gemma 4 Achieves Top Multilingual Performance Across European Languages 7 April 2026
Quantization Strategy Comparison: Balancing Quality and Speed on Consumer Laptops 6 April 2026
Real-time Multimodal AI on Apple Silicon: Gemma E2B Demo Shows Practical Edge Deployment 6 April 2026
Gemma 4 31B Achieves Exceptional Performance on Local Hardware 6 April 2026
Unpaved: Audit Toolkit for AI Developer Tool Bias in Global South Contexts 5 April 2026
Qwen 3.5 397B Reduced to 35% Parameters With Usable Quality on 96GB GPU 5 April 2026
Qwen 3.6 Free Model Available via OpenRouter 5 April 2026
Gemma 4 31B Achieves Third Place on FoodTruck Bench, Beating Larger Models 5 April 2026
Apple Research Shows Self-Distillation Significantly Improves Local Code Generation 5 April 2026
YC-Bench: GLM-5 Matches Claude Opus 4.6 at 11× Lower Cost 4 April 2026
GPUs vs. TPUs: Decoding the Powerhouses of AI 4 April 2026
Gemma 4 31B Outperforms GLM 5.1 in Real-World Testing 4 April 2026
Gemma 4 Shows Strong Reasoning Performance with Thinking Tokens 3 April 2026
Gemma 4 26B A4B Outperforms Qwen 3.5 35B on Apple Silicon 3 April 2026
Gemma 4 Makes Local AI Agents Practical 3 April 2026
Qwen 3.6-Plus Released 2 April 2026
Bonsai 1-Bit Models Deliver Exceptional Local Inference Performance 2 April 2026
Qwen 3.5-27B Demonstrates Superior Performance vs Gemini 3.1 Pro and GPT-5.3 1 April 2026
Llama.cpp Merging TurboQuant Lite (attn-rot) with Major Performance Gains 1 April 2026
ByteShape Releases Qwen 3.5 9B Quantisations with Hardware-Matched Tuning Guide 1 April 2026
PrismML Announces 1-Bit Bonsai: First Commercially Viable 1-Bit LLMs 1 April 2026
Does RAG Help AI Coding Tools? 31 March 2026
Qwen3 512k Context via TurboQuant on Mac mini 28 March 2026
M5 Max Delivers 1.7x Faster Inference Than M3 Max on Qwen 3.5 Models 28 March 2026
Forensic Beats Mem0 with 90.1% on LOCOMO Benchmark 28 March 2026
Reverse-Engineering the Apollo 11 Code with AI 28 March 2026
TurboQuant Benchmarked in Llama.cpp: Google's Extreme Compression Research Tested in Practice 27 March 2026
Qwen 3.5 27B Achieves 1.1M Tokens/Second on B200 GPUs with Optimized vLLM Config 27 March 2026
Comparison of Two Frameworks: 40% Token Efficiency Improvement 27 March 2026
Homelab Consolidation: Replacing 3 Models with Single 122B MoE Model on AMD Ryzen AI MAX+ 27 March 2026
Real-World Benchmark: DeepSeek-V3 Matches Claude Sonnet on Routine Coding Tasks 26 March 2026
Llama.cpp Benchmark: RTX 5090 vs Enterprise Systems Compared 25 March 2026
South Korea Science Ministry Seeks Five On-Device AI Pilot Projects for Public Services 24 March 2026
KV Cache Quantization Levels Benchmarked on SWE-bench: Practical Trade-offs for Local Inference 24 March 2026
FlashAttention-4 Delivers 2.7x Faster Inference with 1613 TFLOPs/s on Blackwell GPUs 24 March 2026
MiniMax M2.7 Model to Be Released as Open Weights 23 March 2026
Llama.cpp ROCm 7 vs Vulkan Performance Benchmarks on AMD Mi50 23 March 2026
Llama 8B Matches 70B Performance on Multi-Hop QA Using Structured Prompting 22 March 2026
ik_llama.cpp Fork Delivers 26x Faster Prompt Processing on Qwen 3.5 27B 22 March 2026
Qwen 3.5 397B emerges as top-performing local coding model 21 March 2026
Multi-Token Prediction support coming to MLX-LM for Qwen 3.5 21 March 2026
Apple M5 Max 128GB real-world performance benchmarks for local inference 21 March 2026
DeepSeek R1 RTX 4090 vs Apple M3 Max: Benchmark & Performance Guide 21 March 2026
Build a $1,500 AI Server with DeepSeek-R1 on RTX 4090 21 March 2026
NVIDIA Nemotron Cascade 2 30B Delivers 120B-Class Performance in Compact Form Factor 20 March 2026
My Dinner with AI 18 March 2026
Hugging Face Releases One-Liner for Automatic Hardware Detection and Model Selection 18 March 2026
I Ran Local LLMs on a 'Dead' GPU, and the Results Surprised Me 17 March 2026
Qwen 3.5 4B Outperforms Nvidia Nemotron 3 4B in Local Benchmarks 17 March 2026
Two Local Models Prove Competitive Enough to Replace ChatGPT, Gemini, and Copilot 15 March 2026
Achieving 2000 Tokens Per Second with QWEN 3.5 27B on RTX-5090 14 March 2026
P-EAGLE: Faster LLM Inference with Parallel Speculative Decoding in vLLM 14 March 2026
Runpod Report: Qwen Has Overtaken Meta's Llama As The Most-Deployed Self-Hosted LLM 13 March 2026
Comprehensive MoE Backend Benchmarks for Qwen3.5-397B: Real Numbers vs Hype 12 March 2026
Apple M5 Max 128GB Benchmark Results for Local LLM Inference 12 March 2026
Simple Layer Duplication Technique Achieves Top Open LLM Leaderboard Performance 11 March 2026
Qwen 3.5-35B Uncensored GGUF Models Now Available 11 March 2026
HP OMEN MAX 16 Review: Is Local AI on a Laptop Viable in 2026? 10 March 2026
Fine-Tuned Qwen SLMs (0.6–8B) Demonstrate Competitive Performance Against Frontier LLMs on Specialized Tasks 10 March 2026
M5 Max and M5 Ultra Chipsets Demonstrate Significant Bandwidth Improvements for Local LLM Inference 10 March 2026
Strix Halo (Ryzen AI Max+ 395) Achieves Strong Local Inference Performance with ROCm 7.2 9 March 2026
Qwen 3.5 Family Benchmark Comparison Shows Strong Performance Across Smaller Models 9 March 2026
FretBench – Testing 14 LLMs on Reading Guitar Tabs Reveals Performance Gaps 9 March 2026
Qwen 3.5 27B Achieves Strong Local Inference Performance 8 March 2026
Benchmark: Local Open-Source LLMs Competitive in Real-Time Trading Applications 8 March 2026
AI Agent Reliability Tracker 8 March 2026
Qwen3-Coder-Next Achieves Top Ranking on SWE-bench at Pass@5 7 March 2026
Qwen 3.5-35B-A3B Achieves 37.8% on SWE-bench Verified Hard 4 March 2026
Qwen 3.5-27B Q4 Quantization Comparison and Analysis 4 March 2026
Qwen 3.5 vs Qwen 3 Benchmark Analysis: Generational Performance Improvements Visualized 3 March 2026
Running Local AI Models on Mac Studio 128GB: 4B, 20B & 120B Tested 2 March 2026
Local LLM Performance Improvements: A Year of Progress Since DeepSeek R1 Moment 2 March 2026
Google Research Finds Longer Chain-of-Thought Correlates Negatively With Accuracy 1 March 2026
Qwen3.5-35B Unsloth Dynamic GGUFs Achieve SOTA Across Nearly All Quantisation Levels 28 February 2026
Qwen3.5-35B RTX 5080 Experiments Confirm KV q8_0 as Free Lunch, Q4_K_M Remains Optimal 28 February 2026
Qwen 3.5-35B Unsloth Dynamic GGUFs Achieve SOTA Quantisation Benchmarks 28 February 2026
Qwen 3.5-35B RTX 5080 Benchmarks Confirm KV Q8_0 as Free Lunch, Q4_K_M Remains Optimal 28 February 2026
The ML.energy Leaderboard 28 February 2026
Accuracy vs. Speed in Local LLMs: Finding Your Sweet Spot 28 February 2026
Qwen 3.5 MoE Delivers 100K Context Window at 40+ TPS on RTX 5060 Ti 26 February 2026
Qwen 3.5 Underperforms on Hard Coding Tasks—APEX Benchmark Analysis 26 February 2026
DeepSeek Paper – DualPath: Breaking the Bandwidth Bottleneck in LLM Inference 26 February 2026
Qwen3.5 Series Releases Comprehensive Model Lineup Across All Tiers 25 February 2026
Show HN: 100% LLM Accuracy–No Fine-Tuning, JSON Only 25 February 2026
Advanced Quantization Techniques Show Surprising Performance Gains Over Standard Methods 25 February 2026
The Real AI Competition Is Closed-Source vs Open-Source, Not America vs China 24 February 2026
Anthropic Has Never Open-Sourced an LLM: Implications for Local Deployment Strategy 24 February 2026
Which Web Frameworks Are Most Token-Efficient for AI Agents? 23 February 2026
How Do You Know Which SKILL.md Is Good? 23 February 2026
Qwen3-Code-Next Proves Practical for Local Development: Real-World Coding Tasks on Mac Studio 23 February 2026
GLM-5 Becomes Top Open-Weights Model on Extended NYT Connections Benchmark 23 February 2026
How Slow Local LLMs Are on My Framework 13 AMD Strix Point 22 February 2026
Asus ExpertBook B3 G2 with 50 TOPS AI Sets New Enterprise Standard 22 February 2026
Strix Halo Performance Benchmarks: Minimax M2.5, Step 3.5 Flash, Qwen3 Coder 21 February 2026
I Run Local LLMs in One of the World's Priciest Energy Markets, and I Can Barely Tell 21 February 2026
Qwen3 Coder Next Remains Effective at Aggressive Quantization Levels 21 February 2026
SanityBoard Adds 27 New Model Evaluations Including Qwen 3.5 Plus, GLM 5, and Gemini 3.1 Pro 20 February 2026
Qwen3 Coder Next 8FP Demonstrates Exceptional Long-Context Performance on 128GB System 20 February 2026
The Path to Ubiquitous AI (17k tokens/sec) 20 February 2026
Free ASIC-Accelerated Llama 3.1 8B Inference at 16,000 Tokens/Second 20 February 2026
Enhanced Quantization Visualization Methods for Understanding LLM Compression Trade-offs 19 February 2026
GPT4All Replaces Ollama On Mac After Quick Trial 19 February 2026
Hardware Economics Shift: DDR5 RDIMM Pricing Now Comparable to GPUs for Local Inference 19 February 2026
Alibaba's Qwen3.5-397B Achieves #3 Position in Open Weights Model Rankings 18 February 2026
Real-World Coding Benchmark Tests LLMs on 65 Production Codebase Tasks 18 February 2026
Qwen 3.5-397B-A17B Now Available for Local Inference with Aggressive Quantisation 17 February 2026
MiniMax-M2.5 230B MoE Model Released with GGUF Support for Local Deployment 14 February 2026
Optimal llama.cpp Settings Found for Qwen3 Coder Next Loop Issues 13 February 2026
MiniMax M2.5: 230B Parameter MoE Model Coming to HuggingFace 13 February 2026
Running Your Own AI Assistant for €19/Month: Complete Self-Hosting Guide 12 February 2026
Running Mistral-7B on Intel NPU Achieves 12.6 Tokens/Second 12 February 2026
OpenClaw with vLLM Running for Free on AMD Developer Cloud 12 February 2026
New Header-Only C++ Benchmark Tool for Predictive Models on Raw Binary Streams 12 February 2026
Use Recursive Language Models to address huge contexts for local LLM 12 February 2026