Tagged "benchmark-report"
- LLMs Consume 5.4x Less Mobile Energy Than Ad-Supported Web Search
- Speculative Decoding Achieves 29% Speed Boost for Gemma-4 31B
- MiniMax-M2.7 Delivers Exceptional Performance on Consumer Hardware
- Intel Arc Pro B70 32GB Achieves 12 Tokens/Sec on Qwen 3.5-27B
- Gemma 4 31B vs Qwen 3.5 27B: Comprehensive Long Context Benchmark
- Warp Decode vs. vLLM's Triton Kernel: Performance Crossover Analysis
- Qwen 3.5 122B Achieves 198 Tokens/sec on Dual RTX PRO 6000 Blackwell GPUs
- Mano-P: Open-Source On-Device GUI Agent, #1 on OSWorld Benchmark
- Show HN: Willitrun – Check if Any ML Model Runs on Any Device (Benchmark-Backed)
- Comprehensive Benchmark: 37 LLMs Tested on MacBook Air M5 With Open-Source Tool
- Gemma 4 Achieves Top Multilingual Performance Across European Languages
- Quantization Strategy Comparison: Balancing Quality and Speed on Consumer Laptops
- Gemma 4 31B Achieves Third Place on FoodTruck Bench, Beating Larger Models
- YC-Bench: GLM-5 Matches Claude Opus 4.6 at 11× Lower Cost
- Gemma 4 31B Outperforms GLM 5.1 in Real-World Testing
- April 2026 TLDR Setup for Ollama and Gemma 4 26B on a Mac mini
- Qwen 3.5-27B Demonstrates Superior Performance vs Gemini 3.1 Pro and GPT-5.3
- M5 Max Delivers 1.7x Faster Inference Than M3 Max on Qwen 3.5 Models
- Forensic Beats Mem0 with 90.1% on LOCOMO Benchmark
- TurboQuant Benchmarked in Llama.cpp: Google's Extreme Compression Research Tested in Practice
- Qwen 3.5 27B Achieves 1.1M Tokens/Second on B200 GPUs with Optimized vLLM Config
- Comparison of Two Frameworks: 40% Token Efficiency Improvement
- Real-World Benchmark: DeepSeek-V3 Matches Claude Sonnet on Routine Coding Tasks
- Llama.cpp Benchmark: RTX 5090 vs Enterprise Systems Compared