Tagged "model-comparison"

Tweaking Local Language Model Settings with Ollama 29 May 2026
Show HN: I Built a Debugging Challenge for the AI Coding Age 25 May 2026
A/B Tested Gemini 3.1 Pro vs. Claude Opus 4.6 – Usage Quota and Quality Comparison 22 May 2026
Estimating Black-Box LLM Parameter Counts via Factual Capacity 30 April 2026
I Cancelled Codex Two Months Ago. Opus 4.7 Brought Me Back 23 April 2026
Google's Gemma 4: The Most Practical Local LLM Despite Not Being The Smartest 16 April 2026
Noi Enables Running ChatGPT and Claude Side-by-Side on Your Desktop 15 April 2026
Show HN: SkillCompass – Open-Source Quality Evaluator for Your AI Skills 13 April 2026
MiniMax-M2.7 Delivers Exceptional Performance on Consumer Hardware 13 April 2026
Running Same Prompts Through Claude and Local LLM Revealed Unexpected Results 13 April 2026
Google Gemma 4 Delivers Exceptional Speed and Accuracy for Local Inference 12 April 2026
Google's Gemini Nano 4 Offers Faster, Smarter Local Inference Capabilities 11 April 2026
Gemma 4 31B vs Qwen 3.5 27B: Comprehensive Long Context Benchmark 11 April 2026
I Replaced My Local LLM With a Model Half Its Size and Got Better Results — and It Wasn't About the Parameters 9 April 2026
YC-Bench: GLM-5 Matches Claude Opus 4.6 at 11× Lower Cost 4 April 2026
Gemma 4 31B Outperforms GLM 5.1 in Real-World Testing 4 April 2026
Gemma 4 26B A4B Outperforms Qwen 3.5 35B on Apple Silicon 3 April 2026
Mistral AI Releases Voxtral: Open-Source TTS Model Beating ElevenLabs on Local Hardware 27 March 2026
Real-World Benchmark: DeepSeek-V3 Matches Claude Sonnet on Routine Coding Tasks 26 March 2026
MiniMax M2.7 Model to Be Released as Open Weights 23 March 2026
Building a Production AI Receptionist: Practical Local LLM Deployment Case Study 23 March 2026
Nvidia Nemotron Cascade 2 30B Emerges as Powerful Alternative to Qwen Models 22 March 2026
Llama 8B Matches 70B Performance on Multi-Hop QA Using Structured Prompting 22 March 2026
Qwen 3.5 397B emerges as top-performing local coding model 21 March 2026
DeepSeek R1 RTX 4090 vs Apple M3 Max: Benchmark & Performance Guide 21 March 2026
Why Self-Hosted LLMs Make Financial and Privacy Sense Over Paid Services 20 March 2026
Hugging Face Releases One-Liner for Automatic Hardware Detection and Model Selection 18 March 2026
Qwen 3.5 4B Outperforms Nvidia Nemotron 3 4B in Local Benchmarks 17 March 2026
Open-Source LLMs Rapidly Displacing Proprietary SOTA Models 16 March 2026
OpenClaw vs Eigent vs Claude Cowork: Comparing Open-Source AI Collaboration Platforms 15 March 2026
Nvidia's Nemotron 3 Super: Understanding the Significance for Local LLM Deployment 15 March 2026
Best Local LLM Models 2026: Developer Comparison 14 March 2026
Runpod Report: Qwen Has Overtaken Meta's Llama As The Most-Deployed Self-Hosted LLM 13 March 2026
Quantization Explained: Q4_K_M vs AWQ vs FP16 for Local LLMs 12 March 2026
Fine-Tuned Qwen SLMs (0.6–8B) Demonstrate Competitive Performance Against Frontier LLMs on Specialized Tasks 10 March 2026
Community Survey: AI Content Automation Stacks in 2026 10 March 2026
How to Run Your Own Local LLM — 2026 Edition 9 March 2026
FretBench – Testing 14 LLMs on Reading Guitar Tabs Reveals Performance Gaps 9 March 2026
llama-swap Emerges as Superior Alternative to Ollama and LM-Studio 6 March 2026
Qwen 3.5-27B Q4 Quantization Comparison and Analysis 4 March 2026
Qwen 3.5 vs Qwen 3 Benchmark Analysis: Generational Performance Improvements Visualized 3 March 2026
Framework Choice Critical: llama.cpp and vLLM Outperform Ollama for Qwen 3.5 Testing 3 March 2026
RAG vs. Skill vs. MCP vs. RLM: Comparing LLM Enhancement Patterns 2 March 2026
Browser Use vs. Claude Computer Use: Comparing Agent Automation Frameworks 2 March 2026
The ML.energy Leaderboard 28 February 2026
LLmFit: Terminal Tool for Right-Sizing LLM Models to Your Hardware 28 February 2026
LLmFit: One-Command Hardware-Aware Model Selection Across 497 Models and 133 Providers 28 February 2026
Extracting 100K Concepts from an 8B LLM 27 February 2026
Qwen 3.5 Underperforms on Hard Coding Tasks—APEX Benchmark Analysis 26 February 2026
LM Studio vs Ollama: Complete Comparison 26 February 2026
No, Local LLMs Can't Replace ChatGPT or Gemini — I Tried 24 February 2026
Strix Halo Performance Benchmarks: Minimax M2.5, Step 3.5 Flash, Qwen3 Coder 21 February 2026
SanityBoard Adds 27 New Model Evaluations Including Qwen 3.5 Plus, GLM 5, and Gemini 3.1 Pro 20 February 2026
Qwen3 Coder Next 8FP Demonstrates Exceptional Long-Context Performance on 128GB System 20 February 2026
Enhanced Quantization Visualization Methods for Understanding LLM Compression Trade-offs 19 February 2026
Real-World Coding Benchmark Tests LLMs on 65 Production Codebase Tasks 18 February 2026
Ask HN: How Do You Debug Multi-Step AI Workflows When the Output Is Wrong? 18 February 2026
Open-Source Models Now Comprise 4 of Top 5 Most-Used Endpoints on OpenRouter 17 February 2026
Switching From Ollama And LM Studio To llama.cpp: A Performance Comparison 14 February 2026
MiniMax Releases M2.5 Model with SOTA Coding and Agent Capabilities 14 February 2026
Switching From Ollama and LM Studio to llama.cpp: Performance Benefits 13 February 2026
I Tried a Claude Code Rival That's Local, Open Source, and Completely Free 12 February 2026
Developer Switches from Ollama and LM Studio to llama.cpp for Better Performance 11 February 2026
Energy-Based Models Compared Against Frontier AI for Sudoku Solving 11 February 2026
Anthropic Releases Claude Opus 4.6 Sabotage Risk Assessment 11 February 2026