Tagged "qwen"

110 Tokens/Second on RTX 4070 Super with Qwen 3.6 35B 22 May 2026
I Stopped Trying to Replace My Cloud LLMs, and Local Models Finally Made Sense 19 May 2026
llama.cpp Adds Multi-Token Prediction, Doubles Qwen 3.6B Throughput for Local Inference 19 May 2026
Qwen3-Coder-Next Local Deployment: Complete Developer Guide for 2026 10 May 2026
Continue.dev for Developers: Complete Local AI Coding Assistant Setup 10 May 2026
Using a Local LLM as a Zero-Shot Classifier 24 April 2026
DFlash Doubles Token Generation Speed of Qwen3.5 27B on Mac M5 Max 15 April 2026
Fine-Tuned Qwen3.5-0.8B for OCR Outperforms Previous 2B Release 14 April 2026
Qwen 3.5 Small – On-Device Multimodal Models Released 14 April 2026
Copilot Rate-Limiting Issues Highlight Cloud AI Service Limitations 14 April 2026
Google Gemma 4 Delivers Exceptional Speed and Accuracy for Local Inference 12 April 2026
DFlash Speculative Decoding Achieves 3.3x Speedup on Apple Silicon 12 April 2026
Intel Arc Pro B70 32GB Achieves 12 Tokens/Sec on Qwen 3.5-27B 11 April 2026
Gemma 4 31B vs Qwen 3.5 27B: Comprehensive Long Context Benchmark 11 April 2026
Qwen 3.5 122B Achieves 198 Tokens/sec on Dual RTX PRO 6000 Blackwell GPUs 10 April 2026
Run Qwen3.5 on an Old Laptop: A Lightweight Local Agentic AI Setup Guide 9 April 2026
I Replaced My Local LLM With a Model Half Its Size and Got Better Results — and It Wasn't About the Parameters 9 April 2026
Gemini-CLI, Llama.cpp, and Qwen3.5 Running on NVIDIA Jetson TK1 9 April 2026
Quantization Strategy Comparison: Balancing Quality and Speed on Consumer Laptops 6 April 2026
Qwen 3.5 397B Reduced to 35% Parameters With Usable Quality on 96GB GPU 5 April 2026
Qwen 3.6 Free Model Available via OpenRouter 5 April 2026
Gemma 4 31B Achieves Third Place on FoodTruck Bench, Beating Larger Models 5 April 2026
Apple Research Shows Self-Distillation Significantly Improves Local Code Generation 5 April 2026
Google Gemma 4 Released with GGUF Quantizations 3 April 2026
Gemma 4 26B A4B Outperforms Qwen 3.5 35B on Apple Silicon 3 April 2026
TurboQuant Enables Qwen 3.5-27B on 16GB Consumer GPUs 2 April 2026
Qwen 3.6-Plus Released 2 April 2026
Qwen 3.5-27B Demonstrates Superior Performance vs Gemini 3.1 Pro and GPT-5.3 1 April 2026
Claude Code Source Leaked: Community Extracts Multi-Agent Orchestration Framework 1 April 2026
ByteShape Releases Qwen 3.5 9B Quantisations with Hardware-Matched Tuning Guide 1 April 2026
TurboQuant KV Cache Compression Achieves 22.8% Faster Decoding at 32K Context 28 March 2026
M5 Max Delivers 1.7x Faster Inference Than M3 Max on Qwen 3.5 Models 28 March 2026
GLM-5.1 Model Weights Launching Early April for Local Deployment 28 March 2026
Coding Implementation to Run Qwen3.5 Reasoning Models Distilled With Claude-Style Thinking Using GGUF and 4-Bit Quantization 27 March 2026
Qwen 3.5 27B Achieves 1.1M Tokens/Second on B200 GPUs with Optimized vLLM Config 27 March 2026
Homelab Consolidation: Replacing 3 Models with Single 122B MoE Model on AMD Ryzen AI MAX+ 27 March 2026
Intel Launches Arc Pro B70/B65 with 32GB VRAM for Local AI Inference 26 March 2026
Qwen3.5-27B Emerges as Sweet Spot for Single-GPU Local Deployment 24 March 2026
Chinese LLM Ecosystem Landscape: ByteDance Doubao, Alibaba, and Open-Source Competition 24 March 2026
Qwen 3.5 Models: Optimal Settings and Reduced Overthinking Configuration 23 March 2026
Alibaba Commits to Continuous Open-Sourcing of Qwen and Wan Models 23 March 2026
Qwen 3.5 122B Uncensored (Aggressive) Released with New K_P Quantisations 22 March 2026
Nvidia Nemotron Cascade 2 30B Emerges as Powerful Alternative to Qwen Models 22 March 2026
ik_llama.cpp Fork Delivers 26x Faster Prompt Processing on Qwen 3.5 27B 22 March 2026
Qwen 3.5 397B emerges as top-performing local coding model 21 March 2026
Multi-Token Prediction support coming to MLX-LM for Qwen 3.5 21 March 2026
Qwen 3.5 Emerges as Top Performer for Local Deployment with Extensive Quantization Options 20 March 2026
Community Converges on Optimal KV Cache Quantization Strategies for Qwen 3.5 Models 20 March 2026
NVIDIA Nemotron Cascade 2 30B Delivers 120B-Class Performance in Compact Form Factor 20 March 2026
Qwen 3.5 4B Outperforms Nvidia Nemotron 3 4B in Local Benchmarks 17 March 2026
Local Qwen Models Master Browser Automation Through Iterative Replanning 17 March 2026
Practical Fix for Qwen 3.5 Overthinking in llama.cpp 16 March 2026
Qwen 3.5 122B Demonstrates Exceptional Reasoning for Local Deployment 16 March 2026
Achieving 2000 Tokens Per Second with QWEN 3.5 27B on RTX-5090 14 March 2026
Fine-Tuned 14B Model Outperforms Claude Opus 4.6 on Ada Code Generation 14 March 2026
Runpod Report: Qwen Has Overtaken Meta's Llama As The Most-Deployed Self-Hosted LLM 13 March 2026
Intel Updates LLM-Scaler-vLLM With Support For More Qwen3/3.5 Models 13 March 2026
Experiment: 0.8B Model Self-Improvement on MacBook Air Yields Surprising Results 11 March 2026
Simple Layer Duplication Technique Achieves Top Open LLM Leaderboard Performance 11 March 2026
Qwen 3.5-35B Uncensored GGUF Models Now Available 11 March 2026
Qwen 3.5 Ultra-Compact Models Enable On-Device AI from Watches to Gaming 10 March 2026
Fine-Tuned Qwen SLMs (0.6–8B) Demonstrate Competitive Performance Against Frontier LLMs on Specialized Tasks 10 March 2026
Strix Halo (Ryzen AI Max+ 395) Achieves Strong Local Inference Performance with ROCm 7.2 9 March 2026
Qwen 3.5 Small Expands On-Device AI to Phones and IoT with Offline Support 9 March 2026
Qwen 3.5 Family Benchmark Comparison Shows Strong Performance Across Smaller Models 9 March 2026
Qwen 3.5 Derestricted Model Available for Local Deployment 9 March 2026
Qwen 3.5 27B Achieves Strong Local Inference Performance 8 March 2026
Benchmark: Local Open-Source LLMs Competitive in Real-Time Trading Applications 8 March 2026
Llama.cpp Prompt Processing Optimization: Ubatch Size Configuration Guide 8 March 2026
Qwen3-Coder-Next Achieves Top Ranking on SWE-bench at Pass@5 7 March 2026
Open WebUI Adds Native Terminal Tool Calling with Qwen3.5 35B Support 7 March 2026
Alibaba Releases Qwen 3.5 AI Model with On-Device AI Support 7 March 2026
Final Qwen3.5 Unsloth GGUF Update with Improved Size/Quality Tradeoffs 6 March 2026
Real-World Qwen 3.5 9B Agent Performance on M1 Pro Validates Edge Deployment 6 March 2026
Alibaba Releases Qwen 3.5 AI Model with On-Device AI Support 6 March 2026
Qwen 3.5-4B Generates Fully Functional OS in Single Prompt 4 March 2026
Qwen 3.5-35B-A3B Achieves 37.8% on SWE-bench Verified Hard 4 March 2026
Qwen 3.5-27B Q4 Quantization Comparison and Analysis 4 March 2026
Quantifying Cost Savings with Local LLMs for Development 4 March 2026
Apple M5 Pro and M5 Max: 4× Faster LLM Processing 4 March 2026
Qwen 3.5 Small Models Released: 0.8B to 9B Parameters Optimized for On-Device Inference 3 March 2026
Qwen 3.5 0.8B Successfully Deployed on 7-Year-Old Samsung S10E Using llama.cpp 3 March 2026
Qwen 3.5 0.8B Running in Browser with WebGPU via Transformers.js 3 March 2026
Qwen 3.5 vs Qwen 3 Benchmark Analysis: Generational Performance Improvements Visualized 3 March 2026
Framework Choice Critical: llama.cpp and vLLM Outperform Ollama for Qwen 3.5 Testing 3 March 2026
Alibaba's Qwen 3.5 Small Model Runs Directly on iPhone 17 3 March 2026
Qwen 3.5 27B Achieves 100+ Tokens/s Decode on Dual RTX 3090s with 170K Context 2 March 2026
Critical: Qwen 3.5 Requires BF16 KV Cache, Not FP16 for Accurate Inference 2 March 2026
Jan Releases Code-Tuned 4B Model for Efficient Local Code Generation and Development Tasks 2 March 2026
Switch Qwen 3.5 Thinking Mode On/Off Without Model Reload Using setParamsByID 1 March 2026
Qwen 3.5-35B-A3B Emerges as Efficient Daily Driver, Replacing 120B Models 1 March 2026
Qwen3.5-35B Unsloth Dynamic GGUFs Achieve SOTA Across Nearly All Quantisation Levels 28 February 2026
Qwen3.5-35B RTX 5080 Experiments Confirm KV q8_0 as Free Lunch, Q4_K_M Remains Optimal 28 February 2026
Qwen3.5-35B Successfully Runs on Raspberry Pi 5 at 3+ Tokens/Second 28 February 2026
Qwen 3.5-35B Unsloth Dynamic GGUFs Achieve SOTA Quantisation Benchmarks 28 February 2026
Qwen 3.5-35B RTX 5080 Benchmarks Confirm KV Q8_0 as Free Lunch, Q4_K_M Remains Optimal 28 February 2026
Qwen 3.5-27B Demonstrates Exceptional Performance with Thoughtful Prompt Engineering 28 February 2026
Qwen 3.5 MoE Delivers 100K Context Window at 40+ TPS on RTX 5060 Ti 26 February 2026
Qwen 3.5 Underperforms on Hard Coding Tasks—APEX Benchmark Analysis 26 February 2026
Qwen3.5 122B Achieves 25 tok/s on 72GB VRAM Setup 26 February 2026
Qwen3.5 Thinking Mode Can Be Disabled for Production Inference Optimization 25 February 2026
Qwen3.5 Series Releases Comprehensive Model Lineup Across All Tiers 25 February 2026
Qwen3.5-35B-A3B Emerges as Game-Changer for Agentic Coding Tasks 25 February 2026
Qwen3.5-27B Identified as Sweet Spot for Mid-Range Local Deployment 25 February 2026
Qwen3 Coder Next Remains Effective at Aggressive Quantization Levels 21 February 2026
SanityBoard Adds 27 New Model Evaluations Including Qwen 3.5 Plus, GLM 5, and Gemini 3.1 Pro 20 February 2026
Local Vision-Language Models for Document OCR and PII Detection in Privacy-Critical Workflows 19 February 2026
AMD Announces Day 0 Support for Qwen 3.5 LLM on Instinct GPUs 18 February 2026
Qwen3-Next 80B MoE Achieves 39 Tokens/Second on RTX 5070/5060 Ti Dual-GPU Setup 17 February 2026
Qwen 3.5-397B-A17B Now Available for Local Inference with Aggressive Quantisation 17 February 2026
Qwen Coder Next Shows Specialized Agent Performance 12 February 2026