Tagged "qwen"

Qwen3.5-27B Emerges as Sweet Spot for Single-GPU Local Deployment 24 March 2026
Chinese LLM Ecosystem Landscape: ByteDance Doubao, Alibaba, and Open-Source Competition 24 March 2026
Qwen 3.5 Models: Optimal Settings and Reduced Overthinking Configuration 23 March 2026
Alibaba Commits to Continuous Open-Sourcing of Qwen and Wan Models 23 March 2026
Qwen 3.5 122B Uncensored (Aggressive) Released with New K_P Quantisations 22 March 2026
Nvidia Nemotron Cascade 2 30B Emerges as Powerful Alternative to Qwen Models 22 March 2026
ik_llama.cpp Fork Delivers 26x Faster Prompt Processing on Qwen 3.5 27B 22 March 2026
Qwen 3.5 397B emerges as top-performing local coding model 21 March 2026
Multi-Token Prediction support coming to MLX-LM for Qwen 3.5 21 March 2026
Qwen 3.5 Emerges as Top Performer for Local Deployment with Extensive Quantization Options 20 March 2026
Community Converges on Optimal KV Cache Quantization Strategies for Qwen 3.5 Models 20 March 2026
NVIDIA Nemotron Cascade 2 30B Delivers 120B-Class Performance in Compact Form Factor 20 March 2026
Qwen 3.5 4B Outperforms Nvidia Nemotron 3 4B in Local Benchmarks 17 March 2026
Local Qwen Models Master Browser Automation Through Iterative Replanning 17 March 2026
Practical Fix for Qwen 3.5 Overthinking in llama.cpp 16 March 2026
Qwen 3.5 122B Demonstrates Exceptional Reasoning for Local Deployment 16 March 2026
Achieving 2000 Tokens Per Second with QWEN 3.5 27B on RTX-5090 14 March 2026
Fine-Tuned 14B Model Outperforms Claude Opus 4.6 on Ada Code Generation 14 March 2026
Runpod Report: Qwen Has Overtaken Meta's Llama As The Most-Deployed Self-Hosted LLM 13 March 2026
Intel Updates LLM-Scaler-vLLM With Support For More Qwen3/3.5 Models 13 March 2026
Experiment: 0.8B Model Self-Improvement on MacBook Air Yields Surprising Results 11 March 2026
Simple Layer Duplication Technique Achieves Top Open LLM Leaderboard Performance 11 March 2026
Qwen 3.5-35B Uncensored GGUF Models Now Available 11 March 2026
Qwen 3.5 Ultra-Compact Models Enable On-Device AI from Watches to Gaming 10 March 2026
Fine-Tuned Qwen SLMs (0.6–8B) Demonstrate Competitive Performance Against Frontier LLMs on Specialized Tasks 10 March 2026
Strix Halo (Ryzen AI Max+ 395) Achieves Strong Local Inference Performance with ROCm 7.2 9 March 2026
Qwen 3.5 Small Expands On-Device AI to Phones and IoT with Offline Support 9 March 2026
Qwen 3.5 Family Benchmark Comparison Shows Strong Performance Across Smaller Models 9 March 2026
Qwen 3.5 Derestricted Model Available for Local Deployment 9 March 2026
Qwen 3.5 27B Achieves Strong Local Inference Performance 8 March 2026
Benchmark: Local Open-Source LLMs Competitive in Real-Time Trading Applications 8 March 2026
Llama.cpp Prompt Processing Optimization: Ubatch Size Configuration Guide 8 March 2026
Qwen3-Coder-Next Achieves Top Ranking on SWE-bench at Pass@5 7 March 2026
Open WebUI Adds Native Terminal Tool Calling with Qwen3.5 35B Support 7 March 2026
Alibaba Releases Qwen 3.5 AI Model with On-Device AI Support 7 March 2026
Final Qwen3.5 Unsloth GGUF Update with Improved Size/Quality Tradeoffs 6 March 2026
Real-World Qwen 3.5 9B Agent Performance on M1 Pro Validates Edge Deployment 6 March 2026
Alibaba Releases Qwen 3.5 AI Model with On-Device AI Support 6 March 2026
Qwen 3.5-4B Generates Fully Functional OS in Single Prompt 4 March 2026
Qwen 3.5-35B-A3B Achieves 37.8% on SWE-bench Verified Hard 4 March 2026
Qwen 3.5-27B Q4 Quantization Comparison and Analysis 4 March 2026
Quantifying Cost Savings with Local LLMs for Development 4 March 2026
Apple M5 Pro and M5 Max: 4× Faster LLM Processing 4 March 2026
Qwen 3.5 Small Models Released: 0.8B to 9B Parameters Optimized for On-Device Inference 3 March 2026
Qwen 3.5 0.8B Successfully Deployed on 7-Year-Old Samsung S10E Using llama.cpp 3 March 2026
Qwen 3.5 0.8B Running in Browser with WebGPU via Transformers.js 3 March 2026
Qwen 3.5 vs Qwen 3 Benchmark Analysis: Generational Performance Improvements Visualized 3 March 2026
Framework Choice Critical: llama.cpp and vLLM Outperform Ollama for Qwen 3.5 Testing 3 March 2026
Alibaba's Qwen 3.5 Small Model Runs Directly on iPhone 17 3 March 2026
Qwen 3.5 27B Achieves 100+ Tokens/s Decode on Dual RTX 3090s with 170K Context 2 March 2026
Critical: Qwen 3.5 Requires BF16 KV Cache, Not FP16 for Accurate Inference 2 March 2026
Jan Releases Code-Tuned 4B Model for Efficient Local Code Generation and Development Tasks 2 March 2026
Switch Qwen 3.5 Thinking Mode On/Off Without Model Reload Using setParamsByID 1 March 2026
Qwen 3.5-35B-A3B Emerges as Efficient Daily Driver, Replacing 120B Models 1 March 2026
Qwen3.5-35B Unsloth Dynamic GGUFs Achieve SOTA Across Nearly All Quantisation Levels 28 February 2026
Qwen3.5-35B RTX 5080 Experiments Confirm KV q8_0 as Free Lunch, Q4_K_M Remains Optimal 28 February 2026
Qwen3.5-35B Successfully Runs on Raspberry Pi 5 at 3+ Tokens/Second 28 February 2026
Qwen 3.5-35B Unsloth Dynamic GGUFs Achieve SOTA Quantisation Benchmarks 28 February 2026
Qwen 3.5-35B RTX 5080 Benchmarks Confirm KV Q8_0 as Free Lunch, Q4_K_M Remains Optimal 28 February 2026
Qwen 3.5-27B Demonstrates Exceptional Performance with Thoughtful Prompt Engineering 28 February 2026
Qwen 3.5 MoE Delivers 100K Context Window at 40+ TPS on RTX 5060 Ti 26 February 2026
Qwen 3.5 Underperforms on Hard Coding Tasks—APEX Benchmark Analysis 26 February 2026
Qwen3.5 122B Achieves 25 tok/s on 72GB VRAM Setup 26 February 2026
Qwen3.5 Thinking Mode Can Be Disabled for Production Inference Optimization 25 February 2026
Qwen3.5 Series Releases Comprehensive Model Lineup Across All Tiers 25 February 2026
Qwen3.5-35B-A3B Emerges as Game-Changer for Agentic Coding Tasks 25 February 2026
Qwen3.5-27B Identified as Sweet Spot for Mid-Range Local Deployment 25 February 2026
Qwen3 Coder Next Remains Effective at Aggressive Quantization Levels 21 February 2026
SanityBoard Adds 27 New Model Evaluations Including Qwen 3.5 Plus, GLM 5, and Gemini 3.1 Pro 20 February 2026
Local Vision-Language Models for Document OCR and PII Detection in Privacy-Critical Workflows 19 February 2026
AMD Announces Day 0 Support for Qwen 3.5 LLM on Instinct GPUs 18 February 2026
Qwen3-Next 80B MoE Achieves 39 Tokens/Second on RTX 5070/5060 Ti Dual-GPU Setup 17 February 2026
Qwen 3.5-397B-A17B Now Available for Local Inference with Aggressive Quantisation 17 February 2026
Qwen Coder Next Shows Specialized Agent Performance 12 February 2026