Tagged "qwen"
- Qwen3.5-27B Emerges as Sweet Spot for Single-GPU Local Deployment
- Chinese LLM Ecosystem Landscape: ByteDance Doubao, Alibaba, and Open-Source Competition
- Qwen 3.5 Models: Optimal Settings and Reduced Overthinking Configuration
- Alibaba Commits to Continuous Open-Sourcing of Qwen and Wan Models
- Qwen 3.5 122B Uncensored (Aggressive) Released with New K_P Quantisations
- Nvidia Nemotron Cascade 2 30B Emerges as Powerful Alternative to Qwen Models
- ik_llama.cpp Fork Delivers 26x Faster Prompt Processing on Qwen 3.5 27B
- Qwen 3.5 397B emerges as top-performing local coding model
- Multi-Token Prediction support coming to MLX-LM for Qwen 3.5
- Qwen 3.5 Emerges as Top Performer for Local Deployment with Extensive Quantization Options
- Community Converges on Optimal KV Cache Quantization Strategies for Qwen 3.5 Models
- NVIDIA Nemotron Cascade 2 30B Delivers 120B-Class Performance in Compact Form Factor
- Qwen 3.5 4B Outperforms Nvidia Nemotron 3 4B in Local Benchmarks
- Local Qwen Models Master Browser Automation Through Iterative Replanning
- Practical Fix for Qwen 3.5 Overthinking in llama.cpp
- Qwen 3.5 122B Demonstrates Exceptional Reasoning for Local Deployment
- Achieving 2000 Tokens Per Second with QWEN 3.5 27B on RTX-5090
- Fine-Tuned 14B Model Outperforms Claude Opus 4.6 on Ada Code Generation
- Runpod Report: Qwen Has Overtaken Meta's Llama As The Most-Deployed Self-Hosted LLM
- Intel Updates LLM-Scaler-vLLM With Support For More Qwen3/3.5 Models
- Experiment: 0.8B Model Self-Improvement on MacBook Air Yields Surprising Results
- Simple Layer Duplication Technique Achieves Top Open LLM Leaderboard Performance
- Qwen 3.5-35B Uncensored GGUF Models Now Available
- Qwen 3.5 Ultra-Compact Models Enable On-Device AI from Watches to Gaming
- Fine-Tuned Qwen SLMs (0.6–8B) Demonstrate Competitive Performance Against Frontier LLMs on Specialized Tasks
- Strix Halo (Ryzen AI Max+ 395) Achieves Strong Local Inference Performance with ROCm 7.2
- Qwen 3.5 Small Expands On-Device AI to Phones and IoT with Offline Support
- Qwen 3.5 Family Benchmark Comparison Shows Strong Performance Across Smaller Models
- Qwen 3.5 Derestricted Model Available for Local Deployment
- Qwen 3.5 27B Achieves Strong Local Inference Performance
- Benchmark: Local Open-Source LLMs Competitive in Real-Time Trading Applications
- Llama.cpp Prompt Processing Optimization: Ubatch Size Configuration Guide
- Qwen3-Coder-Next Achieves Top Ranking on SWE-bench at Pass@5
- Open WebUI Adds Native Terminal Tool Calling with Qwen3.5 35B Support
- Alibaba Releases Qwen 3.5 AI Model with On-Device AI Support
- Final Qwen3.5 Unsloth GGUF Update with Improved Size/Quality Tradeoffs
- Real-World Qwen 3.5 9B Agent Performance on M1 Pro Validates Edge Deployment
- Alibaba Releases Qwen 3.5 AI Model with On-Device AI Support
- Qwen 3.5-4B Generates Fully Functional OS in Single Prompt
- Qwen 3.5-35B-A3B Achieves 37.8% on SWE-bench Verified Hard
- Qwen 3.5-27B Q4 Quantization Comparison and Analysis
- Quantifying Cost Savings with Local LLMs for Development
- Apple M5 Pro and M5 Max: 4× Faster LLM Processing
- Qwen 3.5 Small Models Released: 0.8B to 9B Parameters Optimized for On-Device Inference
- Qwen 3.5 0.8B Successfully Deployed on 7-Year-Old Samsung S10E Using llama.cpp
- Qwen 3.5 0.8B Running in Browser with WebGPU via Transformers.js
- Qwen 3.5 vs Qwen 3 Benchmark Analysis: Generational Performance Improvements Visualized
- Framework Choice Critical: llama.cpp and vLLM Outperform Ollama for Qwen 3.5 Testing
- Alibaba's Qwen 3.5 Small Model Runs Directly on iPhone 17
- Qwen 3.5 27B Achieves 100+ Tokens/s Decode on Dual RTX 3090s with 170K Context
- Critical: Qwen 3.5 Requires BF16 KV Cache, Not FP16 for Accurate Inference
- Jan Releases Code-Tuned 4B Model for Efficient Local Code Generation and Development Tasks
- Switch Qwen 3.5 Thinking Mode On/Off Without Model Reload Using setParamsByID
- Qwen 3.5-35B-A3B Emerges as Efficient Daily Driver, Replacing 120B Models
- Qwen3.5-35B Unsloth Dynamic GGUFs Achieve SOTA Across Nearly All Quantisation Levels
- Qwen3.5-35B RTX 5080 Experiments Confirm KV q8_0 as Free Lunch, Q4_K_M Remains Optimal
- Qwen3.5-35B Successfully Runs on Raspberry Pi 5 at 3+ Tokens/Second
- Qwen 3.5-35B Unsloth Dynamic GGUFs Achieve SOTA Quantisation Benchmarks
- Qwen 3.5-35B RTX 5080 Benchmarks Confirm KV Q8_0 as Free Lunch, Q4_K_M Remains Optimal
- Qwen 3.5-27B Demonstrates Exceptional Performance with Thoughtful Prompt Engineering
- Qwen 3.5 MoE Delivers 100K Context Window at 40+ TPS on RTX 5060 Ti
- Qwen 3.5 Underperforms on Hard Coding Tasks—APEX Benchmark Analysis
- Qwen3.5 122B Achieves 25 tok/s on 72GB VRAM Setup
- Qwen3.5 Thinking Mode Can Be Disabled for Production Inference Optimization
- Qwen3.5 Series Releases Comprehensive Model Lineup Across All Tiers
- Qwen3.5-35B-A3B Emerges as Game-Changer for Agentic Coding Tasks
- Qwen3.5-27B Identified as Sweet Spot for Mid-Range Local Deployment
- Qwen3 Coder Next Remains Effective at Aggressive Quantization Levels
- SanityBoard Adds 27 New Model Evaluations Including Qwen 3.5 Plus, GLM 5, and Gemini 3.1 Pro
- Local Vision-Language Models for Document OCR and PII Detection in Privacy-Critical Workflows
- AMD Announces Day 0 Support for Qwen 3.5 LLM on Instinct GPUs
- Qwen3-Next 80B MoE Achieves 39 Tokens/Second on RTX 5070/5060 Ti Dual-GPU Setup
- Qwen 3.5-397B-A17B Now Available for Local Inference with Aggressive Quantisation
- Qwen Coder Next Shows Specialized Agent Performance