Tagged "qwen"
-
Using a Local LLM as a Zero-Shot Classifier
-
DFlash Doubles Token Generation Speed of Qwen3.5 27B on Mac M5 Max
-
Fine-Tuned Qwen3.5-0.8B for OCR Outperforms Previous 2B Release
-
Qwen 3.5 Small – On-Device Multimodal Models Released
-
Copilot Rate-Limiting Issues Highlight Cloud AI Service Limitations
-
Google Gemma 4 Delivers Exceptional Speed and Accuracy for Local Inference
-
DFlash Speculative Decoding Achieves 3.3x Speedup on Apple Silicon
-
Intel Arc Pro B70 32GB Achieves 12 Tokens/Sec on Qwen 3.5-27B
-
Gemma 4 31B vs Qwen 3.5 27B: Comprehensive Long Context Benchmark
-
Qwen 3.5 122B Achieves 198 Tokens/sec on Dual RTX PRO 6000 Blackwell GPUs
-
Run Qwen3.5 on an Old Laptop: A Lightweight Local Agentic AI Setup Guide
-
I Replaced My Local LLM With a Model Half Its Size and Got Better Results — and It Wasn't About the Parameters
-
Gemini-CLI, Llama.cpp, and Qwen3.5 Running on NVIDIA Jetson TK1
-
Quantization Strategy Comparison: Balancing Quality and Speed on Consumer Laptops
-
Qwen 3.5 397B Reduced to 35% Parameters With Usable Quality on 96GB GPU
-
Qwen 3.6 Free Model Available via OpenRouter
-
Gemma 4 31B Achieves Third Place on FoodTruck Bench, Beating Larger Models
-
Apple Research Shows Self-Distillation Significantly Improves Local Code Generation
-
Google Gemma 4 Released with GGUF Quantizations
-
Gemma 4 26B A4B Outperforms Qwen 3.5 35B on Apple Silicon
-
TurboQuant Enables Qwen 3.5-27B on 16GB Consumer GPUs
-
Qwen 3.6-Plus Released
-
Qwen 3.5-27B Demonstrates Superior Performance vs Gemini 3.1 Pro and GPT-5.3
-
Claude Code Source Leaked: Community Extracts Multi-Agent Orchestration Framework
-
ByteShape Releases Qwen 3.5 9B Quantisations with Hardware-Matched Tuning Guide
-
TurboQuant KV Cache Compression Achieves 22.8% Faster Decoding at 32K Context
-
M5 Max Delivers 1.7x Faster Inference Than M3 Max on Qwen 3.5 Models
-
GLM-5.1 Model Weights Launching Early April for Local Deployment
-
Coding Implementation to Run Qwen3.5 Reasoning Models Distilled With Claude-Style Thinking Using GGUF and 4-Bit Quantization
-
Qwen 3.5 27B Achieves 1.1M Tokens/Second on B200 GPUs with Optimized vLLM Config
-
Homelab Consolidation: Replacing 3 Models with Single 122B MoE Model on AMD Ryzen AI MAX+
-
Intel Launches Arc Pro B70/B65 with 32GB VRAM for Local AI Inference
-
Qwen3.5-27B Emerges as Sweet Spot for Single-GPU Local Deployment
-
Chinese LLM Ecosystem Landscape: ByteDance Doubao, Alibaba, and Open-Source Competition
-
Qwen 3.5 Models: Optimal Settings and Reduced Overthinking Configuration
-
Alibaba Commits to Continuous Open-Sourcing of Qwen and Wan Models
-
Qwen 3.5 122B Uncensored (Aggressive) Released with New K_P Quantisations
-
Nvidia Nemotron Cascade 2 30B Emerges as Powerful Alternative to Qwen Models
-
ik_llama.cpp Fork Delivers 26x Faster Prompt Processing on Qwen 3.5 27B
-
Qwen 3.5 397B emerges as top-performing local coding model
-
Multi-Token Prediction support coming to MLX-LM for Qwen 3.5
-
Qwen 3.5 Emerges as Top Performer for Local Deployment with Extensive Quantization Options
-
Community Converges on Optimal KV Cache Quantization Strategies for Qwen 3.5 Models
-
NVIDIA Nemotron Cascade 2 30B Delivers 120B-Class Performance in Compact Form Factor
-
Qwen 3.5 4B Outperforms Nvidia Nemotron 3 4B in Local Benchmarks
-
Local Qwen Models Master Browser Automation Through Iterative Replanning
-
Practical Fix for Qwen 3.5 Overthinking in llama.cpp
-
Qwen 3.5 122B Demonstrates Exceptional Reasoning for Local Deployment
-
Achieving 2000 Tokens Per Second with QWEN 3.5 27B on RTX-5090
-
Fine-Tuned 14B Model Outperforms Claude Opus 4.6 on Ada Code Generation
-
Runpod Report: Qwen Has Overtaken Meta's Llama As The Most-Deployed Self-Hosted LLM
-
Intel Updates LLM-Scaler-vLLM With Support For More Qwen3/3.5 Models
-
Experiment: 0.8B Model Self-Improvement on MacBook Air Yields Surprising Results
-
Simple Layer Duplication Technique Achieves Top Open LLM Leaderboard Performance
-
Qwen 3.5-35B Uncensored GGUF Models Now Available
-
Qwen 3.5 Ultra-Compact Models Enable On-Device AI from Watches to Gaming
-
Fine-Tuned Qwen SLMs (0.6–8B) Demonstrate Competitive Performance Against Frontier LLMs on Specialized Tasks
-
Strix Halo (Ryzen AI Max+ 395) Achieves Strong Local Inference Performance with ROCm 7.2
-
Qwen 3.5 Small Expands On-Device AI to Phones and IoT with Offline Support
-
Qwen 3.5 Family Benchmark Comparison Shows Strong Performance Across Smaller Models
-
Qwen 3.5 Derestricted Model Available for Local Deployment
-
Qwen 3.5 27B Achieves Strong Local Inference Performance
-
Benchmark: Local Open-Source LLMs Competitive in Real-Time Trading Applications
-
Llama.cpp Prompt Processing Optimization: Ubatch Size Configuration Guide
-
Qwen3-Coder-Next Achieves Top Ranking on SWE-bench at Pass@5
-
Open WebUI Adds Native Terminal Tool Calling with Qwen3.5 35B Support
-
Alibaba Releases Qwen 3.5 AI Model with On-Device AI Support
-
Final Qwen3.5 Unsloth GGUF Update with Improved Size/Quality Tradeoffs
-
Real-World Qwen 3.5 9B Agent Performance on M1 Pro Validates Edge Deployment
-
Alibaba Releases Qwen 3.5 AI Model with On-Device AI Support
-
Qwen 3.5-4B Generates Fully Functional OS in Single Prompt
-
Qwen 3.5-35B-A3B Achieves 37.8% on SWE-bench Verified Hard
-
Qwen 3.5-27B Q4 Quantization Comparison and Analysis
-
Quantifying Cost Savings with Local LLMs for Development
-
Apple M5 Pro and M5 Max: 4× Faster LLM Processing
-
Qwen 3.5 Small Models Released: 0.8B to 9B Parameters Optimized for On-Device Inference
-
Qwen 3.5 0.8B Successfully Deployed on 7-Year-Old Samsung S10E Using llama.cpp
-
Qwen 3.5 0.8B Running in Browser with WebGPU via Transformers.js
-
Qwen 3.5 vs Qwen 3 Benchmark Analysis: Generational Performance Improvements Visualized
-
Framework Choice Critical: llama.cpp and vLLM Outperform Ollama for Qwen 3.5 Testing
-
Alibaba's Qwen 3.5 Small Model Runs Directly on iPhone 17
-
Qwen 3.5 27B Achieves 100+ Tokens/s Decode on Dual RTX 3090s with 170K Context
-
Critical: Qwen 3.5 Requires BF16 KV Cache, Not FP16 for Accurate Inference
-
Jan Releases Code-Tuned 4B Model for Efficient Local Code Generation and Development Tasks
-
Switch Qwen 3.5 Thinking Mode On/Off Without Model Reload Using setParamsByID
-
Qwen 3.5-35B-A3B Emerges as Efficient Daily Driver, Replacing 120B Models
-
Qwen3.5-35B Unsloth Dynamic GGUFs Achieve SOTA Across Nearly All Quantisation Levels
-
Qwen3.5-35B RTX 5080 Experiments Confirm KV q8_0 as Free Lunch, Q4_K_M Remains Optimal
-
Qwen3.5-35B Successfully Runs on Raspberry Pi 5 at 3+ Tokens/Second
-
Qwen 3.5-35B Unsloth Dynamic GGUFs Achieve SOTA Quantisation Benchmarks
-
Qwen 3.5-35B RTX 5080 Benchmarks Confirm KV Q8_0 as Free Lunch, Q4_K_M Remains Optimal
-
Qwen 3.5-27B Demonstrates Exceptional Performance with Thoughtful Prompt Engineering
-
Qwen 3.5 MoE Delivers 100K Context Window at 40+ TPS on RTX 5060 Ti
-
Qwen 3.5 Underperforms on Hard Coding Tasks—APEX Benchmark Analysis
-
Qwen3.5 122B Achieves 25 tok/s on 72GB VRAM Setup
-
Qwen3.5 Thinking Mode Can Be Disabled for Production Inference Optimization
-
Qwen3.5 Series Releases Comprehensive Model Lineup Across All Tiers
-
Qwen3.5-35B-A3B Emerges as Game-Changer for Agentic Coding Tasks
-
Qwen3.5-27B Identified as Sweet Spot for Mid-Range Local Deployment
-
Qwen3 Coder Next Remains Effective at Aggressive Quantization Levels
-
SanityBoard Adds 27 New Model Evaluations Including Qwen 3.5 Plus, GLM 5, and Gemini 3.1 Pro
-
Local Vision-Language Models for Document OCR and PII Detection in Privacy-Critical Workflows
-
AMD Announces Day 0 Support for Qwen 3.5 LLM on Instinct GPUs
-
Qwen3-Next 80B MoE Achieves 39 Tokens/Second on RTX 5070/5060 Ti Dual-GPU Setup
-
Qwen 3.5-397B-A17B Now Available for Local Inference with Aggressive Quantisation
-
Qwen Coder Next Shows Specialized Agent Performance