Qwen 3.5 Small Models Released: 0.8B to 9B Parameters Optimized for On-Device Inference

3 March 2026 1 min read

r/LocalLLaMAsource

Alibaba's Qwen team has released their latest small model family, Qwen 3.5 Small, specifically engineered for on-device applications. The lineup includes models at 0.8B, 2B, 4B, and 9B parameters, with each size offering significant improvements over previous Qwen generations. Community members report that even the smallest 0.8B variant demonstrates surprisingly capable performance for local inference scenarios.

What makes this release particularly significant for local LLM practitioners is the progression of improvements across model sizes. The 4B and 9B variants are receiving praise for research capabilities and reasoning tasks, while the 0.8B model is proving viable even on legacy hardware like a 7-year-old Samsung S10E running at 12 tokens/second through llama.cpp. The multimodal capabilities in such small form factors represent a major step forward in practical edge deployment, enabling developers to run vision-language models on consumer hardware without enterprise-grade resources.

These models are already being tested across various local inference frameworks, from browser-based WebGPU implementations to mobile deployments, making them immediately relevant for developers seeking to add AI capabilities to edge applications.

Source: r/LocalLLaMA · Relevance: 10/10