Qwen3.5-35B Successfully Runs on Raspberry Pi 5 at 3+ Tokens/Second

1 min read
r/LocalLLaMApublisher

Running a 35-billion parameter model on a Raspberry Pi 5 would have been impractical just months ago, but successful execution at 3+ tokens/second demonstrates the dramatic progress in quantisation and runtime optimisation. This achievement validates that cutting-edge model quality is no longer confined to high-end GPUs; resource-constrained edge devices can now run sophisticated models with acceptable performance.

The practical implications are significant for deployed applications: IoT devices, embedded systems, and resource-limited environments can now leverage state-of-the-art language models locally. This eliminates API dependencies, improves privacy and latency, and enables offline operation—critical requirements for many production scenarios.

Successfully running Qwen3.5-35B on both 16GB and 8GB Raspberry Pi variants shows the flexibility of the model across memory constraints, providing a reference point for developers planning deployments on similar ARM-based edge hardware.

Read the full article on r/LocalLLaMA.


Source: r/LocalLLaMA · Relevance: 8/10