Qwen 3.5 122B Achieves 198 Tokens/sec on Dual RTX PRO 6000 Blackwell GPUs

1 min read

A practitioner has published verified optimization results for running Qwen 3.5 122B locally, achieving 198 tokens per second on a dual RTX PRO 6000 Blackwell setup. The public methodology includes raw JSON benchmarks, launch commands, and detailed hardware specifications, making this a reproducible reference point for anyone considering mid-range GPU deployments.

This is significant for local LLM practitioners because it demonstrates that large 122B-parameter models can achieve production-ready inference speeds without enterprise-level hardware investments. The Blackwell architecture provides a cost-effective alternative to data center GPUs while maintaining competitive throughput for applications requiring lower latency than cloud APIs.

The shared benchmark data helps practitioners make informed purchasing decisions and optimize their inference pipelines for models like Qwen that are increasingly favored in the local LLM community.


Source: r/LocalLLaMA · Relevance: 9/10