DFlash Doubles Token Generation Speed of Qwen3.5 27B on Mac M5 Max

1 min read
oMLX

A significant performance breakthrough for Mac users: DFlash support in oMLX 0.3.5 RC1 has demonstrated a 2x speedup for local Qwen3.5 27B inference on Apple Silicon. Initial benchmarks show generation speed improving from 9 to 22 tokens per second on an M5 Max with 128GB unified memory, using speculative decoding with a draft model from the Hugging Face ecosystem.

This is a major win for on-device development workflows, particularly for developers building applications on MacBook Pros who need reasonable inference latency without cloud dependencies. DFlash (dynamic flash attention) combined with draft model speculation represents the cutting edge of optimization techniques for local inference, and native MLX support brings these techniques directly to the Apple Silicon ecosystem where they're most effective.

For practitioners running Qwen models locally on Mac hardware, upgrading to oMLX 0.3.5 RC1 and leveraging the DFlash + draft model pattern could unlock nearly real-time token generation for many practical applications.


Source: r/LocalLLaMA · Relevance: 9/10