MiniMax-M2.5 230B MoE Model Released with GGUF Support for Local Deployment

14 February 2026 1 min read

MiniMax-M2.5, a 230 billion parameter mixture-of-experts model, has been released and is now available for local deployment through GGUF quantizations. The model is showing impressive performance in early benchmarks, with community members already creating optimized quants for various hardware configurations including M3 Max systems with 128GB RAM.

The model's MoE architecture makes it feasible to run locally despite its large parameter count, as only a subset of parameters are active during inference. Early testing shows strong performance across coding and reasoning tasks, with GGUF versions already available for immediate deployment on llama.cpp, LMStudio, and other local inference frameworks.

This release represents a significant advancement in locally deployable large language models, offering performance that rivals proprietary models while maintaining the flexibility and privacy benefits of on-device inference.

Source: r/LocalLLaMA · Relevance: 9/10