Mistral Small 4 119B Released with NVFP4 Quantisation Support

17 March 2026 1 min read

Mistral AI has released the Mistral Small 4 119B model family, marking a significant milestone for local LLM deployment. The release includes official NVFP4 quantisation variants, a lower-precision format optimised for NVIDIA hardware that reduces memory footprint while maintaining competitive inference performance.

The model is now available across multiple quantisation levels on HuggingFace, with official Transformers library support via GitHub PR #44760. This accessibility is crucial for practitioners running inference on edge devices and consumer-grade GPUs. The 119B parameter count positions this as a practical middle ground for those seeking advanced capabilities without enterprise-scale hardware requirements.

Community reception has been positive (523+ upvotes), with early adopters exploring the model's performance on local setups. The official quantisation support removes friction from the typical quant-creation workflow, allowing immediate deployment.

Source: r/LocalLLaMA · Relevance: 9/10