Tagged "moe"
- FOMOE: Running 397B Parameter Qwen3.5 MoE at 5-9 tok/s on $2,100 Desktop Hardware
- FlashAttention-4 Delivers 2.7x Faster Inference with 1613 TFLOPs/s on Blackwell GPUs
- Chinese LLM Ecosystem Landscape: ByteDance Doubao, Alibaba, and Open-Source Competition
- Researcher Discovers Universal "Danger Zone" in Transformer Model Architecture at 50% Depth
- Qwen3.5-397B Achieves 282 tok/s on 4x RTX PRO 6000 Blackwell Through Custom CUTLASS Kernel
- I made Karpathy's Autoresearch work on CPU
- Nvidia Releases Nemotron 3 Super: 120B MoE Model for Local Deployment
- Comprehensive MoE Backend Benchmarks for Qwen3.5-397B: Real Numbers vs Hype
- Krasis: Hybrid CPU/GPU MoE Runtime Achieves 3,324 Tokens/Second Prefill on RTX 5080
- Krasis Hybrid MoE Runtime Achieves 3,324 tok/s Prefill on Single RTX 5080
- Qwen 3.5 MoE Delivers 100K Context Window at 40+ TPS on RTX 5060 Ti
- Qwen3.5-35B-A3B Emerges as Game-Changer for Agentic Coding Tasks
- [Release] Ouro-2.6B-Thinking: ByteDance's Recurrent Model Now Runnable Locally
- Alibaba's Qwen3.5-397B Achieves #3 Position in Open Weights Model Rankings
- Qwen3-Next 80B MoE Achieves 39 Tokens/Second on RTX 5070/5060 Ti Dual-GPU Setup
- Qwen 3.5-397B-A17B Now Available for Local Inference with Aggressive Quantisation
- MiniMax-M2.5 230B MoE Model Released with GGUF Support for Local Deployment
- GPT-OSS 120B Uncensored Model Released in Native MXFP4 Precision
- MiniMax M2.5: 230B Parameter MoE Model Coming to HuggingFace
- Ming-flash-omni-2.0: 100B MoE Omni-Modal Model Released
- GLM-5 Released: 744B Parameter MoE Model Targeting Complex Tasks