Tagged "large-model-inference"
- M5 Max MacBook Runs Local Large Language Models Efficiently
- AMD's New Ryzen AI Max Pro 400 with 192GB LPDDR5X Memory
- Dynamic Expert Cache in llama.cpp Achieves 27% Faster Inference on Large MoE Models
- Qwen 3.5 122B Achieves 198 Tokens/sec on Dual RTX PRO 6000 Blackwell GPUs
- Ollama Gets Blazing Fast on Macs with Full MLX Support and 2× Speedups
- TinyGPU Adds Mac Support for External Nvidia GPU Acceleration
- Intel's $949 GPU has 32GB of VRAM for local AI, but the software is why Nvidia keeps winning
- Qwen3.5-397B Achieves 282 tok/s on 4x RTX PRO 6000 Blackwell Through Custom CUTLASS Kernel
- Apple M5 Max 128GB Benchmark Results for Local LLM Inference
- M5 Max and M5 Ultra Chipsets Demonstrate Significant Bandwidth Improvements for Local LLM Inference
- Community Member Builds 144GB VRAM Local LLM Powerhouse