Tagged "datacenter-gpu"
- NVIDIA Adds Day-0 DeepSeek V4 Blackwell Support
- Elastic KV Cache Memory Breakthrough Enables Efficient Bursty LLM Serving and GPU Sharing
- Show HN: We built an OCR server that can process 270 dense images/s on a 5090
- DGX Spark Setup Guide: Running vLLM and PyTorch for Local LLM Inference Backend
- MiniMax-M2.7 Delivers Exceptional Performance on Consumer Hardware
- Researchers Achieve 1-Bit Quantization of OLMo-3 7B Using Distillation
- MiniMax M2.7 Advances Scalable Agentic Workflows on NVIDIA Platforms for Complex AI Applications
- Qwen 3.5 122B Achieves 198 Tokens/sec on Dual RTX PRO 6000 Blackwell GPUs
- AMD Announces Day 0 Support for Google Gemma 4 Across Processors and GPUs
- Qwen 3.5 397B Reduced to 35% Parameters With Usable Quality on 96GB GPU
- DGX Spark Hardware Limitations: Missing NVFP4 Support Undermines Local AI Value Proposition
- NVIDIA and Google Optimize Gemma 4 AI Models for Local RTX Deployment
- GPUs vs. TPUs: Decoding the Powerhouses of AI
- Google Launches Gemma 4 For Advanced On-Device AI
- AMD Rolls Out Gemma 4 Model Support Across Full Range of GPUs & CPUs
- NVIDIA Accelerates Gemma 4 for Local Agentic AI on RTX GPUs
- Intel's $949 GPU Has 32GB of VRAM for Local AI, but Software is Why Nvidia Keeps Winning
- GPU Passthrough to LXCs in Proxmox Simplifies Local Inference Infrastructure
- Linux Significantly Outperforms Windows for Local LLM Inference
- Qwen 3.5 27B Achieves 1.1M Tokens/Second on B200 GPUs with Optimized vLLM Config
- Hold on to Your Hardware: Implications for Local LLM Deployment
- Llama.cpp Benchmark: RTX 5090 vs Enterprise Systems Compared
- Llama.cpp ROCm 7 vs Vulkan Performance Benchmarks on AMD Mi50
- Rust Project Perspectives on AI
- ik_llama.cpp Fork Delivers 26x Faster Prompt Processing on Qwen 3.5 27B
- Custom GPU Multiplexer Achieves 0.3ms Model Switching on Legacy Hardware
- Qwen3.5-397B Achieves 282 tok/s on 4x RTX PRO 6000 Blackwell Through Custom CUTLASS Kernel
- Nvidia's Nemotron 3 Super: Understanding the Significance for Local LLM Deployment
- Sarvam Open-Sources 30B and 105B Reasoning Models
- Comprehensive MoE Backend Benchmarks for Qwen3.5-397B: Real Numbers vs Hype
- Cutile.jl Brings Nvidia CUDA Tile-Based Programming to Julia
- Sarvam Open-Sources 30B and 105B Reasoning Models
- Qwen 3.5 Family Benchmark Comparison Shows Strong Performance Across Smaller Models
- Intel Arc Pro B70 Workstation GPU Confirmed via vLLM AI Release Notes
- Google Is Exploring Ways to Use Its Financial Might to Take on Nvidia
- NVIDIA Releases Dynamo v0.9.0: Infrastructure Overhaul With FlashIndexer and Multi-Modal Support
- AMD Announces Day 0 Support for Qwen 3.5 LLM on Instinct GPUs
- High Bandwidth Flash Memory Could Alleviate VRAM Constraints in Local LLM Inference
- OpenClaw with vLLM Running for Free on AMD Developer Cloud