Tagged "datacenter-gpu"

NVIDIA Levels Up Local AI Agents Across RTX PCs and DGX Spark 1 June 2026
Real-time LLM Inference on Standard GPUs: 3k tokens/s per request 29 May 2026
GPUs and RAM Are in Short Supply, but the Real Bottleneck for AI Is Electricians 29 May 2026
Alibaba Cloud Joins PyTorch Foundation as Platinum Member 28 May 2026
Deploying Hermes Agent for Free on AMD Developer Cloud with Open Models and vLLM 22 May 2026
ROCm 7.2.3 Delivers Performance Improvements Over 7.0.0 on AMD Radeon AI PRO 15 May 2026
Kog AI – Building a Real-Time Inference Stack on AMD Instinct GPUs 15 May 2026
AMD's vLLM-ATOM Plugin Supercharges DeepSeek-R1 and Kimi-K2 Inference on MI350/MI400 12 May 2026
$200 NVIDIA V100 Server GPU Mod Beats RTX 3060 in Local LLM Test 11 May 2026
Locked, stocked, and losing budget: AI vendor lock-in bites back 7 May 2026
NVIDIA Adds Day-0 DeepSeek V4 Blackwell Support 26 April 2026
Elastic KV Cache Memory Breakthrough Enables Efficient Bursty LLM Serving and GPU Sharing 26 April 2026
Show HN: We built an OCR server that can process 270 dense images/s on a 5090 23 April 2026
DGX Spark Setup Guide: Running vLLM and PyTorch for Local LLM Inference Backend 15 April 2026
MiniMax-M2.7 Delivers Exceptional Performance on Consumer Hardware 13 April 2026
Researchers Achieve 1-Bit Quantization of OLMo-3 7B Using Distillation 13 April 2026
MiniMax M2.7 Advances Scalable Agentic Workflows on NVIDIA Platforms for Complex AI Applications 12 April 2026
Qwen 3.5 122B Achieves 198 Tokens/sec on Dual RTX PRO 6000 Blackwell GPUs 10 April 2026
AMD Announces Day 0 Support for Google Gemma 4 Across Processors and GPUs 7 April 2026
Qwen 3.5 397B Reduced to 35% Parameters With Usable Quality on 96GB GPU 5 April 2026
DGX Spark Hardware Limitations: Missing NVFP4 Support Undermines Local AI Value Proposition 5 April 2026
NVIDIA and Google Optimize Gemma 4 AI Models for Local RTX Deployment 4 April 2026
GPUs vs. TPUs: Decoding the Powerhouses of AI 4 April 2026
Google Launches Gemma 4 For Advanced On-Device AI 4 April 2026
AMD Rolls Out Gemma 4 Model Support Across Full Range of GPUs & CPUs 4 April 2026
NVIDIA Accelerates Gemma 4 for Local Agentic AI on RTX GPUs 3 April 2026
Intel's $949 GPU Has 32GB of VRAM for Local AI, but Software is Why Nvidia Keeps Winning 2 April 2026
GPU Passthrough to LXCs in Proxmox Simplifies Local Inference Infrastructure 1 April 2026
Linux Significantly Outperforms Windows for Local LLM Inference 29 March 2026
Qwen 3.5 27B Achieves 1.1M Tokens/Second on B200 GPUs with Optimized vLLM Config 27 March 2026
Hold on to Your Hardware: Implications for Local LLM Deployment 27 March 2026
Llama.cpp Benchmark: RTX 5090 vs Enterprise Systems Compared 25 March 2026
Llama.cpp ROCm 7 vs Vulkan Performance Benchmarks on AMD Mi50 23 March 2026
Rust Project Perspectives on AI 22 March 2026
ik_llama.cpp Fork Delivers 26x Faster Prompt Processing on Qwen 3.5 27B 22 March 2026
Custom GPU Multiplexer Achieves 0.3ms Model Switching on Legacy Hardware 18 March 2026
Qwen3.5-397B Achieves 282 tok/s on 4x RTX PRO 6000 Blackwell Through Custom CUTLASS Kernel 15 March 2026
Nvidia's Nemotron 3 Super: Understanding the Significance for Local LLM Deployment 15 March 2026
Sarvam Open-Sources 30B and 105B Reasoning Models 12 March 2026
Comprehensive MoE Backend Benchmarks for Qwen3.5-397B: Real Numbers vs Hype 12 March 2026
Cutile.jl Brings Nvidia CUDA Tile-Based Programming to Julia 12 March 2026
Sarvam Open-Sources 30B and 105B Reasoning Models 11 March 2026
Qwen 3.5 Family Benchmark Comparison Shows Strong Performance Across Smaller Models 9 March 2026
Intel Arc Pro B70 Workstation GPU Confirmed via vLLM AI Release Notes 3 March 2026
Google Is Exploring Ways to Use Its Financial Might to Take on Nvidia 21 February 2026
NVIDIA Releases Dynamo v0.9.0: Infrastructure Overhaul With FlashIndexer and Multi-Modal Support 20 February 2026
AMD Announces Day 0 Support for Qwen 3.5 LLM on Instinct GPUs 18 February 2026
High Bandwidth Flash Memory Could Alleviate VRAM Constraints in Local LLM Inference 17 February 2026
OpenClaw with vLLM Running for Free on AMD Developer Cloud 12 February 2026