Tagged "hugging-face"
- An Update on GitHub Availability: Infrastructure Lessons for Hosted LLM Tools
- MiniMax M2.7 GGUF Investigation Reveals NaN Issues Affecting 21-38% of Hugging Face Conversions
- DGX Spark Setup Guide: Running vLLM and PyTorch for Local LLM Inference Backend
- DFlash Doubles Token Generation Speed of Qwen3.5 27B on Mac M5 Max
- MiniMax M2.7 Achieves SOTA Performance Under 64GB on Mac with TQ Quantization
- On-Device AI Inference Emerges as New Security Blind Spot for CISOs
- Unsloth Completes Comprehensive MiniMax M2.7 GGUF Quantization Suite
- MiniMax M2.7 Released: New Model Available for Local Deployment
- VoxCPM2: New Open-Source TTS Model with Voice Cloning and Design
- Hugging Face Moves Safetensors Under PyTorch Foundation
- Ollama is Still the Easiest Way to Start Local LLMs, But It's the Worst Way to Keep Running Them
- Netflix Open-Sources VOID Model for Video Object Deletion
- ByteShape Releases Qwen 3.5 9B Quantisations with Hardware-Matched Tuning Guide
- PrismML Announces 1-Bit Bonsai: First Commercially Viable 1-Bit LLMs
- Mistral AI Releases Voxtral: Open-Source TTS Model Beating ElevenLabs on Local Hardware
- Liquid AI's LFM2-24B Achieves 50 Tokens/Second in Web Browser via WebGPU
- Hugging Face Releases One-Liner for Automatic Hardware Detection and Model Selection
- Mistral Small 4 119B Released with NVFP4 Quantisation Support
- OmniCoder-9B: Efficient Coding Model for 8GB GPUs
- Cicikus v3 Prometheus 4.4B – An Experimental Franken-Merge for Edge Reasoning
- Qwen 3.5 Derestricted Model Available for Local Deployment
- Sarvam AI Releases 30B and 105B Open-Source Models Trained from Scratch
- Local LLM Performance Improvements: A Year of Progress Since DeepSeek R1 Moment
- Open-Source llama.cpp Finds Long-Term Home at Hugging Face
- O-TITANS: Orthogonal LoRA Framework for Gemma 3 with Google TITANS Memory Architecture
- GGML Joins Hugging Face: What This Means for Local Model Optimization
- Open-Source + AI: ggml Joins Hugging Face, llama.cpp Stays Open—Local AI's Long-Term Home
- GGML.AI Acquired by Hugging Face
- Matmul-Free Language Model Trained on CPU in 1.2 Hours
- Qwen 3.5-397B-A17B Now Available for Local Inference with Aggressive Quantisation
- GPT-OSS 20B Now Runs 100% Locally in Browser via WebGPU
- MiniMax M2.5: 230B Parameter MoE Model Coming to HuggingFace