Tagged "rlocalllama"
- Xiaomi 12 Pro Converted Into 24/7 Headless AI Server With Ollama and Gemma4
- MiniMax M2.7 GGUF Investigation Reveals NaN Issues Affecting 21-38% of Hugging Face Conversions
- OpenClaw at 250K GitHub Stars: Community Explores Practical Limitations Beyond News Digests
- MiniMax M2.7 Released: New Model Available for Local Deployment
- Critical Unsloth Gemma-4 Chat Template Updates for Tool Calling
- Building Offline AI Companions on Severely Constrained Hardware (8GB RAM)
- Gemma 4 Template Improvements Enhance Tool Use and Dialog Compliance
- Community Reverse Engineers Gemma 4 Multi-Token Prediction Capability
- Hugging Face Moves Safetensors Under PyTorch Foundation
- Gemma 4 GGUF Models Updated with Critical Quantization Fixes
- Comprehensive Benchmark: 37 LLMs Tested on MacBook Air M5 With Open-Source Tool
- Context Window Optimization: Extending Gemma 4 Context Length Through Efficient Projection Quantization
- Gemma 4 26B MoE Emerges as Optimal All-Around Local Model for Consumer Hardware
- Netflix Open-Sources VOID Model for Video Object Deletion
- Kokoro TTS Achieves 20× Realtime Speed on CPU-Only On-Device Inference
- VRAM Optimization Technique Cuts Gemma 4 Memory Usage by 3x
- Qwen 3.5-27B Demonstrates Superior Performance vs Gemini 3.1 Pro and GPT-5.3
- TurboQuant: Understanding the Quantization Breakthrough
- Mixed KV Cache Quantization: Performance Risks and Pitfalls
- Qwen 3.5 27B Achieves 1.1M Tokens/Second on B200 GPUs with Optimized vLLM Config
- Real-World Benchmark: DeepSeek-V3 Matches Claude Sonnet on Routine Coding Tasks
- OmniCoder v2 Released: Improved Code Generation for Local Deployment
- New Open-Weight Models Released: GigaChat-3.1-Ultra and Lightning Variants
- Llama.cpp Benchmark: RTX 5090 vs Enterprise Systems Compared
- Critical: LiteLLM Supply Chain Attack Detected, Bifrost Alternative Released