Tagged "resource-optimization"

JetBrains Releases Mellum2: A 12B MoE Model for Fast, Specialized Tasks 2 June 2026
Netflix Wiz Creates App to Slash AI Bills by Pruning Agent Instructions, Then Open-Sources It 31 May 2026
Liquid AI Launches Edge-Focused LFM2.5 Model to Power On-Device AI Agents 31 May 2026
Liquid AI Unveils Edge-Focused LFM2.5 Model for On-Device AI Agents 29 May 2026
User Migration from LM Studio/Ollama to llama.cpp Shows Growing Preference 22 May 2026
Arm and Google Collaborate on On-Device AI Optimization Techniques 15 May 2026
What If AI Systems Weren't Chatbots? 13 May 2026
Running a Local LLM on a 12-Year-Old Raspberry Pi 13 May 2026
Gemma 4 Replaces Entire Local LLM Stack for Many Practitioners 12 May 2026
DFlash Speculative Decoding Delivers 8.5x Speed Improvement for LLM Inference 11 May 2026
Deploying Frigate & Ollama On A Minisforum MS-A2 Server 11 May 2026
Dikaletus: Open-Source Meeting Recording and Transcription Using Mistral AI 9 May 2026
Running Espressif's OpenClaw-Inspired AI Agent on ESP32 with Self-Hosted LLM Works in Practice 8 May 2026
How to make SSE token streams resumable, cancellable, and multi-device 7 May 2026
A 49-Line Physics Classifier That Beats kNN on 76% of Benchmarks 5 May 2026
Google's Gemma 4 Could Put Powerful AI on Your Phone and Laptop 5 May 2026
Show HN: Kit – Editor, Browser, Terminal, Mail with AI Agents Sharing Context 3 May 2026
Local LLMs Work Best When You're Not Loyal to Just One 2 May 2026
How to Make SSE Token Streams Resumable, Cancellable, and Multi-Device 1 May 2026
Hipfire: A Rust-Native AMD Inference Engine That Outperforms llama.cpp 28 April 2026
I Replaced My Local LLM With a Model Half Its Size and Got Better Results 24 April 2026
OpenNebula 7.2 "Dark Horse" Released with Enhanced Infrastructure Support 14 April 2026
Ollama is Still the Easiest Way to Start Local LLMs, But It's the Worst Way to Keep Running Them 9 April 2026
Google's Gemma 4 Brings Powerful On-Device AI to Android and iOS 8 April 2026
Bonsai 1-Bit Models Deliver Exceptional Local Inference Performance 2 April 2026
Qwen 3.5-27B Demonstrates Superior Performance vs Gemini 3.1 Pro and GPT-5.3 1 April 2026
GPU Passthrough to LXCs in Proxmox Simplifies Local Inference Infrastructure 1 April 2026
Local AI Ecosystem Extends Far Beyond Ollama 29 March 2026
NVIDIA Releases GPT-OSS-Puzzle-88B, a Deployment-Optimized Model 26 March 2026
Qwen 3.5 Models: Optimal Settings and Reduced Overthinking Configuration 23 March 2026
LMCache Dramatically Accelerates LLM Inference on Oracle Data Science Platform 20 March 2026
Custom GPU Multiplexer Achieves 0.3ms Model Switching on Legacy Hardware 18 March 2026
Kimi Introduces Attention Residuals: 1.25x Compute Performance at <2% Overhead 17 March 2026
FreeBSD 14.4 Released: Implications for Local LLM Deployment 10 March 2026
Fine-Tuned Qwen SLMs (0.6–8B) Demonstrate Competitive Performance Against Frontier LLMs on Specialized Tasks 10 March 2026
Snapdragon Wear Elite Unveiled at MWC 2026, Advancing Wearable AI Inference 8 March 2026
SynthesisOS – A Local-First, Agentic Desktop Layer Built in Rust 4 March 2026
RunAnywhere Launches Production-Grade On-Device AI Platform for Enterprise Scale 4 March 2026
Qwen 3.5-27B Q4 Quantization Comparison and Analysis 4 March 2026
The ML.energy Leaderboard 28 February 2026
DeepSeek Paper – DualPath: Breaking the Bandwidth Bottleneck in LLM Inference 26 February 2026
Show HN: A Ground Up TLS 1.3 Client Written in C 24 February 2026
O-TITANS: Orthogonal LoRA Framework for Gemma 3 with Google TITANS Memory Architecture 22 February 2026
At India AI Impact Summit, Intel Showcases Its AI PCs and Cost-Efficient Frugal AI 21 February 2026
24 Simultaneous Claude Code Agents on Local Hardware 21 February 2026
TemplateFlow – Build AI Workflows, Not Prompts 20 February 2026
Mirai Secures $10M to Optimize On-Device AI Amid Cloud Cost Surge 20 February 2026
Local-First RAG: Vector Search in SQLite with Hamming Distance 19 February 2026
Sarvam AI Launches Edge Model to Challenge Major AI Players with Local-First Approach 18 February 2026
OpenClaw Refactored in Go, Runs on $10 Hardware 18 February 2026
Meet Sarvam Edge: India's AI Model That Runs on Phones and Laptops With No Internet 17 February 2026
Switching From Ollama And LM Studio To llama.cpp: A Performance Comparison 14 February 2026
MiniMax Releases M2.5 Model with SOTA Coding and Agent Capabilities 14 February 2026
Switching From Ollama and LM Studio to llama.cpp: Performance Benefits 13 February 2026
Nanbeige4.1-3B: A Small General Model that Reasons, Aligns, and Acts 11 February 2026
Energy-Based Models Compared Against Frontier AI for Sudoku Solving 11 February 2026