Tagged "llama"

I built Rubric, an open source Sentry for AI. Looking for beta testers 24 March 2026
Qwen 3.5 Models: Optimal Settings and Reduced Overthinking Configuration 23 March 2026
Llama.cpp ROCm 7 vs Vulkan Performance Benchmarks on AMD Mi50 23 March 2026
Ditching Paid AI Services: Building Self-Hosted LLM Solutions as ChatGPT, Claude, and Gemini Alternatives 22 March 2026
Rust Project Perspectives on AI 22 March 2026
Qwen 3.5 122B Uncensored (Aggressive) Released with New K_P Quantisations 22 March 2026
Setting Up a Private AI Brain on Windows: Complete Guide to Local LLM Deployment 22 March 2026
Nvidia Nemotron Cascade 2 30B Emerges as Powerful Alternative to Qwen Models 22 March 2026
Llama 8B Matches 70B Performance on Multi-Hop QA Using Structured Prompting 22 March 2026
ik_llama.cpp Fork Delivers 26x Faster Prompt Processing on Qwen 3.5 27B 22 March 2026
Careless Whisper – Personal Local Speech to Text 22 March 2026
Automating Read-It-Later Workflows with Local LLMs for Overnight Summarization 22 March 2026
Qwen 3.5 397B emerges as top-performing local coding model 21 March 2026
Qualcomm and Samsung's 30-Year AI Alliance Enters a New Phase as On-Device AI Chip Race Heats Up 21 March 2026
Apple M5 Max 128GB real-world performance benchmarks for local inference 21 March 2026
Cursor's Composer 2 model attribution dispute highlights open-source licensing concerns 21 March 2026
What AI Augmentation Means for Technical Leaders 21 March 2026
Ultra-Compact 28M Parameter Models Show Promise for Specialized Domain Tasks 20 March 2026
Qwen 3.5 Emerges as Top Performer for Local Deployment with Extensive Quantization Options 20 March 2026
Community Converges on Optimal KV Cache Quantization Strategies for Qwen 3.5 Models 20 March 2026
NVIDIA Nemotron Cascade 2 30B Delivers 120B-Class Performance in Compact Form Factor 20 March 2026
LMCache Dramatically Accelerates LLM Inference on Oracle Data Science Platform 20 March 2026
Kilo Is the VS Code Extension That Actually Works With Every Local LLM I Throw At It 19 March 2026
Unsloth Studio: Open-Source Web UI for Training and Running LLMs Locally 18 March 2026
On-Device AI: Tether's QVAC Fabric Enables Local Training 18 March 2026
MiniMax-M2.7: New Compact Model Announced for Local Deployment 18 March 2026
I Switched to a Local LLM for These 5 Tasks and the Cloud Version Hasn't Been Worth It Since 18 March 2026
LucidShark – Local-first, open-source quality and security gate 18 March 2026
You're Using Your Local LLM Wrong If You're Prompting It Like a Cloud LLM 18 March 2026
Hugging Face Releases One-Liner for Automatic Hardware Detection and Model Selection 18 March 2026
Run LLMs Locally with Llama.cpp 17 March 2026
I Ran Local LLMs on a 'Dead' GPU, and the Results Surprised Me 17 March 2026
Mistral Releases Small 4 Open-Source Model Under Apache 2.0 17 March 2026
Local Qwen Models Master Browser Automation Through Iterative Replanning 17 March 2026
How I Used Lima for an AI Coding Agent Sandbox 17 March 2026
Researcher Discovers Universal "Danger Zone" in Transformer Model Architecture at 50% Depth 17 March 2026
Kimi Introduces Attention Residuals: 1.25x Compute Performance at <2% Overhead 17 March 2026
Practical Fix for Qwen 3.5 Overthinking in llama.cpp 16 March 2026
Qwen 3.5 122B Demonstrates Exceptional Reasoning for Local Deployment 16 March 2026
Open-Source LLMs Rapidly Displacing Proprietary SOTA Models 16 March 2026
OmniCoder-9B: Efficient Coding Model for 8GB GPUs 16 March 2026
NVIDIA Updates Nemotron 3 122B License, Removes Deployment Restrictions 16 March 2026
This External GPU Enclosure Tries to Break Cloud Dependence for Local AI Inference 16 March 2026
Apple's On-Device AI Raises Privacy Alarms Across British Parliament 16 March 2026
AMD Declares 'AI on the PC Has Crossed an Important Line' – Agent Computers as Next Breakthrough 16 March 2026
Qwen3.5-397B Achieves 282 tok/s on 4x RTX PRO 6000 Blackwell Through Custom CUTLASS Kernel 15 March 2026
OpenClaw vs Eigent vs Claude Cowork: Comparing Open-Source AI Collaboration Platforms 15 March 2026
Running Qwen3.5-27B Across Multiple GPUs Over LAN Achieves Practical Speed for Local Inference 15 March 2026
Open-Source GreenBoost Driver Augments NVIDIA GPU VRAM With System RAM and NVMe Storage 15 March 2026
AMD Launches Agent System Optimized for Local AI Inference With Ryzen and Radeon 15 March 2026
Intel OpenVINO Backend Support Now Available in llama.cpp 14 March 2026
Memory Should Decay: Implementing Temporal Memory Decay in Local LLM Systems 14 March 2026
How to Run Local LLMs in 2026: The Complete Developer's Guide 14 March 2026
Fine-Tuned 14B Model Outperforms Claude Opus 4.6 on Ada Code Generation 14 March 2026
AgentArmor: Open-Source 8-Layer Security Framework for AI Agents 14 March 2026
3-Path Agent Memory: 8 KB Recurrent State vs. 156 MB KV Cache at 10K Tokens 14 March 2026
Runpod Report: Qwen Has Overtaken Meta's Llama As The Most-Deployed Self-Hosted LLM 13 March 2026
Intel Updates LLM-Scaler-vLLM With Support For More Qwen3/3.5 Models 13 March 2026
Quantization Explained: Q4_K_M vs AWQ vs FP16 for Local LLMs 12 March 2026
Nvidia Releases Nemotron 3 Super: 120B MoE Model for Local Deployment 12 March 2026
Comprehensive MoE Backend Benchmarks for Qwen3.5-397B: Real Numbers vs Hype 12 March 2026
Local AI Coding Assistant: Complete VS Code + Ollama + Continue Setup 12 March 2026
Llama.cpp Adds True Reasoning Budget Support 12 March 2026
Cutile.jl Brings Nvidia CUDA Tile-Based Programming to Julia 12 March 2026
Experiment: 0.8B Model Self-Improvement on MacBook Air Yields Surprising Results 11 March 2026
SK Hynix Completes Qualification for LPDDR6 Memory Optimized for AI Inference 11 March 2026
Sarvam Open-Sources 30B and 105B Reasoning Models 11 March 2026
Simple Layer Duplication Technique Achieves Top Open LLM Leaderboard Performance 11 March 2026
NVIDIA Jetson Brings Open Models to Life at the Edge 11 March 2026
LMF – LLM Markup Format 11 March 2026
Llama.cpp Celebrates Major Milestone: From Leak to Industry Standard 11 March 2026
Qwen 3.5 Ultra-Compact Models Enable On-Device AI from Watches to Gaming 10 March 2026
Mnemos: Persistent Memory System for Local AI Agents 10 March 2026
8 Local LLM Settings Most People Never Touch That Fixed My Worst AI Problems 10 March 2026
HP OMEN MAX 16 Review: Is Local AI on a Laptop Viable in 2026? 10 March 2026
FreeBSD 14.4 Released: Implications for Local LLM Deployment 10 March 2026
Fine-Tuned Qwen SLMs (0.6–8B) Demonstrate Competitive Performance Against Frontier LLMs on Specialized Tasks 10 March 2026
M5 Max and M5 Ultra Chipsets Demonstrate Significant Bandwidth Improvements for Local LLM Inference 10 March 2026
Community Survey: AI Content Automation Stacks in 2026 10 March 2026
Strix Halo (Ryzen AI Max+ 395) Achieves Strong Local Inference Performance with ROCm 7.2 9 March 2026
Sarvam Open-Sources 30B and 105B Reasoning Models 9 March 2026
Qwen 3.5 Derestricted Model Available for Local Deployment 9 March 2026
Reverse engineering a DOS game with no source code using Codex 5.4 8 March 2026
Qwen 3.5 27B Achieves Strong Local Inference Performance 8 March 2026
OpenSpec: Spec-driven development (SDD) for AI coding assistants 8 March 2026
Benchmark: Local Open-Source LLMs Competitive in Real-Time Trading Applications 8 March 2026
Llama.cpp Prompt Processing Optimization: Ubatch Size Configuration Guide 8 March 2026
HP Refreshes Lineup with AI-Focused Workstations 8 March 2026
ETH Zurich Research Challenges Context-Length Assumptions in LLM Agents 8 March 2026
Qwen3-Coder-Next Achieves Top Ranking on SWE-bench at Pass@5 7 March 2026
Open WebUI Adds Native Terminal Tool Calling with Qwen3.5 35B Support 7 March 2026
Llama.cpp Merges Automatic Parser Generator to Mainline 7 March 2026
Turning Your Linux Terminal into a Local AI Assistant 7 March 2026
llama-swap Emerges as Superior Alternative to Ollama and LM-Studio 6 March 2026
llama.cpp Merges Agentic Loop and MCP Client Support 6 March 2026
Apple Unveils MacBook Pro with M5 Pro and M5 Max Featuring On-Device AI 5 March 2026
Qwen 3.5-35B-A3B Achieves 37.8% on SWE-bench Verified Hard 4 March 2026
OpenWrt 25.12.0 – Stable Release 4 March 2026
Quantifying Cost Savings with Local LLMs for Development 4 March 2026
Apple Unveils MacBook Pro With M5 Pro and M5 Max for On-Device AI 4 March 2026
Apple M5 Pro and M5 Max: 4× Faster LLM Processing 4 March 2026
AMD Launches Copilot+ Desktop Chips to Compete in On-Device AI Market 4 March 2026
ÆTHERYA Core – Deterministic Policy Engine for Governing LLM Actions 4 March 2026
Qwen 3.5 Small Models Released: 0.8B to 9B Parameters Optimized for On-Device Inference 3 March 2026
Qwen 3.5 0.8B Successfully Deployed on 7-Year-Old Samsung S10E Using llama.cpp 3 March 2026
Framework Choice Critical: llama.cpp and vLLM Outperform Ollama for Qwen 3.5 Testing 3 March 2026
Critical: Qwen 3.5 Requires BF16 KV Cache, Not FP16 for Accurate Inference 2 March 2026
GitDelivr: A Free CDN for Git Clones Built on Cloudflare Workers and R2 2 March 2026
C7: Pipe Up-to-Date Library Docs Into Any LLM From the Terminal 2 March 2026
Huawei's SuperPoD Portfolio Creates New Option for Global Computing at MWC Barcelona 2026 1 March 2026
4 Free Tools to Run Powerful AI on Your PC Without a Subscription 1 March 2026
Unsloth Dynamic 2.0 GGUFs 28 February 2026
5 Useful Docker Containers for Agentic Developers 28 February 2026
Seco Launches Edge AI System-on-Module at Embedded World 2026 27 February 2026
Arduino and Qualcomm Bring On-Device AI Learning to Indian Schools 27 February 2026
Qwen 3.5 MoE Delivers 100K Context Window at 40+ TPS on RTX 5060 Ti 26 February 2026
Qwen 3.5 Underperforms on Hard Coding Tasks—APEX Benchmark Analysis 26 February 2026
Qwen3.5 122B Achieves 25 tok/s on 72GB VRAM Setup 26 February 2026
Researchers Develop Persistent Memory System for Local LLMs—No RAG Required 26 February 2026
DeepSeek Releases DualPath: Addressing Storage Bandwidth Bottlenecks in Agentic Inference 26 February 2026
DeepSeek Paper – DualPath: Breaking the Bandwidth Bottleneck in LLM Inference 26 February 2026
Qwen3.5 Thinking Mode Can Be Disabled for Production Inference Optimization 25 February 2026
Qwen3.5-27B Identified as Sweet Spot for Mid-Range Local Deployment 25 February 2026
Mirai Announces $10M to Advance On-Device AI Performance for Consumer Devices 25 February 2026
How AI is Redefining Price and Performance in Modern Laptops 25 February 2026
Show HN: A Ground Up TLS 1.3 Client Written in C 24 February 2026
Enterprise Infrastructure Guide: Running Local LLMs for 70-150 Developers 24 February 2026
Apple Accelerates U.S. Manufacturing with Mac Mini Production 24 February 2026
Anthropic Has Never Open-Sourced an LLM: Implications for Local Deployment Strategy 24 February 2026
Anthropic Reveals Industrial-Scale Distillation Attacks by Chinese AI Labs 24 February 2026
Comparing Manual vs. AI Requirements Gathering: 2 Sentences vs. 127-Point Spec 24 February 2026
Show HN: Agora – AI API Pricing Oracle with X402 Micropayments 24 February 2026
nanollama: Open-Source Framework for Training Llama 3 from Scratch with One-Command GGUF Export 23 February 2026
Open-Source llama.cpp Finds Long-Term Home at Hugging Face 23 February 2026
Breaking the Speed Limit: Strategies for 17k Tokens/Sec Local Inference 23 February 2026
Ouro 2.6B Thinking Model GGUFs Released with Q8_0 and Q4_K_M Quantization 22 February 2026
Strix Halo Performance Benchmarks: Minimax M2.5, Step 3.5 Flash, Qwen3 Coder 21 February 2026
I Thought I Needed a GPU to Run AI Until I Learned About These Models 21 February 2026
Open-Source + AI: ggml Joins Hugging Face, llama.cpp Stays Open—Local AI's Long-Term Home 21 February 2026
GGML.AI Acquired by Hugging Face 21 February 2026
SanityBoard Adds 27 New Model Evaluations Including Qwen 3.5 Plus, GLM 5, and Gemini 3.1 Pro 20 February 2026
PaddleOCR-VL Now Integrated into llama.cpp for Multilingual OCR 20 February 2026
Kitten TTS V0.8 Released: New State-of-the-Art Super-Tiny TTS Model Under 25 MB 20 February 2026
Free ASIC-Accelerated Llama 3.1 8B Inference at 16,000 Tokens/Second 20 February 2026
Enhanced Quantization Visualization Methods for Understanding LLM Compression Trade-offs 19 February 2026
Kitten TTS V0.8 Released: State-of-the-Art Super-Tiny Text-to-Speech Model Under 25MB 19 February 2026
Self-Hosted AI: A Complete Roadmap for Beginners 17 February 2026
Meet Sarvam Edge: India's AI Model That Runs on Phones and Laptops With No Internet 17 February 2026
Qwen 3.5-397B-A17B Now Available for Local Inference with Aggressive Quantisation 17 February 2026
Open-Source Models Now Comprise 4 of Top 5 Most-Used Endpoints on OpenRouter 17 February 2026
Ask HN: What is the best bang for buck budget AI coding? 17 February 2026
Switching From Ollama And LM Studio To llama.cpp: A Performance Comparison 14 February 2026
SnowBall Technique Addresses Context Window Limitations in Local LLMs 14 February 2026
Scaling llama.cpp On Neoverse N2: Solving Cross-NUMA Performance Issues 14 February 2026
NVIDIA's Dynamic Memory Sparsification Cuts LLM Inference Costs by 8x 14 February 2026
MiniMax Releases M2.5 Model with SOTA Coding and Agent Capabilities 14 February 2026
MiniMax-M2.5 230B MoE Model Released with GGUF Support for Local Deployment 14 February 2026
LLaDA2.1 Introduces Token Editing for Massive Speed Gains in Local Inference 14 February 2026
GPT-OSS 120B Uncensored Model Released in Native MXFP4 Precision 14 February 2026
GNOME's AI Assistant Newelle Adds llama.cpp Support and Command Execution 14 February 2026
Context Management Identified as Real Bottleneck in AI-Assisted Coding 14 February 2026
Switching From Ollama and LM Studio to llama.cpp: Performance Benefits 13 February 2026
Optimal llama.cpp Settings Found for Qwen3 Coder Next Loop Issues 13 February 2026
GitHub Announces Support for Open Source AI Project Maintainers 13 February 2026
Scaling llama.cpp On Neoverse N2: Solving Cross-NUMA Performance Issues 13 February 2026
Student Releases Dhi-5B: Multimodal Model Trained for Just $1,200 13 February 2026
Scaling llama.cpp On Neoverse N2: Solving Cross-NUMA Performance Issues 12 February 2026
New Header-Only C++ Benchmark Tool for Predictive Models on Raw Binary Streams 12 February 2026
Developer Switches from Ollama and LM Studio to llama.cpp for Better Performance 11 February 2026