Tagged "optimization"
-
Wipeout Clone Runs Native on ESP32-S3, Pushing Edge Hardware to Its Limits
-
Stop Guessing: Open-Source Tool Predicts Which Local LLMs Run on Your PC
-
Blueprint: AI Hardware Design
-
Google's Gemma 4 Brings Powerful On-Device AI to Phones and Laptops
-
Using a Local LLM as a Zero-Shot Classifier
-
Building Real-World On-Device AI with LiteRT and NPU
-
Llama.cpp's Auto Fit Feature Quietly Reshapes Local AI Inference on Consumer Hardware
-
Google's Gemma 4 Finally Makes Local LLM Deployment Compelling for Practitioners
-
16 Ways to Make a Small Language Model Think Bigger
-
ZeusHammer: Built an AI Agent That Thinks Locally
-
Controlling the Secondary Fan on Minisforum AI Pro HX 370
-
llama.cpp Merges Speculative Checkpointing for Major Inference Speed Boost
-
115 TOPS in 0.67L: CHUWI AuBox X Packs On-Device AI Power Into a Palm-Sized Mini PC
-
Sorting 1M u64 KV-Pairs in 20ms on i9-13980HX Using Branchless Rust Implementation
-
Learn LLM Internals
-
A Deep Dive into Tinygrad AI Compiler
-
Users Report Significant Performance Improvements After Migrating from Ollama to llama.cpp
-
CarryAI's Serverless Vision-Language Models Enable On-Device Multimodal AI
-
PyTorch Foundation Welcomes Helion as a Foundation-Hosted Project to Standardize Open, Portable, and Accessible AI Kernel Authoring
-
Microsoft Quantum Development Kit Ported to Rust: 100x Faster and Smaller
-
NVIDIA and Google Optimize Gemma 4 AI Models for Local RTX Deployment
-
Qwen 3.6-Plus Released
-
Show HN: Extra-Platforms, Python Library to Detect OS, Arch, Shell, CI, AI
-
Local AI didn't replace my subscriptions, but it did take over these 6 tasks
-
Select the Right Hardware for Your Local LLM Deployment with This Online Guide
-
DeepSeek V3 Complete Guide: Deploy and Optimize Local AI in 2026
-
Mixed KV Cache Quantization: Performance Risks and Pitfalls
-
Linux Significantly Outperforms Windows for Local LLM Inference
-
TurboQuant Benchmarked in Llama.cpp: Google's Extreme Compression Research Tested in Practice
-
RotorQuant: 10-19x Faster Quantisation Alternative Using Clifford Algebra
-
Quantization Reveals Outliers Impacting LLM Accuracy
-
Apple Gets Full Gemini Access and Uses Distillation to Build Lightweight On-Device AI
-
RF-DETR Nano and YOLO26 Enable On-Device Object Detection on Smartphones
-
NVIDIA Releases GPT-OSS-Puzzle-88B, a Deployment-Optimized Model
-
Nota AI and SiMa.ai Partner on Physical AI Technology for Local Deployment
-
.APKs Are Just .ZIPs: Semi-Legally Hacking Software for Orphaned Hardware
-
MacinAI Local brings functional LLM inference to classic Macintosh hardware
-
AI's Impact on Mathematics Analogous to Car's Impact on Cities
-
You're Using Your Local LLM Wrong If You're Prompting It Like a Cloud LLM
-
Auto-retry Claude Code on subscription rate limits (zero deps, tmux-based)
-
India's Mobile-First AI Strategy Could Accelerate Local Inference Adoption in Emerging Markets
-
Linux 7.0 AMDGPU Fixing Idle Power Issue For RDNA4 GPUs After Compute Workloads
-
Show HN: VmExit – An Experiment in AI-Native Computing
-
Quantization Explained: Q4_K_M vs AWQ vs FP16 for Local LLMs
-
SK Hynix Completes Qualification for LPDDR6 Memory Optimized for AI Inference
-
Llama.cpp Prompt Processing Optimization: Ubatch Size Configuration Guide
-
ETH Zurich Research Challenges Context-Length Assumptions in LLM Agents
-
OpenWrt 25.12.0 – Stable Release
-
Building a Dependency-Free GPT on a Custom OS
-
Critical: Qwen 3.5 Requires BF16 KV Cache, Not FP16 for Accurate Inference
-
How to Run High-Performance LLMs Locally on the Arduino UNO Q
-
Bare-Metal LLM Inference: UEFI Application Boots Directly Into LLM Chat
-
Unsloth Dynamic 2.0 GGUFs
-
Accuracy vs. Speed in Local LLMs: Finding Your Sweet Spot
-
Snapdragon 8 Elite Gen 5 for Galaxy Official: 5 Key Improvements that Push the Boundaries
-
On-Device AI in Mobile Apps: What Should Run on the Phone vs the Cloud (A 2026 Decision Guide)
-
Extracting 100K Concepts from an 8B LLM
-
Every agent framework has the same bug – prompt decay. Here's a fix
-
DeepSeek Releases DualPath: Addressing Storage Bandwidth Bottlenecks in Agentic Inference
-
Mirai Announces $10M to Advance On-Device AI Performance for Consumer Devices
-
Advanced Quantization Techniques Show Surprising Performance Gains Over Standard Methods
-
Show HN: A Ground Up TLS 1.3 Client Written in C
-
Which Web Frameworks Are Most Token-Efficient for AI Agents?
-
Wave Field LLM Achieves O(n log n) Scaling: 825M Model Trained to 1B Parameters in 13 Hours
-
Custom Portable Workstation Optimized for Local AI Inference Builds
-
A Tool to Tell You What LLMs Can Run on Your Machine
-
Elastic Introduces Best-in-Class Embedding Models for High Performance Semantic Search
-
Breaking the Speed Limit: Strategies for 17k Tokens/Sec Local Inference
-
Yet Another Fix Coming for Older AMD GPUs on Linux – Thanks to Valve Developer
-
AI-Powered Reverse-Engineering of Rosetta 2 for Linux
-
GGML Joins Hugging Face: What This Means for Local Model Optimization
-
DietPi Released a New Version v10.1