Tagged "optimization"

Tether AI Upgrades QVAC SDK With TurboQuant for Data Center-Sized Memory on Everyday Devices 2 June 2026
Phison and Intel Roll Out aiDAPTIV to Boost Local AI on Intel AI PC Platforms 2 June 2026
JetBrains Releases Mellum2: A 12B MoE Model for Fast, Specialized Tasks 2 June 2026
Good LLM Development and Usage Patterns 2 June 2026
What Apple Knows About AI That Silicon Valley Won't Admit 31 May 2026
Tweaking Local Language Model Settings with Ollama 29 May 2026
The Infrastructure Behind Making Local LLM Agents Actually Useful 29 May 2026
MediaTek Dimensity 8550 Shifts Focus to Gemini Nano V3 and On-Device AI on Phones 28 May 2026
Alibaba Cloud Joins PyTorch Foundation as Platinum Member 28 May 2026
Local LLM Setup: How to Use RAG and an Embedding Model to Stop Wasting Context 27 May 2026
Why AI Hardware Is a Chip Layer Problem 24 May 2026
A Maintainability Ratchet for AI-Assisted Python 24 May 2026
Why Your Docker Container Is 1.2GB When It Should Be 80MB 24 May 2026
The Brain vs. Deep Learning Part I: Computational Complexity Analysis 22 May 2026
Intel llm-scaler-vllm 1.4 Released With Updated Components and Arc Pro B70 Support 21 May 2026
Google's Cormac Brick on Tiny LLMs for On-Device Agents 21 May 2026
Google Tensor SDK Beta with LiteRT Enables Efficient On-Device AI 20 May 2026
Google and Synaptics Partner on Coralboard for Immersive Edge AI Experiences 20 May 2026
Google's Offline AI App Gets Three Major Feature Upgrades 20 May 2026
Samsung's Exynos 2800 Could Be the First Mobile Chip to Use HBM for Powerful On-Device AI 19 May 2026
Running Large Language Models on Single-Board Computer Clusters: Creative Edge Deployment 18 May 2026
Ansede-static: Offline SAST Tool Demonstrates Value of Local AI Tools 18 May 2026
Linux 7.1-rc4 Released: Kernel Updates Relevant to Local LLM Inference 18 May 2026
AMD's Lemonade SDK Advances macOS Support for Local AI Inference with ROCm 7.13 18 May 2026
MegaTrain: Full Precision Training of 100B+ Parameter LLMs on a Single GPU 17 May 2026
HP's On-Device AI Needs More If It Is Going to Compete With Copilot 17 May 2026
Google Limits Gemini Intelligence to New Flagships—Hardware Requirements for Local Deployment 17 May 2026
llama.cpp Delivers Sharp Performance Gains for AMD RDNA3 Users 15 May 2026
Kog AI – Building a Real-Time Inference Stack on AMD Instinct GPUs 15 May 2026
Arm and Google Collaborate on On-Device AI Optimization Techniques 15 May 2026
Mainline Linux 6.12 on Annapurna Labs Alpine V2 (Ubiquiti UNVR, UDM-Pro) 13 May 2026
Lython: Experimental Python Compiler Toolchain Based on LLVM 11 May 2026
One LM Studio Setting Change Makes Local LLMs Competitive With Cloud Models 11 May 2026
DFlash Speculative Decoding Delivers 8.5x Speed Improvement for LLM Inference 11 May 2026
Nota AI Partners with Mobilint to Accelerate On-Device AI on Domestic NPU Infrastructure 7 May 2026
A 49-Line Physics Classifier That Beats kNN on 76% of Benchmarks 5 May 2026
Google Explains Why AICore Storage Requirements Are Increasing on Android 4 May 2026
Local LLMs Work Best When You're Not Loyal to Just One 2 May 2026
Building a Raspberry Pi-Based Local LLM Server for Remote Access 1 May 2026
New Open-Source Tool Automatically Matches Local LLMs to Your PC Hardware 1 May 2026
Private LLM vs. ChatGPT: When It Makes Sense for Business 30 April 2026
How Much "Brain Damage" Can an LLM Tolerate? 30 April 2026
Estimating Black-Box LLM Parameter Counts via Factual Capacity 30 April 2026
Wipeout Clone Runs Native on ESP32-S3, Pushing Edge Hardware to Its Limits 29 April 2026
Stop Guessing: Open-Source Tool Predicts Which Local LLMs Run on Your PC 28 April 2026
Blueprint: AI Hardware Design 26 April 2026
Google's Gemma 4 Brings Powerful On-Device AI to Phones and Laptops 25 April 2026
Using a Local LLM as a Zero-Shot Classifier 24 April 2026
Building Real-World On-Device AI with LiteRT and NPU 24 April 2026
Llama.cpp's Auto Fit Feature Quietly Reshapes Local AI Inference on Consumer Hardware 22 April 2026
Google's Gemma 4 Finally Makes Local LLM Deployment Compelling for Practitioners 22 April 2026
16 Ways to Make a Small Language Model Think Bigger 21 April 2026
ZeusHammer: Built an AI Agent That Thinks Locally 20 April 2026
Controlling the Secondary Fan on Minisforum AI Pro HX 370 20 April 2026
llama.cpp Merges Speculative Checkpointing for Major Inference Speed Boost 20 April 2026
115 TOPS in 0.67L: CHUWI AuBox X Packs On-Device AI Power Into a Palm-Sized Mini PC 18 April 2026
Sorting 1M u64 KV-Pairs in 20ms on i9-13980HX Using Branchless Rust Implementation 18 April 2026
Learn LLM Internals 13 April 2026
A Deep Dive into Tinygrad AI Compiler 12 April 2026
Users Report Significant Performance Improvements After Migrating from Ollama to llama.cpp 12 April 2026
CarryAI's Serverless Vision-Language Models Enable On-Device Multimodal AI 10 April 2026
PyTorch Foundation Welcomes Helion as a Foundation-Hosted Project to Standardize Open, Portable, and Accessible AI Kernel Authoring 7 April 2026
Microsoft Quantum Development Kit Ported to Rust: 100x Faster and Smaller 5 April 2026
NVIDIA and Google Optimize Gemma 4 AI Models for Local RTX Deployment 4 April 2026
Qwen 3.6-Plus Released 2 April 2026
Show HN: Extra-Platforms, Python Library to Detect OS, Arch, Shell, CI, AI 2 April 2026
Local AI didn't replace my subscriptions, but it did take over these 6 tasks 31 March 2026
Select the Right Hardware for Your Local LLM Deployment with This Online Guide 30 March 2026
DeepSeek V3 Complete Guide: Deploy and Optimize Local AI in 2026 30 March 2026
Mixed KV Cache Quantization: Performance Risks and Pitfalls 29 March 2026
Linux Significantly Outperforms Windows for Local LLM Inference 29 March 2026
TurboQuant Benchmarked in Llama.cpp: Google's Extreme Compression Research Tested in Practice 27 March 2026
RotorQuant: 10-19x Faster Quantisation Alternative Using Clifford Algebra 27 March 2026
Quantization Reveals Outliers Impacting LLM Accuracy 27 March 2026
Apple Gets Full Gemini Access and Uses Distillation to Build Lightweight On-Device AI 27 March 2026
RF-DETR Nano and YOLO26 Enable On-Device Object Detection on Smartphones 26 March 2026
NVIDIA Releases GPT-OSS-Puzzle-88B, a Deployment-Optimized Model 26 March 2026
Nota AI and SiMa.ai Partner on Physical AI Technology for Local Deployment 26 March 2026
.APKs Are Just .ZIPs: Semi-Legally Hacking Software for Orphaned Hardware 25 March 2026
MacinAI Local brings functional LLM inference to classic Macintosh hardware 21 March 2026
AI's Impact on Mathematics Analogous to Car's Impact on Cities 20 March 2026
You're Using Your Local LLM Wrong If You're Prompting It Like a Cloud LLM 18 March 2026
Auto-retry Claude Code on subscription rate limits (zero deps, tmux-based) 18 March 2026
India's Mobile-First AI Strategy Could Accelerate Local Inference Adoption in Emerging Markets 15 March 2026
Linux 7.0 AMDGPU Fixing Idle Power Issue For RDNA4 GPUs After Compute Workloads 13 March 2026
Show HN: VmExit – An Experiment in AI-Native Computing 12 March 2026
Quantization Explained: Q4_K_M vs AWQ vs FP16 for Local LLMs 12 March 2026
SK Hynix Completes Qualification for LPDDR6 Memory Optimized for AI Inference 11 March 2026
Llama.cpp Prompt Processing Optimization: Ubatch Size Configuration Guide 8 March 2026
ETH Zurich Research Challenges Context-Length Assumptions in LLM Agents 8 March 2026
OpenWrt 25.12.0 – Stable Release 4 March 2026
Building a Dependency-Free GPT on a Custom OS 3 March 2026
Critical: Qwen 3.5 Requires BF16 KV Cache, Not FP16 for Accurate Inference 2 March 2026
How to Run High-Performance LLMs Locally on the Arduino UNO Q 1 March 2026
Bare-Metal LLM Inference: UEFI Application Boots Directly Into LLM Chat 1 March 2026
Unsloth Dynamic 2.0 GGUFs 28 February 2026
Accuracy vs. Speed in Local LLMs: Finding Your Sweet Spot 28 February 2026
Snapdragon 8 Elite Gen 5 for Galaxy Official: 5 Key Improvements that Push the Boundaries 27 February 2026
On-Device AI in Mobile Apps: What Should Run on the Phone vs the Cloud (A 2026 Decision Guide) 27 February 2026
Extracting 100K Concepts from an 8B LLM 27 February 2026
Every agent framework has the same bug – prompt decay. Here's a fix 26 February 2026
DeepSeek Releases DualPath: Addressing Storage Bandwidth Bottlenecks in Agentic Inference 26 February 2026
Mirai Announces $10M to Advance On-Device AI Performance for Consumer Devices 25 February 2026
Advanced Quantization Techniques Show Surprising Performance Gains Over Standard Methods 25 February 2026
Show HN: A Ground Up TLS 1.3 Client Written in C 24 February 2026
Which Web Frameworks Are Most Token-Efficient for AI Agents? 23 February 2026
Wave Field LLM Achieves O(n log n) Scaling: 825M Model Trained to 1B Parameters in 13 Hours 23 February 2026
Custom Portable Workstation Optimized for Local AI Inference Builds 23 February 2026
A Tool to Tell You What LLMs Can Run on Your Machine 23 February 2026
Elastic Introduces Best-in-Class Embedding Models for High Performance Semantic Search 23 February 2026
Breaking the Speed Limit: Strategies for 17k Tokens/Sec Local Inference 23 February 2026
Yet Another Fix Coming for Older AMD GPUs on Linux – Thanks to Valve Developer 23 February 2026
AI-Powered Reverse-Engineering of Rosetta 2 for Linux 23 February 2026
GGML Joins Hugging Face: What This Means for Local Model Optimization 22 February 2026
DietPi Released a New Version v10.1 22 February 2026