Tagged "apple-silicon"
- Show HN: Phonetic Formatter – Offline English Text to IPA on iPhone and iPad
- Llama 4 Scout on MLX: The Complete Apple Silicon Guide (2026)
- Running Gemma 4 on an iPhone 13 Pro
- DFlash Doubles Token Generation Speed of Qwen3.5 27B on Mac M5 Max
- oMLX Framework Implements DFlash Attention for Optimized Inference
- MiniMax M2.7 Achieves SOTA Performance Under 64GB on Mac with TQ Quantization
- DFlash Speculative Decoding Achieves 3.3x Speedup on Apple Silicon
- Parakeet Streaming ASR on Apple Silicon via CoreML
- AIYO Wisper: Local Voice-to-Text for macOS Using WhisperKit
- On-Device Apple Intelligence Vulnerable to Prompt Injection Attacks
- Running a 1.7B Parameters LLM on an Apple Watch
- Comprehensive Benchmark: 37 LLMs Tested on MacBook Air M5 With Open-Source Tool
- Google Launches Offline AI Dictation App for iOS with Gemma
- Real-time Multimodal AI on Apple Silicon: Gemma E2B Demo Shows Practical Edge Deployment
- Apple Brings Enhanced On-Device AI Features to iPhone
- Ollama Gets Blazing Fast on Macs with Full MLX Support and 2× Speedups
- Gemma 4 26B MoE Emerges as Optimal All-Around Local Model for Consumer Hardware
- Samsung Launches Galaxy Book6 Series with NVIDIA RTX 5070 and On-Device AI
- Mixed Precision Quantization on MLX with TurboQuant Implementation
- Kokoro TTS Achieves 20× Realtime Speed on CPU-Only On-Device Inference
- Gemma 4 KV Cache Memory Issues Fixed in llama.cpp
- April 2026 TLDR Setup for Ollama and Gemma 4 26B on a Mac mini
- Google Gemma 4 Released with GGUF Quantizations
- Gemma 4 26B A4B Outperforms Qwen 3.5 35B on Apple Silicon
- Gemma 4 Makes Local AI Agents Practical
- Apfel – The Free AI Already on Your Mac
- Apple Silicon Macs Run Local AI Faster with Ollama's New MLX Support
- TinyGPU Adds Mac Support for External Nvidia GPU Acceleration
- Ollama Adopts Apple's MLX Framework for Faster Local AI on Mac
- Is Anyone Working on an AI Operating System?
- Select the Right Hardware for Your Local LLM Deployment with This Online Guide
- TurboQuant KV Cache Compression Achieves 22.8% Faster Decoding at 32K Context
- Qwen3 512k Context via TurboQuant on Mac mini
- M5 Max Delivers 1.7x Faster Inference Than M3 Max on Qwen 3.5 Models
- RotorQuant: 10-19x Faster Quantisation Alternative Using Clifford Algebra
- mlx-Code: Run Claude Code Locally with MLX-LM
- Apple Gets Full Gemini Access and Uses Distillation to Build Lightweight On-Device AI
- Liquid AI's LFM2-24B Achieves 50 Tokens/Second in Web Browser via WebGPU
- Apple Plans Slimmed-Down Gemini Models for Local iPhone AI Features
- Running an Open-Weight LLM Locally on an Apple Watch
- Ultra-Large 400B-Class LLM Runs on iPhone in Test
- Ditching Paid AI Services: Building Self-Hosted LLM Solutions as ChatGPT, Claude, and Gemini Alternatives
- Multi-Token Prediction support coming to MLX-LM for Qwen 3.5
- Apple M5 Max 128GB real-world performance benchmarks for local inference
- DeepSeek R1 RTX 4090 vs Apple M3 Max: Benchmark & Performance Guide
- NVIDIA Nemotron 3 Nano 4B Enables On-Device Inference Directly in Web Browsers via WebGPU
- Dictare – Open-source Voice Layer for AI Coding Agents (100% Local)
- Startup Transforms Mac Mini Into Full-Powered AI Inference System With External GPU
- Local LLMs on Apple Silicon Mac 2026: M1 M2 M3 Guide
- Apple M5 Max 128GB Benchmark Results for Local LLM Inference
- Experiment: 0.8B Model Self-Improvement on MacBook Air Yields Surprising Results
- M5 Max and M5 Ultra Chipsets Demonstrate Significant Bandwidth Improvements for Local LLM Inference
- Apple Launches MacBook Neo with A18 Pro Chip for Affordable Local AI Inference
- Real-World Qwen 3.5 9B Agent Performance on M1 Pro Validates Edge Deployment
- MediaTek Advances Omni Model for Efficient Smartphone Inference
- Apple Unveils MacBook Pro with M5 Pro and M5 Max Featuring On-Device AI
- Apple Unveils MacBook Pro With M5 Pro and M5 Max for On-Device AI
- Apple M5 Pro and M5 Max: 4× Faster LLM Processing
- AMD Launches Copilot+ Desktop Chips to Compete in On-Device AI Market
- VibeWhisper – macOS Voice-to-Text with 100% Local Processing Option
- Apple M4 iPad Air Targets AI Users with Double M1 Speed Performance
- Alibaba's Qwen 3.5 Small Model Runs Directly on iPhone 17
- Running Local AI Models on Mac Studio 128GB: 4B, 20B & 120B Tested
- Apple Neural Engine Reverse-Engineered for Local Model Training on Mac Mini M4
- Show HN: Caret – Tab to Complete at Any App on Your Mac
- Researchers Develop Persistent Memory System for Local LLMs—No RAG Required
- Apple: Python bindings for access to the on-device Apple Intelligence model
- Apple Accelerates U.S. Manufacturing with Mac Mini Production
- Qwen3-Code-Next Proves Practical for Local Development: Real-World Coding Tasks on Mac Studio
- Nvidia Could Launch Its First Laptops With Its Own Processors
- AI-Powered Reverse-Engineering of Rosetta 2 for Linux
- Apple Researchers Develop On-Device AI Agent That Interacts With Apps for You
- PaddleOCR-VL Now Integrated into llama.cpp for Multilingual OCR
- Complete Offline AI System: Voice Control and Smart Home via Local LLM and Radio Without Internet
- Kitten TTS V0.8 Released: State-of-the-Art Super-Tiny Text-to-Speech Model Under 25MB
- GPT4All Replaces Ollama On Mac After Quick Trial
- Meet Sarvam Edge: India's AI Model That Runs on Phones and Laptops With No Internet
- Sourdine: Open-Source macOS App for 100% Local AI Transcription
- MiniMax Releases M2.5 Model with SOTA Coding and Agent Capabilities
- MiniMax-M2.5 230B MoE Model Released with GGUF Support for Local Deployment