Tagged "release"
-
Velr: Embedded Property-Graph Database for Local LLM Applications
-
Self-Hostable AI Agents and Internal Software Framework Released
-
Qt 6.11 Released with Enhanced Cross-Platform Deployment Capabilities
-
MiniMax M2.7 Model to Be Released as Open Weights
-
LM Studio Releases Reworked Plugins with Fully Local Web Research
-
Qwen 3.5 122B Uncensored (Aggressive) Released with New K_P Quantisations
-
Nvidia Nemotron Cascade 2 30B Emerges as Powerful Alternative to Qwen Models
-
BrowserOS 0.44.0 Release: Advances in Local AI Integration for Web-Based Applications
-
Pydantic-Deep: Production Deep Agents for Pydantic AI
-
Atuin v18.13 – Better Search, a PTY Proxy, and AI for Your Shell
-
NVIDIA Nemotron Cascade 2 30B Delivers 120B-Class Performance in Compact Form Factor
-
NVIDIA Nemotron 3 Nano 4B Enables On-Device Inference Directly in Web Browsers via WebGPU
-
Llamafile 0.10 Released with GPU Support and Rebuilt Core
-
Cybersecurity Skills for AI Agents – agentskills.io Standard Implementation
-
ASUS ExpertCenter PN55 Mini PC Combines AMD AI CPU and 55 TOPS NPU
-
Meet Sarvam Edge: India's AI Model That Runs on Phones and Laptops With No Internet
-
Multiverse Computing Targets On-Device AI With Compressed Models and New API Portal
-
Dell Pro Max 16 Plus Launches With Enterprise-Grade Discrete NPU for On-Device AI
-
Tether's QVAC Introduces Cross-Platform Bitnet LoRA Framework for On-Device AI Training
-
Unsloth Studio: Open-Source Web UI for Training and Running LLMs Locally
-
On-Device AI: Tether's QVAC Fabric Enables Local Training
-
MiniMax-M2.7: New Compact Model Announced for Local Deployment
-
Mamba 3: State Space Model Architecture Optimized for Inference
-
Hugging Face Releases One-Liner for Automatic Hardware Detection and Model Selection
-
Mistral Small 4 119B Released with NVFP4 Quantisation Support
-
Mistral Releases Small 4 Open-Source Model Under Apache 2.0
-
Mistral Releases Leanstral: First Open-Source Code Agent for Lean 4 Proof Assistant
-
Kimi Introduces Attention Residuals: 1.25x Compute Performance at <2% Overhead
-
OmniCoder-9B: Efficient Coding Model for 8GB GPUs
-
NVIDIA Updates Nemotron 3 122B License, Removes Deployment Restrictions
-
StepFun Releases SFT Dataset Used to Train Step 3.5 Flash for Community Fine-Tuning
-
Nvidia's Nemotron 3 Super: Understanding the Significance for Local LLM Deployment
-
Cicikus v3 Prometheus 4.4B – An Experimental Franken-Merge for Edge Reasoning
-
AMD Launches Agent System Optimized for Local AI Inference With Ryzen and Radeon
-
Intel OpenVINO Backend Support Now Available in llama.cpp
-
Lemonade v10 Brings Linux NPU Support and Multi-Modal Capabilities
-
Intel Updates LLM-Scaler-vLLM With Support For More Qwen3/3.5 Models
-
Sarvam Open-Sources 30B and 105B Reasoning Models
-
Qwodel – An Open-Source Unified Pipeline for LLM Quantization
-
Nvidia Releases Nemotron 3 Super: 120B MoE Model for Local Deployment
-
Llama.cpp Adds True Reasoning Budget Support
-
Texas Instruments Launches NPU-Powered MCUs for Low-Power Edge AI
-
SK Hynix Completes Qualification for LPDDR6 Memory Optimized for AI Inference
-
Sarvam Open-Sources 30B and 105B Reasoning Models
-
Qwen 3.5-35B Uncensored GGUF Models Now Available
-
Kali Linux Integrates Local Ollama and MCP for AI-Driven Penetration Testing
-
SK Hynix Develops 1c LPDDR6 DRAM to Boost On-Device AI Performance in Mobile Devices
-
Gloss: Open-Source, Local-First RAG Alternative to NotebookLM Built in Rust
-
FreeBSD 14.4 Released: Implications for Local LLM Deployment
-
Fish Audio Open-Sources S2: Expressive Text-to-Speech with Natural Language Control and 100ms Latency
-
Sarvam Open-Sources 30B and 105B Reasoning Models
-
Qwen 3.5 Small Expands On-Device AI to Phones and IoT with Offline Support
-
Qwen 3.5 Derestricted Model Available for Local Deployment
-
Engram – Open-Source Persistent Memory for AI Agents
-
Snapdragon Wear Elite Unveiled at MWC 2026, Advancing Wearable AI Inference
-
HP Refreshes Lineup with AI-Focused Workstations
-
Apple Launches MacBook Neo with A18 Pro Chip for Affordable Local AI Inference
-
Sarvam AI Releases 30B and 105B Open-Source Models Trained from Scratch
-
Open WebUI Adds Native Terminal Tool Calling with Qwen3.5 35B Support
-
Llama.cpp Merges Automatic Parser Generator to Mainline
-
Jse v2.0 AI Output Specification
-
IBM Granite 4.0 1B Speech Model Released for Multilingual Speech Recognition
-
Alibaba Releases Qwen 3.5 AI Model with On-Device AI Support
-
Final Qwen3.5 Unsloth GGUF Update with Improved Size/Quality Tradeoffs
-
Building PyTorch-Native Support for IBM Spyre Accelerator
-
llama.cpp Merges Agentic Loop and MCP Client Support
-
Alibaba Releases Qwen 3.5 AI Model with On-Device AI Support
-
Kakao Launches Kanana AI for On-Device Schedule and Recommendation Management
-
Apple Unveils MacBook Pro with M5 Pro and M5 Max Featuring On-Device AI
-
RunAnywhere Launches Production-Grade On-Device AI Platform for Enterprise Scale
-
Qualcomm Snapdragon Wear Elite Brings On-Device AI to Smartwatches
-
OpenWrt 25.12.0 – Stable Release
-
Apple Unveils MacBook Pro With M5 Pro and M5 Max for On-Device AI
-
Apple M5 Pro and M5 Max: 4× Faster LLM Processing
-
AMD Launches Copilot+ Desktop Chips to Compete in On-Device AI Market
-
ÆTHERYA Core – Deterministic Policy Engine for Governing LLM Actions
-
Qwen 3.5 Small Models Released: 0.8B to 9B Parameters Optimized for On-Device Inference
-
Qualcomm Snapdragon Wear Elite: 2B Parameter NPU for Personal AI Wearables
-
Apple M4 iPad Air Targets AI Users with Double M1 Speed Performance
-
AMD Ryzen AI 400 Series Desktop Processors Launch with Integrated 60 TOPS NPU
-
Alibaba's Qwen 3.5 Small Model Runs Directly on iPhone 17
-
Qualcomm Launches Snapdragon Wear Elite for On-Device AI on Wearables
-
Jan Releases Code-Tuned 4B Model for Efficient Local Code Generation and Development Tasks
-
GitDelivr: A Free CDN for Git Clones Built on Cloudflare Workers and R2
-
AMD Expands Ryzen AI 400 Series Portfolio for Consumer and Enterprise AI PC Options
-
Alibaba's Open-Source CoPaw AI Agent Now Compatible with MCP and ClawHub Skills
-
Qwen 3.5-35B-A3B Emerges as Efficient Daily Driver, Replacing 120B Models
-
ParseHive – AI-Powered Invoice Data Extraction for Windows and Mac
-
Huawei's SuperPoD Portfolio Creates New Option for Global Computing at MWC Barcelona 2026
-
DeepSeek V4 Multimodal Model Coming Next Week With Image and Video Generation
-
Unsloth Dynamic 2.0 GGUFs
-
Qwen3.5-35B Unsloth Dynamic GGUFs Achieve SOTA Across Nearly All Quantisation Levels
-
Qwen 3.5-35B Unsloth Dynamic GGUFs Achieve SOTA Quantisation Benchmarks
-
The ML.energy Leaderboard
-
LLmFit: One-Command Hardware-Aware Model Selection Across 497 Models and 133 Providers
-
Krasis: Hybrid CPU/GPU MoE Runtime Achieves 3,324 Tokens/Second Prefill on RTX 5080
-
Seco Launches Edge AI System-on-Module at Embedded World 2026
-
Snapdragon 8 Elite Gen 5 Powers Galaxy S26 Series With Enhanced On-Device AI
-
On-Device Function Calling in Google AI Edge Gallery
-
DeepSeek Releases DualPath: Addressing Storage Bandwidth Bottlenecks in Agentic Inference
-
DeepSeek Paper – DualPath: Breaking the Bandwidth Bottleneck in LLM Inference
-
Apple: Python bindings for access to the on-device Apple Intelligence model
-
Red Hat Launches AI Enterprise for Hybrid AI Deployments
-
Qwen3.5 Thinking Mode Can Be Disabled for Production Inference Optimization
-
Qwen3.5 Series Releases Comprehensive Model Lineup Across All Tiers
-
Qwen3.5-35B-A3B Emerges as Game-Changer for Agentic Coding Tasks
-
Meta's OpenClaw Release Raises Questions About Open-Source Model Safety and Alignment
-
Kioxia Sampling UFS 5.0 Embedded Flash Memory for Next-Generation Mobile Applications
-
Elastic Introduces Best-in-Class Embedding Models for High Performance Semantic Search
-
Making Wolfram Technology Available as Foundation Tool for LLM Systems
-
Elastic Introduces Best-in-Class Embedding Models for High Performance Semantic Search
-
Ouro 2.6B Thinking Model GGUFs Released with Q8_0 and Q4_K_M Quantization
-
Ollama 0.17 Released With Improved OpenClaw Onboarding
-
Google Open-Sources NPU IP, Synaptics Implements It for Hardware Acceleration
-
DietPi Released a New Version v10.1
-
Asus ExpertBook B3 G2 with 50 TOPS AI Sets New Enterprise Standard
-
Vellium v0.3.5: Major Writing Mode Overhaul and Native KoboldCpp Support
-
[Release] Ouro-2.6B-Thinking: ByteDance's Recurrent Model Now Runnable Locally
-
Claude Code Open – AI Coding Platform with Web IDE and Agents
-
SanityBoard Adds 27 New Model Evaluations Including Qwen 3.5 Plus, GLM 5, and Gemini 3.1 Pro
-
PaddleOCR-VL Now Integrated into llama.cpp for Multilingual OCR
-
NVIDIA Releases Dynamo v0.9.0: Infrastructure Overhaul With FlashIndexer and Multi-Modal Support
-
Kitten TTS V0.8 Released: New State-of-the-Art Super-Tiny TTS Model Under 25 MB
-
Free ASIC-Accelerated Llama 3.1 8B Inference at 16,000 Tokens/Second
-
LayerScale Launches Inference Engine Faster Than vLLM, SGLang, and TRT-LLM
-
Kitten TTS V0.8 Released: State-of-the-Art Super-Tiny Text-to-Speech Model Under 25MB
-
Aegis.rs: Open Source Rust-Based LLM Security Proxy Released
-
Tailscale Releases New Tool to Prevent Sensitive Data Leakage to Cloud AI Services
-
Sarvam AI Launches Edge Model to Challenge Major AI Players with Local-First Approach
-
Alibaba's Qwen3.5-397B Achieves #3 Position in Open Weights Model Rankings
-
OpenClaw Refactored in Go, Runs on $10 Hardware
-
GLM-5 Technical Report: DSA Innovation Reduces Training and Inference Costs
-
Cloudflare Releases Agents SDK v0.5.0 with Rust-Powered Infire Engine for Edge Inference
-
AMD Announces Day 0 Support for Qwen 3.5 LLM on Instinct GPUs
-
Meet Sarvam Edge: India's AI Model That Runs on Phones and Laptops With No Internet
-
Qwen 3.5-397B-A17B Now Available for Local Inference with Aggressive Quantisation
-
Cohere Releases Tiny Aya: Efficient 3.3B Multilingual Model for 70+ Languages
-
ASUS Zenbook 14 Launches in India with AI-Capable Hardware, Starting at Rs 1,15,990
-
Asus ExpertBook B3 G2 Laptop Features Ryzen AI 9 HX 470 CPU in 1.41kg Ultraportable Form Factor
-
InitRunner: YAML-Based AI Agent Framework with RAG and Memory
-
GPU-Accelerated DataFrame Library for Local Inference Workloads
-
Alibaba Unveils Major AI Model Upgrade Ahead of DeepSeek Release
-
NVIDIA's Dynamic Memory Sparsification Cuts LLM Inference Costs by 8x
-
MiniMax Releases M2.5 Model with SOTA Coding and Agent Capabilities
-
MiniMax-M2.5 230B MoE Model Released with GGUF Support for Local Deployment
-
LLaDA2.1 Introduces Token Editing for Massive Speed Gains in Local Inference
-
GPT-OSS 20B Now Runs 100% Locally in Browser via WebGPU
-
GPT-OSS 120B Uncensored Model Released in Native MXFP4 Precision
-
GNOME's AI Assistant Newelle Adds llama.cpp Support and Command Execution
-
ByteDance Releases Seed2.0 LLM with Complex Real-World Task Improvements
-
WinClaw: Windows-Native AI Assistant with Office Automation
-
Ring-1T-2.5 Released with SOTA Deep Thinking Performance
-
GitHub Announces Support for Open Source AI Project Maintainers
-
MiniMax M2.5: 230B Parameter MoE Model Coming to HuggingFace
-
Ming-flash-omni-2.0: 100B MoE Omni-Modal Model Released
-
Student Releases Dhi-5B: Multimodal Model Trained for Just $1,200
-
ByteDance Releases Seedance 2.0 AI Development Platform
-
Samsung's REAM: Alternative Model Compression Technique
-
OpenClaw with vLLM Running for Free on AMD Developer Cloud
-
Microsoft MarkItDown: Document Preprocessing Tool for LLMs
-
Memio Launches AI-Powered Knowledge Hub for Android with Local Processing
-
New Header-Only C++ Benchmark Tool for Predictive Models on Raw Binary Streams
-
GLM-5 Released: 744B Parameter MoE Model Targeting Complex Tasks
-
Nanbeige4.1-3B: A Small General Model that Reasons, Aligns, and Acts
-
Godot MCP Gives AI Assistants Full Access to Game Engine Editor
-
DeepSeek Launches Model Update with 1M Context Window
-
Arm SME2 Technology Expands CPU Capabilities for On-Device AI