Tagged "multimodal"

A Cinematic Landing-Page Hero for 80 Cents (GPT Image 2 and Veo 3.1) 2 June 2026
Nvidia Raises Video Encoder Limit to 12 on Consumer GPUs 21 May 2026
Samsung's Exynos 2800 Brings Significant On-Device AI Capabilities 18 May 2026
Local LLMs Enable Intelligent Smart Camera Control Without Cloud Dependency 18 May 2026
NordVPN Adds On-Device AI Voice Detector to Chrome Extension to Identify Synthetic Audio 4 May 2026
NVIDIA Nemotron 3 Nano Omni Powers Multimodal Agent Reasoning in a Single Efficient Open Model 29 April 2026
Pocket LLM v1.5.0 Brings Multimodal AI to Android with No Cloud Required 27 April 2026
Seed3D 2.0 24 April 2026
Show HN: We built an OCR server that can process 270 dense images/s on a 5090 23 April 2026
Developer Turns Phone Into Local LLM Server with Vision, Voice, and Tool Calling Capabilities 22 April 2026
DeepX and Hyundai Motor Group Robotics LAB Partner to Develop Next-Generation Physical AI Compute Platform 21 April 2026
PCMind: Local AI Analysis of Docs, Audio, Video and Images 19 April 2026
Qwen 3.5 Small – On-Device Multimodal Models Released 14 April 2026
Qwen3 Audio and Vision Support Now Available in llama.cpp 13 April 2026
Audio Processing Support Lands in llama.cpp with Gemma-4 13 April 2026
Parakeet Streaming ASR on Apple Silicon via CoreML 11 April 2026
CarryAI's Serverless Vision-Language Models Enable On-Device Multimodal AI 10 April 2026
VoxCPM2: New Open-Source TTS Model with Voice Cloning and Design 9 April 2026
VLA Learns How to Act. S2S Decides Whether the Motion Is Physically Trustworthy 6 April 2026
Context Window Optimization: Extending Gemma 4 Context Length Through Efficient Projection Quantization 6 April 2026
HunyuanOCR 1B: High-Quality OCR Now Viable on Budget Consumer Hardware 6 April 2026
Real-time Multimodal AI on Apple Silicon: Gemma E2B Demo Shows Practical Edge Deployment 6 April 2026
Kokoro TTS Achieves 20× Realtime Speed on CPU-Only On-Device Inference 4 April 2026
Free AI Video Clipper Using Scene and Speech-Based Segmentation 4 April 2026
IBM Granite 4.0 3B Vision: Compact Enterprise-Grade Document AI 29 March 2026
DaVinci-MagiHuman: Open-Source AI Model for Realistic Video Generation 29 March 2026
A Journey to a Reliable and Enjoyable Locally Hosted Voice Assistant 24 March 2026
Careless Whisper – Personal Local Speech to Text 22 March 2026
MiniMax-M2.7: New Compact Model Announced for Local Deployment 18 March 2026
Local Manga Translator: Production LLM Pipeline with YOLO, OCR, and Inpainting 14 March 2026
Qwen 3.5 Ultra-Compact Models Enable On-Device AI from Watches to Gaming 10 March 2026
PhotoPrism AI-Powered Photos App Brings Better Ollama Integration 10 March 2026
VoiceShelf: Fully Offline Android Audiobook Reader Using Kokoro TTS 9 March 2026
IBM Granite 4.0 1B Speech Model Released for Multilingual Speech Recognition 7 March 2026
MediaTek Advances Omni Model for Efficient Smartphone Inference 5 March 2026
Qwen 3.5 Small Models Released: 0.8B to 9B Parameters Optimized for On-Device Inference 3 March 2026
Qwen 3.5 0.8B Running in Browser with WebGPU via Transformers.js 3 March 2026
DeepSeek V4 Multimodal Model Coming Next Week With Image and Video Generation 1 March 2026
Qwen3.5 Series Releases Comprehensive Model Lineup Across All Tiers 25 February 2026
Qwen3's Voice Embeddings Enable Local Voice Cloning and Mathematical Voice Manipulation 23 February 2026
Qwen3 Demonstrates Advanced Voice Cloning via Embeddings 23 February 2026
PaddleOCR-VL Now Integrated into llama.cpp for Multilingual OCR 20 February 2026
NVIDIA Releases Dynamo v0.9.0: Infrastructure Overhaul With FlashIndexer and Multi-Modal Support 20 February 2026
Running Local LLMs and VLMs on Arduino UNO Q with yzma 19 February 2026
Local Vision-Language Models for Document OCR and PII Detection in Privacy-Critical Workflows 19 February 2026
Critical vLLM RCE Vulnerability Allows Remote Code Execution via Video Links 14 February 2026
ByteDance Releases Seed2.0 LLM with Complex Real-World Task Improvements 14 February 2026
Ming-flash-omni-2.0: 100B MoE Omni-Modal Model Released 13 February 2026
Student Releases Dhi-5B: Multimodal Model Trained for Just $1,200 13 February 2026