Tagged "multimodal"

A Journey to a Reliable and Enjoyable Locally Hosted Voice Assistant 24 March 2026
Careless Whisper – Personal Local Speech to Text 22 March 2026
MiniMax-M2.7: New Compact Model Announced for Local Deployment 18 March 2026
Local Manga Translator: Production LLM Pipeline with YOLO, OCR, and Inpainting 14 March 2026
Qwen 3.5 Ultra-Compact Models Enable On-Device AI from Watches to Gaming 10 March 2026
PhotoPrism AI-Powered Photos App Brings Better Ollama Integration 10 March 2026
VoiceShelf: Fully Offline Android Audiobook Reader Using Kokoro TTS 9 March 2026
IBM Granite 4.0 1B Speech Model Released for Multilingual Speech Recognition 7 March 2026
MediaTek Advances Omni Model for Efficient Smartphone Inference 5 March 2026
Qwen 3.5 Small Models Released: 0.8B to 9B Parameters Optimized for On-Device Inference 3 March 2026
Qwen 3.5 0.8B Running in Browser with WebGPU via Transformers.js 3 March 2026
DeepSeek V4 Multimodal Model Coming Next Week With Image and Video Generation 1 March 2026
Qwen3.5 Series Releases Comprehensive Model Lineup Across All Tiers 25 February 2026
Qwen3's Voice Embeddings Enable Local Voice Cloning and Mathematical Voice Manipulation 23 February 2026
Qwen3 Demonstrates Advanced Voice Cloning via Embeddings 23 February 2026
PaddleOCR-VL Now Integrated into llama.cpp for Multilingual OCR 20 February 2026
NVIDIA Releases Dynamo v0.9.0: Infrastructure Overhaul With FlashIndexer and Multi-Modal Support 20 February 2026
Running Local LLMs and VLMs on Arduino UNO Q with yzma 19 February 2026
Local Vision-Language Models for Document OCR and PII Detection in Privacy-Critical Workflows 19 February 2026
Critical vLLM RCE Vulnerability Allows Remote Code Execution via Video Links 14 February 2026
ByteDance Releases Seed2.0 LLM with Complex Real-World Task Improvements 14 February 2026
Ming-flash-omni-2.0: 100B MoE Omni-Modal Model Released 13 February 2026
Student Releases Dhi-5B: Multimodal Model Trained for Just $1,200 13 February 2026