Tagged "multimodal"
- A Journey to a Reliable and Enjoyable Locally Hosted Voice Assistant
- Careless Whisper – Personal Local Speech to Text
- MiniMax-M2.7: New Compact Model Announced for Local Deployment
- Local Manga Translator: Production LLM Pipeline with YOLO, OCR, and Inpainting
- Qwen 3.5 Ultra-Compact Models Enable On-Device AI from Watches to Gaming
- PhotoPrism AI-Powered Photos App Brings Better Ollama Integration
- VoiceShelf: Fully Offline Android Audiobook Reader Using Kokoro TTS
- IBM Granite 4.0 1B Speech Model Released for Multilingual Speech Recognition
- MediaTek Advances Omni Model for Efficient Smartphone Inference
- Qwen 3.5 Small Models Released: 0.8B to 9B Parameters Optimized for On-Device Inference
- Qwen 3.5 0.8B Running in Browser with WebGPU via Transformers.js
- DeepSeek V4 Multimodal Model Coming Next Week With Image and Video Generation
- Qwen3.5 Series Releases Comprehensive Model Lineup Across All Tiers
- Qwen3's Voice Embeddings Enable Local Voice Cloning and Mathematical Voice Manipulation
- Qwen3 Demonstrates Advanced Voice Cloning via Embeddings
- PaddleOCR-VL Now Integrated into llama.cpp for Multilingual OCR
- NVIDIA Releases Dynamo v0.9.0: Infrastructure Overhaul With FlashIndexer and Multi-Modal Support
- Running Local LLMs and VLMs on Arduino UNO Q with yzma
- Local Vision-Language Models for Document OCR and PII Detection in Privacy-Critical Workflows
- Critical vLLM RCE Vulnerability Allows Remote Code Execution via Video Links
- ByteDance Releases Seed2.0 LLM with Complex Real-World Task Improvements
- Ming-flash-omni-2.0: 100B MoE Omni-Modal Model Released
- Student Releases Dhi-5B: Multimodal Model Trained for Just $1,200