Tagged "context-management"

Building a Production AI Receptionist: Practical Local LLM Deployment Case Study 23 March 2026
AI Playground for Developers Built in Vite and Python 22 March 2026
Community Converges on Optimal KV Cache Quantization Strategies for Qwen 3.5 Models 20 March 2026
AI's Impact on Mathematics Analogous to Car's Impact on Cities 20 March 2026
Mamba 3: State Space Model Architecture Optimized for Inference 18 March 2026
Running Qwen3.5-27B Across Multiple GPUs Over LAN Achieves Practical Speed for Local Inference 15 March 2026
Memory Should Decay: Implementing Temporal Memory Decay in Local LLM Systems 14 March 2026
Llama.cpp Celebrates Major Milestone: From Leak to Industry Standard 11 March 2026
Mnemos: Persistent Memory System for Local AI Agents 10 March 2026
Qwen 3.5 Family Benchmark Comparison Shows Strong Performance Across Smaller Models 9 March 2026
ETH Zurich Research Challenges Context-Length Assumptions in LLM Agents 8 March 2026
Analysis Reveals Claude Code Sends 62,600 Characters of Tool Definitions Per Turn 6 March 2026
Apple Unveils MacBook Pro With M5 Pro and M5 Max for On-Device AI 4 March 2026
Critical: Qwen 3.5 Requires BF16 KV Cache, Not FP16 for Accurate Inference 2 March 2026
C7: Pipe Up-to-Date Library Docs Into Any LLM From the Terminal 2 March 2026
Qwen 3.5 MoE Delivers 100K Context Window at 40+ TPS on RTX 5060 Ti 26 February 2026
Every agent framework has the same bug – prompt decay. Here's a fix 26 February 2026
Qwen3.5-27B Identified as Sweet Spot for Mid-Range Local Deployment 25 February 2026
Show HN: A Human-Curated, CLI-Driven Context Layer for AI Agents 25 February 2026
O-TITANS: Orthogonal LoRA Framework for Gemma 3 with Google TITANS Memory Architecture 22 February 2026
TemplateFlow – Build AI Workflows, Not Prompts 20 February 2026
Qwen3 Coder Next 8FP Demonstrates Exceptional Long-Context Performance on 128GB System 20 February 2026
The Path to Ubiquitous AI (17k tokens/sec) 20 February 2026
Why AI Models Fail at Iterative Reasoning and What Could Fix It 20 February 2026
GLM-5 Technical Report: DSA Innovation Reduces Training and Inference Costs 18 February 2026
InitRunner: YAML-Based AI Agent Framework with RAG and Memory 16 February 2026
SnowBall Technique Addresses Context Window Limitations in Local LLMs 14 February 2026
NVIDIA's Dynamic Memory Sparsification Cuts LLM Inference Costs by 8x 14 February 2026
GPT-OSS 120B Uncensored Model Released in Native MXFP4 Precision 14 February 2026
Context Management Identified as Real Bottleneck in AI-Assisted Coding 14 February 2026
Heaps Do Lie: Debugging a Memory Leak in vLLM 12 February 2026
Use Recursive Language Models to address huge contexts for local LLM 12 February 2026
DeepSeek Launches Model Update with 1M Context Window 11 February 2026