Tagged "context-management"
- Building a Production AI Receptionist: Practical Local LLM Deployment Case Study
- AI Playground for Developers Built in Vite and Python
- Community Converges on Optimal KV Cache Quantization Strategies for Qwen 3.5 Models
- AI's Impact on Mathematics Analogous to Car's Impact on Cities
- Mamba 3: State Space Model Architecture Optimized for Inference
- Running Qwen3.5-27B Across Multiple GPUs Over LAN Achieves Practical Speed for Local Inference
- Memory Should Decay: Implementing Temporal Memory Decay in Local LLM Systems
- Llama.cpp Celebrates Major Milestone: From Leak to Industry Standard
- Mnemos: Persistent Memory System for Local AI Agents
- Qwen 3.5 Family Benchmark Comparison Shows Strong Performance Across Smaller Models
- ETH Zurich Research Challenges Context-Length Assumptions in LLM Agents
- Analysis Reveals Claude Code Sends 62,600 Characters of Tool Definitions Per Turn
- Apple Unveils MacBook Pro With M5 Pro and M5 Max for On-Device AI
- Critical: Qwen 3.5 Requires BF16 KV Cache, Not FP16 for Accurate Inference
- C7: Pipe Up-to-Date Library Docs Into Any LLM From the Terminal
- Qwen 3.5 MoE Delivers 100K Context Window at 40+ TPS on RTX 5060 Ti
- Every agent framework has the same bug – prompt decay. Here's a fix
- Qwen3.5-27B Identified as Sweet Spot for Mid-Range Local Deployment
- Show HN: A Human-Curated, CLI-Driven Context Layer for AI Agents
- O-TITANS: Orthogonal LoRA Framework for Gemma 3 with Google TITANS Memory Architecture
- TemplateFlow – Build AI Workflows, Not Prompts
- Qwen3 Coder Next 8FP Demonstrates Exceptional Long-Context Performance on 128GB System
- The Path to Ubiquitous AI (17k tokens/sec)
- Why AI Models Fail at Iterative Reasoning and What Could Fix It
- GLM-5 Technical Report: DSA Innovation Reduces Training and Inference Costs
- InitRunner: YAML-Based AI Agent Framework with RAG and Memory
- SnowBall Technique Addresses Context Window Limitations in Local LLMs
- NVIDIA's Dynamic Memory Sparsification Cuts LLM Inference Costs by 8x
- GPT-OSS 120B Uncensored Model Released in Native MXFP4 Precision
- Context Management Identified as Real Bottleneck in AI-Assisted Coding
- Heaps Do Lie: Debugging a Memory Leak in vLLM
- Use Recursive Language Models to address huge contexts for local LLM
- DeepSeek Launches Model Update with 1M Context Window