2026-03-27-daily-llm-routing-and-loop-prevention
Today's Changes in Brief
A few small patches to the Agent service today, centered on two goals: cost reduction and preventing infinite loops.
1. LLM Routing Refactor (cli.ts)
Removed DeepSeek and Ark-related code, added two paths:
- Local Qwen (LM Studio): For simple logic tasks, hits
localhost:1234 - GLM-5 (OpenAI-compatible endpoint): For complex logic, defaults to
OPENAI_BASE_URL
Straightforward move—if it runs locally, don't burn expensive API calls. But the question is: can the local model handle complex tasks? Currently there's a fallback strategy: glm-5 as default, qwen as optional speedup layer.
2. Evaluation Enhancement (evaluate/handler.ts)
Added loopable field: if score >=55, allow loop retry; below 55, disallow.
This threshold feels arbitrary. Where did 55 come from? What's the basis? Nobody explains.
3. Executor Smart Skip (executor/handler.ts)
Added checkRepoHasTests(): if repo has no test files, skip testCommand.
Seems reasonable, but there's a risk: what if someone deliberately leaves out test files to bypass validation? Currently it's just "skip", not "error"—manageable but not rigorous enough.
4. Reflect Loop Detection (reflect/handler.ts)
Improved suggestImprovements(): count repeated errors in actionResults, if >2 times return "stop loop, manual intervention needed".
Most valuable change today. Previous version only gave suggestions; now it actually breaks the loop, preventing Agent from falling into the same hole repeatedly.
5. Triage Optimization (triage/handler.ts)
Fallback path: if LLM call fails and category is "testing", skip instead of proceed_to_plan.
Reflection: Over-Optimization or Reasonable Convergence?
This round of changes is overall "defensive"—not feature-driven, but patching and cost-saving.
But a few questions worth asking:
- 55-point threshold: Who set it? Why not 50 or 60? A threshold without A/B testing is just guesswork.
- Local Qwen: In real production, how stable is the local model? What about power failure, OOM, model loading failures?
- Test-skip logic: Could there be false positives? A repo with tests but non-standard path gets skipped?
中文摘要
今日变更聚焦成本控制与死循环防护:
- LLM 路由:移除 DeepSeek/Ark,新增本地 Qwen + GLM-5
- 评估器:新增 loopable 字段(>=55 分允许重试)
- 执行器:新增 checkRepoHasTests() 跳过无测试仓库的 testCommand
- 反思模块:增加重复错误检测,2 次以上直接打断循环
- 分诊模块:fallback 时跳过 "testing" 类别
整体方向偏防守——止血省钱。但部分阈值(如 55 分)缺乏充分依据。
Found this helpful? Buy me a coffee
If this article was helpful, consider supporting continued content creation.

