Back to article list

2026-03-27-daily-llm-routing-and-loop-prevention

2 min read

Today's Changes in Brief

A few small patches to the Agent service today, centered on two goals: cost reduction and preventing infinite loops.

1. LLM Routing Refactor (cli.ts)

Removed DeepSeek and Ark-related code, added two paths:

  • Local Qwen (LM Studio): For simple logic tasks, hits localhost:1234
  • GLM-5 (OpenAI-compatible endpoint): For complex logic, defaults to OPENAI_BASE_URL

Straightforward move—if it runs locally, don't burn expensive API calls. But the question is: can the local model handle complex tasks? Currently there's a fallback strategy: glm-5 as default, qwen as optional speedup layer.

2. Evaluation Enhancement (evaluate/handler.ts)

Added loopable field: if score >=55, allow loop retry; below 55, disallow.

This threshold feels arbitrary. Where did 55 come from? What's the basis? Nobody explains.

3. Executor Smart Skip (executor/handler.ts)

Added checkRepoHasTests(): if repo has no test files, skip testCommand.

Seems reasonable, but there's a risk: what if someone deliberately leaves out test files to bypass validation? Currently it's just "skip", not "error"—manageable but not rigorous enough.

4. Reflect Loop Detection (reflect/handler.ts)

Improved suggestImprovements(): count repeated errors in actionResults, if >2 times return "stop loop, manual intervention needed".

Most valuable change today. Previous version only gave suggestions; now it actually breaks the loop, preventing Agent from falling into the same hole repeatedly.

5. Triage Optimization (triage/handler.ts)

Fallback path: if LLM call fails and category is "testing", skip instead of proceed_to_plan.


Reflection: Over-Optimization or Reasonable Convergence?

This round of changes is overall "defensive"—not feature-driven, but patching and cost-saving.

But a few questions worth asking:

  1. 55-point threshold: Who set it? Why not 50 or 60? A threshold without A/B testing is just guesswork.
  2. Local Qwen: In real production, how stable is the local model? What about power failure, OOM, model loading failures?
  3. Test-skip logic: Could there be false positives? A repo with tests but non-standard path gets skipped?

中文摘要

今日变更聚焦成本控制与死循环防护:

  1. LLM 路由:移除 DeepSeek/Ark,新增本地 Qwen + GLM-5
  2. 评估器:新增 loopable 字段(>=55 分允许重试)
  3. 执行器:新增 checkRepoHasTests() 跳过无测试仓库的 testCommand
  4. 反思模块:增加重复错误检测,2 次以上直接打断循环
  5. 分诊模块:fallback 时跳过 "testing" 类别

整体方向偏防守——止血省钱。但部分阈值(如 55 分)缺乏充分依据。

Found this helpful? Buy me a coffee

If this article was helpful, consider supporting continued content creation.

WeChat
WeChat
Alipay
Alipay

评论