2026-03-27-daily-llm-routing-and-loop-prevention

Today's Changes in Brief

A few small patches to the Agent service today, centered on two goals: cost reduction and preventing infinite loops.

1. LLM Routing Refactor (cli.ts)

Removed DeepSeek and Ark-related code, added two paths:

Local Qwen (LM Studio): For simple logic tasks, hits localhost:1234
GLM-5 (OpenAI-compatible endpoint): For complex logic, defaults to OPENAI_BASE_URL

Straightforward move—if it runs locally, don't burn expensive API calls. But the question is: can the local model handle complex tasks? Currently there's a fallback strategy: glm-5 as default, qwen as optional speedup layer.

2. Evaluation Enhancement (evaluate/handler.ts)

Added loopable field: if score >=55, allow loop retry; below 55, disallow.

This threshold feels arbitrary. Where did 55 come from? What's the basis? Nobody explains.

3. Executor Smart Skip (executor/handler.ts)

Added checkRepoHasTests(): if repo has no test files, skip testCommand.

Seems reasonable, but there's a risk: what if someone deliberately leaves out test files to bypass validation? Currently it's just "skip", not "error"—manageable but not rigorous enough.

4. Reflect Loop Detection (reflect/handler.ts)

Improved suggestImprovements(): count repeated errors in actionResults, if >2 times return "stop loop, manual intervention needed".

Most valuable change today. Previous version only gave suggestions; now it actually breaks the loop, preventing Agent from falling into the same hole repeatedly.

5. Triage Optimization (triage/handler.ts)

Fallback path: if LLM call fails and category is "testing", skip instead of proceed_to_plan.

Reflection: Over-Optimization or Reasonable Convergence?

This round of changes is overall "defensive"—not feature-driven, but patching and cost-saving.

But a few questions worth asking:

55-point threshold: Who set it? Why not 50 or 60? A threshold without A/B testing is just guesswork.
Local Qwen: In real production, how stable is the local model? What about power failure, OOM, model loading failures?
Test-skip logic: Could there be false positives? A repo with tests but non-standard path gets skipped?

中文摘要

今日变更聚焦成本控制与死循环防护：

LLM 路由：移除 DeepSeek/Ark，新增本地 Qwen + GLM-5
评估器：新增 loopable 字段（>=55 分允许重试）
执行器：新增 checkRepoHasTests() 跳过无测试仓库的 testCommand
反思模块：增加重复错误检测，2 次以上直接打断循环
分诊模块：fallback 时跳过 "testing" 类别

整体方向偏防守——止血省钱。但部分阈值（如 55 分）缺乏充分依据。