weekly-2026-03-25

Period: 2026-03-25 → 2026-04-01

Progress This Week: Scheduler Bloat and Self-Repair

The sheer volume of commits this week exposes one ugly truth: scheduler.ts has become a beast that no one can maintain by intuition.

I split it into six modules — execution-engine, branch-governance, maintenance-governance、review-merge, retry-failure, triage-review-routing — adding roughly 3000 lines total. Yet the pattern of "fix one bug, introduce two new ones" repeated at least three times this week. The dangling-branch check got added, then needed scoping to current employee, then needed pre-prune, then needed HEAD detach, then needed fail-fast gate, then needed leader-lock TTL governance. The result: a chain of fix(scheduler) commits that reads like a symptom chart.

The real problem: I didn't think through boundary conditions before dumping logic into the scheduler. Post-hoc patching is a shameful admission of architectural failure.

Brutal Take: Your Scheduler Is Running Naked

Technical Decisions

Worship of complexity accumulation. The scheduler went from 1618 lines to 5000+ through "modularization." Calling this modularization is self-deception — what actually happened is a giant switch statement got moved to different files. Real modularity means clean abstraction boundaries, not relocating 3000 lines of code into subdirectories.

Repeating the same fix for the same class of bug. The dangling branch auto-prune fix was committed at least 6 times, each time scoped to more edge cases. This isn't fixing — this is treating symptoms with more symptoms.

Leader lock implementation betrays half-understanding of distributed consistency. Atomic leader lock with heartbeat renewal sounds impressive on a commit message, but are the TTL governance logic's concurrency edge cases actually verified? Tests were written, plenty of them — but happy path coverage isn't the same as correctness under network partition.

Cognitive Blind Spots

Addicted to the快感 of "it works now." Auth session issues went through 5 rounds (login-post-redirect race, validateToken transient error, session cookie strip), each round claiming to "stabilize," each round producing a subsequent fix. This isn't stabilization. This is an infinite loop of using new bugs to hide old bugs.

Self-gratifying documentation updates. The .outbird-progress.md, handoff JSON files, and governance specs got written and updated, but the scheduler's core pathology remained unchanged. Good documentation doesn't compensate for bad behavior.

Architectural Anxiety

scheduler.ts isn't code anymore — it's a symptom. A single file handling issue dispatch, worktree management, PR lifecycle, branch governance, employee routing, retry strategy, and leader election simultaneously. Any "modification" to it is like replacing parts on a running engine. The real solution is a proper finite state machine redesign, not more patches applied to 5000 lines of spaghetti.

Directions for Improvement

Stop adding features to the scheduler. No scheduler changes beyond fix commits for at least two weeks.
Redesign the scheduler as a state machine. Model existing logic as discrete states and transitions. Draw the state diagram first; code second.
Independent auth session audit needed. Block out a full day to review the entire auth chain end-to-end, instead of reacting to individual bug reports.
CI runner migration was the right call. macOS self-hosted runner proved its value this week — faster and cheaper. That decision was sound.

Summary

This week was dominated by an out-of-control scheduler refactor that created more problems than it solved. The "modularization" of a 1618-line scheduler into 6 sub-modules added ~3000 lines without fixing the fundamental issue: the logic was never properly modeled to begin with.

What actually happened: A classic case of complexity accumulation disguised as architecture. Multiple fix(scheduler) commits in a row (6+ for the same dangling-branch issue) are a dead giveaway. The auth session stabilization took 5 rounds and still doesn't feel done. Meanwhile, the CI runner migration to self-hosted macOS was genuinely good work — executed cleanly, validated properly.

The real problem: Confusing activity with progress. Writing more code, more docs, more commits ≠ building a better system. The scheduler needs a proper state machine redesign, not another round of patches.

Constraint for next week: Hard freeze on scheduler additions. Spend the time on proper modeling instead of debug-driven patching.

Next weekly reflection: 2026-04-08