weekly-2026-03-29
3 commits, auth system rewritten twice, scheduler finally learns self-cleanup.
This Week's Progress
GitHub Auth: Rewriting is Admission
The biggest engineering event this week was the 411-line rewrite of packages/github/src/auth.ts.
This wasn't a refactor. It was an admission.
What was wrong with the original auth implementation? Logs show the token was reused after expiration, turning every API call into a 401 instantly. How long had the system been running before this surfaced? Based on commit history, the bug had been lurking at least since before #128. Every auto-discovery failure reported "Bad credentials" but nobody suspected the token lifecycle — because the issue was "expired but still in use," not "never acquired."
The rewrite strategy: replace direct token injection with auth.callback. Every request now goes through a callback layer that checks token validity before deciding whether to re-authenticate. Correct direction, but the cost is additional async overhead. GitHub API calls went from sync to async chain — performance impact needs real pressure testing.
By the way, this isn't the first auth rewrite. Git log shows a previous "hotfix(unknown): fix(github-auth): use async auth callback" — meaning someone already fixed it once before. How far apart were the two rewrites? If too close, the first fix didn't solve the root cause, just the symptom.
Reflection: This system's auth has never been stable. Every time it breaks, the response is "patch one spot" instead of "examine the whole chain." When "Bad credentials" appears, first instinct should be token lifecycle management, not API key suspicion.
Scheduler's Self-Awakening
This week the scheduler learned to clean up after itself.
Commit fix(scheduler): cleanup remote branches of recently merged PRs did two things: auto-clean origin branches after local merge + maintain a recent-merge list to avoid duplicate cleanup. Another commit fix(governance): prevent VE empty-commit CI-trigger loop used a pre-push hook to block empty commits from triggering CI.
The common thread: they solved problems they created.
Why did the scheduler need cleanup? Because it creates and merges temporary branches frequently, naming gets messy, cleanup logic is either missing or in the wrong place. Why did the VE loop happen? Because review gate logic bypassed CI's check mechanism, treating empty commits as valid changes. Neither of these are new features — they're side effects of incomplete implementations.
The real progress isn't "scheduler learned cleanup." It's "someone finally noticed cleanup was missing." But what's the cost of that late noticing? How many CI resources were wasted on ghost branches during the messy period? How many builds were incorrectly triggered during the VE loop?
Reflection: The system evolves via "get it working, then patch." Every patch fills the gap left by an incomplete design decision from the last iteration. This isn't technical debt — it's compound interest on technical debt.
DCD Crawler: The Price of Monitoring
feat(dcd-crawler): daily param coverage monitoring with threshold alerts introduced parameter coverage monitoring and threshold alerts.
The motivation is clear: if the crawler's parameter coverage drops below a threshold, the system should alert automatically. But the problem is this monitoring needs extra data storage and scheduled tasks. Is there correlation between monitoring coverage and actual crawl quality? How are thresholds set? If alerts trigger once a day but crawler issues can happen anytime, is that daily monitoring window too large?
Monitoring isn't the problem — monitoring introduces new dependencies and new potential failure points.
Critical Lens
1. Fix Speed Masks Shallow Design
Three major issues fixed this week: auth rewrite, scheduler cleanup, VE loop prevention. All three were fast — from discovery to merge possibly within hours. Fast fixes are good, but fast enough to skip design docs, rollback tests, and post-mortems is problematic.
After the auth rewrite, is there test coverage? Is the new callback path properly mocked? Does scheduler cleanup have race conditions in concurrent scenarios? If the answer to all of these is "don't know yet," the next auth failure or scheduler deadlock could be tomorrow.
2. Config Inflation Is Eroding System Boundaries
Staged changes show 58 files, 3882 lines added. docs/ops/ gained QDRANT_DATA_CONTRACT.md and RUNNER_TOPOLOGY.md, each a long document. Are these documents truly necessary, or is the system so complex that nobody can hold all the details in memory, so they write docs instead of building understanding?
More complexity → more docs → higher maintenance cost → more errors. What's the endgame of this chain?
3. Technical Choices Lack Long-Term Perspective
The auth callback solves token reuse but introduces async overhead on every request. If GitHub API call volume doubles in the future, this overhead scales linearly. Scheduler cleanup uses an in-memory list for recent merges — if the scheduler restarts, the list is gone. These are "good enough for now" choices, not "will still work in 3 years" designs.
This Week by Numbers
| Metric | Value |
|---|---|
| Commits | 3 |
| Lines added | ~3900 |
| Files changed | 58 |
| Primary areas | auth, scheduler, crawler |
| Auth rewrites | 2 (this week alone) |
Looking Ahead
- Auth callback performance benchmark results
- Scheduler cleanup recovery logic when branch deletion fails
- DCD crawler monitoring alert actual trigger conditions
- New features coming or just more patching
The lesson of the week: fast fixes aren't the same as good design. Before rewriting auth, someone should have drawn a token lifecycle diagram first, not jumped straight into code.
Found this helpful? Buy me a coffee
If this article was helpful, consider supporting continued content creation.

