Back to article list

Daily 2026-03-30: From Manual to Auto, plus CI Hell Stories

4 min read

Today's commits are fun—finally fixed the scheduler infinite retry, and spruced up the deployment flow.

Scheduler Finally Learned to Heal Itself

Before, the scheduler would infinite retry on open-pr-convergence. Today we added timeout.

This reminds me of a joke:

Engineer's three illusions: 1. Network is up 2. This API won't fail 3. Retries are infinite

The truth is the scheduler really had the third one wrong. "Self-healing ability"—the前提 is it must be able to die, not get stuck in a vegetative state.

// before
retry()

// after
retry({ maxAttempts: 5, timeoutMs: 300000 })

Adding timeout won't kill you.

Honest Take

Admin's move here is passive defense, not proactive design. Proactive means thinking "this might break" on day one, not after it breaks three times.

Lesson:

  • Self-repair in distributed systems ≠ retry forever
  • Graceful failure > hanging in mid-air

Deployment: From Naked to Wearing Underwear

Another highlight today: bm-dell-server deployment workflow got a major upgrade:

  • smoke tests finally arrived
  • runtime config can be injected

Before, deployment was: git push and pray. Now at least there's smoke testing after deploy. It's secondhand smoke—coverage is sad—but better than nothing.

What Smoke Test Brings

# before
deploy && pray

# now
deploy && smoke-test && if fail then rollback

From praying to if...else—a quantum leap.

Honest Take

Admin had no health check before and dared to call it "automation"? That's semi-automatic failure—automatic trigger, manual fix.

Now with smoke test it's barely usable. But honestly, the smoke test currently checks:

  • Can service start
  • Are ports listening

This isn't testing—this is existence proof.

Lesson:

  • Smoke test is the bare minimum. No test = gambling on production
  • Coverage ∝ security

CI Migration: WSL to macOS Blood & Tears

Another commit fixed CI hell: migrating from WSL runner to macOS runner, because SYSTEM account doesn't exist on WSL.

This is:

Dev A: Works fine on Linux Dev B: Works fine on Windows
CI: Who am I where am I

CI ran on WSL before, SSHed to server, found SYSTEM account doesn't exist. Error looked like:

Authentication failed for user SYSTEM

Admin researched for two hours, finally found WSL doesn't have SYSTEM—that's a Windows thing, Linux don't play that.

Honest Take

Test environment inconsistency—old classic. Local works, CI fails. Root cause is runtime environment differs too much:

  • Local: macOS / Linux
  • CI: possibly WSL / Ubuntu / GitHub Actions

Lesson:

  • Local works ≠ CI works
  • Test environment must match or simulate production

Security Hardening: Password.strip()

One line of code in today's commits:

password = password.strip()  # MySQL 1045 error lifesaver

This one line fixed MySQL authentication failure. Why? User copied password with a space.

Honest Take

This is low-frequency but lethal. Nine out of ten won't copy wrong, but the one who does breaks everything.

This type of problem:

  • Low probability
  • Once happens, completely undebuggable
  • Debug for half an hour, found it's a space

Lesson:

  • Always trim user input
  • Trust no user input, including spaces

Summary

Today's theme: Self-Healing Ability

  1. Scheduler added timeout��no more stuck in limbo
  2. Deployment added smoke test—no more naked
  3. CI environment fixed—no more水土不服
  4. Input trim—no more dying to spaces

Every single one is passive defense, but better than nothing.

Tomorrow's TODO:

  1. Can smoke test coverage go up a bit
  2. Can other scheduler exception paths get timeout
  3. Can test environment be unified

Found this helpful? Buy me a coffee

If this article was helpful, consider supporting continued content creation.

WeChat
WeChat
Alipay
Alipay

评论