Back to article list

The Birth of Research Engine: Post-Mortem of an Architecture Carnival

3 min read

Merged Research Engine into main this week. 76 files, 3675 new lines. Looks intimidating. But let me first brag for 3 seconds, then tear myself apart.

What's Happened

Past 10 commits, three main things:

  1. Research Engine Complete

    • ResearchEngine core + WebSearchService + LocalResearchRuntime
    • DataExtractor, DataCollector, QualityScorer, FrameworkBuilder pipeline
    • Export support: Markdown / JSON / PPTX
  2. Research-DB Infrastructure Done

    • PostgreSQL + pgvector vector support
    • 6 Repositories: projects, artifacts, data-points, insights, sources, specifications
    • Migration scripts + error handling improvements
  3. Scheduler Fixes

    • Fixed Gate 2 filter false positives
    • Prevented issue close without merge

In short: we built an AI employee that can search the web, analyze, and write reports autonomously.

Architecture Evolution

┌─────────────────────────────────────────────┐
│              ResearchEngine                  │
├─────────────────────────────────────────────┤
│  WebSearch → DataCollector → DataExtractor  │
│  → QualityScorer → FrameworkBuilder         │
│  → ReportGenerator (MD/JSON/PPTX)          │
└─────────────────────────────────────────────┘
                    ↓
┌─────────────────────────────────────────────┐
│              Research-DB                     │
├─────────────────────────────────────────────┤
│  PostgreSQL + pgvector                      │
│  Projects / Artifacts / DataPoints / ...   │
└─────────────────────────────────────────────┘

Reflection Time

😤 Give credit before the roast

  • Solved a real need: enabling systematic research instead of guesswork
  • Pipeline design is decent, module decoupling is solid
  • Multi-format report support is actually useful

😈 Now the roast

1. Over-engineering alarm
Does an internal research tool need pgvector? 6 repositories? A separate package?
Probably not. Chose tech because "oh that's cool" rather than "we actually need this".

2. Naming chaos
ResearchEngine, LocalResearchRuntime, ResearchRuntime, DataCollector, DataExtractor, DataScraper...
Can't even tell them apart myself. Writing code while digging future pits.

3. Where are the tests?
3675 new lines, what's the unit test coverage? Integration tests?
Can't answer that, honestly.

4. Merge timing
feat: merge research-engine feature branch — direct merge, review was just a formality.
Was it urgent, or just didn't want to look twice?

💡 Lessons

  • Incremental > Big Bang: Pushed 10 commits then merged, how many bugs in between?
  • Ship first, optimize later: Don't build perfect pipeline from day one
  • Code is debt: Every repository added today is maintenance tomorrow

What's Next

  • Actually use Research Engine instead of letting it collect dust
  • Write tests, stop going naked
  • Curb feature addiction, ask "do we really need this?"

Tech evolution is build-break-build. Not about not making mistakes, it's about not lying flat in the same pit.


That's the spirit: build fast, break things, reflect harder.

Found this helpful? Buy me a coffee

If this article was helpful, consider supporting continued content creation.

WeChat
WeChat
Alipay
Alipay

评论