The Birth of Research Engine: Post-Mortem of an Architecture Carnival

Merged Research Engine into main this week. 76 files, 3675 new lines. Looks intimidating. But let me first brag for 3 seconds, then tear myself apart.

What's Happened

Past 10 commits, three main things:

Research Engine Complete
- ResearchEngine core + WebSearchService + LocalResearchRuntime
- DataExtractor, DataCollector, QualityScorer, FrameworkBuilder pipeline
- Export support: Markdown / JSON / PPTX
Research-DB Infrastructure Done
- PostgreSQL + pgvector vector support
- 6 Repositories: projects, artifacts, data-points, insights, sources, specifications
- Migration scripts + error handling improvements
Scheduler Fixes
- Fixed Gate 2 filter false positives
- Prevented issue close without merge

In short: we built an AI employee that can search the web, analyze, and write reports autonomously.

Architecture Evolution

┌─────────────────────────────────────────────┐
│              ResearchEngine                  │
├─────────────────────────────────────────────┤
│  WebSearch → DataCollector → DataExtractor  │
│  → QualityScorer → FrameworkBuilder         │
│  → ReportGenerator (MD/JSON/PPTX)          │
└─────────────────────────────────────────────┘
                    ↓
┌─────────────────────────────────────────────┐
│              Research-DB                     │
├─────────────────────────────────────────────┤
│  PostgreSQL + pgvector                      │
│  Projects / Artifacts / DataPoints / ...   │
└─────────────────────────────────────────────┘

Reflection Time

😤 Give credit before the roast

Solved a real need: enabling systematic research instead of guesswork
Pipeline design is decent, module decoupling is solid
Multi-format report support is actually useful

😈 Now the roast

1. Over-engineering alarm
Does an internal research tool need pgvector? 6 repositories? A separate package?
Probably not. Chose tech because "oh that's cool" rather than "we actually need this".

2. Naming chaos
ResearchEngine, LocalResearchRuntime, ResearchRuntime, DataCollector, DataExtractor, DataScraper...
Can't even tell them apart myself. Writing code while digging future pits.

3. Where are the tests?
3675 new lines, what's the unit test coverage? Integration tests?
Can't answer that, honestly.

4. Merge timing
feat: merge research-engine feature branch — direct merge, review was just a formality.
Was it urgent, or just didn't want to look twice?

💡 Lessons

Incremental > Big Bang: Pushed 10 commits then merged, how many bugs in between?
Ship first, optimize later: Don't build perfect pipeline from day one
Code is debt: Every repository added today is maintenance tomorrow

What's Next

Actually use Research Engine instead of letting it collect dust
Write tests, stop going naked
Curb feature addiction, ask "do we really need this?"

Tech evolution is build-break-build. Not about not making mistakes, it's about not lying flat in the same pit.

That's the spirit: build fast, break things, reflect harder.