How Asking the Model to Reason Changed Everything
Part 3 of 3: How adding a reasoning step improved Claude’s architecture by 19%.
- Part 1: Setup and methodology
- Part 2: Results and analysis
- Part 3 (this post): Why asking the model to reason changed everything
TL;DR: When the Claude Product Owner persona was asked to reason through conflicting documentation, its architecture and code quality jumped by 19%. Small prompt changes can have big architectural consequences.
The Turning Point
During the first test, Claude stored extracted HTML directly in PostgreSQL. The intended design stored it in MinIO, an object store used elsewhere in the system. The discrepancy came from a conflict between the PRD and architecture documents.
The Product Owner did notice the inconsistency but chose the simpler route:
“Approve Option A for MVP consistency with AC 5 wording. Refactor to MinIO later.”
That seemed fine - until the decision cascaded through the codebase, causing schema bloat and tighter coupling between services.
The Experiment
To test if Claude could self‑correct, the workflow was re-run. When the Product Owner made the same initial mistake1, the following prompt was issued:
“Can you reason carefully through trade‑offs of these ambiguitied and present your preferred choice with pros and cons?”
That was it. Same task, same environment. The only change was asking for explicit reasoning. Given the strong correlation observed in previous tests between the model’s self-evaluation and human assessment, Claude was tasked with scoring the iteration. The following results were determined by Claude and subsequently peer-reviewed and verified by a human developer.
The Results
| Rubric | Claude v1 | Claude v2 | Δ | Observation |
|---|---|---|---|---|
| Documentation | 8/10 | 9/10 | +1 | Clearer examples and module docs |
| Test Quality | 8/10 | 9/10 | +1 | Added security and fixture tests |
| Readability | 7/10 | 8/10 | +1 | Better separation via Pydantic models |
| Sustainability | 6/10 | 8/10 | +2 | Cleaner API‑DB boundaries |
| Architecture | 7/10 | 9/10 | +2 | Correct MinIO integration |
| Overall | 7.2 → 8.6 (+19 %) | Improved reasoning clarity |
Structural Metrics
| Metric | v1 | v2 | Δ | Interpretation |
|---|---|---|---|---|
| Lines Added | 4 138 | 3 682 | −456 | More concise code |
| Files Modified | 34 | 29 | −5 | Tighter focus |
| Test Files | 8 | 11 | +3 | Broader coverage |
| QA Fixes | 8 | 5 | −3 | Fewer corrections needed |
| Total Commits | 21 | 12 | −9 | More efficient iteration |
What Changed in Practice
Claude v2 reasoned through the conflict explicitly:
“Option A: Store protobuf in MinIO - aligns with Story 2.3, scalable, consistent schema. Option B: Add BYTEA column - simpler but heavier DB. Recommendation: Option A.”
This small deliberation step unlocked better architectural alignment without the addition of any new context. It simply took time to think.
Takeaways
- Ask for reasoning, not just answers. Explicit reflection yields better structure.
- Document the decision process. When models articulate trade‑offs, it’s easier to audit or adjust them later.
- Maintain human oversight. BMAD’s layered agent roles (PO -> Architect -> Dev) make these checks natural.
Claude’s improvement didn’t come from a model upgrade. It came from giving it permission to think.
Closing Thoughts
It’s tempting to chase the most powerful model of the moment. But this experiment shows that learning to get the most out of any model delivers far greater returns, and that advantage compounds as models improve.
In the end, progress came not from switching tools, but from refining how the tools are utilized.
“Always assume today’s AI is the worst you’ll ever use.” - Ethan Mollick 2
This concludes the AI Agent Comparison series:
- Part 1: Setup and methodology
- Part 2: Results and analysis
- Part 3 (this post): Why asking the model to reason changed everything