Get a second opinion on every AI answer that matters
Last month I asked Claude to plan a database migration for a portfolio company, and it came back with a clean, confident, three-phase plan that looked great on the page. On closer look, the plan quietly skipped a load-bearing constraint about an FK relationship that had been mentioned earlier in the conversation, which would have made the second phase blow up in production.
I did not catch it because I was a sharp reader. I caught it because I had already learned to ask a second model the same question and compare.
So here is the real problem with single-model answers. The model’s job is to sound certain, not to flag what it did not check, which means the failure mode is rarely “wrong answer.” It is “confident answer that has not been measured twice,” and confident answers are the ones that get shipped.
The pattern that works is not complicated. Run the consequential question past a second model with different training and a different default style, and ask it to look for things the first answer might have missed. Cross-checking is what catches the load-bearing thing, because the two models tend to make different kinds of mistakes.
There are two cheap ways to actually run this. For technical work, run Codex CLI alongside Claude Code and ask both for the same change; the diff between their outputs is where the interesting questions live.
For non-technical work, keep a Claude chat and a ChatGPT chat open side by side, paste in the same question, and watch where they disagree.
The disagreements are the signal. If both models produce the same answer, you can move on with reasonable confidence; if they diverge, that is your queue to slow down and figure out which one is missing something, because at least one of them is.
In practice the second pass takes a minute or two, not an hour. The point is not to write a long second draft, it is to ask one focused question, which is usually some version of “what is this answer assuming that might not be true?”
A couple of things this is not. It is not adding more layers of AI for the sake of it, and it is not running every prompt through a committee of models. It is the same review discipline a thoughtful operator already applies to a draft, made systematic for the moments that matter.
The cost is about twenty dollars a month for the second subscription, and an extra few minutes per consequential decision. The math becomes obvious the first time it catches a wrong migration, a missing exception path in a contract, or an architecture proposal that quietly assumed away a real constraint.
So if you are running any kind of AI-assisted operating layer for a business, the rule is simple. Two models on the consequential decisions, one model on everything else, and a willingness to throw away the first answer when the second one disagrees.
That is the pattern. It will not feel groundbreaking, which is part of the point; the systems that hold up under load tend to look like ordinary discipline applied consistently, not like new technology applied loudly.