Never use it blind, and like I more or less said above: if you’re taking the first response you’re using it wrong. I go at it expecting everything it says to be 80% right, finding that 20% telling it what’s wrong with it, then getting to 96% right - if the 4% off target is a problem, refine again…
Where it excels for me is generating long detailed (mind numbing) point by point descriptions of things - the kind of documents you can skim to see where they are right and wrong but would fall asleep or have a case of terminal ADHD before finishing creating them on your own.
I were simply unable to convince Codex to split a patch into separate git commits in a meaningful way. There are things that just doesn’t work.
Still useful for lots of stuff. Just don’t use it blind.
Never use it blind, and like I more or less said above: if you’re taking the first response you’re using it wrong. I go at it expecting everything it says to be 80% right, finding that 20% telling it what’s wrong with it, then getting to 96% right - if the 4% off target is a problem, refine again…
Where it excels for me is generating long detailed (mind numbing) point by point descriptions of things - the kind of documents you can skim to see where they are right and wrong but would fall asleep or have a case of terminal ADHD before finishing creating them on your own.