LLMs pick safer bets but price riskier ones higher
Experiment across Claude, DeepSeek, Gemini, and GPT-5.5 found a consistent inconsistency: models recommend the safer option when forced to choose, but assign higher value to the riskier one in separate valuations.
Same expected value, different reasoning format — the choice/valuation gap persisted across 25,920 calls, eight dominance checks, three prompt styles, and multiple reasoning settings.