Reasoning models amplify position bias when they think longer
Chain-of-thought reasoning and models like DeepSeek-R1 are assumed to reduce shallow biases through careful thinking. A new study finds the opposite: longer reasoning trajectories correlate with stronger position bias in multiple-choice QA.
Across thirteen configurations (R1-distilled 7–8B, base models with CoT prompts, DeepSeek-R1 671B) on MMLU, ARC-Challenge, and GPQA, twelve showed positive correlation between trajectory length and position bias score (0.11–0.41 correlation, all p < 0.05).
Suggests reasoning-mode scaling does not automatically remove heuristic biases — it can entrench them if the model learns spurious reasoning patterns correlated with answer position.