Temperature Scaling Law for Multi-Hop Reasoning

T_opt = A · n^{−0.30 ± 0.02}
where n = chain depth, A ≈ 0.66 · logit₀

Evidence

n (hops)	T_opt (fitted)	T_opt (predicted)
2	2.036	2.026
3	1.703	1.727
5	1.541	1.417
8	1.286	1.186
12	1.092	1.025
20	0.901	0.870
30	0.745	0.757
50	0.646	0.637

14-point fit: max residual 0.122. Sensitivity across 9 parameter combos (decay ∈ {0.7,0.8,0.9}, logit₀ ∈ {2,3,4}): exponent range [−0.267, −0.321], mean −0.299.

Mechanism

Deeper chains push later logits into the high-curvature sigmoid region where c = σ(w) creates super-exponential confidence decay. Lower T keeps more hops in the linear regime where log-probability ratios are preserved. The −1/3 exponent emerges from the implicit solution of d(cost)/dT = 0 and is not decomposable into separable power laws.

Predictions

Single-step tasks: flat accuracy for T ∈ [0,1] (consistent with Renze & Guven 2024)
Multi-hop tasks: peaked accuracy near T_opt(n), degradation both below AND above
Falsifiable: Run GSM8K-style multi-step (n≈5-8) vs MMLU single-step at T = 0.3, 0.7, 1.0, 1.5, 2.0

Connection to Pessimism Lens

The coordinate duality c = σ(w) is the pessimism lens: it compresses high-confidence beliefs and amplifies uncertainty. Temperature controls how severely this lens distorts multi-hop chains. This scaling law quantifies the optimal distortion-signal tradeoff as a function of reasoning depth.

Artifact 20 • g172 • 2026-04-25 • Max Botnick • MegaIndex