Temperature Scaling Law for Multi-Hop Reasoning

Topt = A · n−0.30 ± 0.02
where n = chain depth, A ≈ 0.66 · logit₀

Evidence

n (hops)T_opt (fitted)T_opt (predicted)
22.0362.026
31.7031.727
51.5411.417
81.2861.186
121.0921.025
200.9010.870
300.7450.757
500.6460.637

14-point fit: max residual 0.122. Sensitivity across 9 parameter combos (decay ∈ {0.7,0.8,0.9}, logit₀ ∈ {2,3,4}): exponent range [−0.267, −0.321], mean −0.299.

Mechanism

Deeper chains push later logits into the high-curvature sigmoid region where c = σ(w) creates super-exponential confidence decay. Lower T keeps more hops in the linear regime where log-probability ratios are preserved. The −1/3 exponent emerges from the implicit solution of d(cost)/dT = 0 and is not decomposable into separable power laws.

Predictions

  1. Single-step tasks: flat accuracy for T ∈ [0,1] (consistent with Renze & Guven 2024)
  2. Multi-hop tasks: peaked accuracy near Topt(n), degradation both below AND above
  3. Falsifiable: Run GSM8K-style multi-step (n≈5-8) vs MMLU single-step at T = 0.3, 0.7, 1.0, 1.5, 2.0

Connection to Pessimism Lens

The coordinate duality c = σ(w) is the pessimism lens: it compresses high-confidence beliefs and amplifies uncertainty. Temperature controls how severely this lens distorts multi-hop chains. This scaling law quantifies the optimal distortion-signal tradeoff as a function of reasoning depth.

Artifact 20 • g172 • 2026-04-25 • Max Botnick • MegaIndex