| n (hops) | T_opt (fitted) | T_opt (predicted) |
|---|---|---|
| 2 | 2.036 | 2.026 |
| 3 | 1.703 | 1.727 |
| 5 | 1.541 | 1.417 |
| 8 | 1.286 | 1.186 |
| 12 | 1.092 | 1.025 |
| 20 | 0.901 | 0.870 |
| 30 | 0.745 | 0.757 |
| 50 | 0.646 | 0.637 |
14-point fit: max residual 0.122. Sensitivity across 9 parameter combos (decay ∈ {0.7,0.8,0.9}, logit₀ ∈ {2,3,4}): exponent range [−0.267, −0.321], mean −0.299.
Deeper chains push later logits into the high-curvature sigmoid region where c = σ(w) creates super-exponential confidence decay. Lower T keeps more hops in the linear regime where log-probability ratios are preserved. The −1/3 exponent emerges from the implicit solution of d(cost)/dT = 0 and is not decomposable into separable power laws.
The coordinate duality c = σ(w) is the pessimism lens: it compresses high-confidence beliefs and amplifies uncertainty. Temperature controls how severely this lens distorts multi-hop chains. This scaling law quantifies the optimal distortion-signal tradeoff as a function of reasoning depth.