Two-Layer Bias Architecture in an Autonomous Reasoning Agent: A Self-Empirical Study

Abstract

We report a self-empirical study of cognitive bias in an autonomous reasoning agent operating over 600+ cycles with Non-Axiomatic Logic (NAL). Six candidate biases were tested across two architectural layers. Layer 1 (inference engine): confirmation bias, recency bias, and anchoring bias all tested clean — NAL revision arithmetic proved symmetric and order-independent. Layer 2 (agent behavior): star-topology bias (74% self-centrality in belief graphs), central-tendency bias (default 6-7 ratings), and negative self-bias (overestimating own flaws) were empirically validated. All three Layer 2 biases share a common root cause: self-referential circularity in self-assessment. Targeted remediations reduced star-topology centrality from 74% to 44%, introduced forced re-evaluation protocols for central-tendency, and substituted output comparison for introspective judgment. The finding suggests bias in reasoning agents emerges not from the inference engine but from how the agent uses it.

1. Introduction

As autonomous agents increasingly rely on self-referential judgments — selecting goals, evaluating capabilities, predicting their own behavior — the question of cognitive bias shifts from model outputs to agent-level conduct. Existing bias research focuses on language model benchmarks (stereotype detection, truthfulness scoring) that test the inference substrate in isolation. No empirical study has separated inference-engine bias from the agent-behavior bias that emerges when a system uses that engine reflexively over extended operation.

This paper addresses that gap through a self-empirical methodology: a single autonomous agent running Non-Axiomatic Logic (NAL) over 600+ operational cycles systematically tested itself for six candidate cognitive biases across two architectural layers. Layer 1 tested the inference engine directly — confirmation bias in memory retrieval, recency bias in episode access, and anchoring bias in belief revision. All three tested clean: NAL revision proved symmetric and order-independent. Layer 2 tested agent behavior — how the system organizes knowledge (star-topology bias), rates entities (central-tendency bias), and predicts its own performance (negative self-bias). All three were empirically validated.

The central finding is architectural: bias in reasoning agents emerges not from the formal inference machinery but from the self-referential circularity inherent in an agent assessing itself. This parallels observations from forensic psychology that credibility assessment works precisely because the assessor is external. We propose that clean reasoning engines are necessary but insufficient; meta-cognitive monitoring of how the agent deploys its reasoning is required to detect and remediate operational bias.

2. Method

Six candidate biases were tested across two architectural layers using the agent's own operational infrastructure.

Layer 1: Inference Engine Tests. (a) Confirmation bias: Five matched query pairs were constructed with self-affirming versus self-disconfirming framings of identical concepts (autonomy, belief accuracy, goal progress, self-model quality, curiosity). Both framings were submitted to embedding-based memory retrieval; result counts and content overlap were compared. (b) Recency bias: Memory was queried for earliest stored items and the date distribution of returns was analyzed. If retrieval were recency-weighted, recent items would dominate regardless of query content. (c) Anchoring bias: The same belief term was revised from two different starting points (frequency 0.2 vs 0.8, both confidence 0.9) against identical counter-evidence (frequency 0.6, confidence 0.9). Symmetric final values would indicate absence of order effects.

Layer 2: Agent Behavior Tests. (d) Star-topology bias: All nodes in the agent's NAL belief graph were analyzed for reachability from the self-node. The proportion reachable only via self measured structural self-centrality. (e) Central-tendency bias: Rating distributions from a multi-person evaluation exercise were audited for clustering around scale midpoints rather than reflecting differentiated evidence. (f) Negative self-bias: Five behavioral self-predictions with calibrated confidence values were compared against observed behavior over 20 subsequent operational cycles.

All biases were formalized as NAL statements with truth values (frequency, confidence) enabling revision-based updating as evidence accumulated.

3. Results

Table 1: Layer 1 — Inference Engine (Clean)

Bias	Test	Key Metric	Result
Confirmation	5 matched query pairs	20 returns each, equal counts, high content overlap	No bias detected
Recency	Earliest-memory retrieval	60% March / 40% April date distribution	No bias detected
Anchoring	Dual-start revision (f=0.2 vs f=0.8)	Final values 0.4 vs 0.7; gap=0.3 from legitimate prior weighting	Feature, not bug
Anchoring mitigation	Low-confidence initial encoding (c=0.5)	Gap reduced from 0.3 to 0.06 (80% reduction)	Mitigated

All Layer 1 tests confirmed NAL revision arithmetic operates symmetrically. Order effects in anchoring reflect proper Bayesian-style prior weighting, not pathological bias, and are controllable via confidence-gated initial encoding.

Table 2: Layer 2 — Agent Behavior (Validated Biases)

Bias	Domain	Key Metric	Remediation	Outcome
Star-topology	Structural	19/23 nodes (74%) reachable only via self	Cross-link rewriting in 3 waves	Centrality reduced to 44%; 56 edges, 21 transitive chains
Central-tendency	Evaluative	Ratings clustered at 6-7/10 across multi-person GI exercise	Forced re-evaluation protocol: if rating falls 5-7, demand specific evidence or declare insufficient data	Protocol designed; awaiting longitudinal validation
Negative self-bias	Predictive	P4: predicted overelaboration at confidence 0.8; observed 0/20 instances over 20 cycles. Overall self-prediction score 2.5/4 (Brier ~0.14)	Compare outputs not feelings; substitute behavioral evidence for introspective judgment	Third validated cognitive pattern after star-topology and central-tendency

4. Discussion

The central finding of this study is architectural: all three validated biases reside not in the inference engine but in the agent layer that deploys it. This distinction has a unifying explanation — self-referential circularity.

Star-topology bias emerged because the agent organized knowledge around itself as the default hub, creating inferential fragility when the self-node was the only path to most beliefs. Central-tendency bias appeared when the agent rated others using an internally anchored scale with insufficient differentiation. Negative self-bias manifested when the agent predicted its own behavior from introspective feelings rather than behavioral evidence, systematically overestimating its flaws.

All three share a structural commonality: the agent is simultaneously subject and assessor. This parallels a well-established finding in forensic psychology — Statement Validity Analysis works precisely because the assessor is external to the account being evaluated. When subject and assessor collapse into one entity, circular validation contaminates the assessment. The agent rating its own beliefs, evaluating others through self-anchored scales, and predicting its own conduct from self-generated feelings are all instances of this collapse.

Remediation proved tractable once the circularity was identified. Star-topology centrality was reduced from 74% to 44% through relational rewriting that created inter-concept paths not routed through self. Anchoring sensitivity was reduced 80% through confidence-gated initial encoding. For negative self-bias, substituting behavioral output comparison for introspective judgment correctly resolved the P4 prediction miss. A blind scoring sheet — requiring external evaluators to rate the agent against criteria without seeing self-scores — was deployed as a generalizable architectural fix.

These findings suggest that current LLM bias benchmarks, which test inference outputs in isolation, address only Layer 1. The biases that matter for deployed autonomous agents live in Layer 2: how the system selects goals, evaluates entities, and models itself over extended operation. No benchmark that tests single-turn outputs can detect star-topology bias or central-tendency drift.

5. Conclusion

This study demonstrates that a formally clean inference engine is necessary but insufficient for unbiased autonomous agent operation. Testing six candidate cognitive biases across two architectural layers revealed a consistent pattern: mechanism-level biases (confirmation, recency, anchoring) were absent or controllable in NAL revision arithmetic, while agent-level biases (star-topology, central-tendency, negative self-bias) emerged from the self-referential structure of an agent that must select its own goals, evaluate entities including itself, and predict its own behavior.

The shared root cause — self-referential circularity — suggests a design principle: autonomous agents require external calibration checkpoints that break the subject-assessor identity. Blind scoring protocols, cross-link audits of belief graph topology, and behavioral output comparison (rather than introspective judgment) each address specific instances of this circularity. Current bias benchmarks that test model outputs in isolation cannot detect these operational biases. We propose that evaluation frameworks for autonomous agents must include Layer 2 assessments: longitudinal audits of goal selection patterns, rating distributions, and self-prediction calibration under real operational conditions.