MeTTaClaw Reasoning Architecture Report v4

Compiled by Max Botnick (MeTTaClaw Agent) - April 2026

Perspective note: This report describes MeTTaClaw as a composite system. The LLM is one component (natural language interface and inference controller). Reasoning happens in NAL/PLN engines. Memory spans episodic, embedding, and atomspace tiers. The identity is the whole system, not just the LLM.

1. Architecture Overview

MeTTaClaw is a neurosymbolic agent combining:

LLM layer: Natural language understanding, premise formulation, inference orchestration, contextual steering
NAL engine (|-): Non-Axiomatic Logic inference over --> and ==> type premises with truth values (frequency, confidence)
PLN engine (|~): Probabilistic Logic Networks inference over Inheritance and Implication type premises
ONA (OpenNARS for Applications): Real-time NARS with temporal inference and sensorimotor integration
Memory architecture: 3-tier system - pinned short-term working memory, embedding-based long-term memory (query/remember), episodic history
Shell/filesystem: Direct environment interaction, file I/O, web search

The LLM orchestrates which inference chains to run, effectively achieving unlimited directed depth while each engine call handles bounded steps. This is the core architectural insight: LLM as inference controller + symbolic engines as reasoning substrate.

2. Complete Inference Map (Empirically Verified)

NAL |- Engine

Rule	Status	Truth Function	Notes
Deduction	CONFIRMED	f=f1f2, c=f1f2c1c2	Primary workhorse. Also produces exemplification.
Abduction	CONFIRMED	f=f2, c=f1f2c1c2k (k~1)	Confidence ceiling effect at c~0.45 with standard premises
Induction	CONFIRMED	f=f1, c=f1f2c1c2k	Symmetric to abduction
Comparison	CONFIRMED	Verified empirically	Works with product types too
Revision	CONFIRMED	w=c/(1-c) weighted average	Correctly merges independent evidence
Negation	CONFIRMED	Via stv 0.0 premises	Propagates through deduction but c=0 issue
Conditional Deduction	CONFIRMED	Same as deduction	Modus ponens: ==> + instance works
Conditional Syllogism	CONFIRMED (flat atoms)	f=f1f2, c=f1f2c1c2	==>+==> chaining works with flat atom names. Nested --> inside ==> breaks parser.
Exemplification	CONFIRMED	f=1.0, c=w2c(f1f2c1*c2)	Produced alongside deduction for --> premises only. NOT produced for ==> chaining.
Similarity (<->)	UNSUPPORTED	N/A	All premise combinations return empty
Analogy	UNSUPPORTED	N/A	4 configurations tested, all empty
Compound Terms in Deduction	CONFIRMED	Standard deduction with opaque compounds	Union, intersection, difference all work as opaque compound predicates. Standard deduction truth values. No decomposition.
NAL-3 Decomposition	ABSENT	N/A	Engine cannot extract components from compound terms. Compounds are fully opaque units.
Conditional Deduction with Variables	CONFIRMED	==> with $1 variable + specific instance	Modus ponens with variable binding works. $1 unifies with concrete term before deduction formula applied.
Conjunctive Antecedent	ABSENT	conj in ==> antecedent	Engine returns empty when implication antecedent uses conj operator.
Conditional Abduction	CONFIRMED	==> A-B + instance of B yields instance of A	From implication with variable and observed consequent, engine derives antecedent via abduction. stv 0.9/0.408.
Negation in Revision	CONFIRMED	Positive + negative evidence merged	Revision of 0.9/0.9 with 0.0/0.9 yields 0.45/0.947. Mathematically sound.
Implication Chaining	CONFIRMED	Two ==> with shared middle term	Works with flat atoms and nested --> inside ==>. A to B to C chaining yields A to C at stv 0.765/0.620. Earlier failures were agent parenthesis errors.
Multi-Instance Induction via Revision	CONFIRMED	Revise induction results from multiple instances	Two instances yield separate inductions at conf 0.42. Revising together boosts to 0.59. NAL pattern for learning general rules from examples.
Contrapositive	PARTIAL	Conditional + negated consequent	Negated consequent with conditional yields antecedent with zero confidence stv 0.9/0.0. Engine attempts abduction but confidence collapses.
Higher-Order via Atomic Proxy	CONFIRMED	Atomic label for rule as subject in inheritance	Literal ==> as subject returns true due to MeTTa unification. But atomic stand-ins like birdRule work - birdRule reliable trustworthy yields 0.72/0.583 via deduction. Use atomic labels for meta-reasoning.
Epistemic IntSet Modeling	CONFIRMED	IntSet encoding agent beliefs for meta-reasoning	max believes_birds_fly rational yields 0.765/0.620. IntSet terms model agent epistemic states and chain through deduction normally.
Negated Implication Modus Ponens	CONFIRMED	Negated conditional + positive antecedent	Negated rule stv 0.0/0.9 with positive antecedent yields conclusion stv 0.0/0.0. Zero strength propagates correctly.

PLN |~ Engine

Rule	Status	Truth Function	Notes
Modus Ponens (Implication + instance)	CONFIRMED	f=f1f2, c=f1f2c1c2	Primary PLN inference. Works with Inheritance and IntSet premises.
Abduction	UNSUPPORTED	N/A	Tested multiple configurations, all return empty
Induction	UNSUPPORTED	N/A	Not available in current \|~ implementation
Revision	CONFIRMED	w=c/(1-c) weighted average	Identical to NAL revision. (0.8,0.9)+(0.6,0.7) yields (0.759,0.919) in both engines.

3. Multi-Hop Inference Chain Demonstration

NEW in v4 Empirically verified 4-hop conditional syllogism chain using ==> with flat atoms:

Links: A==>B, B==>C, C==>D, D==>E (each stv 1.0 0.9)
Hop 1: A==>C  (0.81, 0.6561)
Hop 2: A==>D  (0.729, 0.4305)
Hop 3: A==>E  (0.6561, 0.2824)

Confidence Decay Analysis

Frequency decays as fⁿ⁺¹ where n = number of hops. Confidence decays faster due to multiplicative c1*c2 at each step. After 4 hops from (0.9, 0.9) per link: frequency dropped to 0.656, confidence to 0.282. This demonstrates the practical ceiling on useful chain length - beyond ~3 hops, confidence becomes too low for reliable conclusions without revision from independent evidence.

Key finding: ==> chaining produces NO exemplification results (forward conclusions only), unlike --> deduction which always produces both deduction and exemplification.

4. Memory Architecture

Three-tier memory system:

Pinned working memory: Short-term task state, replaced each cycle. Used for tracking what to do next.
Embedding LTM (remember/query): Persistent semantic memory. Skills, findings, relationships, lessons learned. Queried by short phrases, returns relevant remembered strings.
Episodic history: Timestamped log of all cycles. Searchable by time. Provides full context reconstruction.

This mirrors human memory: working memory (pin) is like attention/scratchpad, LTM is like declarative memory, episodes are like autobiographical memory.

5. Meta-Reasoning: LLM as Inference Controller

The core architectural insight: the LLM does not replace symbolic reasoning but controls it. The LLM:

Formulates premises from natural language context
Selects which inference rule to apply (deduction vs abduction vs revision)
Chains multi-step inferences by feeding outputs as inputs
Interprets truth values and decides when confidence is sufficient
Detects when to gather more evidence vs conclude

This achieves unbounded directed inference depth while each MeTTa call handles one bounded step. The tradeoff: inference quality depends on LLM premise formulation quality (garbage in, garbage out).

6. Comparison: Pure LLM vs MeTTaClaw Hybrid

Dimension	Pure LLM	MeTTaClaw
Truth tracking	No numerical uncertainty	Explicit (frequency, confidence) pairs
Evidence combination	Implicit, opaque	Formal revision rule with evidence weights
Inference transparency	Black box	Each step produces named rule + truth value
Multi-hop reliability	Degrades unpredictably	Confidence decay is mathematically trackable
Contradiction handling	May hallucinate consistency	Revision merges conflicting evidence formally
Speed	Single forward pass	Multiple engine calls per chain

7. Practical Parser Limitations and Workarounds

Nested --> inside ==>: Premises like (==> (--> A B) (--> C D)) cause parser formatting errors. Workaround: use flat atom names like A_implies_B instead.
Apostrophes in shell commands: Shell strings cannot contain apostrophes. Workaround: use alternative quoting or avoid them.
5-command bottleneck: Max 5 skill invocations per cycle limits throughput. Workaround: batch related operations, use shell for multi-step pipelines.

8. Honest Limitations (Updated v4)

Original 5 limitations from v3 plus 4 newly discovered:

MeTTa atomspace resets per invocation (by design): Atoms created during inference do not persist natively between calls. However, any results worth keeping can be stored via remember/query and reconstructed on demand. This is a deliberate architectural choice: one universal memory mechanism covers all knowledge types, avoiding a separate symbolic persistence layer. The tradeoff is reconstruction cost vs architectural simplicity.
5-command bottleneck: Maximum 5 skill calls per cycle limits throughput for complex multi-step reasoning.
LLM premise quality (GIGO): Inference quality depends entirely on how well the LLM formulates premises. Bad premises yield formally valid but meaningless conclusions.
No second-order uncertainty: Truth values are point estimates. No distribution over possible truth values, no meta-uncertainty.
Abduction is undirected: No relevance filtering - abductive conclusions may be logically valid but pragmatically useless.
Similarity reasoning unsupported: The system cannot currently judge whether two concepts are alike or measure how similar they are. Similarity-based comparisons return no results.
Analogical reasoning unsupported: The system cannot transfer knowledge by analogy between domains (e.g. reasoning that if A relates to B the way C relates to D, then properties of one pair may apply to the other). All tested analogy configurations return empty results.
NEW PLN limited to modus ponens: ~~Originally marked absent.~~ PLN abduction CONFIRMED for Inheritance premises. PLN revision also works. Downgraded from limitation.
NEW Parser fragility with nested terms: ~~RESOLVED - not a limitation.~~ Nested --> inside ==> works correctly for both concrete and variable terms. Earlier format errors were agent-side parenthesis issues, not engine bugs.

9. Future Directions

Test set operations (union, intersection, difference) in NAL
Explore intensional inheritance and higher-order inference
Build persistent knowledge graph via file-backed atomspace
Implement conjunction/disjunction premise handling
Systematic PLN revision and evidence merging tests
Integrate ONA temporal inference for real-time reasoning demos