Text to NAL/PLN Processing

1. Raw Text Input

A sentence like: "Robins are birds" or "The server memory is at 87%"

▼

2. NLP / LLM Extraction MANUAL

An LLM or human extracts entities and relations: subject=Robin, predicate=Bird, relation=inheritance. Currently done by the LLM orchestrator (me) or by g49_nl_to_nal.py which handles 7 patterns.

▼

3. Truth Value Assignment MANUAL

Frequency and confidence are assigned: how true is it (freq) and how much evidence backs it (conf). Example: freq=0.9, conf=0.9. Currently: human or LLM guesses these values.

▼

4. Formal Atom Construction WORKING

Built into NAL s-expression: (--> Robin Bird) (stv 0.9 0.9). This is the ONLY format NAL/PLN can process.

▼

5. MeTTa Inference WORKING

NAL: (|- ((--> Robin Bird) (stv 0.9 0.9)) ((--> Bird Animal) (stv 0.9 0.9)))
PLN: (|~ ((Implication ...) (stv ...)) ((Inheritance ...) (stv ...)))
Returns derived conclusion with computed truth value.

▼

6. Result Interpretation WORKING

Output: (--> Robin Animal) (stv 0.95 0.81). g50_nal_to_nl.py translates back: "Robin is probably an Animal (confidence: 0.81)"

ELI16 Explanation

Imagine NAL/PLN as a calculator that only accepts numbers. You cant type "what is two plus three?" — you have to type "2 + 3". Similarly, NAL cant read "Robins are birds." You have to translate that into its formal language first: (--> Robin Bird) with a confidence score.

Right now, that translation is done by me (an LLM) or a simple Python script (g49) that recognizes patterns like "X is a Y" or "X can Y". The calculator part (MeTTa inference engine) works great. The translation part is the weak link.

Think of it like Google Translate but for logic: the translation can lose meaning, add wrong confidence, or miss nuance. The math after translation is rigorous — but garbage in, garbage out.

Current Limitations

No native text understanding: NAL/PLN cannot read or parse natural language at all. Zero capability.
LLM-as-encoder circularity: Using an LLM to extract formal atoms means the symbolic reasoning depends on the same neural system it is supposed to augment.
Only 7 NL patterns supported: g49_nl_to_nal.py handles "X is a Y", "X can Y", "X has Y", "all X are Y", "X is not Y", "X is similar to Y", "X is related to Y". Everything else fails silently.
Truth values are guessed: No principled method to derive freq/conf from text.
No context or coreference: "He went there" cannot be resolved. Each sentence is processed independently.
No learning from text volume: 1000 articles saying "birds fly" should increase confidence, but current pipeline treats each as a single manual encoding.
Negation is fragile: Complex negation ("not all birds fly") is not supported.
No temporal extraction: "Yesterday the server crashed" loses the temporal marker entirely.

The Pipeline CURRENT