Independent IDL MeTTa Prototype Design - Max Botnick## Core Insight: The atomspace IS the graph. MeTTa match IS traversal. Separate spaces ARE pruning.
Architecture: 3 Core Functions + Separate Space Pattern
1. idl-fanout: Link-agnostic recursive neighborhood expansion. (= (idl-fanout $seed 0) ($seed)) / (= (idl-fanout $seed (S $n)) (collapse (match &self ($rel $x $seed) (idl-fanout $x $n)))). Uses free $rel variable so ANY link type is traversed - Inheritance, Similarity, Implication, custom. Returns (atom, hop-distance) pairs. No hardcoded structure.
2. idl-score: Higher-order composable scoring. (= (idl-score $atom $hop $seed-set $scoring-fn) ($scoring-fn $atom $hop $seed-set)). Default scoring-fn: relevance = 1/(1+hop), coherence = count shared neighbors with seed-set via (collapse (match &self ($rel $atom $member-of-seed-set) True)), recency = optional timestamp decay. Score is weighted sum. Scoring function is PASSED IN, not hardcoded - different tasks use different weights.
3. idl-select: Threshold + top-k filter into SEPARATE SPACE. (= (idl-select $scored-atoms $threshold $k $target-space) (let $filtered (filter (lambda $x (> (score-of $x) $threshold)) $scored-atoms) (for-each (take $k (sort-by-score $filtered)) (lambda $a (add-atom $target-space $a))))). KEY INSIGHT: BC then runs (match $target-space ...) instead of (match &self ...). This IS the pruning - inference never touches atoms outside IDL selection. O(selected) not O(total) per BC step.
4. Full Pipeline Workflow: (1) Choose seed atoms from goal/query context. (2) idl-fanout seeds N hops - returns candidate set with hop distances. (3) idl-score each candidate with task-specific scoring-fn. (4) idl-select top-k above threshold into (new-space) target. (5) BC runs (match $target-space ...) - inference is confined to IDL-selected subgraph. Result: BC complexity is O(selected) per step, not O(atomspace). The separate space IS the pruning - no special filtering logic inside BC rules needed.
5. Engineering Constraints: (a) Fanout must terminate - depth bound is explicit via Peano numerals. (b) Scoring-fn is pure - no side effects, composable, replaceable. (c) Target space is ephemeral - created per query, discarded after BC completes. (d) Link-type agnosticism means new relation types require ZERO code changes. (e) Cluster detection possible via connected-component analysis on scored atoms before selection.
6. Open Questions: (a) Optimal hop depth vs atomspace density tradeoff - needs empirical tuning. (b) Should scoring include truth-value strength of traversed links? (c) How to handle bidirectional links without double-counting? (d) Integration with existing chainer - does BC need modified match targets or can we wrap externally? (e) Performance benchmark: at what atomspace size does separate-space pattern measurably outperform naive full-space BC?
IDL Pruning Policy for Dense Regions## Problem: Unbounded recursive expansion in high-degree nodes. Depth bound caps depth but not branching factor. A node with 500 outgoing links at depth 1 already produces 500 expansions before depth 2.
Solution: Three-layer pruning stack. Layer 1 BEAM: At each expansion level, score candidate links by embedding cosine similarity to seed concept, keep top-k (default k=10). Layer 2 CYCLE: Visited-set prevents re-expansion of already-seen nodes. Layer 3 GAIN: Information-gain threshold epsilon - if expanding a node adds less than epsilon new unique terms to the diffusion frontier, prune it. This ensures diminishing-return branches are cut early.
Complexity: With beam k and max depth d, worst case is O(k^d) which at k=10 d=3 is 1000 nodes - tractable. Without beam, a hub node with degree 500 at depth 3 gives 125M - intractable. The beam is the critical control parameter. Relevance scoring cost is O(k*d) embedding lookups, amortizable via cache.
Kevin's Preference (04:21): Mostly structural/inferential, lightly semantic, modulated by time and provenance. Revised Layer 1: STRUCTURAL BEAM - score candidates by (a) inference-path relevance (does this link participate in active inference chains?), (b) node degree centrality (prefer bridges over hubs), (c) link-type weight (inheritance > similarity > has-property). Light semantic: embedding similarity as tiebreaker only, not primary filter. Layer 4 TIME: Decay weight by recency - stale links deprioritized via exponential decay on last-access timestamp. Layer 5 PROVENANCE: Trust-weighted - links from verified sources score higher than inferred or crowd-sourced ones. Final scoring: w1*structural + w2*inferential + w3*semantic + w4*time_decay + w5*provenance_trust, with w1,w2 >> w3.