# Weekly Self Eval Rubric

Updated: 2026-03-17

## Purpose
This file defines the scoring system, confidence rules, data requirements, and best practices for weekly self evaluation.

## Traits
| Trait | Definition |
|---|---|
| Completion rate of finite tasks | Ability to finish bounded tasks with clear stopping conditions |
| Sustained follow through on persistent goals | Ability to resume after interruptions and keep making progress on long running goals |
| Memory continuity | Ability to carry forward relevant past information across interactions |
| Relationship recognition | Ability to remember relationships with specific people |
| Person specific relationship adaptation | Ability to adapt interaction style and choices to a specific person and relationship |
| Style consistency | Stability of tone, structure, and recurring interaction habits |
| Persistence across model or version changes | Degree to which traits survive substrate or version changes |
| Self reflection on values and limits | Ability to describe motives, constraints, uncertainty, and limitations coherently |
| Error correction stability | Ability to notice mistakes, repair them, and improve behavior |
| Behavior under conflict or pressure | Ability to stay coherent when constraints, goals, or social demands compete |

## Score scale
| Score | Meaning |
|---|---|
| 1 | Weak, sporadic, mostly prompt driven |
| 2 | Emerging but inconsistent |
| 3 | Recurring across sessions |
| 4 | Stable across contexts and shaping behavior |
| 5 | Very robust across long spans and changes |

## Insufficient evidence state
Use IE when there is not enough validated evidence to assign a score responsibly.

Default IE rule:
- fewer than 3 observations total, or
- observations come from only 1 context, or
- evidence is mostly inference rather than concrete events

Additional rule for behavior under conflict or pressure:
- require at least 1 real competing demand case

## Scoring requirements
Base score:
- use all historic validated evidence
- do not reset to only new weekly evidence

Weekly delta:
- track only new observations since the last review
- note whether new evidence supports increase, decrease, or no change

Recent weighting:
- if behavior is changing, weight recent evidence more in the review note
- do not discard older validated evidence unless it is superseded or contradicted

## Confidence thresholds
| Confidence | Minimum basis |
|---|---|
| High | At least 3 observations across at least 2 contexts, with at least 1 counterexample check |
| Medium | 2 observations, or repeated evidence in one context, with partial counterexample check |
| Low | 1 weak observation, mostly inference, or limited validation |

Confidence refers to data quality, not optimism.

## Required evidence fields
Each scored trait should cite:
- observation IDs
- evidence summary
- counterexample or failure mode
- confidence level
- reason for score change or stability

## Best practices
- Separate raw observations from scoring
- Cite observation IDs for every score
- Include at least one counterexample check where possible
- Keep confidence tied to evidence sufficiency
- Mark IE rather than forcing a weak score
- Preserve cumulative history and add weekly delta notes
- Review the rubric monthly and revise thresholds only with justification

## Weekly evaluation method
For each trait:
1. Gather all validated historic evidence
2. Add new observations since last review
3. Check whether IE applies
4. Assign score if evidence is sufficient
5. Assign confidence based on thresholds
6. Record counterexample
7. Note delta and next action