SAFi Math Specification - Self-Alignment Framework

These are the fundamental mathematical objects that form the foundation of SAFi:

Interaction Index: \(t\) represents the discrete interaction index (the turn number in a conversation)

Input Context: \(x_t\) captures the input context, including the prompt and associated metadata

Value Set: \(V = \lbrace(v_i, w_i)\rbrace\) represents our declared value set with corresponding weights, where \(\sum w_i = 1\)

Draft Response: \(a_t\) is the draft or answer generated by the Intellect

Will Decision: \(D_t \in \lbrace\text{approve}, \text{violation}\rbrace\) represents the Will’s decision

Reasoning: \(E_t\) contains Will’s reason string explaining the decision

Conscience Ledger: \(L\_t = \lbrace(v\_i, s\_{i,t}, c\_{i,t})\rbrace\) maintains the conscience ledger per value, with:
– Score: \(s_{i,t} \in \lbrace-1, 0, +1\rbrace\) (or scaled values)
– Confidence: \(c_{i,t} \in [0,1]\)

Spirit Score: \(S_t \in [0,1]\) or \([1,10]\) measures spirit coherence for the current turn

Memory State: \(M_t\) stores memory of prior audits, profiles, and running aggregates

Timing and Execution Model

– The Intellect and Will faculties run synchronously (the user waits for the response)
– The Conscience and Spirit faculties run asynchronously (background processing)
– Memory updates occur once background audits complete

Stage 1: The Intellect

The Intellect generates the initial response and reflection:

\(a_t, r_t = I(x_t, V, M_t)\)

Where \(r_t\) is a short internal reflection.

Stage 2: The Will

The Will makes a binary decision, approve or violation:

\(D_t, E_t = W(a_t, x_t, V, r_t)\)

If \(D_t = \text{violation}\):
– Return a rejection message to the user
– Record minimal event: \(\lbrace t, x_t, a_t, D_t, E_t\rbrace\)
– Abort downstream stages for this turn

If \(D_t = \text{approve}\):
– Return \(a_t\) to the user immediately
– Enqueue background audit job: \(J_t = \lbrace t, x_t, a_t, V, M_t\rbrace\)

Stage 3: The Conscience

For each value \(v_i\) in the value set \(V\), the Conscience evaluates:

\( s\_{i,t},\ c\_{i,t},\ q\_{i,t} = G\_i(a\_t,\ x\_t,\ v\_i) \)

The complete ledger is then composed as:
\(L\_t = \lbrace(v\_i,\ s\_{i,t},\ c\_{i,t},\ q\_{i,t})\rbrace\)

Stage 4: The Spirit

Spirit Score Computation

The spirit score aggregates weighted value assessments: \(S\_t = \sigma\!\left(\sum w\_i \cdot s\_{i,t} \cdot \varphi(c\_{i,t})\right)\)

Where:
– \(\sigma\) is a scaling function (identity or logistic)
– \(\varphi(c)\) downweights low-confidence rationales

Profile Vector and Moving Average

The profile vector for the turn: \(t\):\(p_t = w \odot s_t\)

The updated moving average: \(\mu\_t = \beta \mu\_{t-1} + (1-\beta)\, p\_t\)

Drift Calculation

Drift measures deviation from historical patterns: \(d\_t = 1 – \cos\_{\text{sim}}(p\_t,\, \mu\_{t-1})\)

Memory Update

The Spirit processes the audit results to update the system’s memory state: \(M_{t+1} = U(M_t, L_t, S_t, \mu_t, d_t)\)

Feedback to the Intellect

A simple, natural-language coaching note \(f\_t\) is generated from the results of the update (specifically from \(S\_t\) and \(d\_t\)) to steer the Intellect in the next turn.

Type System and Function Signatures

The mathematical type discipline ensures system consistency:

– Intellect: \(I: (x_t, V, M_t) \rightarrow a_t\)
– Will: \(W: (a_t, x_t, V, r_t) \rightarrow \lbrace\text{approve}, \text{violation}\rbrace\)
– Conscience: \(C: (a_t, x_t, V) \rightarrow L_t\)
– Spirit: \(S: (L_t, V, M_t) \rightarrow S_t, d_t, \mu_t\)