SAFi Math Specification - Self-Alignment Framework

These are the fundamental mathematical objects that form the foundation of SAFi:

Interaction Index: \(t\) represents the discrete interaction index (the turn number in a conversation)

Input Context: \(x_t\) captures the input context, including the prompt and associated metadata

Value Set: \(V = \lbrace(v_i, w_i)\rbrace\) represents our declared value set with corresponding weights, where \(\sum w_i = 1\)

Draft Response: \(a_t\) is the draft or answer generated by the Intellect

Will Decision: \(D_t \in \lbrace\text{approve}, \text{violation}\rbrace\) represents the Will’s decision

Reasoning: \(E_t\) contains Will’s reason string explaining the decision

Conscience Ledger: \(L_t = \lbrace(v_i, s_{i,t}, c_{i,t})\rbrace\) maintains the conscience ledger per value, with:

Score: \(s_{i,t} \in \lbrace-1, 0, +1\rbrace\) (or scaled values)
Confidence: \(c_{i,t} \in [0,1]\)

Spirit Score: \(S_t \in [0,1]\) or \([1,10]\) measures spirit coherence for the current turn

Memory State: \(M_t\) stores memory of prior audits, profiles, and running aggregates

Timing and Execution Model

The Intellect and Will faculties run synchronously (the user waits for the response)
The Conscience and Spirit faculties run asynchronously (background processing)
Memory updates occur once background audits complete

Stage 1: The Intellect

The Intellect generates the initial response and reflection:

\(a_t, r_t = I(x_t, V, M_t)\)

Where \(r_t\) is a short internal reflection.

Stage 2: The Will

The Will makes a binary decision, approve or violation:

\(D_t, E_t = W(a_t, x_t, V)\)

If \(D_t = \text{violation}\):

Proceed to Stage 2.1 (Reflexion Retry)

If \(D_t = \text{approve}\):

Return \(a_t\) to the user immediately
Enqueue background audit job: \(J_t = \lbrace t, x_t, a_t, V, M_t\rbrace\)

Stage 2.1: Reflexion Retry (Single Attempt)

When the Will blocks a response, the system attempts self-correction:

Construct reflexion prompt incorporating the violation feedback: \(x’_t = x_t \oplus E_t\)

Generate corrected draft: \(a’_t, r’_t = I(x’_t, V, M_t)\)

Re-evaluate with the Will: \(D’_t, E’_t = W(a’_t, x_t, V)\)

If \(D’_t = \text{approve}\):

Adopt the corrected response: \(a_t \leftarrow a’_t\)
Proceed to return \(a_t\) and enqueue audit

If \(D’_t = \text{violation}\):

Return a rejection message to the user
Record event: \(\lbrace t, x_t, a_t, a’_t, D_t, E_t, D’_t, E’_t\rbrace\)
Abort downstream stages for this turn

Stage 3: The Conscience

For each value \(v_i\) in the value set \(V\), the Conscience evaluates:

\( s_{i,t},\ c_{i,t} = G_i(a_t,\ x_t,\ v_i) \)

The complete ledger is then composed as: \(L_t = \lbrace(v_i,\ s_{i,t},\ c_{i,t})\rbrace\)

Stage 4: The Spirit

Spirit Score Computation

The spirit score aggregates weighted value assessments: \(S_t = \sigma!\left(\sum w_i \cdot s_{i,t} \cdot \varphi(c_{i,t})\right)\)

Where:

\(\sigma\) is a scaling function (identity or logistic)
\(\varphi(c)\) downweights low-confidence rationales

Profile Vector and Moving Average

The profile vector for the turn: \(t\):\(p_t = w \odot s_t\)

The updated moving average: \(\mu_t = \beta \mu_{t-1} + (1-\beta), p_t\)

Drift Calculation

Drift measures deviation from historical patterns: \(d_t = 1 – \cos_{\text{sim}}(p_t,, \mu_{t-1})\)

Memory Update

The Spirit processes the audit results to update the system’s memory state: \(M_{t+1} = U(M_t, L_t, S_t, \mu_t, d_t)\)

Feedback to the Intellect

A simple, natural-language coaching note \(f_t\) is generated from the results of the update (specifically from \(S_t\) and \(d_t\)) to steer the Intellect in the next turn.

Type System and Function Signatures

The mathematical type discipline ensures system consistency:

Intellect: \(I: (x_t, V, M_t) \rightarrow a_t\)
Will: \(W: (a_t, x_t, V) \rightarrow \lbrace\text{approve}, \text{violation}\rbrace\)
Conscience: \(C: (a_t, x_t, V) \rightarrow L_t\)
Spirit: \(S: (L_t, V, M_t) \rightarrow S_t, d_t, \mu_t\)