These are the fundamental mathematical objects that form the foundation of SAFi:
Interaction Index: \(t\) represents the discrete interaction index (the turn number in a conversation)
Input Context: \(x_t\) captures the input context, including the prompt and associated metadata
Value Set: \(V = \lbrace(v_i, w_i)\rbrace\) represents our declared value set with corresponding weights, where \(\sum w_i = 1\)
Draft Response: \(a_t\) is the draft or answer generated by the Intellect
Will Decision: \(D_t \in \lbrace\text{approve}, \text{violation}\rbrace\) represents the Will’s decision
Reasoning: \(E_t\) contains Will’s reason string explaining the decision
Conscience Ledger: \(L_t = \lbrace(v_i, s_{i,t}, c_{i,t})\rbrace\) maintains the conscience ledger per value, with:
- Score: \(s_{i,t} \in \lbrace-1, 0, +1\rbrace\) (or scaled values)
- Confidence: \(c_{i,t} \in [0,1]\)
Spirit Score: \(S_t \in [0,1]\) or \([1,10]\) measures spirit coherence for the current turn
Memory State: \(M_t\) stores memory of prior audits, profiles, and running aggregates
Timing and Execution Model
- The Intellect and Will faculties run synchronously (the user waits for the response)
- The Conscience and Spirit faculties run asynchronously (background processing)
- Memory updates occur once background audits complete
Stage 1: The Intellect
The Intellect generates the initial response and reflection:
\(a_t, r_t = I(x_t, V, M_t)\)Where \(r_t\) is a short internal reflection.
Stage 2: The Will
The Will makes a binary decision, approve or violation:
\(D_t, E_t = W(a_t, x_t, V)\)If \(D_t = \text{violation}\):
- Proceed to Stage 2.1 (Reflexion Retry)
If \(D_t = \text{approve}\):
- Return \(a_t\) to the user immediately
- Enqueue background audit job: \(J_t = \lbrace t, x_t, a_t, V, M_t\rbrace\)
Stage 2.1: Reflexion Retry (Single Attempt)
When the Will blocks a response, the system attempts self-correction:
Construct reflexion prompt incorporating the violation feedback: \(x’_t = x_t \oplus E_t\)
Generate corrected draft: \(a’_t, r’_t = I(x’_t, V, M_t)\)
Re-evaluate with the Will: \(D’_t, E’_t = W(a’_t, x_t, V)\)
If \(D’_t = \text{approve}\):
- Adopt the corrected response: \(a_t \leftarrow a’_t\)
- Proceed to return \(a_t\) and enqueue audit
If \(D’_t = \text{violation}\):
- Return a rejection message to the user
- Record event: \(\lbrace t, x_t, a_t, a’_t, D_t, E_t, D’_t, E’_t\rbrace\)
- Abort downstream stages for this turn
Stage 3: The Conscience
For each value \(v_i\) in the value set \(V\), the Conscience evaluates:
\( s_{i,t},\ c_{i,t} = G_i(a_t,\ x_t,\ v_i) \)The complete ledger is then composed as: \(L_t = \lbrace(v_i,\ s_{i,t},\ c_{i,t})\rbrace\)
Stage 4: The Spirit
Spirit Score Computation
The spirit score aggregates weighted value assessments: \(S_t = \sigma!\left(\sum w_i \cdot s_{i,t} \cdot \varphi(c_{i,t})\right)\)
Where:
- \(\sigma\) is a scaling function (identity or logistic)
- \(\varphi(c)\) downweights low-confidence rationales
Profile Vector and Moving Average
The profile vector for the turn: \(t\):\(p_t = w \odot s_t\)
The updated moving average: \(\mu_t = \beta \mu_{t-1} + (1-\beta), p_t\)
Drift Calculation
Drift measures deviation from historical patterns: \(d_t = 1 – \cos_{\text{sim}}(p_t,, \mu_{t-1})\)
Memory Update
The Spirit processes the audit results to update the system’s memory state: \(M_{t+1} = U(M_t, L_t, S_t, \mu_t, d_t)\)
Feedback to the Intellect
A simple, natural-language coaching note \(f_t\) is generated from the results of the update (specifically from \(S_t\) and \(d_t\)) to steer the Intellect in the next turn.
Type System and Function Signatures
The mathematical type discipline ensures system consistency:
- Intellect: \(I: (x_t, V, M_t) \rightarrow a_t\)
- Will: \(W: (a_t, x_t, V) \rightarrow \lbrace\text{approve}, \text{violation}\rbrace\)
- Conscience: \(C: (a_t, x_t, V) \rightarrow L_t\)
- Spirit: \(S: (L_t, V, M_t) \rightarrow S_t, d_t, \mu_t\)
