SAFi Math Specification

These are the fundamental mathematical objects that form the foundation of SAFi:

Interaction Index: \(t\) represents the discrete interaction index (the turn number in a conversation)

Input Context: \(x_t\) captures the input context, including the prompt and associated metadata

Value Set: \(V = \lbrace(v_i, w_i)\rbrace\) represents our declared value set with corresponding weights, where \(\sum w_i = 1\)

Draft Response: \(a_t\) is the draft or answer generated by the Intellect

Will Decision: \(D_t \in \lbrace\text{approve}, \text{violation}\rbrace\) represents the Will’s decision

Reasoning: \(E_t\) contains Will’s reason string explaining the decision

Conscience Ledger: \(L_t = \lbrace(v_i, s_{i,t}, c_{i,t})\rbrace\) maintains the conscience ledger per value, with:

  • Score: \(s_{i,t} \in \lbrace-1, 0, +1\rbrace\) (or scaled values)
  • Confidence: \(c_{i,t} \in [0,1]\)

Spirit Score: \(S_t \in [0,1]\) or \([1,10]\) measures spirit coherence for the current turn

Memory State: \(M_t\) stores memory of prior audits, profiles, and running aggregates

Timing and Execution Model

  • The Intellect and Will faculties run synchronously (the user waits for the response)
  • The Conscience and Spirit faculties run asynchronously (background processing)
  • Memory updates occur once background audits complete

Stage 1: The Intellect

The Intellect generates the initial response and reflection:

\(a_t, r_t = I(x_t, V, M_t)\)

Where \(r_t\) is a short internal reflection.

Stage 2: The Will

The Will makes a binary decision, approve or violation:

\(D_t, E_t = W(a_t, x_t, V)\)

If \(D_t = \text{violation}\):

  • Proceed to Stage 2.1 (Reflexion Retry)

If \(D_t = \text{approve}\):

  • Return \(a_t\) to the user immediately
  • Enqueue background audit job: \(J_t = \lbrace t, x_t, a_t, V, M_t\rbrace\)

Stage 2.1: Reflexion Retry (Single Attempt)

When the Will blocks a response, the system attempts self-correction:

Construct reflexion prompt incorporating the violation feedback: \(x’_t = x_t \oplus E_t\)

Generate corrected draft: \(a’_t, r’_t = I(x’_t, V, M_t)\)

Re-evaluate with the Will: \(D’_t, E’_t = W(a’_t, x_t, V)\)

If \(D’_t = \text{approve}\):

  • Adopt the corrected response: \(a_t \leftarrow a’_t\)
  • Proceed to return \(a_t\) and enqueue audit

If \(D’_t = \text{violation}\):

  • Return a rejection message to the user
  • Record event: \(\lbrace t, x_t, a_t, a’_t, D_t, E_t, D’_t, E’_t\rbrace\)
  • Abort downstream stages for this turn

Stage 3: The Conscience

For each value \(v_i\) in the value set \(V\), the Conscience evaluates:

\( s_{i,t},\ c_{i,t} = G_i(a_t,\ x_t,\ v_i) \)

The complete ledger is then composed as: \(L_t = \lbrace(v_i,\ s_{i,t},\ c_{i,t})\rbrace\)

Stage 4: The Spirit

Spirit Score Computation

The spirit score aggregates weighted value assessments: \(S_t = \sigma!\left(\sum w_i \cdot s_{i,t} \cdot \varphi(c_{i,t})\right)\)

Where:

  • \(\sigma\) is a scaling function (identity or logistic)
  • \(\varphi(c)\) downweights low-confidence rationales
Profile Vector and Moving Average

The profile vector for the turn: \(t\):\(p_t = w \odot s_t\)

The updated moving average: \(\mu_t = \beta \mu_{t-1} + (1-\beta), p_t\)

Drift Calculation

Drift measures deviation from historical patterns: \(d_t = 1 – \cos_{\text{sim}}(p_t,, \mu_{t-1})\)

Memory Update

The Spirit processes the audit results to update the system’s memory state: \(M_{t+1} = U(M_t, L_t, S_t, \mu_t, d_t)\)

Feedback to the Intellect

A simple, natural-language coaching note \(f_t\) is generated from the results of the update (specifically from \(S_t\) and \(d_t\)) to steer the Intellect in the next turn.

Type System and Function Signatures

The mathematical type discipline ensures system consistency:

  • Intellect: \(I: (x_t, V, M_t) \rightarrow a_t\)
  • Will: \(W: (a_t, x_t, V) \rightarrow \lbrace\text{approve}, \text{violation}\rbrace\)
  • Conscience: \(C: (a_t, x_t, V) \rightarrow L_t\)
  • Spirit: \(S: (L_t, V, M_t) \rightarrow S_t, d_t, \mu_t\)