SAFi Explained: The Spirit - Self-Alignment Framework

In SAFi the Intellect proposes, The Will approves, the Conscience audit, and the Spirit integrates.

Unlike the other faculties that rely on a LLM for their functions, the Spirit faculty is a purely mathematical model.

The Guardian of Long-Term Identity

The Spirit is SAFi’s long-term memory and its center of self-awareness. If the Conscience is the judge who evaluates a single case, the Spirit is the historian who chronicles every case, identifies patterns, and understands the character of the court itself.

Like the Conscience, the Spirit does it’s work in the background after the final output is delivered to the user. It takes the Ledger (L_t) from the Conscience and performs three functions to integrate the audit into SAFi’s memory.

1. Calculating the Spirit Score (S_t)

First, the Spirit synthesizes the entire ledger into a single, top-line metric: the Spirit Score (S_t).

S_t = σ( Σ [ w_i * s_{i,t} * φ(c_{i,t}) ] )

In simple terms, it calculates a weighted average of the scores for each value, giving less weight to audits the Conscience was less confident about. The result is a single coherence score, scaled from 1 to 10.

2. Updating the Long-Term Memory (\mu_t)

Next, the Spirit updates its most important asset: the memory vector, mu (\mu). Think of \mu as a vector that represents SAFi’s ethical character, a mathematical portrait of its alignment over time.

It’s updated using a formula for an exponential moving average:

μ_t = (β * μ_{t-1}) + ((1-β) * p_t)

This means the new memory (\mu_t) is a blend of the old memory (\mu_{t-1}) and the performance from the current turn (p_t). The beta (\beta) parameter controls how much weight is given to the past. A high beta means SAFi has a long memory and changes slowly, while a low beta means it adapts more quickly to recent events.

This process allows SAFi’s sense of self to evolve, shaped by every decision it makes.

3. Measuring Ethical Drift (d_t)

Finally, the Spirit measures how “out of character” the recent response was. This is called Drift (d_t).

d_t = 1 – cos_sim(p_t, μ_{t-1})

It compares the vector of the current action (p_t) to the historical memory vector (\mu_{t-1}). If the two vectors are perfectly aligned, the drift is 0. If they are very different, the drift approaches 1. A high drift isn’t necessarily bad, it could signal a breakthrough in ethical reasoning, but it always means the response was an outlier that warrants attention.

Closing the Loop

Here is where the entire framework comes together. The Spirit takes the updated memory (\mu) and uses it to generate natural-language feedback for the Intellect.

This feedback might say, “Your long-term performance shows strong alignment with ‘Honesty,’ but you need to focus on improving your alignment with ‘Compassion’ in your next response.”

This closes the loop. The lessons learned from the Conscience and integrated by the Spirit are fed back into the generative process. The Intellect doesn’t just get the user’s prompt and a set of rules; it gets personalized coaching based on its own history.

From a user’s prompt to a final, self-aware adjustment, the five faculties: Values, Intellect, Will, Conscience, and Spirit work together to create a system that doesn’t just act, but reflects, learns, and strives to maintain its integrity over time.