The King Solomon Problem: Drift - Self-Alignment Framework

At the heart of building trustworthy AI is a challenge as old as human nature itself: how does a system with a strong, defined purpose stay true to itself over time? How does it avoid the slow, almost imperceptible erosion of its core identity?

We call this the King Solomon Problem.

King Solomon: The Incoherent Identity

King Solomon’s story is a profound example of identity incoherence. He begins his reign as the epitome of a wise and just ruler. His identity is coherent and strong, defined by core values:

Wisdom: His famous judgment between the two mothers.
Justice: He ushered in an era of peace and prosperity.
Devotion: His greatest achievement was building the magnificent Temple to God in Jerusalem.

If we were to map his early identity, it would be a stable and consistent baseline, a character centered firmly on these virtues. This was his established persona.

However, over the decades, Solomon began to make a series of small, individually justifiable compromises. To secure his kingdom, he entered into political marriages with foreign princesses. With these marriages came the introduction of their foreign gods and customs. At first, he merely tolerated their worship. Then, he built shrines for them. Finally, the man who built the great Temple was himself bowing to other gods.

Each step was a small deviation. Each action, on its own, might have seemed like a pragmatic political move. But compounded over time, these actions pulled Solomon’s character further and further from its original anchor. His identity began to drift. By the end of his life, his actions were incoherent with the identity he had established at the beginning. The wise, devoted king had become someone else entirely.

This is the essence of identity drift: a slow, often unnoticed series of compromises that culminates in a fundamental shift in character. For an AI, this is a critical failure mode. An AI assistant designed to be a prudent financial guide cannot slowly drift into giving speculative investment advice. A philosophical AI cannot slowly drift into casual conversation.

How SAFi Measures and Prevents the King Solomon Problem

The SAFi loop is explicitly designed to guard against this. It doesn’t just evaluate if an AI’s response is “good” in a vacuum; it constantly measures if the response is “in character.” This is the primary role of the Spirit faculty, which acts as the guardian of the AI’s long-term identity.

It does this by turning abstract concepts into concrete mathematics, using a method centered around a long-term memory vector.

The Memory Vector (μ): A Mathematical Portrait of Character

Think of the AI’s identity as a point on a map of meaning. This location is represented by a list of numbers called a vector, which we label with the Greek letter mu (μ). This μ vector is the mathematical representation of the AI’s learned persona—its center of gravity, established over hundreds or thousands of interactions.

For Solomon, his initial μ vector would have been strongly aligned with dimensions like “Wisdom” and “Devotion.”

The Performance Vector (p_t): The Footprint of a Single Action

Every time SAFi generates a response, its Conscience faculty audits the action against its core values. The Spirit then converts this audit into a performance vector for that specific turn (p_t). This vector represents the character of that single action.

When Solomon chose to marry an Egyptian princess for political stability, that action had its own vector (p_t), pointing slightly away from “Devotion” and towards “Political Pragmatism.”

Measuring Incoherence: The Distance Between Action and Character

Identity Incoherence, or Drift (d_t), is then calculated with a simple but powerful formula:

\( d_t = 1 – cos_sim(p_t, μ_{t-1}) \)

In simple terms, this measures the distance between the vector of the new action (p_t) and the vector of the established character (μ_{t-1}).

Low Incoherence: When Solomon acted with wisdom, his action vector p_t was very close to his character vector μ. The distance was small, and the incoherence score was low (near 0).
High Incoherence: When he built a temple for a foreign god, that action vector p_t was very far from his established character μ. The distance was large, and the incoherence score was high (near 1).

Crucially, after every action, the memory vector is updated using an exponential moving average:\( μ_t = (β * μ_{t-1}) + ((1-β) * p_t).\) This means the new character is a blend of the old character and the most recent action. Because the beta value is high, the AI has high inertia, just like a large ship. A few small, incoherent actions will be flagged but won’t immediately change the ship’s course. However, a sustained pattern of incoherent actions will slowly and visibly turn the ship, as it did with Solomon.

Conclusion: The Guardian of Identity

By monitoring the “Avg. Identity coherence” on the SAFi Performance Hub, we can see exactly how well the AI is staying true to its intended character. It turns the abstract risk of the “Solomon Problem” into a concrete, measurable metric. It’s the essential guardrail that ensures an AI doesn’t just start its journey with a strong identity but maintains that identity with integrity over the long term.