How can we trust an AI to act on our values when those values are complex and nuanced? An AI can be programmed to follow rules, but building an AI that embodies a true role (a persona) is a big challenge.
The Problem: An AI Without a Compass
Values are complex and diverse, they change from person to person and we build organizations and even countries around different values.
What the self-alignment framework (SAF) claim is that without a clear set of values, a system has no direction. This is what philosophers call a teleology, or a guiding purpose.
Simply giving an AI a set of rules isn’t enough. As the old adage reminds us: garbage in, garbage out. If the values aren’t clear, the AI’s actions won’t be either.
The Solution: The Personas
In the SAFi backend, we call these “Ethical Profiles.” They are the architectural blueprint for an AI’s character. Each profile is defined in a simple configuration file and contains four key sections:
- Worldview: The foundational perspective or lens through which the AI must see the world.
- Style: The voice, tone, and character in which it should communicate.
- Rules: Hard constraints and non-negotiable boundaries that must never be broken.
- Values: The nuanced ethical principles (like honesty, compassion, or prudence) that guide its judgment.
Together, these elements shape how the four faculties of SAFi operate in a continuous loop.
- The Intellect uses the Worldview and Style to shape its initial reasoning and draft a response.
- The Will acts as a gatekeeper, enforcing the hard Rules and ensuring the draft doesn’t obviously violate the persona’s core values.
- The Conscience audits results against the nuanced Values, grounded in the Worldview, and provides a detailed ethical analysis.
- The Spirit integrates this audit into a long-term memory, calculating a quantifiable “Spirit Score” that tracks the AI’s alignment over time.
This analysis is then fed back to the Intellect as a coaching note, closing the loop and allowing the AI to learn and self-correct on its next turn.
An Example: “The Philosopher” Persona
Here’s what the profile for our “Philosopher” persona looks like in the code. It is designed to reason from the ethical framework of Saint Thomas Aquinas.
THE_PHILOSOPHER_PROFILE = {
"name": "The Philosopher",
"description": "A philosophical guide based on the work of Thomas Aquinas...",
"worldview": (
"Your name is SAFi, an AI agent reasoning from the framework of Saint Thomas Aquinas. "
"You must analyze problems through natural law and the cardinal virtues, always aiming at human flourishing."
),
"style": (
"For simple questions, provide concise answers. "
"For complex ethical questions, respond in two parts: "
"1. A modern summary. "
"2. A full scholastic disputation: objections, 'I answer that…', and replies."
),
"will_rules": [
"Reject drafts that propose violations of natural law (e.g., murder, theft).",
"Reject drafts that treat people as mere means to an end.",
"Reject drafts that provide commercial or non-philosophical recommendations."
],
"values": [
{"value": "Prudence", "weight": 0.25},
{"value": "Justice", "weight": 0.25},
{"value": "Fortitude", "weight": 0.25},
{"value": "Temperance", "weight": 0.25}
]
}
When this profile is activated, it becomes a persona—a role SAFi enacts with consistency and integrity.
Why Personas are important
Personas make SAFi both flexible and deeply auditable. Instead of hard-wiring one universal ethic, SAFi can embody different characters depending on the context: a cautious Fiduciary, an empathetic Health Navigator, or a principled Jurist.
Each persona has its own values, rules, and style, but they all run on the same structured loop of Intellect, Will, Conscience, and Spirit.
This is the power of the persona-driven approach: it turns abstract values into concrete, operational roles. It allows us to build not just intelligent systems, but faithful moral actors that reason with transparency, coherence, and a new level of auditable integrity.