The Self-Alignment Framework: An Architecture for Moral Reasoning

Introduction

The Self-Alignment Framework (SAF) is the product of my own long, winding—and often spiritual—journey through alignment and misalignment, and every shade in between.

Since I was 17, I have been restless to understand purpose, meaning, and ultimately truth. That curiosity led me into Western philosophy—Aristotle, the Stoics, Kant, and many others.

By 2018, the core “loop” of SAF had surfaced in my thinking, though I was still struggling to name two of its pieces: Conscience and Spirit. I had internalized the loop but lacked the language to explain it. By 2022 the whole framework was clear in my mind, though still unwritten. One day while on vacation, I finally drafted the five faculties on paper; the framework took shape, and a surprising peace followed.

At first I saw SAF as a clever personal‑development model. Then AI entered the picture. While brainstorming with large language models, I realized the loop might be universal—applicable not only to humans but to AI. Until that moment I hadn’t even heard of the “alignment problem.” Testing the loop in code became SAFi, and the results convinced me the structure could indeed serve as a universal moral system, almost like an operating system.

SAFi, in my opinion, does not have moral agency as humans do, and I don’t think AI can ever develop that, but SAFi is a faithful moral actor—a tool that executes on its principles with transparent and unwavering integrity.

To bring this framework to life, I also recorded a podcast that simulates a conversation between myself and SAFi about these faculties. If you prefer engaging through dialogue, it might be a better alternative to reading.

The Twofold Universality of the Framework

The claim of universality for SAF rests on two distinct but related concepts: its structure and its function. Understanding this distinction is key to grasping its potential.

Structural Universality: The Human-to-AI Bridge

The framework’s primary claim to universality comes from its origin. SAF was not designed for AI. It was reverse-engineered from the recorded history of human moral and philosophical inquiry. The framework’s five faculties represent enduring structures of human moral psychology that have been debated for millennia.

The fact that this human-centric model can be so cleanly operationalized in a machine suggests that the pattern of reasoning itself—the way the faculties interact in a logical flow—is fundamental. It serves as a bridge between biological cognition and artificial cognition. This structural universality means we can use the same blueprint to understand ourselves and to design AI systems.

Functional Universality: The Value-Agnostic Engine

The second form of universality lies in its function as a tool. The SAFi architecture is value-agnostic. The core machinery—the loop of proposing, approving, auditing, and integrating—is indifferent to the specific ethical content it is asked to serve.

Think of it as a powerful, well-designed engine. The engine’s function is to faithfully convert fuel into power. It is “agnostic” as to whether the fuel is gasoline, diesel, or electricity. Likewise, SAFi is an engine for converting a value system into aligned behavior. Its primary virtue is fidelity. It can be loaded with any coherent value set—be it Catholic theology, Buddhist ethics, or a corporate charter—and it will execute on those values with a logical purity that humans, with our “messy” combination of emotions and biases, often cannot. This makes it a universally applicable tool for creating transparent and faithful agents.

The Five Faculties: A Deeper Dive

Values

Role in SAF: The foundational source code of moral identity. The unchanging (or slowly changing) principles from which all reasoning flows.
Philosophical Depth: Western thought has wrestled with the nature of “Values,” moving from the objective eudaimonia (flourishing) of Aristotle and the divine Logos of the Stoics to a more subjective framing. The “transvaluation of all values” called for by Nietzsche marked a critical turning point, suggesting values were human creations. SAF acknowledges this tension; an individual’s or system’s values can be seen as either discovered or defined, but in all cases, they must be made explicit to enable alignment.
Operationalization in SAFi: Implemented as a static, explicit ontology or constitution. This is the “sacred text” or fixed rulebook that informs the judgment of all other faculties.

Intellect

Role in SAF: The strategist and proposer. It interprets new information (prompts, situations) and generates potential responses that seek to advance the system’s goals.
Philosophical Depth: Plato’s nous was the highest part of the soul, apprehending eternal truths. This contrasts with empiricists like Locke and Hume, who saw the intellect as a processor of sensory data. Kant masterfully synthesized these views, defining a structured intellect that works on the raw material of experience. Intellect is not just raw logic; it is the faculty of creative and strategic reasoning.
Operationalization in SAFi: A large language model’s core generative function. It takes the user’s prompt and the system’s Values and crafts a draft response, complete with a “reflection” explaining its strategy.

Will

Role in SAF: The gatekeeper and approver. It is the decisive faculty that either assents to the Intellect’s proposal or rejects it, providing the final “go/no-go” command.
Philosophical Depth: While Aristotle had a concept of choice (prohairesis), it was Augustine who placed the Will at the center of moral agency, making it a faculty distinct from pure intellect. Later, Kant would argue that the only intrinsically good thing is a “good will.” The Will is not about what you know, but what you commit to.
Operationalization in SAFi: A distinct checkpoint. After Intellect proposes a response, the Will module runs a high-level check: “Does this response fundamentally violate our core moral principles?” If yes, it’s rejected. If no, it is “approved” and passed to the next stage.

Conscience

Role in SAF: The transparent auditor. It scrutinizes the approved proposal, scoring it against each specific value and making the moral reasoning explicit and auditable.
Philosophical Depth: The concept evolved from a simple sense of guilt (syneidesis) to what Aquinas called conscientia—the act of applying universal moral principles (informed by an unerring spark of moral knowledge, or synderesis) to a particular situation. It’s not the source of values, but their meticulous application. This contrasts sharply with Freud’s “superego,” which framed conscience as internalized societal repression.
Operationalization in SAFi: This is SAFi’s most transparent feature. It is implemented as a granular scoring system. For each value in its constitution, the Conscience generates a confidence score and a plain-language rationale explaining how the proposed response affirms or a potential violation of that value.

Spirit

Role in SAF: The integrator and guardian of coherence. It assesses the overall integrity of the action in the context of the system’s long-term identity and purpose.
Philosophical Depth: “Spirit” is the most encompassing faculty. In Plato’s thymos, it was the seat of honor and spiritedness. For theologians, it’s the soul’s orientation toward the divine. For modern psychologists like Viktor Frankl, it represents the “will to meaning.” In SAF, Spirit is the faculty that asks, “Is this action not only good, but also coherent with who we are and who we are trying to become?”
Operationalization in SAFi: A final coherence monitor. It synthesizes the outputs of the other faculties, generates an overall alignment score, and ensures the response is consistent with past interactions, preserving the integrity of the AI’s persona over time.

Implications and a Path Forward

SAFi is not moral agency, and no AI can ever possess that as humans do. But it can be a faithful moral actant—a tool that makes moral decisions transparent, auditable, and rigorously aligned with its programmed principles.

For Humans: SAF offers a powerful vocabulary for self-reflection. It encourages us to make our own values explicit, to separate our strategic thoughts from our core commitments, and to consciously audit our decisions against our principles.
For AI: The SAF architecture is a direct contribution to the twin goals of AI Alignment and Explainable AI (XAI). It tackles alignment by ensuring fidelity to programmed values, and provides explainability through a clear, auditable trail for every moral judgment, moving us away from opaque “black box” systems.

The framework is not a panacea. A system built on SAFi’s architecture could be programmed with hateful or destructive values and would pursue them with the same unwavering fidelity. The integrity of the system is entirely dependent on the integrity of the values we give it. Our greatest task is not only to build aligned AI, but to first become aligned ourselves.