What is SAF - Self-Alignment Framework

I’ve been writing about the philosophy of SAF and its technical implementation, SAFi. But I haven’t taken a step back to explain what SAF truly is, and how to place it in the landscape of ideas that exist today.

At its core, SAF is a philosophical framework based on classical philosophy. It is a continuation of a line of moral thinking that started with Plato, Aristotle, St. Augustine, and Thomas Aquinas. SAF synthesizes this deep tradition and, as I’ve written before, adds a new functional faculty to it, the Spirit.

From this angle, SAF is purely an abstract philosophical blueprint. This is how I first conceived it, as a universal architecture for human alignment.

But to prove the framework really works, I went ahead and built an AI application that governs the output of another AI model. I called this application SAFi, an acronym for Self Alignment Framework Interface. I also liked that it sounds like “Sophie,” a name associated with wisdom.

SAFi grabs the philosophical ideas of the framework and turns them into code. You can think of SAFi as an operating system for ethics. It’s a governance layer that sits on top of any AI model to ensure alignment.

When I first conceived the framework, AI was not in the picture. I was focused on the principles of human alignment. But after learning about the AI alignment problem, I became convinced that SAF was the perfect solution for it. It also presented a good opportunity to validate the framework in the real world.

How is SAF Different?

As far as I know, SAF is the first framework that tries to align AI models with human values using a real time, external governance loop.

This is a key difference from other well known approaches. The two major AI training methods I have learned about are Reinforcement Learning from Human Feedback (RLHF), famously used by OpenAI, and Constitutional AI, developed by Anthropic.

These are primarily training methods. They are used before the AI is deployed to shape the model’s internal weights, attempting to bake a set of values directly into the AI. The goal is to create an inherently “good” AI.

SAF takes a different approach. It assumes that any core AI model, no matter how well trained, might make a mistake. SAFi is not a training method. It is a runtime governance architecture.

It is an external system of checks and balances that monitors and guides the AI’s output in real time.

To use an analogy, RLHF and Constitutional AI are like providing an AI with a great education and upbringing, hoping to raise a good citizen.

SAF is like building the entire system of constitutional government. It assumes that even a good citizen might err, so it provides a legislative branch ( The Intellect) an executive branch (the Will), a judicial branch (the Conscience) and a way to measure the long term health of the nation (the Spirit).

It is my understanding that SAF is truly unique in its approach. It is both a timeless philosophical framework and a new, necessary architecture for the age of AI.