The thought, architecture, and practice behind the Self-Alignment Framework — the latest below, or browse by topic.
The Five Faculties: A Tour of SAFi’s Cognitive Architecture
Read the article →: The Five Faculties: A Tour of SAFi’s Cognitive ArchitectureMost attempts at AI governance treat alignment as a prompt-level concern. You write a system message, hope the model follows it, and accept that any sufficiently creative attacker can talk the model into ignoring it. The Self-Alignment Framework Interface (SAFi)…
More articles

SAFi Math Specification
Read the article →: SAFi Math SpecificationThese are the fundamental mathematical objects that form the foundation of SAFi: Interaction Index: represents the discrete interaction index (the turn number in a…

SAF and moral relativism
Read the article →: SAF and moral relativismYesterday I wrote an article titled “Understanding SAF Universality”, and I think the next logical question that might follow is this: If everyone can…

Understanding SAF Universality
Read the article →: Understanding SAF UniversalityOne of the biggest claims that SAF makes is that it is a universal framework. I understand the weight of this claim, so in…

SAF and The Search for Truth
Read the article →: SAF and The Search for TruthAs the creator of this framework, I’ve had to wrestle with a few of the big questions in moral philosophy. One of them is:…

SAF Family Tree
Read the article →: SAF Family TreeSo far I have focused on SAFi, the technical implementation of SAF in this blog, so today I want to talk about the traditional…

Building a Mission-Aligned Persona with SAFi
Read the article →: Building a Mission-Aligned Persona with SAFiThe most important step when implementing SAFi is defining your Values. Values are what will dictate if the system works or not, the old…

SAFi with RAG: Providing Current Knowledge to the LLMs
Read the article →: SAFi with RAG: Providing Current Knowledge to the LLMsOne of the key challenges in working with open-source Large Language Models (LLMs) is their knowledge cutoff date. Their training data is static, meaning…

The King Solomon Problem: Drift
Read the article →: The King Solomon Problem: DriftAt the heart of building trustworthy AI is a challenge as old as human nature itself: how does a system with a strong, defined…

Personas in SAFi
Read the article →: Personas in SAFiHow can we trust an AI to act on our values when those values are complex and nuanced? An AI can be programmed to…











