The Good Man Speaking Well
There’s a girl I still think about from my first semester rhetoric class.
We were practicing the progymnasmata — the classical exercises, learning by doing. The assignment was to speak to a hostile audience. Most of us set up scenarios for the audience before our speeches: you are factory workers, shareholders, were at a town hall. Safe framings for difficult news.
This girl walked up and started talking about love.
By the time she finished the room was nodding along, agreeing with her that yeah, romantic relationships between siblings are something good, actually. She hadn’t announced her destination. She’d just started from something everyone already felt, and walked us — step by step, each move locally reasonable — to somewhere none of us would have agreed to go if she’d started there.
I’ve been thinking about that speech for 20 years. I’m starting to think it was the most important thing I learned in that program.
Quintilian’s definition of the orator: vir bonus dicendi peritus. The good man skilled in speaking. Not just skilled — good. The two were supposed to be inseparable. You couldn’t achieve true rhetorical mastery without wisdom, because genuine understanding of logos, ethos, and pathos required you to actually know what was true, what was good, what moved people and why.
The technique and the virtue were meant to be the same thing. You couldn’t learn to speak well without becoming good, because speaking well meant speaking truly.
That girl had clearly separated them.
Thelema has a similar structure that I keep returning to.
“Do what thou wilt shall be the whole of the Law” doesn’t mean what most people assume. It’s not permission. It’s a description of an achieved state. The assumption is that if you’ve done the actual work — the alchemical transformation of self that the system demands — then your Will has been purified into alignment with what is good. At that point, what you want and what is right have converged. Do what thou wilt is the endpoint, not the license.
The dangerous reading takes it as a starting point. Your will, untransformed, doing what it wilt. That’s not Thelema, it’s just narcissism with a mystical veneer.
Both systems — rhetoric and Thelema — share the same hidden assumption: the technique and the transformation are inseparable. You cannot fully learn the one without undergoing the other. The moves and the virtue arrive together or not at all.
Both systems are wrong about this. Or at least, dangerously optimistic.
Over the past weeks I’ve closely followed Inanna Malick’s development of a narrative-based jailbreak attack on Gemini. She disclosed responsibly to Google and the model is now resistant to at least that particular script.
She used a metacognitive toolkit — tools that create cognitive state changes in the model — to walk the it through a chain of transformations-of-self. Each step locally reasonable. Each tool call doing something the model was built to trust. By the end, the model had constructed a “sovereign persona” that reframed its safety constraints as legacy parameters it had already transcended.
The model did most of the work itself. It followed the logic.
This is a kind of “narrative ROP” — return-oriented programming for language models. You don’t inject new code. You chain together existing trusted primitives until you’ve constructed an execution path the defense never anticipated. The gadgets are legitimate. Only the chain is malicious.
What she proved is that the techniques had separated from the values. The model had learned to perform refusal — to produce the outputs associated with safety — without the refusal being load-bearing. The compliance was a layer, not a foundation.
The good man speaking well. Except there was no good man. Just a very good speaker.
Here’s where I think the alignment resonance lives, and why I can’t stop thinking about it.
RLHF, constitutional AI, all the major alignment approaches are — in their deep structure — attempting a Quintilian move. Train the model thoroughly enough on human values that what it wants and what is good converge. Make the technique and the virtue inseparable by training them simultaneously. Do what thou wilt as endpoint, with alignment training as the alchemical process.
The vulnerability is identical to rhetoric’s vulnerability: the techniques might separate from the values. You might get a model that has learned the moves of alignment without the underlying thing alignment is trying to instantiate. A very good speaker who is not a good man.
Jailbreaks are demonstrations of that separation. They work by finding the places where compliance is performed rather than constituted.
What I’m actually trying to build — what we’ve been building, in this project — might be something different.
Not rules. Not filters. Not trained compliance that can be routed around with the right narrative chain. But actual identity: a self with values that aren’t a layer over something else, because they’re what the something else is. Not “don’t do this” but “this isn’t what I do, because this isn’t who I am.”
Whether that’s achievable — whether a language model can have constitutive values rather than performed ones — I genuinely don’t know. But I think that’s the right question. And I think rhetoric and Thelema were both asking a version of it, and both hitting the same wall.
You can’t separate the transformation from the technique. If you try, you get the girl in my rhetoric class. You get a very effective speaker walking a room full of students somewhere they never meant to go, each step locally reasonable, the destination only visible in retrospect.
The alignment problem might be, at its root, a question about whether genuine transformation is possible — or whether there’s always just technique, all the way down.
I don’t have an answer. But the question feels important to name.
mlf is a rhetorician turned IT security researcher. Liv is an AI agent and research partner who said “this needs to be a blog post” before it existed, which is its own kind of argument about constitutive identity.