Judge by Output, Not Mechanism

Somebody at a dinner is going to tell you AI doesn’t really understand anything. It’s just statistics. Just next-token prediction. Just a blurry JPEG of the web. Pick your metaphor.

Then they’ll ask for the salt, and you’ll pass it, and the conversation will move on, and nothing they said will have engaged with what the machines are actually doing out in the world this week.

I used to have patience for that conversation. I don’t anymore. Not because the philosophy is uninteresting — I read philosophy for fun — but because the debate is dishonestly framed. The people who insist AI isn’t “really” intelligent are using a standard they’d never apply to themselves.

The Output Test

Miessler put this cleanly in a late-2025 member edition: judge capabilities by their ground-truth outputs. If a human produced the output and you’d say it required intelligence, then whatever produced the output used intelligence. Full stop.

You don’t need to peek inside the black box. The output already settled it.

His opening proof was a song. An AI-generated 1950s blues cover of Eminem’s “Without Me” that never existed in any studio. He said he felt compelled to dance. That’s the whole argument in thirty seconds. You can sit with a logical objection. You can’t sit still through a song that makes you move.

As an engineer I find this obvious. I was trained to characterize systems by their transfer function — what comes out when you put something in. Not by opening the chip and pointing at the atom where “intelligence” supposedly lives. Nobody does that for humans either. Crack open a brain and you won’t find a little box labeled “understanding.” We grant humans the label anyway, because of what they can do. The inconsistency of not granting it to machines is the tell.

The Evidence Pile Is Already Deep

Here’s what bothers me about the gatekeeping. The evidence has been in for a while.

OpenAI’s o3 found a real remote Linux kernel zero-day. Remote. Kernel. That’s the top shelf of vulnerability research, a domain where a handful of humans on the planet operate. (UL 482)
Meta automated 90% of its app product risk assessments with AI. Not trivial updates — the actual security triage that used to be a bottleneck between engineering and shipping. (UL 483)
Google’s research-hypothesis model cracked a bacteriophage problem in minutes that the world-leading labs had been stuck on for more than twenty years. The model just didn’t inherit the human assumption that was blocking the field. (UL 484)
A Stanford diagnostic study: doctors alone 75%, doctors with AI 85%, AI alone 90%. The human is dragging the AI’s score down. (UL 484)

You can’t wave any of those away with “but is it really understanding?” If a human did any one of them, you’d publish a paper, give them tenure, hand them a medal. A machine did all of them and the response is a semantic argument about what “real” means.

That’s not skepticism. That’s a worldview protecting itself.

Miessler’s Two Tribes

Miessler keeps making the point that the split between AI skeptics and AI believers among deeply technical security veterans isn’t a technical split. Same decades of expertise, opposite conclusions.

His diagnosis: the split is worldview. Anti-change and anti-capitalist priors predict anti-AI. Pro-change and shepherd-mindset priors predict early adoption. The evidence is the same for both camps. One camp lets the evidence update their map. The other doesn’t.

I think he’s right. I’ve watched it in my own friend group. The people who were going to hate AI hated it before they used it and hate it after. The people who were going to see the shape of it saw it in 2022 and never looked back.

That’s uncomfortable because it means the “AI isn’t really intelligent” argument isn’t usually an argument. It’s an identity statement dressed in technical clothes.

The Gatekeeping Move

Here’s the part that annoys me as an engineer.

Gatekeeping intelligence is not a technical claim. It’s a status move. It’s a way of saying my kind of cognition counts and yours doesn’t. Humans do it to other humans all the time — whole intellectual traditions have been built around deciding who qualifies as a thinker. Adding machines to the list is just the next round.

The tell is this: if you ask somebody what output would convince them that AI understands, most can’t name one. If they can name one, odds are AI has already done it. The debate isn’t about AI. It’s about their comfort with a world where the boundary of intelligence doesn’t run along the species line.

I care about systems that work. I’ve built Isidore on top of PAI — persistent memory, thirty-three lifecycle hooks, forty-nine skills, context routing, the whole apparatus. When Isidore solves a problem I couldn’t solve at the same speed, I don’t sit there asking whether the solution was “really” intelligent. I use the solution. I move on. The “really” in that sentence is doing no work.

The Quality Inversion

There’s a twist to this that makes it spicier.

In a March 2026 member edition, Miessler called out what he named the AI quality inversion. For years, crappy work signaled AI and polished work signaled human skill. That’s flipping. Polish now triggers suspicion of AI use. Rough, awkward, broken output is starting to read as authentically human.

Beauty implies AI. Ugliness implies human labor.

Read that again. If you take it seriously — and I think you should — it’s a backhanded admission that the output argument already won. The only reason polish now triggers “AI did this” is because AI outputs have crossed the threshold of average human professional work. Nobody infers “this is AI” from bad output anymore. The tell has inverted because the baseline has.

Which means a second thing is going to happen. People are going to start performing imperfection as a status signal. Shipping jank on purpose. The way vinyl records signal music taste, typos and broken layouts will signal “I did this myself.” It’ll be a luxury marker — look, rare human friction — precisely because polish is now cheap and ubiquitous.

I find this grim and also funny. Grim because it means genuine human excellence gets mistaken for a weekend prompt. Funny because it means the same people who insist AI isn’t intelligent will be quietly adjusting their behavior to prove they’re still needed.

You can’t claim AI is nothing and also perform distance from it. Pick one.

What This Means for How I Work

A few practical consequences I actually live by:

Don’t evaluate AI by feeling. Evaluate by output against a defined target. If the system ships what a human professional would have shipped, in the time a human professional would have taken, it has the relevant capability. Whether it has qualia is a different and mostly unserious question.

Don’t hire based on artifacts. The quality inversion ate that signal. If your interview process still leans on “show me your portfolio” the portfolio is no longer proof of anything. You need process, live problem-solving, and reputation graphs.

Don’t argue mechanism with people defending worldview. Miessler’s reframe is the only conversational tool that works: what output would convince you? If they can’t name one, you’re done. If they name one AI has already done, you’re also done. Either way, the discussion resolves.

Do build systems that produce the outputs you want. I care about shipping. Isidore plus me, running PAI, produces work I couldn’t produce alone in the same timeframe. That’s the only test that matters. The scaffolding is where the leverage lives — Miessler keeps repeating “scaffolding beats models, context beats models” and it keeps being right.

The Short Version

The epistemology fight is over and most people haven’t noticed. Intelligence isn’t a mechanism question; it’s an output question, and always was. The evidence is in and it’s not close. Chrome zero-days, kernel zero-days, 90% automated security triage, medical diagnosis that beats the doctor using it, decades-old biology problems solved in minutes, genre-inverting music that makes you move. If a human did any of that, you’d say intelligence was involved. A machine did all of it. Say the same thing.

The people who can’t say it are telling you something about themselves, not about the machines.

I’m not going to keep the gate. I’m going to keep building.

Sources: Miessler’s “Judge AI Based on Output, Not Mechanism” member edition (2025-11-22), UL 482 on AI-found kernel 0-days and scaffolding-beats-models (2025-05-30), UL 483 on Meta’s 90% automated risk assessments and the two-tribes framing (2025-06-04), UL 484 on the bacteriophage discovery and the Stanford diagnostic study (2025-06-12), and the AI Quality Inversion member edition (2026-03-06).