Author argues that why AI can't replace lots of fields like lawyers, or Poker but can replace chess or software engineers. They say that, AI can just product the output as it has read as text, but it can't react to a hostile environment, and modulate it's response that way. While at times, a law document might sound as if it counters the future questions of an adversary. It's more 'they've learned the language of strategy more than the dynamics of it'.
Domain experts say “AI won’t replace me” because they know that “producing coherent output” is table stakes.
The REAL job is produce output that achieves an objective in an environment where multiple agents are actively modeling and countering you.
Why do outsiders think AI can already do these jobs? They judge artifacts but not dynamics:
“This product spec is detailed.”
“This negotiation email sounds professional.”
“This mockup is clean.”
Experts evaluate any artifact by survival under pressure:
“Will this specific phrasing trigger the regulator?”
“Does this polite email accidentally concede leverage?”
“Will this mockup trigger the engineering veto path?”
“How will this specific stakeholder interpret the ambiguity?”
These are simulation-based questions. The outsider doesn’t know to ask them because they don’t have the mental model that makes them relevant.
[...]
There’s a deeper reason LLMs are at a permanent handicap here: the thing you’re trying to learn is not fully contained in the text8. They can catch up by sheer brute force, but are far more inefficient than humans, and the debt is coming due now.
When an investor publishes a thesis, consider what is not in it:
The position sizing that limits the exposure
The timing that avoided telegraphing intent
Strategic concealment
How the thesis itself is written to not move the market against them
What they’d actually do if proved wrong tomorrow
Text is the residue of action. The real competence is the counterfactual recursive loop: what would I do if they do this? what does my move cause them to do next? what does it reveal about me? That loop is the engine of adversarial expertise, and it’s weakly revealed by corpora.
This is why models can recite game theory but still write the “nice email” that leaks leverage. They’ve learned the language of strategy more than the dynamics of strategy.
This is what domain expertise really is. Not a larger knowledge base. Not faster reasoning. It’s a high-resolution simulation of an ecosystem of agents who are all simultaneously modeling each other. And that simulation lives in heads, not in documents. The text is just the move that got documented. The theory that generated it is called skill.
[...]
Not every domain follows poker dynamics. You have certain fields very close to chess, and LLMs are already poised to be successful in them.
Writing code is probably the most clear example:
System is deterministic
Rules are fixed and explicit
No hidden state that matters
Correctness is objective and verifiable
No agent is actively trying to counter the model
The same “closed world” structure shows up in others: Math / Formal proofs, data transformation, translation, factual research, compliance heavy clerical work (invoice matching, reconciliation), where you can iterate towards the right move without needing a “theory of the mind”.
The important caveat is that many domains are chess-like in their technical core but become poker-like in their operational context.
Professional software engineering extends well beyond the chess-like core. Understanding ambiguous requirements means modeling what the stakeholder actually wants versus what they said. Writing good APIs means anticipating how other developers will misuse them. Code review is social: you’re modeling reviewers’ preferences and concerns. Architectural decisions account for unknown future requirements and organizational politics. That is, the parts outsiders don’t see but senior engineers spend much of their time simulating.
The parts that look like the job are chess (like). The parts that are the job are poker.
Difficulty is orthogonal to “openness” of a domain. Proving theorems is hard. Negotiating salary is easy. But theorem-proving is chess-shaped and negotiation is poker-shaped.
This is why the disconnect between experts and outsiders is domain-specific. Ask a competitive programmer if AI can solve algorithm problems, and they’ll say yes because they’ve watched it happen. Ask a litigator if AI can handle depositions, and they’ll laugh because they live in a world where every word is a move against an adversary who’s modeling them back.
[....]
The fix is a different training loop. We need models trained on the question humans actually optimize: what happens after my move? Grade the model on outcomes (did you get the review, did you concede leverage, did you get exploited), not on whether the message sounded reasonable.\
That requires multi-agent environments where other self-interested agents react, probe, and adapt. Stop treating language generation as single-agent output objective and start treating it as action in a multi-agent game with hidden state, where exploitability is a failure mode.
Closing the Loop
The “AI can replace your job” debate often confuses artifact quality with strategic competence. Both sides are right about what they’re looking at. They’re looking at different things.
LLMs can produce outputs that look expert to outsiders because outsiders grade coherence, tone, and plausibility. Experts grade robustness in adversarial multi-agent environments with hidden state.
Years of operating in adversarial environments have trained them to automatically model counterparties, anticipate responses, and craft outputs robust to exploitation. They do it without thinking, because in their world, you can’t survive without it.
LLMs produce artifacts that look expert. They don’t yet produce moves that survive experts.
[....]
The Priya example nails it. the finance friend evaluated the email in isolation. the experienced coworker simulated how it would land in Priya's inbox, against her triage heuristics, under deadline pressure
This is the gap between LLMs writing code and LLMs building systems. code that compiles isn't code that survives contact with users, adversaries, edge cases.
[....]
Been running production systems solo for 20 years. the best operators aren't the ones who know the most commands — they're the ones who can simulate what will break next. "if I do X, the cache invalidates, which triggers Y, which overloads Z." that's a world model