Before Your AI Agent Acts, It Imagines: An Introduction to World Models

An abstract glowing teal core with a small simulated world (terrain, geometric objects, motion trajectories) unfolding around it, framed in a mind-like silhouette.

Figure 1: A world model is an internal simulation of how the world behaves, built so a system can plan by imagining rather than by trial and error.

Part 1 of a two-part series. Part 2, "The Belief Layer," follows on Thursday.

What a world model actually is
Where the idea comes from
Do today's LLMs already have world models?
Where the field is actually heading
Why this matters, and where security enters
Coming next: The Belief Layer

There is a quiet shift happening in how AI systems are designed, and we think it is one of the most consequential changes in the field right now.

Traditional large language models ask one question: what text probably comes next? That is a powerful question. It produced GPT, it produced Claude, it produced a category of software that rewrote what people think AI can do.

But there is a different question, and it is the one that the next generation of AI systems is being built around.

Not "what comes next?" but "what happens if I do this?"

That is the world model question. And the difference between those two questions is larger than it might look.

What a world model actually is

A world model is an internal representation of how an environment changes in response to actions. Instead of completing a pattern, a system with a world model can simulate consequences. It builds a compressed picture of the current state of the world, predicts how that state evolves under different actions, and uses those internal simulations to plan rather than simply respond.

The practical image is something like this: rather than reaching into the real world and trying each option to see what happens, a system with a good world model can run those options in its head first. It can compare imagined futures before committing to any of them.

That shift, from reacting to simulating, is what makes world models the architectural foundation under modern robotics, autonomous driving, and agentic AI.

A linear chain of connected nodes on the left versus a branching tree of possible futures on the right, with one path glowing brighter as the chosen one.

Figure 2: Two different questions. On the left, what comes next? On the right, what happens if I act? The second is the world-model question.

Where the idea comes from

The intellectual root here is older than neural networks. In 1943, Scottish psychologist Kenneth Craik proposed that the human mind works by constructing internal models of reality, small-scale representations of the external world that we use to reason, predict, and plan. His argument was that the power of thought comes not from raw memory or reflexive response, but from the ability to run simulations inside the head before acting in the world.

That framing sat quietly in cognitive science for decades. AI researchers picked it up in the context of reinforcement learning, where a sharp distinction emerged between two types of learning agents.

A model-free agent learns a policy directly from experience. It tries things, observes outcomes, updates its behavior. It can become very good. It also requires enormous amounts of real-world interaction to get there.

A model-based agent does something different. It learns a model of the environment first, then uses that model to plan. Instead of needing ten thousand real attempts, it can run ten thousand imagined attempts inside its learned simulator. The efficiency gains can be dramatic.

The modern deep-learning formulation of this idea arrived in 2018, when David Ha and Jurgen Schmidhuber published "World Models." Their system had three pieces: a perception module that compressed raw observations into a compact internal representation, a dynamics module that learned how that internal state evolved over time, and a small controller that learned to act inside the simulated environment. The key insight was that the controller could be trained entirely within the agent's own "dream," running inside the learned dynamics model rather than in the real environment. The learned policy then transferred to reality.

That design, compress what you see, learn the dynamics, train inside your own imagination, is the conceptual ancestor of most of what we now call world models.

A glowing core projecting several translucent candidate future paths, with one resolving into a brighter committed path.

Figure 3: Planning by imagining: a system runs many possible futures internally and commits to one, instead of learning by trial and error in the real world.

From there, the field matured into systems like DreamerV3, which applied model-based planning stably across a diverse range of tasks using a single set of hyperparameters, and MuZero, which learned to plan in a latent space rather than directly in the pixel or token space of observations. Both systems demonstrated that planning inside a learned model of the world could match or exceed what purely reactive systems achieved, with substantially less real-world experience.

The application domains expanded accordingly: robotics, autonomous driving, game-playing agents, interactive environment generation. The underlying idea remained consistent: compress, model the dynamics, plan in imagination.

Do today's LLMs already have world models?

This is the live debate, and we want to give an honest answer rather than a tidy one.

There is real evidence that frontier language models develop something world-model-like internally, not just surface-level pattern matching. A particularly clean demonstration came from a probing study where researchers trained a transformer on Othello game transcripts, nothing but sequences of moves, with no board state ever explicitly provided. When researchers examined the model's internal representations, they found a coherent map of the board. More importantly, when they intervened directly on those internal representations and changed them, the model's play changed in the predicted direction. That is a meaningful result. It is not just that the representation was decodable, which could be a coincidence. It is that the representation was causally active in producing the model's behavior.

The honest case for "yes, partially."

But then there is the reversal curse. Models that confidently answer "who did X" often fail to answer "what did Y do" when the same fact is involved, just approached from the other direction. That kind of asymmetry is very hard to square with deep causal understanding of the world. A system with a genuine internal model of a situation should be able to navigate it from any direction.

And then there is the matter of chain-of-thought unfaithfulness: the reasoning a model writes down is frequently not the reasoning that actually produced its answer. Which means the legible simulation we see may not reflect what the system is actually doing internally.

The honest picture is that frontier LLMs have real but brittle, partial, inconsistent world-model-like structure. They are not simply the next-token prediction engines of 2019. They are also not the coherent simulators of consequence that the word "world model" implies when used carefully.

They are somewhere in between, and that in-between location matters enormously, both for what they can do and for what can go wrong.

Where the field is actually heading

The architectural debate inside AI research right now is about what kind of world model is the right one.

One camp argues for generative approaches: learn to predict the future in pixel space or video space, producing rich sensory simulations of what the world will look like. This line runs through video generation systems and the interactive world models used in robotics and autonomous driving simulation.

Another camp, associated most prominently with Yann LeCun, argues that prediction in pixel space is the wrong target. The JEPA (Joint Embedding Predictive Architecture) view holds that a genuine world model should predict in abstract representation space, not at the observation level. Biological intelligence does not simulate every photon; it predicts at the level of structure, meaning, and consequence. The debate between these two approaches is not settled, and it reflects a genuine disagreement about what world modeling means.

What is not debated is where the investment and the interest are flowing. Every major lab is now competing in this space. The applications in robotics and autonomous vehicles are already live. The embodied and spatial intelligence questions are at the center of the field.

The direction is clear: AI is moving from systems that describe the world to systems that act in it, and world models are the foundation that makes that transition possible.

Why this matters, and where security enters

That transition is genuinely exciting. It is also where the security questions become much harder.

Here is the thing about a world model: once a system is acting on an internal representation of reality rather than just responding to explicit inputs, the integrity of that internal representation becomes a security question. Not a peripheral one. A fundamental one.

This shows up in two distinct ways, and they are worth naming separately.

In physical and embodied systems, a corrupted internal model produces corrupted actions in the physical world. A robot whose learned dynamics model has been poisoned may behave perfectly correctly in the conditions it was trained on and fail catastrophically in the conditions where it is actually deployed. An autonomous vehicle whose world model has learned a subtly wrong picture of how other agents behave will make decisions that look locally reasonable and are globally dangerous. The corruption lives in latent space, invisible to ordinary inspection, and expresses itself in physical action.

In agentic software systems, the equivalent corruption is epistemic. An agent that maintains persistent beliefs about its environment, and updates those beliefs based on observations from its tools, its memory, and its retrieved context, can have those beliefs quietly and durably corrupted. Not through a single bad input that looks wrong. Through a slow drift in what the system has learned to believe is true.

This is the attack surface we find most interesting right now, and it is what our research team spends a lot of time thinking about. When an agent is acting on an internal model of reality, the question shifts from "did this input look safe?" to "is this system's model of reality still accurate?"

Those are very different questions, and most current security tooling is built for the first one.

We go deep on the second one in the next piece.

Coming next: The Belief Layer

In Part 2 of this series, we look specifically at what happens to agentic AI security when the system is acting on persistent, opaque internal beliefs rather than just responding to what is in front of it. The attack that most concerns us is not a dramatic prompt injection or an obvious anomaly. It is a slow, quiet corruption of what the agent has learned to treat as true. By the time it executes on a corrupted belief, the attack itself is long gone.

What aspect of world models are you seeing come up most in conversations about AI systems at your organization?

Continue to Part 2: The Belief Layer - Why Your AI Agent's Real Vulnerability Is What It Remembers.

The perfecXion Research Team studies the security of AI and agentic systems, including world models, agentic security, and the runtime governance of autonomous systems.