Skip to content

2. How LLMs Actually Work

1Why AI is different
2How LLMs work
3Where things go wrong
4Traditional security gaps
5Runtime security

After this module you will be able to

  • Explain how LLMs generate text through next-token prediction rather than reasoning
  • Identify why tokenisation, context windows, and sampling introduce security-relevant behaviour
  • Describe the role of training, fine-tuning, and alignment in shaping model outputs
  • Recognise what LLMs fundamentally lack: persistent memory, self-awareness, and the ability to verify their own outputs

Tokens, not words

LLMs do not process words. They process tokens, which are fragments of text that may or may not align with word boundaries. The word "unbelievable" might be split into "un", "believ", "able". The phrase "SELECT * FROM" might be a single token or four, depending on the tokeniser.

This matters for security because adversarial inputs can exploit tokenisation boundaries. A word that looks benign to a human reviewer might be split in a way that changes how the model interprets it. If your security controls operate at the word level but the model operates at the token level, you have a gap.

Next-token prediction

Every output from an LLM is generated one token at a time. At each step, the model produces a probability distribution over all possible next tokens, and one is selected. That is the entire mechanism. There is no internal "thinking" step, no comprehension, no intent. There is statistical pattern completion, learned from enormous quantities of text.

This is not a weakness to fix. It is the fundamental mechanism. The model produces text that looks coherent because coherent text was overwhelmingly represented in its training data. It produces text that looks like reasoning for the same reason.

LLMs are statistical text generators, not reasoning engines. Their outputs can look like reasoning without being reasoning. A model can produce a logically structured argument that is entirely fabricated, or a confident answer that is flatly wrong. This gap between apparent competence and actual mechanism is the root of most AI-specific security challenges. If your security posture assumes the model "understands" instructions, policies, or constraints, that assumption is misplaced.

Training, fine-tuning, and alignment

LLM behaviour is shaped in stages. Pre-training on large corpora gives the model its general capabilities and its general tendencies, including unhelpful ones. Fine-tuning narrows the model's behaviour for specific tasks or domains. Alignment techniques like RLHF (reinforcement learning from human feedback) attempt to make outputs helpful, harmless, and honest.

Each stage can introduce risks. Pre-training can embed biases from the source data. Fine-tuning can overfit to narrow patterns. Alignment can create a veneer of safety that masks underlying capabilities. None of these stages guarantee that the model will behave as intended in production, particularly when faced with inputs that differ from its training distribution.

Context windows and attention

An LLM can only work with the text inside its context window. This is the combined total of the system prompt, the user input, any retrieved documents, and the model's own output so far. Information outside the window does not exist to the model. It cannot ask for more. It will not tell you something is missing.

Scenario: Silent data loss in a policy review system

Your organisation builds an LLM-powered tool that reviews contracts against a 40-page compliance policy. The policy is injected into the context window alongside each contract. One day, a longer-than-usual contract is submitted. The combined length exceeds the model's context window, and the final 8 pages of the compliance policy are silently truncated. The model reviews the contract against an incomplete policy, finds no issues, and returns an approval. No error is raised. No warning is logged. The model simply proceeded with what it had.

This is not a bug. It is how context windows work. If your system does not monitor for truncation, you will not know it happened.

Temperature and sampling

The temperature parameter controls randomness in token selection. At low temperature, the model almost always picks the most probable next token. At high temperature, less probable tokens get selected more often.

The practical consequence: the same prompt can produce different outputs on different runs. This is by design, but it means you cannot treat LLM outputs as deterministic. If your system depends on consistent outputs for the same input, you need controls beyond the model itself.

What the model does not have

It is worth being explicit about what LLMs lack, because these gaps are easy to overlook when the outputs sound confident and articulate:

  • No persistent memory across calls, unless your system explicitly engineers it. Each request starts from zero.
  • No awareness of what it does not know. The model will generate an answer whether or not it has relevant information.
  • No ability to verify its own outputs against external reality. It cannot check a database, confirm a fact, or validate a calculation unless given tools to do so.
  • No concept of uncertainty as a reason to stop. The model will always produce output. It will never say "I should not answer this because I am not confident enough" unless it has been trained to mimic that behaviour.

Reflection

Think about the AI-powered systems in your organisation, or the ones you are evaluating. Which of them depend on the assumption that the model "understands" a policy, a constraint, or an instruction? What would change if you treated the model as a statistical text generator that has no understanding of any of those things?

Consider

Look for language in your system documentation that implies comprehension: "the model checks compliance," "the agent understands the policy," "the system interprets the request." Each of those phrases points to an assumption worth examining.


Next: Where Things Go Wrong →