Non-Determinism is Not the Problem
The case against AI tools mistakes process variability for unreliability
I keep hearing this argument: AI agents are non-deterministic, LLMs can produce different results when given the same problem, therefore the code they produce cannot be trusted.
This is a non-sequitur.
Let’s start with an observation that should be obvious but apparently isn’t. Humans are non-deterministic too. Give the same problem to two engineers and you get two different solutions. Give it to the same engineer six months later and you still get a different one. That has never stopped us from shipping software.
The implicit comparison being made is between LLMs and compilers. Compilers are deterministic: same input, same output, every time. LLMs are not. Therefore LLMs are unreliable. But this comparison misses the point entirely. We don’t compare developers to compilers. We don’t expect humans to produce identical output given identical input. We expect them to produce correct output, verified through review and testing.
The same standard applies to AI tools.
Now, I’m not claiming LLMs fail the same way humans do. They don’t. LLMs hallucinate with confidence. They don’t self-correct the way a human rereads their own code and spots the mistake. They can produce plausible-looking nonsense that passes a casual glance. These are real failure modes, and they’re different from the failure modes we’re used to managing.
But this is precisely why we verify. This is why we test. This is why we review outputs rather than shipping them blindly. The failure modes are different, so our verification strategies need to account for them. That’s an engineering problem with engineering solutions.
What actually matters is behaviour, not how we got there. We instruct agents, review the output, and rely on tests as executable statements of expected behaviour. As long as the external behaviour is preserved, the internal path taken to reach it is largely irrelevant. This is exactly why test suites exist and why refactoring is even possible.
But there’s a deeper confusion at play here, one about agency and control.
When people say “AI is non-deterministic”, they often mean something more like “AI does unpredictable things on its own”. As if we were talking about a colleague who happens to be made of silicon. We’re not. We’re talking about software that responds to inputs.
I’ll grant that this can feel like an understatement when you’re watching an agent loop operate over time, calling tools, maintaining context, producing outputs that seem to emerge rather than follow directly from a prompt. There is real complexity here, and reasoning about agent behaviour does add cognitive load. I’m not dismissing that.
But complexity is not the same as autonomy. These systems don’t have intentions, preferences, or agency. They have behaviours that emerge from inputs, context, and training. The complexity makes them harder to reason about, not impossible to control.
And here’s what the non-determinism critics often miss: we have extensive control over what these tools do. Modern AI coding tools offer orchestration capabilities, subagents for specific tasks, and guardrails that constrain behaviour. You can define workflows, set boundaries, require confirmations. The tool doesn’t run wild. You direct it.
This is no different from how we’ve always worked with complex systems. We don’t trust developers to never make mistakes. We build processes around them: social programming, code review, automated testing, continuous integration, staged deployments. The system as a whole produces reliable outcomes even though individual contributors are thoroughly non-deterministic.
The question isn’t whether an LLM will give you the same answer twice. The question is whether you have the discipline to verify outputs, the tests to catch regressions, and the workflows to maintain control.
Here’s where I need to be direct about something. This argument assumes a certain level of engineering maturity. Tests, reviews, guardrails, orchestration, discipline. Critics might respond that many teams don’t have this maturity, and that AI makes their problems worse.
They’re not wrong.
AI is an amplifier. It amplifies whatever practices you already have. If you have strong testing habits, AI helps you move faster with confidence. If you ship code without verification, AI helps you ship bad code faster. This isn’t a reason to avoid AI tools. It’s a recognition that using them well requires the same foundations that any reliable software development requires. The bar hasn’t changed. The speed has.
This matters because language shapes thinking. If we talk about AI as if it has agency (yes, “agents” might be misleading), we start believing we can delegate not just tasks, but responsibility. We start thinking accountability can be outsourced to a machine.
It can’t. And this is where the human analogy reaches its limit. Humans aren’t just another non-deterministic process in the pipeline. We have intent, situational awareness, and moral judgement. AI tools have none of these. That’s precisely why responsibility remains with us. We’re not accountable because we’re in the loop. We’re accountable because we’re the only ones who can be.
We chose to use the tool. We chose to accept the output. We chose to ship it, publish it, send it. The outcome is ours, good or bad.
So when I hear the non-determinism argument, it sounds less like a real concern and more like a category error. We’re comparing the variability of a generative process to the reliability of a deterministic one, and concluding that the former is inherently unsuitable for serious work.
But we’ve been doing serious work with non-deterministic processes forever. They’re called humans.
AI doesn’t do anything. We do things with AI. That’s not pedantry. That’s the difference between using a tool responsibly and hiding behind one.


The analogy to human developers nails it. We've always managed non-determinism through process, not by demanding perfect consistency from individuals. The part about AI being an amplifier cuts both ways though, and thats a fair warning. Teams without good verification habits will absolutely accelerate their way into worse outcomes. I've seen a few projects where people treat LLM outputs as gospel and skip testing becuase "the AI wrote it" which is the exact opposite of what should happen.
Strong argument, and I agree the non-determinism critique is largely a category error. But I'd push further than "verify outputs" as the answer.
You're right that we don't expect humans to be deterministic. We expect them to be contextually calibrated... A senior engineer produces better code because they carry richer context: domain knowledge, codebase familiarity, architectural constraints, team conventions.
The same applies to AI tools. The variability we observe is a symptom of underspecified context. Give an LLM a vague prompt, get variable outputs. Give it structured context that constrains the solution space (architectural decisions, coding standards, domain models, explicit boundaries) and outputs converge toward intended behavior.
I've been calling this "documentation as context as code." or "the .context method"; Treating structured markdown context as the primary control mechanism, collapsing the probability distribution at the input layer.
The verification you describe remains essential. There's also an upstream intervention most teams are missing entirely.
I've been documenting the approach at buildingbetter.tech if anyone wants to dig deeper.
I actually cover a lot of it here with a repo on how to implement this methodology https://buildingbetter.tech/p/the-context-method