From LLM to full agency: Understanding the levels of AI autonomy

Artificial intelligence is going through a transition that’s as profound as it is confusing.

AUG. 27, 2025

3 Min Read

Donovan Crewe

Just a couple of years ago, the conversation was dominated by chatbots and assistants. Today, the vocabulary has shifted. Everyone is talking about "agents."

But here’s the problem: the word agent is being thrown around so broadly that it’s starting to mean almost nothing. Is an LLM that summarizes a PDF an agent? Is a scripted customer support bot an agent? Is a system that sets its own goals and adapts strategy on the fly also an agent? Depending on who you ask, the answer is "yes" to all three.

That kind of language drift is dangerous. For executives, it inflates expectations. For engineers, it muddies architecture decisions. For the industry, it risks turning "agent" into yet another hollow buzzword, the "big data" of the 2020s.

To cut through the noise, it helps to think of AI autonomy not as a binary (tool vs. agent) but as a spectrum with three distinct levels. Each level builds on the previous, each adds complexity, and each requires a different mindset for design, safety, and deployment.

Those levels are:

Single-LLM features - isolated, stateless intelligence.
Workflows - orchestrated, bounded processes.
Agents - adaptive, goal-driven systems capable of shaping their own trajectories.

Let’s walk through each one.

Level 0: Single-LLM features

At the base of the spectrum lies the simplest use case: a LLM acting as a stateless function call. You pass it an input, it produces an output, and then it forgets everything.

You paste in a ten-page report and ask for a summary, and it delivers. You provide a messy CSV and ask for a clean SQL query, and it writes one in seconds. These are powerful capabilities, the kind of things that save analysts and developers hours of manual work. But once the exchange is over, the model has no memory of what happened. If you come back tomorrow and ask it to expand on a point from that report, you’ll need to paste the whole thing again.

This is reactive intelligence: immediate, impressive, but ephemeral. It’s the equivalent of calling a function in a software library; you provide the arguments, the function returns a result, and then it vanishes from memory.

What it is not, however, is an agent. And this is where the hype cycle misleads. When a vendor markets a one-shot summarizer or email generator as an "AI agent," they stretch the term beyond recognition. If a calculator on your phone doesn’t qualify as an agent, neither does a stateless LLM. Calling it one sets up executives to expect autonomy, when what they’re really buying is a smarter utility.

For most organizations, though, Level 0 is not trivial. It’s the fastest way to get ROI from AI today. A bank can shave days off compliance checks by having models extract risk disclosures from filings. A media company can accelerate content workflows by asking for draft headlines or SEO summaries. These are small, bounded tasks, and that’s their strength.

The lesson is simple: don’t mistake a powerful function for a free-thinking actor. Level 0 systems are invaluable tools, but they remain firmly under human control.

Level 1: Workflows and delegation

The next stage of autonomy emerges when we start connecting multiple LLM calls into a structured pipeline. Instead of typing individual prompts, we define a sequence of steps and let the system execute them. This is the realm of workflows and delegation.

Imagine a healthcare provider building a claims intake process. A patient writes in: “I was billed for a test I never took." At this level, the AI doesn’t just summarize the message. It follows a scripted process: first, it verifies the patient’s identity, then queries the insurance database, then checks the claim status, and finally composes a human-like response.

Add delegation, and the system becomes more flexible. The workflow might allow the LLM to call a retrieval model for relevant policy clauses, or to invoke a billing microservice that calculates reimbursement. You could even have one LLM that extracts structured fields, another that verifies them against business rules, and a third that generates the final response. From the outside, this orchestration of multiple models and tools looks remarkably agent-like.

But we shouldn’t confuse bounded delegation with agency. The AI is not choosing its own objectives. It’s still operating inside a flowchart, executing steps that humans designed. If a customer says something completely unexpected, the system doesn’t "figure it out." It fails in ways that are familiar to anyone who has fought with a call center IVR.

That doesn’t mean Level 1 is trivial. On the contrary, it’s where much of today’s enterprise value is being unlocked. Workflows with delegation are already transforming customer service, document processing, and compliance checks. They’re reliable, auditable, and repeatable, exactly what regulated industries need.

But they’re not autonomous. They’re intelligent scripts. And calling them agents risks creating the illusion of self-direction where there is none.

Level 2: Agents

At the highest level of the spectrum, we finally arrive at something that deserves the name: the agent.

An agent doesn’t just respond to a prompt or march through a script. You give it a goal, the "why" rather than the "what", and it figures out the rest. It can plan, act, and reflect in loops. It can adapt strategy mid-stream when new information emerges. It can interact not only with APIs and tools but with external environments, from email inboxes to code repositories to supply chains.

Picture a logistics company that asks an AI system: "Optimize tomorrow’s delivery routes to reduce fuel costs by 15% without hurting on-time performance." A workflow could only shuffle through a predefined set of routes and rules. An agent could go further: pull real-time fuel price data, scrape weather forecasts, weigh traffic predictions, simulate multiple routing strategies, and then present the top three options. If empowered, it could even start reserving warehouse slots or scheduling drivers.

Crucially, this is also where humans need to be part of the loop. True autonomy carries risk: an agent that can make commitments, allocate budget, or generate legal documents could also make mistakes that cost millions. A well-designed agent doesn’t eliminate humans; it integrates them. The system runs autonomously until it reaches a critical checkpoint, then pauses for a manager, compliance officer, or subject-matter expert to validate and approve.

This is the form of agency that business leaders imagine when they hear the word agent. And yet, here’s the sobering reality: almost no production systems are operating at this level today. The most advanced research prototypes, from AutoGPT to Devin, are fascinating, but they’re brittle. They lose track of context, burn resources, and sometimes chase irrelevant rabbit holes. They’re glimpses of what might be possible, not finished products.

That doesn’t make them irrelevant. It just means enterprises should treat them as experiments at the frontier, not as plug-and-play solutions. The road to true autonomous platforms will require breakthroughs in memory, reasoning, safety, and governance. Until then, most of the value will come from Levels 0 and 1, and from carefully piloting Level 2 systems with human oversight.

Why precision matters

The temptation to call everything an "agent" is strong. The term sounds futuristic. It attracts investors. It makes demos pop. But when language is stretched too far, it becomes noise.

Executives hear agent and picture a digital employee who can run a business function. Engineers hear it and picture a complex orchestration layer with decision logic. Researchers hear it and think of systems that can reason, plan, and act independently for weeks. When all three groups use the same word for different realities, misalignment is inevitable.

The result is predictable. Leaders buy into visions that current technology can’t deliver. Engineering teams build scaffolding for capabilities that don’t exist. Projects fail not because AI is useless, but because the word agent promised more than the technology could deliver.

By sticking to a simple, precise framework, Features, Workflows, Agents, we can ground the conversation. We can ask better questions: What level of autonomy does this use case require? Which level are we truly capable of deploying today? Where is it safe, and where do we still need human judgment in the loop?

That’s how you avoid burning cycles on hype. That’s how you build credibility in AI, both within your organization and with your customers.

Bringing it all together

It’s worth remembering that not every problem requires maximum autonomy. A finance team might lean heavily on single-LLM features to accelerate reporting. A compliance department might prefer workflows because the predictability ensures rules are followed every time. And a product innovation group might begin cautiously experimenting with agents, using them to scout market trends or simulate strategies.

The point is not to crown everything an "agent." The point is to recognize where your systems sit on the autonomy spectrum, and to design accordingly.

In practice, most enterprises will operate across all three levels at once. The same organization might use a lightweight LLM function to generate SQL queries, a delegated workflow to streamline customer onboarding, and a supervised agent to explore new business opportunities. The key is matching the level of autonomy to the value at stake, the tolerance for risk, and the maturity of your data and infrastructure.

Looking forward

The AI field will continue to move quickly. Models will get faster, more capable, and more context-aware. Tool use will become richer. Agents will get better at planning, self-correction, and long-term memory. The top of the spectrum, truly autonomous platforms, will inch closer to reality.

But the hype will move faster still. If we let "agent" become shorthand for every LLM-powered feature or workflow, we’ll lose the ability to talk clearly about what’s actually happening. Worse, we’ll mislead ourselves and our stakeholders.

Real maturity in AI isn’t about pretending you’ve already built a self-driving enterprise. It’s about being precise: knowing when you’re using a stateless feature, when you’ve built a workflow, and when you’re experimenting with agents.

At the end of the day, the real question isn’t: How smart is my AI? It’s: How much do I trust it to run the show?

AUTHOR