2025-11-14 - Agentic AI
I had a great back and forth with a student in the RL course that helped clarify the notion of "Agentic AI" for me. As someone who deals with strict definitions of agents in the RL field, LLMs being able to make API calls didn't initially strike me as something that, on its own, should be considered agentic. Nevertheless, this exchange proved fruitful for both of us, so I'll share my conclusion below:
You’re right that LLMs fit the formal definition of a stochastic policy, and that formalization has clearly been productive for fine-tuning and alignment work (RLHF and related methods). I still believe they don’t constitute RL agents in the formal sense - at least the foundation models don’t.
Where I think we differ is less about the math and more about terminology. To me, calling something an "agent" or an "RL agent" implies interaction with an environment that has external state transitions - where actions change the world state independently of the agent’s internal model. In the LLM’s case, the "environment" is usually just its own sequential generation process, which makes it feel categorically different, even if the MDP formalism applies.
That said, I do think the interaction with an LLM constitutes a form of I-POMDP. And as I’m writing this, I realize that’s likely the justification for calling it "agentic AI." So I’ll soften my initial claim: the reason LLMs can act as agents isn’t because of their internal model (which isn’t an RL agent), but because their models can interact with other agents, allowing them to functionally be considered agents in a broader system.
So we’re really talking about a few notions of policy here: the LLM’s own predictive distribution as a policy, and the emergent policy that arises when it interacts with humans or other agents.
This has been helpful for me. I hope it has offered something clarifying to anyone else. Thanks for the discussion.