latest-post
I had a great back and forth with a student in the RL course that helped clarify the notion of "Agentic AI" for me. As someone who deals with strict definitions of agents in the RL field, LLMs being able to make API calls didn't initially strike me as something that, on its own, should be considered agentic. Nevertheless, this exchange proved fruitful for both of us, so I'll share my conclusion below:
You’re right that LLMs fit the formal definition of a stochastic policy, and that formalization has clearly been productive for fine-tuning and alignment work (RLHF and related methods). I still believe they don’t constitute RL agents in the formal sense - at least the foundation models don’t.
Where I think we differ is less about the math and more about terminology. To me, calling something an "agent" or an "RL agent" implies interaction with an environment that has external state transitions - where actions change the world state independently of the agent’s internal model. In the LLM’s case, the "environment" is usually just its own sequential generation process, which makes it feel categorically different, even if the MDP formalism applies.
That said, I do think the interaction with an LLM constitutes a form of I-POMDP. And as I’m writing this, I realize that’s likely the justification for calling it "agentic AI." So I’ll soften my initial claim: the reason LLMs can act as agents isn’t because of their internal model (which isn’t an RL agent), but because their models can interact with other agents, allowing them to functionally be considered agents in a broader system.
So we’re really talking about a few notions of policy here: the LLM’s own predictive distribution as a policy, and the emergent policy that arises when it interacts with humans or other agents.
This has been helpful for me. I hope it has offered something clarifying to anyone else. Thanks for the discussion.
Previous Post:
Personalized Interactive Narratives via Sequential Recommendation of Plot Points by Hong Yu and Mark O. Riedl
If you are a fan of interactive storytelling, aka DND, you should read this paper, especially if you are a DM/GM. I love that the academic terminology for Dungeon Master is Drama Manager, which perfectly acknowledges the respect for the source material and a solid intervention for generality's sake. We can still call them DMs after all. Ga Tech based folks, by the way.
The brain is a computer is a brain: neuroscience’s internal debate and the social significance of the Computational Metaphor
If I were teaching a class, I'd probably find a way to shoehorn in this essay. My biggest gripe with AI is that "Artificial Intelligence" means something slightly different to everyone. I guess you could say that about a lot of sciences, but I think everyone knows what you're talking about when you say "Poultry Science," as a counter-example.
Many roads to Rome: cautious considerations on the computability of creativity
Can you truly make creative AI agents? I can't think of a more loaded question. Speaking of no one having a good definition of things, let's talk about creativity too! There are a handful of things in this essay I don't like, but I think that's important and it raises a few valid concepts. It's important to read things that aren't 100% aligned with your approach or methods, or even presumptions. There is value in seeing how those outside of AI or computer science feel about the subject (even if they are unwittingly contributing to it)
Speaking of:
Exploring AI intervention points in high-school engineering education: a research through co-design approach
We all know that Gen AI is changing education at every level. This offers some insight and a framing of pressing issues that I find valuable. I am sure there are several directions to go form here.
A Framework for Sequential Planning in Multi-Agent Settings
Here's food for thought. Can every decision be thought of as an Interactive Partially Observable Markov Decision Process? Does the buck stop there regarding human-level decision-making under uncertainty? I'm not sure, but I am going to keep needling away at this thought until proven otherwise.
A survey of inverse reinforcement learning: Challenges, methods and progress
So if anything can be modeled as an I-POMDP, then what if we can only define the observations and actions? Inverse RL is all about inferring the reward functions from interactions with the environment. Still chomping away at this one, but thought I'd share none-the-less.