In early 2024, I wasn’t planning to ship a game that year, let alone an AI-driven multiplayer game in Unreal. But then, in a playtest, a dev told one of our AI NPCs that he was bored. Instead of a scripted response, the NPC suggested a game of hide-and-seek. Other AI NPCs overheard and asked to join in, and within a minute we had a village full of NPCs hiding, counting, and searching. That’s when I realized: This actually works.

Later that year, after just five months of development, we shipped Retail Mage on Steam. We learned some surprising things along the way about what’s possible today with Generative AI in games—and what players actually want from these experiences.

A Game That Shouldn’t Have Worked

Early in Retail Mage’s development, I had convinced an AI NPC customer that a particular painting was just the sort of inspiration he needed for his next spell. But when I went to make the sale, my character wasn’t strong enough to remove the painting from the wall.

In most games, that would have been it. I would have needed to find a different item, or level myself up, or solve some challenge — all mechanics rigidly designed by the developers to solve this type of puzzle. Instead, I tried something else.

I grabbed a book from a nearby shelf, tore out a piece of paper, and wrote “IOU” on it. Then, I handed the IOU to the customer and told him we’d ship the painting later.

And it worked! He gave me a 4-star review!

There was no pre-written script for this scenario. No game designer planned for an “IOU” mechanic. We didn’t have a “tear out paper” routine. Every step of that interaction was mediated by our AI systems: deciding I couldn’t take the painting down, interacting with the book, creating an IOU note. The system made a real-time ruling that my solution reasonably fulfilled the customer’s request.

This moment encapsulated what we set out to explore at Jam & Tea:

Can Gen AI systems interpret player intent and make decisions like a human Dungeon Master?
Can we combine that with a 3D game simulation and make the whole thing stay coherent?
Can this create a new kind of play—one that feels truly improvisational?

The Promise: AI as Game Master, Not Just Content Generator

We’re game developers. We want to make games we’ve always dreamed of. Chasing cheaper, faster, (potentially unethical) generated content doesn’t excite us. But we are excited that new tech could help us make new kinds of fun games. What could generative AI technology unlock for players?

Of course, we’re not the first to see Large Language Models and think: can I make a Game Master with that? And folks have used ChatGPT and similar services to create text-based roleplaying games.

But for us, we wanted to push well beyond chatbots and integrate machine learning techniques into the core game loop in real-time to find truly novel forms of gameplay.

What if AI could function like a human Game Master, interpreting player actions and making real-time rulings rather than following pre-scripted rules? Could we synthesize that with a 3D simulation? Could we let players forge their own paths and still keep it engaging and within bounds? Could we create some sort of holodeck-like experience? It sounded crazy, but the more we iterated, the more the answer became: maybe?

The Reality: Making It Actually Work Was … Not Easy

Getting from concept to shipped product meant solving hard problems. AI in games isn’t just an engineering problem; it’s also a game design problem, an economics problem, and a UX problem all at once.

Solving cost is existential

Early in development, running our AI inference cost so much that each play session was as expensive as a ticket to Disneyland. We couldn't build a viable product that way.

What we found is that we had to get our hands dirty. We couldn’t just call APIs. The full story is a long one that I’ll reveal in upcoming articles, but ultimately we had to manage our own (cloud) GPUs (thank you, AWS!) and create our own inference approach. We heavily leveraged techniques such as structured generation (which let us control AI outputs without rigid scripting) and collaborated with projects like sglang to get our costs down by 1000x—three orders of magnitude.

The blank page syndrome

Sometimes the biggest challenge wasn't technical—it was psychological. As players, we’ve been trained by decades of deterministic games to look for the "right" solution to a problem. When faced with true open-endedness, many initially struggled.

During playtests, we'd watch players hunt around the store looking for exactly the right item to satisfy a customer request, as though it were a hidden object game, rather than use the flexibility of the game to create solutions. We realized we’d need to rethink player onboarding, quest design, and UI/UX conventions: all things we’re still refining.

Information overload is real

If NPCs can now go off and do their own thing (like initiating a game of hide-and-seek), it becomes increasingly difficult for players to keep track of what’s going on.

And if every NPC has a deep backstory, everyone can talk endlessly about their life story, town gossip, the weather, politics, and more. It just doesn’t end.

This, on its own, isn’t necessarily a bad thing. In fact, it’s part of the magic. But helping players navigate such a rich simulation is a challenge. Retail Mage includes some of our attempts to help players, but I think information overload and the corresponding game and UI design questions are still to be fully solved.

AI can be too intelligent

We had to slow the AI NPCs down. Left to their own devices, they’d just go off and solve their own problems, leaving players with little to do. We saw AI customers in Retail Mage self-organize and help each other out while we were ignoring them during demos.

And sometimes NPCs acted just a bit too realistically. I once convinced an AI NPC to help me solve a mystery. They were all in and suggested we start the next day. Oops. No, I meant I wanted to investigate this mystery now. The AI NPC pushed back: truly, this could wait until tomorrow, why are you being so pushy?!

We had AI NPCs observe otherwise normal player behavior of jumping around the level (”bunny hopping”) and decide that something could be seriously wrong with these human player characters. Were they on drugs or something? They were acting awfully suspicious!

And as it turns out, AI NPCs that act like entirely rational, reasonable humans don’t make for adventurous gameplay. Most rational and reasonable humans avoid danger and like a good hobbit, they stay home and tend to the garden.

The Surprises: AI "Flaws" Became Features

Hallucinations can be a feature, not a bug

In traditional development, unpredictable behavior is usually a defect to be eliminated. With generative AI, we’ve found that certain kinds of unpredictability actually enhanced the experience.

For example, our AI customers occasionally misremembered conversations or invented details about the store. Rather than correcting these hallucinations, we could often incorporate them into the world state, allowing everyone to just roll with it. We call this freedom and flexibility “improvisational play”: when the player and the game can play back with each other. We believe it’s one of the key advances that machine learning AI can bring to games.

Figuring out when to lean in and when to manage hallucinations (and how to technically accomplish this) was a huge set of learnings for us, and an area where we’re still improving.

Players find ways to break systems and that’s the point

Players inevitably try to break systems. In traditional games, this leads to discovering the boundaries of the simulation, breaking immersion.

In Retail Mage, we saw players attempt to change a customer's purchase by convincing them they really wanted something else, or by creating elaborate backstories for items to increase their value. Instead of hitting a wall of "that's not implemented," they were delighted when the AI engaged with their schemes.

The Compromises: What We Had to Give Up

Building a game with AI at its core forced us to rethink fundamental assumptions about game design. Some of our most interesting decisions weren't about what to add, but what to remove or reimagine:

We disabled standard physics entirely. Physics engines in Unity and Unreal run at a much faster cadence than our AI inference. This created race conditions where the physics would resolve before the AI could make decisions about the same objects, leading to discrepancies in game state. Our solution was to implement a simplified physics model that could synchronize with the AI's decision-making pace.
Traditional UI patterns often didn't apply. In most games, you'd have button shortcuts for common actions like "grab" or "use." But when players can attempt literally anything they can describe, how do you create an interface for that? We opted for a more open-ended text-entry approach that sacrificed some efficiency for flexibility.
Progression systems can stifle creativity We want to reward players when they improvise solutions rather than enforcing rigid solutions and play styles; however, that in turn creates challenges because it remove many of the traditional tools of game design and places a lot of responsibility on players to find their own fun. This isn’t an unsolvable problem, but definitely requires more research and investment on our part.

Yes, we also skipped some traditional game elements like progression systems and elaborate tutorials—partly due to time constraints (we gave ourselves just five months with a small team), but also because we wanted to focus on proving out the core AI gameplay mechanics first.

In retrospect, we may have cut too deeply in some areas. Retail Mage was envisioned as a technical showcase as much as it was a game. During our journey, we met many folks who couldn’t quite see how the technical innovations we had found could translate into gameplay, so we decided to make just enough of a game to prove the point. Consequently, Retail Mage exists in a sort of uncanny valley of game design: more than a demo but not quite as much of a game as some players would expect. With more time, we'd have created better scaffolding to help players transition from traditional gameplay expectations to this new improvisational paradigm.

The Future: Building Towards A Holodeck

Throughout the development of Retail Mage, we kept coming back to a fundamental question: what if games could truly respond to anything players might try? What if virtual worlds could be as adaptive and responsive as those imagined in science fiction—like Star Trek's holodeck?

Retail Mage isn't the final form—it's just the start. We proved this approach can work, now we need to push further: bigger worlds, deeper NPC reasoning, and better tools for our designers.

That's why we're building INFUSE. Simply put, INFUSE is the toolset we wish we'd had when we started Retail Mage—a high-level framework that handles the complex integration of AI with traditional game systems.

We've come to believe that many AI tools out there are just too low-level. Unlocking the magic this technology promises requires deep levels of integration, not just a drop-in plugin. Once you’ve imbued your game with this level of dynamic responsiveness, you’ll find your tools are equally enhanced. I wouldn’t be surprised to see a future where designers are instructing their NPCs as if they were Westworld’s androids, with hopefuly less of the abuse, uprising and murder.

For now, INFUSE is primarily an internal tool—we're still in the trenches, learning by iterating. But as we refine it, we're moving toward sharing these tools with players and developers to build worlds of their own. As a first step, we’re partnering with some amazing media owners to bring more INFUSE-driven experiences to life.

The Takeaway: Creation, Not Just Optimization

We built Retail Mage to prove that AI could fundamentally change gameplay, not just how games are made.

The result isn't perfect. It's rough around the edges and limited in scope. But it demonstrates that truly improvisational play—where the game interprets player intent rather than just validating predefined solutions—is achievable now.

For game developers, I think the key insight isn't about technology readiness—it's about mindset. The most interesting applications won't come from optimizing existing workflows or replacing content creation. They'll come from reimagining what types of gameplay we can build when our games are now capable of improvising in real-time alongside players.

This creates both technical and human challenges: How do we design interfaces for open-ended play? How do we respect the contributions of actors, writers, and artists when AI can generate content dynamically? These aren't just technical problems—they're design, ethical, and business problems too.

That's the territory we're committed to exploring: not merely using AI to make traditional games faster or cheaper, but using it to create experiences that weren't possible before—while keeping humans at the heart of the creative process.

This is the first in a series of articles exploring the technical and design challenges we encountered building Retail Mage and other AI-driven games at Jam & Tea Studios. If you're interested in these topics or exploring our INFUSE platform, you can find us at jamandtea.studio. You can find Retail Mage on Steam.

Making Retail Mage: A New Approach to AI in Games