In the age of generative AI, the word hallucination has become something of a euphemism

The AI Agent Hallucination Conundrum

By RTP research June 2025

What if your AI agent was always correct?

In the age of generative AI, the word hallucination has become something of a euphemism — a gentle way to describe when powerful AI systems make things up. In casual use, it's a tolerable side effect. In commerce, medicine, finance, or governance, it can be catastrophic.

As we race toward more capable AI agents — and eventually AGI — one question looms large:

Is it possible to build a zero-hallucination AI and more specifically for our industry, hallucination free shopping agents?

The answer reveals not only the technical boundaries of current systems, but also the deeper philosophical limitations of trying to create intelligence modeled on imperfect human knowledge.

Of course our ultimate goal should be having systems that are perfect and correct all the time. This said, and understanding the limits of reality and humans, another question can be if perfect AI systems are a valid, even important goal.

What Are AI Hallucinations

AI-powered shopping agents are quickly becoming a part of how people discover and buy products. Ask a question, and the system responds — conversationally, instantly, and often helpfully. The experience feels effortless: natural language in, relevant suggestions out. It’s no surprise that brands and retailers are eager to adopt them.

But as these agents become more visible and more polished, it’s worth pausing to ask: what’s actually happening under the hood? What kind of intelligence do these agents have — and just how far can we expect them to go?

The truth is, most shopping agents today are highly constrained — deliberately so. These systems are carefully engineered to avoid hallucinations, misinformation, and missteps. In a retail environment, getting something wrong — like product availability or return policy — can erode trust and create real operational costs. That’s why these agents are built around strict guardrails that define what they’re allowed to say, and how.

The Problem With Hallucinations in Commerce

If you’ve experimented with AI chatbots or digital assistants in ecommerce, you’ve likely seen this:

- Incorrect prices

- Imaginary products or inventory

- Made-up return policies

- Confident answers that aren’t grounded in any real data

These aren’t just bugs — they’re a known limitation of large language models (LLMs). These models are trained to predict the next word, not verify the truth.

when AI agents generate false, misleading, or invented information — can be quietly harmful or explosively damaging, depending on the context. Here's a detailed breakdown of how they affect both consumers and companies, in practical and reputational terms:

For Consumers

1. Misdirected Purchases

A shopper buys something based on false specs (e.g. “machine washable” or “compatible with X”), only to discover it isn’t. This leads to frustration, returns, and loss of trust.

In some verticals (health, parenting, home improvement), this could be dangerous — e.g. wrong usage instructions, dosage suggestions, or safety ratings.

2. Wasted Time

Agents hallucinating product availability, price, or location send users down dead ends.

Consumers may go to a store or click to buy only to find the item doesn’t exist or is out of stock.

3. Missed Opportunities

If the agent confidently recommends an inferior or irrelevant product, it can crowd out better options the user might have otherwise found.

4. Erosion of Trust

Even a few bad experiences with hallucinated content can lead users to distrust the entire assistant — and the brand behind it. The promise of convenience becomes a liability.

For Companies

1. Operational Cost

Hallucinations increase return rates, customer service volume, and re-engagement costs.

Even minor mistakes (“this ships in 2 days” when it’s actually 5) generate support tickets and churn.

2. Reputational Damage

If a brand’s AI agent misrepresents pricing, availability, or return policies, it can spark public backlash — especially if users feel deceived.

Social media screenshots of "dumb" or "lying" bots spread fast and carry long-term brand consequences.

3. Legal Risk

False claims about pricing, warranties, or product capabilities can trigger regulatory violations, especially in tightly governed markets like health, finance, or consumer protection.

Misleading promotions surfaced by an AI agent may constitute false advertising, even if unintentional.

4. Broken Partner Relationships

In marketplaces or multi-brand platforms, hallucinations about other brands’ products or terms can strain partnerships.

A hallucinated claim about a third-party seller could make the platform liable or damage trust with vendors.

5. Loss of Strategic Control

Hallucinating agents may prioritize the wrong inventory, push the wrong messaging, or contradict campaign strategies — introducing noise into carefully planned customer journeys.

Why This Is Especially Dangerous in Commerce

Unlike a creative writing tool or casual chatbot, a commerce agent is operating in the zone of decision-making. It’s not just answering questions — it’s influencing purchases, steering intent, and representing a brand. In that context, hallucination isn't just a technical bug — it’s business risk.

That’s why companies reaching for AI in commerce need to draw clear lines:

Hallucinations in discovery might be tolerable — if scoped and transparent.

Hallucinations in product facts, pricing, or policy are unacceptable.

The Tradeoff: Accuracy vs. Exploration in AI Shopping Agents

As shopping agents become more embedded in consumer journeys, one of the central tensions shaping their design comes into focus: the tradeoff between factual accuracy and creative assistance.

It’s entirely possible to build an AI system that never hallucinates — one that avoids making anything up, never suggests a product that’s out of stock, never misstates a price or return policy, and never veers from what’s been explicitly retrieved from a trusted source.

These systems are dependable, precise, and predictable. They’re built on structured catalogs, real-time inventory feeds, pricing APIs, agent-agent protocols, and brand-approved policies. When a shopper asks, “Is this item available in Large?” or “What’s your shipping timeline?” — the agent gives a verified answer, or nothing at all.

This kind of reliability is essential for many shopping scenarios. You can’t risk errors in checkout, returns, or inventory. Trust is the foundation of commerce, and hallucinations — even rare ones — erode it quickly.

What if your AI agent was always correct - and at what cost?

That trust comes at a cost.

The more tightly you control what the agent is allowed to say, the less room there is for exploration.

Highly constrained agents can’t suggest a related product unless it’s tagged and approved. They can’t improvise a gift idea for a tricky occasion unless that gift is already structured in the data. They can’t offer an inspired style pairing, or a suggestion based on tone, context, or subtle emotional cues — not because the model isn’t capable, but because the system won’t let it take the risk.

In these systems, the agent becomes more of a product index with a voice — helpful, but often flat.

The answers are correct, but sometimes unimaginative. It’s safe, but it’s not inspiring.

This is why many agents today feel more like customer service bots than true shopping companions. They excel at reactive precision, but struggle with proactive discovery. They can guide you through what’s already visible, but they rarely help you see what you didn’t know to look for.

The solution isn’t to abandon guardrails — far from it. But it is to design more intentionally around where creativity is valuable, and where accuracy is essential.

The real test of the next generation of shopping agents won’t be whether they avoid mistakes. That’s baseline. The real question is whether they can guide without guessing, and inspire without inventing.

Because shopping isn’t just about answering a question. It’s about helping someone find something they love — even when they don’t quite know how to ask for it yet.

Different Modes, Different Needs: Pre-Purchase vs. Post-Purchase Agents

The tradeoff between accuracy and creative flexibility becomes even clearer when we distinguish between the two main phases of the shopping journey: pre-purchase and post-purchase.

In the post-purchase phase, the job of an AI agent is largely operational: helping a customer with order tracking, return instructions, warranty details, or account updates. These interactions are transactional, rules-based, and tightly defined. There’s a right answer, a known policy, a specific next step.

Here, strict guardrails are not only acceptable — they’re essential. The agent must say only what’s true, current, and enforceable. A return window isn’t a suggestion; it’s a policy. Shipping timelines aren’t a creative opportunity; they’re a commitment. In this zone, hallucinations aren’t just risky — they’re unacceptable. Guardrails protect the business, clarify expectations, and deliver the reliability that customers rightly expect after they’ve handed over their money.

But the pre-purchase phase is a very different world.

Before someone buys, they’re often still figuring out what they need. Their questions are open-ended. Their goals are fuzzy. They might be browsing for a vibe, comparing options, or looking for help articulating what they’re imagining. This is where AI can become more than just a functional layer — it can become a guide.

In this mode, overly strict guardrails can actually hurt the experience. If the agent is only allowed to recommend products with perfect metadata, or only respond with answers pulled directly from a catalog, it becomes narrow and repetitive. It can’t suggest alternatives based on context. It can’t make lateral connections. It can’t surprise the shopper — and sometimes, surprise is the most valuable part of discovery.

That doesn’t mean throwing out safety or trust. It means designing different rules for different moments. Post-purchase agents should operate like policy engines: precise, consistent, unambiguous. Pre-purchase agents should feel more like stylists, curators, or personal shoppers — grounded, yes, but capable of inspiration and improvisation within trusted boundaries.

As shopping agents mature, recognizing this split — between the rigidity required after purchase and the flexibility needed before — will be key. The best agents won’t choose between creativity and control. They’ll apply each at the right time, and in the right way.

Because shoppers don’t want just answers.

They want help making decisions.

And decisions live somewhere between facts and feelings — not at the extremes of either one.

Can One Agent Do It All?

As AI shopping agents become more sophisticated, many companies are starting to claim that their assistant can “handle everything”— from product discovery to post-purchase support. On the surface, this sounds like the natural evolution: one unified interface for the entire customer journey. One agent, many functions. But in practice, it’s not quite that simple.

Pre-purchase and post-purchase use cases require fundamentally different capabilities, tone, and architecture. And while it’s tempting to merge them into a single system, doing so well takes more than just stitching together intents.

Post-purchase agents are built for precision. Their value comes from consistency and correctness — not creativity. They’re typically integrated with order management systems, policy engines, and customer accounts. They need to speak in clear, unambiguous terms, escalate when necessary, and operate within hard constraints. This makes them excellent at tasks like handling returns, updating shipping info, or explaining warranties.

Lately, many of these systems have started to claim that they can also “handle pre-purchase” — by answering product questions or surfacing recommendations. And technically, they can. But in reality, these experiences often feel flat. Because while these systems can provide information, they’re not built for exploration. They’re trained to follow rules, not understand preference. When adapted for product discovery, they tend to default to reciting catalog facts rather than guiding users toward meaningful decisions.

On the other side, pre-purchase agents are optimized for flexibility. Their job is to help the shopper figure out what they want — even when the shopper doesn’t know themselves. These agents need to interpret context, navigate ambiguity, and inspire confidence without locking into a binary answer. They might pull from real-time data, past behavior, social cues, or even visual input. Done well, they feel like a smart shopping companion — part assistant, part stylist, part curator.

Can these more creative agents handle post-purchase tasks too? In principle, yes — but with one important caveat: they must know when to switch modes.

The tone that works for a discovery conversation — light, suggestive, exploratory — doesn’t work for a billing dispute or a refund denial. Precision and accountability matter more than flexibility once a transaction is complete. So a discovery agent that wants to handle post-purchase must learn how to shift gears: to become crisp, exact, and policy-aligned when the context demands it.

Between the two, pre-purchase agents may actually be better positioned to expand into post-purchase — especially if they’ve been designed with multi-modal inputs, richer context models, and a more nuanced understanding of user intent. With strong retrieval, memory, and fallback systems in place, they can switch into a support mode when needed — while still preserving the personalized relationship they’ve built upstream.

But for post-purchase agents to move upstream? That’s a harder leap.

Because shopping discovery isn’t just about delivering information. It’s about interpreting signals, managing ambiguity, and helping people make choices they weren’t yet sure how to articulate. That’s a different kind of intelligence — not just functional, but emotional and interpretive.

The agents of the future may well blend both roles — but the path toward that future likely starts on the side of discovery, not compliance.

In the end, the agent that learns how to serve the shopper before the sale — guiding them through the messiness of preference, mood, and tradeoffs — may be the one best equipped to support them after the sale, too. Because it knows the why behind the buy — and that, more than any policy lookup, is the foundation for real service.

Reviewing the capabilities

Most of what’s marketed today as “AI agents” are really next-gen chatbots.

They:

- Don’t plan

- Don’t act independently

- Don’t reason across time

They do:

- Use smarter language models

- Pull from more data sources

- Operate within stricter guardrails

This doesn’t make them less useful. But it’s important to manage expectations.

Calling them “agents” risks implying intelligence, or some level of freedom to search unhindered, that doesn’t exist — and encouraging businesses to trust systems that may still hallucinate or mislead when pushed beyond their data boundaries.

Why Retailers Should Care

For retailers and brands, the stakes are high. A single wrong answer about price or policy can lead to:

Abandoned carts

Unhappy customers

Refunds or legal issues

By adopting a hybrid architecture:

Brands can guarantee factual accuracy where it matters

Offer safe discovery experiences to support conversion

Preserve trust and clarity in every interaction

Retailers and brands should also pay close attention to the differences between Pre- and Post-Purchase applications and ask deeper questions from companies and marketing teams making these sound the same.