Multi-Agent RAG Platform for Retail | Case Study

Multi-Agent AI Platform

Unifying Enterprise Knowledge with a Multi-Agent AI Platform

The client had 10,000+ documents, three disconnected systems, and store associates spending five minutes finding answers while customers waited. This is how we fixed it.

Company

Fortune 500 Retailer

Stores

2,800+

Team

4 Engineers

Duration

~6 Months

This wasn't a search problem. It was a system design problem - information spread across documents, analytics, and loyalty platforms that didn't talk to each other.

We built a multi-agent layer on top of these systems, allowing teams to query across them through a single interface, without rebuilding anything underneath.

Use Case

Before

After

Cross-domain queries

3-4 tools for one answer

1 conversational interface

Query time

5-10 minutes

~15 seconds

Documents searchable via AI

10K+ with page-level citation tracking

Document ingestion

Manual

Automatic, event-driven pipeline

Source verification

None

100% citation-backed responses

Services Delivered

Across this project, we delivered:

Enterprise RAG Platform

Multi-Agent Development

Full-Stack Product Engineering

Data Pipelines & Cloud Infrastructure

Discovery:

2 Weeks Before Writing Code

We spoke with store teams, regional managers, and loyalty teams.

What we learned:

10,000+ documents across mixed formats and languages

Three disconnected systems : document store, analytics dashboard, loyalty platform

No semantic search, no cross-system queries, no citation tracking anywhere

Two decisions shaped the entire system:

Start with Phase 1

•

Validate real workflows first

•

Don't build the full system upfront

Use domain-based architecture

•

One agent per domain

•

Orchestration connects them

•

No rebuilding - existing systems are wrapped

This approach reflects how we structure these systems - starting with context, defining execution boundaries, and layering orchestration only where it adds value.

The Engineering decisions that made it production-ready

Multilingual embeddings

Global retailer, 110+ countries. We used Vertex AI's multilingual model so international documents weren't invisible to search.

Reranking on top of vector search

Similarity search finds what's related. Reranking finds what's relevant. We set a confidence threshold at 0.38 - if the system isn't confident enough, it doesn't answer. No hallucination over a low-quality match.

Dual ingestion pipelines

Event-driven for new uploads (indexed in minutes, not days) and batch processing with 16 parallel workers for bulk imports. Both handle extraction, chunking, embeddings, and indexing automatically.

Session memory

Users ask follow-up questions without re-establishing context. Query time dropped from minutes to seconds. Teams started relying on it daily within weeks. Citation logs gave teams visibility into what the system was confident about - and what it wasn't.

That evidence is what made Phase 2 possible. The client had seen it work in production. They came back with the obvious next question: what else can connect to this?

The three agents

LIVE

Arlo - Store Ops Agent

•

Policies

•

SOPs

•

Product Catalogs

DATA SOURCE

Phase 1 RAG platform

LIVE

Lyra - Loyalty Agent

•

Member Engagement

•

Rewards

•

Campaigns

DATA SOURCE

Loyalty platform

DEMO

Arc - Analytics Agent

•

Store performance

•

Metrics

DATA SOURCE

Demo mode - scoped down intentionally

The Analytics Agent

We made a deliberate scope call. The Analytics Agent shipped in demo mode - not because the architecture wasn't ready, but because live data integration would have delayed Phase 2 delivery by six weeks for a single agent. The Store Operations and Loyalty agents handle live production data. Analytics is an execution step, not a design problem, and it didn't block the core value of the platform.

Agents register via standardized metadata. The orchestrator discovers them at runtime - no changes to orchestration logic required.

The Problems You Don't See in Demos

Stateful conversations in a stateless system:

A2A agents are stateless, but users aren't. Queries like "what about my store?" broke the routing logic early on. We added a memory layer where the Root Agent detects implicit context and enriches queries before routing. Users never have to restate context.

Trust through citations in enterprise workflows:

Generic AI responses failed immediately in real workflows. Every answer now includes source file, page number, and a signed link to the original document. The system doesn't just answer, it shows the source.

Planner reliability at scale:

The planner didn't work reliably out of the box. We ran 12+ prompt iterations, added validation, and built retry logic. ~80% of plans succeed on the first try. The rest fall back to slower, deterministic execution - users see latency, not failure.

Wrapping existing systems without rebuilding:

The Phase 1 system was already live. Rewriting it would have killed momentum. We wrapped it with a thin A2A proxy - no backend changes. This pattern turns existing systems into agents without rebuilding them.

What Changed Along the Way

Real projects don't follow the original plan. This one didn't either.

Phase 2 wasn't scoped initially.

It came from Phase 1 working in production - citation logs, daily usage, and measurable time savings. That kind of evidence moves internal decisions faster than any architecture proposal.

The memory layer wasn't planned.

Users naturally used implicit context - "my store," "our numbers." Stateless routing broke immediately. We added a memory layer mid-sprint. It's now one of the most used parts of the system.

The Analytics Agent shipped in demo mode.

We chose to ship what was ready instead of delaying everything for one integration. Named it, scoped it clearly, and moved forward.

Security and Failure Handling

For an enterprise client handling loyalty program PII across millions of members:

Concern

How it's handled

Authentication

Google Cloud IAM, scoped per service

Document access

Signed URLs, 15-min expiry

PII

Never stored in orchestrator layer - accessed via agent API only

Network isolation

Internal endpoints only; RAG backend via SSH tunnel in production

Audit trail

Every agent interaction traced via OpenTelemetry + Cloud Tracing, logged to BigQuery

Secrets

GCP Secret Manager - no hardcoded credentials

When things go wrong

No relevant documents found

System states clearly, no hallucination

Agent unavailable

Partial results returned with explanation of what's missing

Planner generates invalid plan

Validation catches it, retry logic corrects it

Streaming connection drops

Frontend reconnects with exponential backoff

The Team and Timeline

4 engineers. Both phases. The entire platform.

2 WEEKS

Discovery Phase

•

Stakeholder Interviews

•

Document Audit

•

Architecture Decisions

~3 MONTHS

Phase 1 RAG Platform

•

Ingestion Pipelines

•

Vector Search

•

Citation Tracking

•

Chat UI

~3 MONTHS

Phase 2 Multi-Agent

•

Agent Development

•

Orchestrator

•

Memory Integration

•

Streaming UI

ONGOING

Iteration

•

Prompt Refinement

•

Performance Tuning

Tech Stack

AI / ML

Vertex AI

Google ADK

LangChain

•

A2A Protocol

FRONTEND

Next.js

React 19

TypeScript

Tailwind CSS

BACKEND

Python

FastAPI

Cloud Run

Cloud Functions

•

Cloud Run Jobs

INFRASTRUCTURE

Docker

Kubernetes Helm

Terraform

•

GitHub Actions

DATA

BigQuery

Redis

•

Cloud Storage

The orchestration patterns - Planner-Executor, A2A protocol, RAG pipelines - are cloud-agnostic. We have production experience on AWS and Azure.

4.7 / 525 reviews

Newsletter

Brew. Build. Breakthrough.

A twice-a-month newsletter from
Karan Shah, CEO & Co-Founder

10K+ Users Already Subscribed

Unifying Enterprise Knowledge with a Multi-Agent AI Platform

Services Delivered

Discovery:

2 Weeks Before Writing Code

Two decisions shaped the entire system:

Start with Phase 1

Use domain-based architecture

The Engineering decisions that made it production-ready

Multilingual embeddings

Reranking on top of vector search

Dual ingestion pipelines

Session memory

The three agents

Arlo - Store Ops Agent

Lyra - Loyalty Agent

Arc - Analytics Agent

The Analytics Agent

The Problems You Don't See in Demos

Stateful conversations in a stateless system:

Trust through citations in enterprise workflows:

Planner reliability at scale:

Wrapping existing systems without rebuilding:

What Changed Along the Way

Phase 2 wasn't scoped initially.

The memory layer wasn't planned.

The Analytics Agent shipped in demo mode.

Security and Failure Handling

When things go wrong

System states clearly, no hallucination

Partial results returned with explanation of what's missing

Validation catches it, retry logic corrects it

Frontend reconnects with exponential backoff

The Team and Timeline

Tech Stack

Services

Platforms

Inside the Lab

Brew. Build. Breakthrough.