Multi-Agent AI Platform

Unifying Enterprise Knowledge with a Multi-Agent AI Platform

The client had 10,000+ documents, three disconnected systems, and store associates spending five minutes finding answers while customers waited. This is how we fixed it.

Company

Fortune 500 Retailer

Stores

2,800+

Team

4 Engineers

Duration

~6 Months

Case study hero

This wasn't a search problem. It was a system design problem - information spread across documents, analytics, and loyalty platforms that didn't talk to each other.

We built a multi-agent layer on top of these systems, allowing teams to query across them through a single interface, without rebuilding anything underneath.

Use Case

Before

After

Cross-domain queries

3-4 tools for one answer

1 conversational interface

Query time

5-10 minutes

~15 seconds

Documents searchable via AI

0

10K+ with page-level citation tracking

Document ingestion

Manual

Automatic, event-driven pipeline

Source verification

None

100% citation-backed responses

Services Delivered

Across this project, we delivered:

Discovery:

2 Weeks Before Writing Code

We spoke with store teams, regional managers, and loyalty teams.

What we learned:

10,000+ documents across mixed formats and languages

Three disconnected systems : document store, analytics dashboard, loyalty platform

No semantic search, no cross-system queries, no citation tracking anywhere

Discovery session

Two decisions shaped the entire system:

Phase 1 icon
Start with Phase 1

Validate real workflows first

Don't build the full system upfront

Phase 2 icon
Use domain-based architecture

One agent per domain

Orchestration connects them

No rebuilding - existing systems are wrapped

This approach reflects how we structure these systems - starting with context, defining execution boundaries, and layering orchestration only where it adds value.

The Engineering decisions that made it production-ready

Multilingual embeddings

Global retailer, 110+ countries. We used Vertex AI's multilingual model so international documents weren't invisible to search.

Vertex AIGemini
Reranking on top of vector search

Similarity search finds what's related. Reranking finds what's relevant. We set a confidence threshold at 0.38 - if the system isn't confident enough, it doesn't answer. No hallucination over a low-quality match.

PineconeweaviateLangChain
Dual ingestion pipelines

Event-driven for new uploads (indexed in minutes, not days) and batch processing with 16 parallel workers for bulk imports. Both handle extraction, chunking, embeddings, and indexing automatically.

Cloud RunLlamaIndex
Session memory

Users ask follow-up questions without re-establishing context. Query time dropped from minutes to seconds. Teams started relying on it daily within weeks. Citation logs gave teams visibility into what the system was confident about - and what it wasn't.

redisGoogle BigQuery

That evidence is what made Phase 2 possible. The client had seen it work in production. They came back with the obvious next question: what else can connect to this?

The three agents

LIVE
Arlo - Store Ops Agent

Policies

SOPs

Product Catalogs

DATA SOURCE

Phase 1 RAG platform

LIVE
Lyra - Loyalty Agent

Member Engagement

Rewards

Campaigns

DATA SOURCE

Loyalty platform

DEMO
Arc - Analytics Agent

Store performance

Metrics

DATA SOURCE

Demo mode - scoped down intentionally

The Analytics Agent

We made a deliberate scope call. The Analytics Agent shipped in demo mode - not because the architecture wasn't ready, but because live data integration would have delayed Phase 2 delivery by six weeks for a single agent. The Store Operations and Loyalty agents handle live production data. Analytics is an execution step, not a design problem, and it didn't block the core value of the platform.

Agents register via standardized metadata. The orchestrator discovers them at runtime - no changes to orchestration logic required.

The Problems You Don't See in Demos

Stateful conversations in a stateless system:

A2A agents are stateless, but users aren't. Queries like "what about my store?" broke the routing logic early on. We added a memory layer where the Root Agent detects implicit context and enriches queries before routing. Users never have to restate context.

Trust through citations in enterprise workflows:

Generic AI responses failed immediately in real workflows. Every answer now includes source file, page number, and a signed link to the original document. The system doesn't just answer, it shows the source.

Planner reliability at scale:

The planner didn't work reliably out of the box. We ran 12+ prompt iterations, added validation, and built retry logic. ~80% of plans succeed on the first try. The rest fall back to slower, deterministic execution - users see latency, not failure.

Wrapping existing systems without rebuilding:

The Phase 1 system was already live. Rewriting it would have killed momentum. We wrapped it with a thin A2A proxy - no backend changes. This pattern turns existing systems into agents without rebuilding them.

What Changed Along the Way

Real projects don't follow the original plan. This one didn't either.

Phase 2 wasn't scoped initially.

It came from Phase 1 working in production - citation logs, daily usage, and measurable time savings. That kind of evidence moves internal decisions faster than any architecture proposal.

The memory layer wasn't planned.

Users naturally used implicit context - "my store," "our numbers." Stateless routing broke immediately. We added a memory layer mid-sprint. It's now one of the most used parts of the system.

The Analytics Agent shipped in demo mode.

We chose to ship what was ready instead of delaying everything for one integration. Named it, scoped it clearly, and moved forward.

Security and Failure Handling

For an enterprise client handling loyalty program PII across millions of members:

Concern

How it's handled

Authentication

Google Cloud IAM, scoped per service

Document access

Signed URLs, 15-min expiry

PII

Never stored in orchestrator layer - accessed via agent API only

Network isolation

Internal endpoints only; RAG backend via SSH tunnel in production

Audit trail

Every agent interaction traced via OpenTelemetry + Cloud Tracing, logged to BigQuery

Secrets

GCP Secret Manager - no hardcoded credentials

When things go wrong

01

No relevant documents found

System states clearly, no hallucination

02

Agent unavailable

Partial results returned with explanation of what's missing

03

Planner generates invalid plan

Validation catches it, retry logic corrects it

04

Streaming connection drops

Frontend reconnects with exponential backoff

The Team and Timeline

4 engineers. Both phases. The entire platform.

2 WEEKS

Discovery Phase

Stakeholder Interviews

Document Audit

Architecture Decisions

~3 MONTHS

Phase 1 RAG Platform

Ingestion Pipelines

Vector Search

Citation Tracking

Chat UI

~3 MONTHS

Phase 2 Multi-Agent

Agent Development

Orchestrator

Memory Integration

Streaming UI

ONGOING

Iteration

Prompt Refinement

Performance Tuning

Tech Stack

AI / ML
Vertex AI
Vertex AI
Google ADK
Google ADK
LangChain
LangChain
A2A Protocol
FRONTEND
Next.js
Next.js
React 19
React 19
TypeScript
TypeScript
Tailwind CSS
Tailwind CSS
BACKEND
Python
Python
FastAPI
FastAPI
Cloud Run
Cloud Run
Cloud Functions
Cloud Functions
Cloud Run Jobs
INFRASTRUCTURE
Docker
Docker
Kubernetes Helm
Kubernetes Helm
Terraform
Terraform
GitHub Actions
DATA
BigQuery
BigQuery
Redis
Redis
Cloud Storage

The orchestration patterns - Planner-Executor, A2A protocol, RAG pipelines - are cloud-agnostic. We have production experience on AWS and Azure.

Karan Shah
Newsletter

Brew. Build. Breakthrough.

A twice-a-month newsletter from
Karan Shah, CEO & Co-Founder

10K+ Users Already Subscribed

SoluteLabs © 2014-2026

Privacy Policy

Terms of Use