The client had 10,000+ documents, three disconnected systems, and store associates spending five minutes finding answers while customers waited. This is how we fixed it.
Fortune 500 Retailer
2,800+
4 Engineers
~6 Months

This wasn't a search problem. It was a system design problem - information spread across documents, analytics, and loyalty platforms that didn't talk to each other.
We built a multi-agent layer on top of these systems, allowing teams to query across them through a single interface, without rebuilding anything underneath.
Use Case
Before
After
Cross-domain queries
3-4 tools for one answer
1 conversational interface
Query time
5-10 minutes
~15 seconds
Documents searchable via AI
0
10K+ with page-level citation tracking
Document ingestion
Manual
Automatic, event-driven pipeline
Source verification
None
100% citation-backed responses
Across this project, we delivered:
Enterprise RAG Platform
Multi-Agent Development
Full-Stack Product Engineering
Data Pipelines & Cloud Infrastructure
We spoke with store teams, regional managers, and loyalty teams.
What we learned:
10,000+ documents across mixed formats and languages
Three disconnected systems : document store, analytics dashboard, loyalty platform
No semantic search, no cross-system queries, no citation tracking anywhere

Validate real workflows first
Don't build the full system upfront
One agent per domain
Orchestration connects them
No rebuilding - existing systems are wrapped
This approach reflects how we structure these systems - starting with context, defining execution boundaries, and layering orchestration only where it adds value.
Global retailer, 110+ countries. We used Vertex AI's multilingual model so international documents weren't invisible to search.


Similarity search finds what's related. Reranking finds what's relevant. We set a confidence threshold at 0.38 - if the system isn't confident enough, it doesn't answer. No hallucination over a low-quality match.



Event-driven for new uploads (indexed in minutes, not days) and batch processing with 16 parallel workers for bulk imports. Both handle extraction, chunking, embeddings, and indexing automatically.


Users ask follow-up questions without re-establishing context. Query time dropped from minutes to seconds. Teams started relying on it daily within weeks. Citation logs gave teams visibility into what the system was confident about - and what it wasn't.


That evidence is what made Phase 2 possible. The client had seen it work in production. They came back with the obvious next question: what else can connect to this?
Policies
SOPs
Product Catalogs
Phase 1 RAG platform
Member Engagement
Rewards
Campaigns
Loyalty platform
Store performance
Metrics
Demo mode - scoped down intentionally
We made a deliberate scope call. The Analytics Agent shipped in demo mode - not because the architecture wasn't ready, but because live data integration would have delayed Phase 2 delivery by six weeks for a single agent. The Store Operations and Loyalty agents handle live production data. Analytics is an execution step, not a design problem, and it didn't block the core value of the platform.
Agents register via standardized metadata. The orchestrator discovers them at runtime - no changes to orchestration logic required.
A2A agents are stateless, but users aren't. Queries like "what about my store?" broke the routing logic early on. We added a memory layer where the Root Agent detects implicit context and enriches queries before routing. Users never have to restate context.
Generic AI responses failed immediately in real workflows. Every answer now includes source file, page number, and a signed link to the original document. The system doesn't just answer, it shows the source.
The planner didn't work reliably out of the box. We ran 12+ prompt iterations, added validation, and built retry logic. ~80% of plans succeed on the first try. The rest fall back to slower, deterministic execution - users see latency, not failure.
The Phase 1 system was already live. Rewriting it would have killed momentum. We wrapped it with a thin A2A proxy - no backend changes. This pattern turns existing systems into agents without rebuilding them.
Real projects don't follow the original plan. This one didn't either.
It came from Phase 1 working in production - citation logs, daily usage, and measurable time savings. That kind of evidence moves internal decisions faster than any architecture proposal.
Users naturally used implicit context - "my store," "our numbers." Stateless routing broke immediately. We added a memory layer mid-sprint. It's now one of the most used parts of the system.
We chose to ship what was ready instead of delaying everything for one integration. Named it, scoped it clearly, and moved forward.
For an enterprise client handling loyalty program PII across millions of members:
Concern
How it's handled
Authentication
Google Cloud IAM, scoped per service
Document access
Signed URLs, 15-min expiry
PII
Never stored in orchestrator layer - accessed via agent API only
Network isolation
Internal endpoints only; RAG backend via SSH tunnel in production
Audit trail
Every agent interaction traced via OpenTelemetry + Cloud Tracing, logged to BigQuery
Secrets
GCP Secret Manager - no hardcoded credentials
01
No relevant documents found
02
Agent unavailable
03
Planner generates invalid plan
04
Streaming connection drops
4 engineers. Both phases. The entire platform.
2 WEEKS
Discovery Phase
Stakeholder Interviews
Document Audit
Architecture Decisions
~3 MONTHS
Phase 1 RAG Platform
Ingestion Pipelines
Vector Search
Citation Tracking
Chat UI
~3 MONTHS
Phase 2 Multi-Agent
Agent Development
Orchestrator
Memory Integration
Streaming UI
ONGOING
Iteration
Prompt Refinement
Performance Tuning
The orchestration patterns - Planner-Executor, A2A protocol, RAG pipelines - are cloud-agnostic. We have production experience on AWS and Azure.

A twice-a-month newsletter from
Karan Shah, CEO & Co-Founder
10K+ Users Already Subscribed