VOICE AI LEARNING ASSISTANT

From Static Finance Content to a Searchable AI Learning Platform

The client had high-quality courses, Synthesia video lessons, and a 500+ page financial modeling book. Members still had to search manually. Here is how we built a retrieval and voice AI layer across all three content sources.

Company

Investment Analyst

Content sources

2,800+

Pages ingested

500+

Duration

~6 Weeks

Before

After

Members browsed courses manually with no semantic understanding

Hybrid keyword and semantic search across platform content

Video knowledge stayed locked inside Synthesia transcripts

Transcripts pulled, cleaned, and indexed automatically

A 500+ page book existed only as a static PDF

Book content extracted, chunked, embedded, and stored in Qdrant

No AI assistant existed for member Q&A

ElevenLabs voice agent answers from proprietary training material

Course and video updates required manual operational work

Automated LearnWorlds and Synthesia sync pipelines keep content current

No financial advice controls were needed because there was no AI layer

Explicit guardrails keep the agent educational, not advisory

Services Delivered

Across this project, we delivered:

Enterprise AI Knowledge Systems

AI Agent & Copilot Development

AI Automation & Workflow Intelligence

Full-Stack Product Engineering

Ongoing AMC Support

Discovery:

Understanding the Content Before Designing the AI Layer

We mapped how members were expected to learn today, where the content lived, and what LearnWorlds, Synthesia, Algolia, ElevenLabs, and Qdrant could realistically support.

What we learned:

Three useful content sources existed, but none of them worked together.LearnWorlds held course structure, Synthesia held video scripts, and the 500-page PDF held the deepest reference material.

Course content was structured, but not semantically searchable.Members could browse pages and pathways, but they could not ask questions in natural language and get directed to the right lesson.

Video transcripts were valuable, but isolated. Synthesia contained spoken instructional content, but LearnWorlds had no native transcript layer or reliable mapping between courses and videos.

The PDF needed its own retrieval pipeline.The Financial Modelling Mastery book was too important to treat as a static file. It needed extraction, cleaning, chunking, embeddings, and vector storage before an AI agent could use it.

Two decisions shaped the entire system:

Build a decoupled content architecture

•

Pull content from each source independently

•

Avoid manual course-to-video mapping

•

Use automated sync pipelines instead of an admin curation layer

•

Reduce maintenance overhead for future course and video updates

Treat the PDF as the foundation for book intelligence

•

Build a reusable PDF ingestion pipeline

•

Clean noisy textbook content before embedding

•

Store semantic chunks in Qdrant

•

Validate retrieval through real finance-related test queries

This approach let us move fast without forcing LearnWorlds, Synthesia, and the PDF book into one brittle content model. Each source kept its structure. The intelligence layer made them searchable, retrievable, and usable by the AI agent.

The three knowledge layers

LearnWorlds Course Layer

•

Course titles

•

Descriptions

•

Learning pathways

DATA SOURCE

LearnWorlds API

System Role

Synced into Algolia for semantic search and recommendations.

Synthesia Video Layer

•

Video metadata

•

Scripts

•

Clean transcripts

DATA SOURCE

Synthesia API

System Role

Converts video scripts into searchable learning content, so spoken lessons can be discovered through semantic search and surfaced by the AI agent.

Financial Modelling Book Layer

•

500-page PDF

•

Cleaned text chunks

•

Embedded book knowledge

DATA SOURCE

Financial Modelling Mastery PDF

System Role

Turns the 500+ page PDF into a retrievable knowledge base, so the AI agent can answer detailed finance questions from the book instead of relying on generic model knowledge.

WHAT WE BUILT

A unified AI intelligence layer across three content sources.

We built four connected components that turned The Investment Analyst's static learning content into a searchable, AI-assisted member experience.

PDF Ingestion Pipeline

Built a reusable Python pipeline to extract, clean, chunk, embed, & store the 500+ page Financial Modelling Mastery book in Qdrant.

•Page-level text extraction

•Noise filtering for headers, TOC entries, captions, blank pages, and copyright text

•Semantic chunking for retrieval quality

•OpenAI embedding generation

•Qdrant storage

•Test-query validation through the agent

Why it mattered:The book became retrievable by the AI assistant instead of sitting as a static PDF.

ElevenLabs AI Voice Agent

Built a member-facing AI tutor using ElevenLabs Agents v2.0 and the client's professional voice clone.

•Voice-enabled member Q&A

•Answers grounded in proprietary content

•References and links back to courses, videos, and book material

•Guardrails against personalised financial advice

•LearnWorlds widget / iframe embedding

•Paywall-compatible access

Why it mattered:Members could ask questions naturally and get guided to the right learning material.

Algolia Semantic Search & Retrieval

Configured Algolia as the discovery layer for LearnWorlds courses and Synthesia video content.

•Hybrid keyword + semantic search

•Course title, description, and pathway indexing

•Video transcript indexing

•Investment-domain index structure

•Algolia Recommendation API for course suggestions

•Smart search inside LearnWorlds

Why it mattered:Courses and video knowledge became searchable from one place.

Automated Content Sync Pipelines

Built automated pipelines to keep LearnWorlds and Synthesia content current inside the search and AI layer.

•LearnWorlds course sync

•Synthesia video metadata sync

•Transcript extraction and timestamp cleanup

•Scheduled and event-driven indexing

•Update and deletion handling in Algolia

Why it mattered:New, updated, or removed content flowed into the system without manual operational work.

The Details That Made It Production-Ready

Chunking Strategy for the PDF

Raw PDF extraction created too much noise for reliable retrieval. The book included tables, formulae, repeated headers, captions, footnotes, and copyright text.

We solved this with:

Multi-stage text cleaning

Removal of non-informational content

Semantic chunks instead of fixed character windows

Finance-specific retrieval tests

Tech stack

Decoupled Architecture Over Manual Linking

LearnWorlds and Synthesia had no native connection.

Instead of building a manual linking layer with a custom DB and admin UI, we used independent sync pipelines feeding a shared Algolia index.

This meant:

No manual course-to-video mapping

No ongoing admin curation

Faster delivery

Easier content updates

Tech stack

Embedding Model Choice

Investment content is dense and vocabulary-heavy. Queries like EBITDA bridge, terminal value growth rate, or DCF assumptions need financial context, not surface similarity.

We used OpenAI embeddings for the PDF pipeline to improve retrieval quality on finance-specific content.

Better embeddings meant better answers from day one.

Tech stack

Keeping the Agent Grounded

The agent had to teach financial concepts, not give investment advice.

We configured it with:

Grounding against TIA's proprietary content

Guardrails against personalised financial advice

Clear fallback behavior when content is not found

References and links back to courses, videos, or book material

Tech stack

Platform Embedding Constraints

LearnWorlds limited how deeply the AI layer could be embedded.

We avoided unsupported platform customization and delivered the experience through widget and frame embedding.

This kept the integration:

Stable across LearnWorlds updates

Compatible with member-only access

Easy to place across course pages

Independent from LearnWorlds core code

Tech stack

Real World Challenges

PDF noise would have polluted retrieval

Without filtering repeated headers, captions, blank-page noise, and TOC fragments, the agent would retrieve junk.

Reusability changed the pipeline design

The ingestion module was built configuration-first, so future books would not require a rewrite.

No native LearnWorlds-Synthesia mapping

There was no reliable way to say this course unit equals this Synthesia video. That forced the decoupled architecture decision early.

Paywall embedding had to be validated

The ElevenLabs widget had to work inside authenticated LearnWorlds pages without unsupported platform changes.

What changed for members and platform operations

Area

Before

After

Content search

Manual page browsing, no semantic understanding

Hybrid keyword and semantic search via Algolia NeuralSearch

Cross-source discovery

No way to search LearnWorlds and Synthesia together

Unified index across courses, transcripts, and book content

Book knowledge

500-page PDF unused by any system

Fully embedded in Qdrant - queryable by the AI agent

Member Q&A

No AI assistant on the platform

ElevenLabs voice agent grounded in proprietary content

Video transcripts

Locked on Synthesia, not connected to search

Auto-synced and indexed into Algolia via pipeline

Content freshness

Manual updates required for every content change

Automated pipelines handle sync, updates, and deletions

Course recommendations

Static navigation; no intelligent recommendation

Algolia Recommendation API surfaces courses contextually

Financial advice risk

No AI on platform

Explicit guardrails in place; agent teaches, not advises

The Team and Timeline

A single engineer delivered the core work over approximately six weeks, while the broader platform engagement ran in parallel.

1–2 weeks

Discovery Phase

•Architectural analysis

•Platform constraint mapping

•Approach evaluation

•Decision and sign-off

~6 weeks

PDF Pipeline

•PDF extraction and cleaning

•Semantic chunking

•OpenAI embedding

•Qdrant ingestion

•Test-query validation

•README

Parallel

AI Agent

•ElevenLabs v2.0 widget integration

•Voice clone configuration

•Prompt grounding

•Guardrails

•CTA placement

•Paywall embedding

•Algolia layer

•NeuralSearch configuration

•Index design

•Recommendation API

•LearnWorlds and Synthesia pipeline builds

•Automated sync

ONGOING

Go Live

•End-to-end validation

•Production deployment

•Monitoring

•AMC support

Tech Stack

AI & VOICE

ElevenLabs Agents

•

Professional voice clone

•

Prompt grounding

•

Response orchestration

VECTOR DATABASE

Qdrant

Open AI

SEARCH & RETRIEVAL

Algolia Neural Search

•

Algolia Vector DB

•

Algolia API

•

Cloud Functions

•

Cloud Run Jobs

CONTENT SOURCES

LearnWorlds APIs

Synthesia APIs

INGESTION PIPELINE

Python

•

Scheduled Workflows

•

Event-Driven Indexing Workflows

BACKEND SERVICES

Node.js

TypeScript

•

Secure API Integration

•

Data Transformation Layers

FRONTEND & EMBEDDING

•

Embedded Widget

•

iFrame

•

Paywall-Compatible CTA Placement

SECURITY

•

Token-Based API

•

Environment-Based secrets Management

4.7 / 527 reviews

From Static Finance Content to a Searchable AI Learning Platform

Services Delivered

Discovery:

Understanding the Content Before Designing the AI Layer

Two decisions shaped the entire system:

Build a decoupled content architecture

Treat the PDF as the foundation for book intelligence

The three knowledge layers

LearnWorlds Course Layer

Synthesia Video Layer

Financial Modelling Book Layer

A unified AI intelligence layer across three content sources.

PDF Ingestion Pipeline

ElevenLabs AI Voice Agent

Algolia Semantic Search & Retrieval

Automated Content Sync Pipelines

The Details That Made It Production-Ready

Chunking Strategy for the PDF

Decoupled Architecture Over Manual Linking

Embedding Model Choice

Keeping the Agent Grounded

Platform Embedding Constraints

Real World Challenges

PDF noise would have polluted retrieval

Reusability changed the pipeline design

No native LearnWorlds-Synthesia mapping

Paywall embedding had to be validated

What changed for members and platform operations

The Team and Timeline

Tech Stack

Services

Platforms

Inside the Lab

Healthcare

Brew. Build. Breakthrough.

Services

Platforms

Inside the Lab

Healthcare

Brew. Build. Breakthrough.