Conversation Search Project

April 7, 2023
|
Tech

This was a quick experiment during the peak hype of ChatGPT. It's a tool that summarizes conversations from video and extracts topics. Ultimately there are tools out there and I didn’t see myself working on this longer term, but it was interesting!

High level outline of the system services.

Basic Flow

The idea was a tool to record conversations and keep track of different "entities" (nouns) across those conversations. For example at a company where conversations on a certain topic change over time, this tool would give you insight into that.

I decided to use terraform to deploy the cloud infrastructure which was surprisingly easy with GPT generating a validating the code.

I've worked with DynamoDB in the past, but this time decided to use correctly by following a single table design / adjacency list.

I then deployed a small PostgreSQL RDS instance which would act as an entity name lookup table. This would be queried by the user to "fuzzy" match an entity name and find all the source conversations it referenced.

Used the Levenshtein distance to match keywords.

The meat of the system was an ingestion script that would extract the audio, transcribe with OpenAI's Whisper, summarize/extract entities, and insert into Pinecone (vector database).

Dockerfile for ingestion script.
Main SQS message handler

Example Test

Used this video from Diary of a CEO as a test.

Interview with Rory Sutherland
Screenshot of summary and entities + descriptions.
Here's a frontend dump of all the entities.

That's basically it. I decided I didn't want to spend an enormous amount of time working on this tool as there's already lots of great ones out there.

But this was fun and informative introduction to building with LLMs.