Conversation Search Project
This was a quick experiment during the peak hype of ChatGPT. It's a tool that summarizes conversations from video and extracts topics. Ultimately there are tools out there and I didn’t see myself working on this longer term, but it was interesting!
Basic Flow
The idea was a tool to record conversations and keep track of different "entities" (nouns) across those conversations. For example at a company where conversations on a certain topic change over time, this tool would give you insight into that.
I decided to use terraform to deploy the cloud infrastructure which was surprisingly easy with GPT generating a validating the code.
I've worked with DynamoDB in the past, but this time decided to use correctly by following a single table design / adjacency list.
I then deployed a small PostgreSQL RDS instance which would act as an entity name lookup table. This would be queried by the user to "fuzzy" match an entity name and find all the source conversations it referenced.
The meat of the system was an ingestion script that would extract the audio, transcribe with OpenAI's Whisper, summarize/extract entities, and insert into Pinecone (vector database).
Example Test
Used this video from Diary of a CEO as a test.
That's basically it. I decided I didn't want to spend an enormous amount of time working on this tool as there's already lots of great ones out there.
But this was fun and informative introduction to building with LLMs.