PodFetch - A Podcast Digestion Pipeline! (WIP)

I always wanted to be able to search in podcast transcripts and find the episodes that are relevant to my interests. It’s crazy still in 2024 Spotify is not able to give you “more like this” episodes, or “search in episode content” feature. So I used Whisper and started my own search engine for podcasts. It’s still in the early stages, and so far I’ve only re-written it once! The first version was using Solr for fulltext search. Now that LLMs are a thing, I would also like to add episode summarization feature to it. At the moment I’m adding speaker diarization to the pipeline, and the beauty of it is that the system is learning to recognize more and more speakers which reduces the need for manual labeling.

AI Researcher | CTO

Working on multimodal LLMs, and on-device AI. Available for hire.