The Jupyter notebooks in this repository accompany the blog post Topic extraction with Neo4j Graph Data Science for better semantic search.
To following along with the blog content, work through the notebooks in this order:
- Download_TMDB_movies.ipynb
- Extract themes.ipynb
- Clean up themes and get embeddings.ipynb
- Cluster themes.ipynb
- Summarize theme groups.ipynb
- Compare retrievers.ipynb
You will need a Neo4j environment with GDS installed. You can create an Aura DS instance or download Neo4j Desktop.
You will also need an API key for the Large Language Model of your choice. The notebooks use Anthropic and OpenAI, but you can adapt the code to use others.