This is a course from Graphlet AI on full-stack graph machine learning taught by Russell Jurney.
This class uses a Docker image rjurney/graphml-class. To bring it up as the jupyter
service along with neo4j
, run:
# Pull the Docker images BEFORE class starts, or it can take a while on a shared connection
docker compose pull
# Run a Jupyter Notebook container in the background with all requirements.txt installed
docker compose up -d
# Tail the Jupyter logs to see the JupyterLab url to connect in your browser
docker logs jupyter -f --tail 100
To shut down docker, be in this folder and type:
docker compose down
You say potato, I say patato... the docker compose command changed in recent versions :)
NOTE: older versions of docker may use the command docker-compose
rather than the two word command docker compose
.
To edit code in VSCode you may want a local Anaconda Python environment with the class's PyPi libraries installed. This will enable VSCode to parse the code, understand APIs and highlight errors.
Note: if you do not use Anaconda, consider using it :) You can use a Python 3 venv in the same way as conda
.
Create a new Anaconda environment:
conda create -n graphml python=3.10.11 -y
Activate the environment:
conda activate graphml
Install the project's libraries:
poetry install
You can use a Python environment in VSCode by typing:
SHIFT-CMD-P
to bring up a command search window. Now type Python
or Interpreter
or if you see it, select Python: Select Interpreter
. Now choose the path to your conda environment. It will include the name of the environment, such as:
Python 3.10.11 ('graphml') /opt/anaconda3/envs/graphml/bin/python
Note: the Python version is set to 3.10.11
because Jupyter Stacks have not been updated more recently.
We build a knowledge graph from the Stack Exchange Archive for the network motif section of the course.
To run a bash shell in the Jupyter container, type:
docker exec -it jupyter bash
Once you're there, you can run the following commands to download and prepare the data for the course.
First, download the data:
graphml_class/stats/download.py stats.meta
Then you will need to convert the data from XML to Parquet:
spark-submit --packages "com.databricks:spark-xml_2.12:0.18.0" graphml_class/stats/xml_to_parquet.py
The course covers knowledge graph construction in PySpark in graphml_class.stats.graph.py.
spark-submit graphml_class/stats/graph.py
This course now covers network motifs in property graphs (frequent patterns of structure) using pyspark / GraphFrames (see motif.py, no notebook yet).
It supports directed motifs, not undirected. All the 4-node motifs are outlined below. Note that GraphFrames can also filter the
paths returned by its f.find()
method using any Spark DataFrame
filter - enabling temporal and complex property graph motifs.