Skip to content

Full Stack Graph Machine Learning: Theory, Practice, Tools and Techniques

License

Notifications You must be signed in to change notification settings

Graphlet-AI/graphml-class

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Full Stack Graph Machine Learning: Theory, Practice, Tools and Techniques v1.1.1

This is a course from Graphlet AI on full-stack graph machine learning taught by Russell Jurney.

Graphlet AI

Environment Setup

This class uses a Docker image rjurney/graphml-class. To bring it up as the jupyter service along with neo4j, run:

# Pull the Docker images BEFORE class starts, or it can take a while on a shared connection
docker compose pull

# Run a Jupyter Notebook container in the background with all requirements.txt installed
docker compose up -d

# Tail the Jupyter logs to see the JupyterLab url to connect in your browser
docker logs jupyter -f --tail 100

To shut down docker, be in this folder and type:

docker compose down

docker compose vs docker-compose

You say potato, I say patato... the docker compose command changed in recent versions :)

NOTE: older versions of docker may use the command docker-compose rather than the two word command docker compose.

VSCode Setup

To edit code in VSCode you may want a local Anaconda Python environment with the class's PyPi libraries installed. This will enable VSCode to parse the code, understand APIs and highlight errors.

Note: if you do not use Anaconda, consider using it :) You can use a Python 3 venv in the same way as conda.

Class Anaconda Environment

Create a new Anaconda environment:

conda create -n graphml python=3.10.11 -y

Activate the environment:

conda activate graphml

Install the project's libraries:

poetry install

VSCode Interpretter

You can use a Python environment in VSCode by typing:

SHIFT-CMD-P

to bring up a command search window. Now type Python or Interpreter or if you see it, select Python: Select Interpreter. Now choose the path to your conda environment. It will include the name of the environment, such as:

Python 3.10.11 ('graphml') /opt/anaconda3/envs/graphml/bin/python

Note: the Python version is set to 3.10.11 because Jupyter Stacks have not been updated more recently.

Knowledge Graph Construction in PySpark

We build a knowledge graph from the Stack Exchange Archive for the network motif section of the course.

Docker Exec Commands

To run a bash shell in the Jupyter container, type:

docker exec -it jupyter bash

Once you're there, you can run the following commands to download and prepare the data for the course.

First, download the data:

graphml_class/stats/download.py stats.meta

Then you will need to convert the data from XML to Parquet:

spark-submit --packages "com.databricks:spark-xml_2.12:0.18.0" graphml_class/stats/xml_to_parquet.py

The course covers knowledge graph construction in PySpark in graphml_class.stats.graph.py.

spark-submit graphml_class/stats/graph.py

Network Motifs with GraphFrames

This course now covers network motifs in property graphs (frequent patterns of structure) using pyspark / GraphFrames (see motif.py, no notebook yet). It supports directed motifs, not undirected. All the 4-node motifs are outlined below. Note that GraphFrames can also filter the paths returned by its f.find() method using any Spark DataFrame filter - enabling temporal and complex property graph motifs.

All 4-node directed network motifs