You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
With the eventual inclusion of Jupyter Notebook examples (see #436, #478), we should provide a Jupyter Notebook container as part of our dev environment in order to run the examples.
Additional context
The difficulty will be configuring Jupyter to use our Spark container as the master instead of a local master, and ensuring all of the containers can communicate.
We may want to add the Jupyter container under a Docker Compose profile instead of always deploying it (since it is a large image).
The text was updated successfully, but these errors were encountered:
Got a hardcoded pyspark jupyter container working with our cluster.
This was done by adding to our docker-compose.yml a jupyter container.
Additionally, we will need to synced the Spark and Python version between our cluster and the jupyter environment. At the time of this writing, the Spark version was already synced since we are using latest Spark for both container. For python, I had to sync the cluster's python version by adding to the client Dockerfile
RUN . /opt/bitnami/scripts/libcomponent.sh && component_unpack "python" "3.10.5-156" --checksum 0756ba4f37dc82759e718c524c543e444224b367a84da33e975553e72b64b143
which is then used as the build for our spark cluster
From here, I was able to connect a pyspark notebook to our cluster with
SparkSession.builder.master("spark://spark:7077")
You can verify that a connection was made when looking at the cluster's web UI.
Future challenge would be to make this not hardcoded, in particular the Spark and Python versions.
Syncing Spark: This is probably easy since both bitnami and jupyter images provide tags by Spark version. However, jupyter Spark version only goes down to 3.1.1
Syncing Python: This is harder since bitnami does not tag by python. We may have to override both bitnami and jupyter's python install to ensure that they are synced to our specified version.
Describe the solution you'd like
With the eventual inclusion of Jupyter Notebook examples (see #436, #478), we should provide a Jupyter Notebook container as part of our dev environment in order to run the examples.
Additional context
The difficulty will be configuring Jupyter to use our Spark container as the master instead of a local master, and ensuring all of the containers can communicate.
We may want to add the Jupyter container under a Docker Compose profile instead of always deploying it (since it is a large image).
The text was updated successfully, but these errors were encountered: