[FEATURE] Add Jupyter Notebook container #488

jeremyprime · 2022-08-24T18:32:38Z

Describe the solution you'd like

With the eventual inclusion of Jupyter Notebook examples (see #436, #478), we should provide a Jupyter Notebook container as part of our dev environment in order to run the examples.

Additional context

The difficulty will be configuring Jupyter to use our Spark container as the master instead of a local master, and ensuring all of the containers can communicate.

We may want to add the Jupyter container under a Docker Compose profile instead of always deploying it (since it is a large image).

Aryex · 2022-08-26T19:55:37Z

Got a hardcoded pyspark jupyter container working with our cluster.
This was done by adding to our docker-compose.yml a jupyter container.
Additionally, we will need to synced the Spark and Python version between our cluster and the jupyter environment. At the time of this writing, the Spark version was already synced since we are using latest Spark for both container. For python, I had to sync the cluster's python version by adding to the client Dockerfile

RUN . /opt/bitnami/scripts/libcomponent.sh && component_unpack "python" "3.10.5-156" --checksum 0756ba4f37dc82759e718c524c543e444224b367a84da33e975553e72b64b143

which is then used as the build for our spark cluster

From here, I was able to connect a pyspark notebook to our cluster with

SparkSession.builder.master("spark://spark:7077")

You can verify that a connection was made when looking at the cluster's web UI.

Aryex · 2022-08-26T20:03:21Z

Future challenge would be to make this not hardcoded, in particular the Spark and Python versions.

Syncing Spark: This is probably easy since both bitnami and jupyter images provide tags by Spark version. However, jupyter Spark version only goes down to 3.1.1
Syncing Python: This is harder since bitnami does not tag by python. We may have to override both bitnami and jupyter's python install to ensure that they are synced to our specified version.

jeremyprime added enhancement New feature or request docker Management of the developer environment labels Aug 24, 2022

jeremyprime added the High Priority label Sep 27, 2022

jeremyprime added this to the 3.3.4 milestone Sep 27, 2022

jeremyprime self-assigned this Oct 6, 2022

jeremyprime mentioned this issue Oct 6, 2022

Add a Jupyter Notebook container #508

Merged

jeremyprime closed this as completed in #508 Oct 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Add Jupyter Notebook container #488

[FEATURE] Add Jupyter Notebook container #488

jeremyprime commented Aug 24, 2022

Aryex commented Aug 26, 2022

Aryex commented Aug 26, 2022

[FEATURE] Add Jupyter Notebook container #488

[FEATURE] Add Jupyter Notebook container #488

Comments

jeremyprime commented Aug 24, 2022

Describe the solution you'd like

Additional context

Aryex commented Aug 26, 2022

Aryex commented Aug 26, 2022