Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[IA-4534] Leaner terra base docker image #463

Closed
wants to merge 8 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -17,4 +17,5 @@ package-lock.json
.python_history
.keras/
.ammonite/
.metals/
.metals/
.venv/
1 change: 1 addition & 0 deletions terra-base/.python-version
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
3.10
178 changes: 178 additions & 0 deletions terra-base/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,178 @@
# Smallest image with ubuntu jammy, CUDA and NVDIA drivers installed - 80 mb
FROM --platform=linux/amd64 nvidia/cuda:12.2.0-base-ubuntu22.04

#######################
# Environment Variables
#######################
ENV DEBIAN_FRONTEND noninteractive
ENV LC_ALL en_US.UTF-8
# Ensure this matches c.NotebookApp.port in jupyter_notebook_config.py
ENV JUPYTER_PORT 8000
ENV JUPYTER_HOME /etc/jupyter
ENV JUPYTER_KERNELSPEC_DIR /usr/local/share/jupyter
# We need node >18 for jupyter to work
ENV NODE_MAJOR 20

#######################
# Users Setup
#######################
# Create the jupyter user, add it to the `users` group and specify the home directory path
ENV USER jupyter
LizBaldo marked this conversation as resolved.
Show resolved Hide resolved
ENV USER_HOME /home/$USER
RUN useradd -m -s /bin/bash -d $USER_HOME -N -g users $USER
# Create the welder user
# The welder uid is consistent with the Welder docker definition here:
# https://github.com/DataBiosphere/welder/blob/master/project/Settings.scala
# Adding welder-user to the Jupyter container isn't strictly required, but it makes welder-added
# files display nicer when viewed in a terminal.
ENV WELDER_USER welder-user
ENV WELDER_UID 1001
RUN useradd -m -s /bin/bash -N -u $WELDER_UID $WELDER_USER

# We want to grant the jupyter user limited sudo permissions
# without password so they can install the necessary packages that they
# want to use on the docker container
RUN mkdir -p /etc/sudoers.d \
&& echo "$USER ALL=(ALL) NOPASSWD: /usr/bin/apt-get install *, /opt/conda/bin/conda install *, /opt/poetry/bin/poetry install" > /etc/sudoers.d/$USER \
&& chmod 0440 /etc/sudoers.d/$USER

#######################
# Prerequisites
#######################
RUN apt-get update && apt-get install -yq --no-install-recommends \
sudo \
ca-certificates \
curl \
jq \
# gnupg requirement
gnupg \
dirmngr \
# useful utilities for debugging within the docker
nano \
procps \
lsb-release \
# python requirements
checkinstall \
build-essential \
zlib1g-dev \
# pip requirements
libssl-dev \
libbz2-dev \
libreadline-dev \
libsqlite3-dev \
libexempi8 \
libnode-dev \
llvm \
libncurses5-dev \
libncursesw5-dev \
tk-dev \
libffi-dev \
liblzma-dev \
python3-openssl \
# install script requirements
locales \
# for ssh-agent and ssh-add
keychain \
# extras \
wget \
bzip2 \
# git
git \
# Uncomment en_US.UTF-8 for inclusion in generation
&& sed -i 's/^# *\(en_US.UTF-8\)/\1/' /etc/locale.gen \
# Generate locale
&& locale-gen \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*

RUN curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key --keyring /usr/share/keyrings/cloud.google.gpg add -
RUN curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -

# Install Node >18 (needed for jupyterlab)
RUN apt-get update && apt-get install -yq --no-install-recommends
RUN mkdir -p /etc/apt/keyrings
RUN curl -fsSL https://deb.nodesource.com/gpgkey/nodesource-repo.gpg.key | gpg --dearmor -o /etc/apt/keyrings/nodesource.gpg

RUN echo "deb [signed-by=/etc/apt/keyrings/nodesource.gpg] https://deb.nodesource.com/node_$NODE_MAJOR.x nodistro main" | tee /etc/apt/sources.list.d/nodesource.list
RUN dpkg --remove --force-remove-reinstreq libnode-dev
RUN apt-get update && apt-get install -f -yq nodejs

############
# Install R
############
RUN apt-get update && apt-get install -y r-base

################
# Install Python
################
# Install Python 3.10 and add to system python path
# Note that miniconda does come with it's own installation of python,
# but it is cleaner to do a proper system instalation here and set the
# path properly
RUN apt-get update && apt-get install -y python3.10 python3-pip
RUN update-alternatives --install /usr/bin/python python /usr/bin/python3.10 1 \
&& update-alternatives --set python /usr/bin/python3.10

############################
# Manage python dependencies
############################
## POETRY is the prefered tool to create virtual environments and
## manage dependencies - Should be used by Terra devs
# Install Poetry
ENV POETRY_HOME /opt/poetry
RUN curl -sSL https://install.python-poetry.org | POETRY_HOME=$POETRY_HOME python3
# Append '/home/jupyter/.local/bin' to PATH
# poetry docs: https://python-poetry.org/docs/
ENV PATH "${PATH}:${POETRY_HOME}/bin"

# Prevent poetry from creating a virtual environment (we want to install on the docker system python)
RUN poetry config virtualenvs.create false

# Install python dependencies with poetry
COPY poetry.lock .
COPY pyproject.toml .
RUN poetry install --no-cache --no-interaction

###################
# Install Miniconda
###################
## CONDA should not be used by terra devs, but is a widely used tools
## to manage python environments in a runtime and we should provide it to users
ENV CONDA_HOME /opt/conda
RUN curl -so $HOME/miniconda.sh https://repo.anaconda.com/miniconda/Miniconda3-py310_23.5.1-0-Linux-x86_64.sh \
&& chmod +x $HOME/miniconda.sh \
&& $HOME/miniconda.sh -b -p $CONDA_HOME \
&& rm $HOME/miniconda.sh
ENV PATH "${PATH}:${CONDA_HOME}/bin"

# #######################
# # Utilities
# #######################
COPY scripts $JUPYTER_HOME/scripts
COPY custom $JUPYTER_HOME/custom
COPY jupyter_notebook_config.py $JUPYTER_HOME
RUN chown -R $USER:users $JUPYTER_HOME

# copy workspace_cromwell.py script and make it runnable by all users
RUN curl -o /usr/local/bin/workspace_cromwell.py https://raw.githubusercontent.com/broadinstitute/cromwhelm/1ceedf89587cffd355f37401b179001f029f77ed/scripts/workspace_cromwell.py \
&& chmod +x /usr/local/bin/workspace_cromwell.py

RUN chown -R $USER:users $JUPYTER_KERNELSPEC_DIR \
&& find $JUPYTER_HOME/scripts -name '*.sh' -type f | xargs chmod +x \
# You can get kernel directory by running `jupyter kernelspec list`
&& $JUPYTER_HOME/scripts/kernel/kernelspec.sh $JUPYTER_HOME/scripts/kernel $JUPYTER_KERNELSPEC_DIR/kernels

# Make sure that the jupyter user will have access to the jupyter path in the working directory
EXPOSE $JUPYTER_PORT
WORKDIR $USER_HOME

# make pip install to a user directory, instead of a system directory which requires root.
# this is useful so `pip install` commands can be run in the context of a notebook.
ENV PIP_USER true
USER $USER

# Note: this entrypoint is provided for running Jupyter independently of Leonardo.
# When Leonardo deploys this image onto a cluster, the entrypoint is overwritten to enable
# additional setup inside the container before execution. Jupyter execution occurs when the
# init-actions.sh script uses 'docker exec' to call run-jupyter.sh.
ENTRYPOINT ["/usr/local/bin/jupyter-nbclassic"]
92 changes: 92 additions & 0 deletions terra-base/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
# terra-base image

This repo contains the terra-base image that is compatible with the
notebook service in [Terra]("https://app.terra.bio/") called Leonardo.
The goal is to provide a lean, GPU-enabled base image that can be use for both Terra-suported, and custome image builds.
[See design doc]("https://broadworkbench.atlassian.net/wiki/spaces/IA/pages/2842460165/2023-08-28+Terra+Docker+Refactoring+POC#Provide-a-leaner-set-of-terra-docker-images")

## Image contents

`terra-base` extends an cuda 12.2.0 and Ubuntu 22.04 base image with the minimum
requirements necessary to set up Jupyter and provide compatibility with Leonardo and GPU machines.

- OS prerequisites
- Python 3.10
- R 4.1.2
- Anaconda
- Jupyter
- Leonardo customizations/extensions
- Terra python client library ([FISS](https://github.com/broadinstitute/fiss))
- Terra notebook utils
- Full list of python packages is available [here](pyproject.toml)

To see the complete contents of this image please see the [Dockerfile](./Dockerfile).

## Contributing

### Installing python 3.10

All terra-docker images are on Python 3.10, so make sure to have it installed on your
local machine, for mac users doing this via `pyenv` is the recommended way of doing it:

```bash
$> brew install pyenv
$> pyenv install 3.10.0
```

### Installing poetry

All python package dependencies are managed by [poetry]("https://python-poetry.org/docs/"),
which allows for cross-platforms, deterministic python environments builds.

You can install poetry via brew or curl commands (NEVER USE PIP!):

```bash
# Via brew
$> brew install poetry
# Via curl
$> curl -sSL https://install.python-poetry.org | python3 -
```

### Managing python packages

**Attention mac m1 users!**

Make sure to do the following before installing any python packages:

```bash
$> brew install libomp
$> brew install llvm
```

For compilers to find libomp and llvm you may need to set either in the terminal directly
or in your bash/zsh profile file:

```bash
$> export LDFLAGS="-L/opt/homebrew/opt/libomp/lib -L/opt/homebrew/opt/llvm/lib"
$> export CPPFLAGS="-I/opt/homebrew/opt/libomp/include -I/opt/homebrew/opt/llvmp/include"
```

From inside the `terra-base` directory, run the following command to install
the python 3.10 environment with the relevant packages:

```bash
$> poetry install
```

If you need to update an existing python dependencies, then modify the `pyproject.toml`
file (**it is strongly recommended to never pin an exact version, but floor/ceiling it instead**)
and then update the environment by running the following:

```bash
$> poetry update
```

If you need to add a new dependency, **do not modify the pyproject.toml directly** and do this instead:

```bash
$> poetry add <INSERT_PACKEGE_NAME>
$> poetry update
```

The `poetry update` step will generate a new `poetry.lock` file that will be use in the docker image build.
32 changes: 32 additions & 0 deletions terra-base/custom/.eslintrc.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
module.exports = {
'env': {
'browser': true,
'es6': true,
},
'extends': [
'google',
],
'globals': {
'Atomics': 'readonly',
'SharedArrayBuffer': 'readonly',
},
'parserOptions': {
'ecmaVersion': 2018,
'sourceType': 'module',
},
'rules': {
'comma-dangle': ['error', 'never'],
//80 is the standard, but this required the least refactoring.
'max-len': ['error', { 'code': 150 }],
//TODO: remove the following rules and fix the offenses
'require-jsdoc': ['error', {
'require': {
'FunctionDeclaration': false,
'MethodDefinition': false,
'ClassDeclaration': false,
'ArrowFunctionExpression': false,
'FunctionExpression': false
}
}]
},
};
6 changes: 6 additions & 0 deletions terra-base/custom/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
#Before you merge, lint the files with eslint.

# `brew install eslint`
# `brew install npm`
# `npm i`
# `eslint --fix *.js`
Loading