-
Notifications
You must be signed in to change notification settings - Fork 2
Docker
This page gives an overview of the container infrastructure provided by the Analytics Hub via Docker (a specific type of container).
Docker is a containerization technology that bundles up operating systems and other software that can run on a variety of machines. For example, you can bundle up the Ubuntu operating system with Python, Jupyter, and other python libraries that can run an identical environment on a local computer or in the cloud.
Docker images are templates that specify the instructions needed to build a container, which is the actual environment that has been created that is running on a computer. This is analogous to the distinction between a class definition and an object (an instance of the class) in object-oriented programming.
We use Docker to get the same computational environment running in multiple places. This is useful to avoid the "it works on my computer" issues that arise when collaborating with others, and also useful even when working alone to ensure that you can use the same set of packages locally or in the cloud if you need to scale up to a larger compute infrastructure.
For installation instructions see: https://docs.docker.com/install/
For a high level overview and "getting started" type of material see: https://docs.docker.com/get-started/
Earth Lab maintains RStudio Docker images based on the excellent Rocker project.
For example, to launch our r-spatial-aws
image that has RStudio, a bunch of spatial packages, and the AWS command line interface, follow the instructions on this page: https://hub.docker.com/r/earthlab/r-spatial-aws
For example, if you are connected to an EC2 instance via SSH, run the following command from your terminal:
docker run -e PASSWORD=<<insert your password here>> -d -p 8787:8787 earthlab/r-spatial-aws
Then copy your EC2 instance's public DNS address and append :8787
to the end in your web browser, e.g., navigating to the page ec2-34-217-71-152.us-west-2.compute.amazonaws.com:8787
, replacing ec2-34-217-71-152.us-west-2.compute.amazonaws.com
with your instance's address.
To run a Jupyter notebook server with conda and a bunch of spatial packages, you can check out the earth-analytics-python-env
Docker image: https://cloud.docker.com/u/earthlab/repository/docker/earthlab/earth-analytics-python-env
You can find out how to setup secrets to authenticate with docker here: https://medium.com/platformer-blog/lets-publish-a-docker-image-to-docker-hub-using-a-github-action-f0b17e5cceb3
and here is how to push an image to Docker Hub from the Main Branch: https://github.com/earthlab/r-python-eds-lessons-env/blob/main/.github/workflows/build-push-image.yml
The way it is setup:
- The secrets are in our repo vs the organization. Secrets are the login credentials for pushing to the repo
- Right now you can see two actions. One is build-push-image.yml - that pushes to docker hub anytime someone commits to the MAIN branch. The build-image.yml file just tests the image so if someone opens a pull request you can see if the image will break or not.
- Best practice would be an action that pushes when you create a tagged release. This will really make it easy to tag only when you want to push a new image.
Research Computing holds trainings, Docker, and singularity/apptainer. Find them Here