Skip to content

Lazy Launcher

RicercarG edited this page Sep 19, 2024 · 12 revisions

If you are not interested in how HPC operates, and just want to set up a python environment to run your code, then use the following steps to get started.

Interactive Sessions

Interactive session allows you to run your python code in the terminal just like you do on your local machine. This is ideal for testing your code and debugging. However the session will be terminated if you close the terminal or vscode.

First Time Setup

Step1: Log in to greene in terminal/vscode

Remember to connect to NYU VPN/Wifi.

ssh <netid>@greene.hpc.nyu.edu

Replace <netid> with your own.

Step2: Change to your scratch directory

You should always save your data in scratch directory.

cd /scratch/<netid>

Step3: Clone this cheatsheet repository and setup

git clone https://github.com/RicercarG/NYU-Greene-HPC-Cheatsheet.git

Change to the cheatsheet directory

cd NYU-Greene-HPC-Cheatsheet

Grant the permission to the run_setup.sh script

chmod +rx run_setup.sh

Run the script for setting up some essential commands to ~/.bashrc file, which will make your life easier.

./run_setup.sh

Step4: Request an interactive CPU/GPU session

It's always a good practice to request for a CPU/GPU node before running any code, check Best Practice on Greene for more.
Download the shell script I wrote for requesting CPU/GPU nodes. Run the script to request a CPU/GPU node. (fun fact: chs stands for cheatsheet. Typing the full name is too tiring.)

chsdevice.sh

or using the alias shortcut

cdv

Here chsdevice.sh and cdv command calls the same script. Note that you can call this script wherever directory you are in.

What runtime configuration shall I use?

  • CPU number: In most cases, having 1 or 2 is sufficient.
  • GPU number: Should be based on your project. If you don't know GPU parallel computing, then require for 1, or 0 for no GPU.
  • GPU Type: A100 40GB is the fastest but you have to wait for a long time to get allocated; V100 32GB is in the middle, RTX8000 48GB is the slowest, but easy to get access.
  • Memory (GB): This is the memory for CPUs. 64 works in most cases.
  • Time (hours): This is the maximum time you can use the CPU/GPU node. I recommend 4 or 6 hours.

Step5: Setup the singularity environment with conda

You can consider singularity as a container that wraps up all small programs in python libraries into one large file. In this way, you won't be bothered with errors cause by exceeding quota of file number.
Good news is that you don't have to setup singularity with conda installed from scratch any more.
Download the shell script for setting and launching singularity.

Run the script to setup singularity and conda.

chslauncher.sh

or using the alias shortcut

clc

Here chslauncher.sh and clc command calls the same script. Note that you can call this script wherever directory you are in. For convenience, I will use clc in the following steps.

What do these prompted options mean during installation?

  • Name Your Singularity Folder: Since you can have multiple singularity environments, you should give a unique name to your singularity folder. You will use this name to activate your singularity environment. It's a good practice to set up a new singularity environment for each project.
  • cuda version: This should be based on your project. If not specified, cuda 11.8 works for most cases.
  • Size of overlay: This decides how large and how many python libraries you can install. For LLM or Diffusers projects, I empirically recommand overlay-50G-10M.
  • Open on demand jupyter notebook?: The way of running jupyter notebook on hpc is using open on demand. Some operations need to be done to enable your conda environment to be recognized by notebook. Type 'y' to let the script do the work for you.

Step6: Activate the singularity and conda environment

Run the sample script again, and type in your singularity folder name that you created in the previous step.

clc

What's the difference between Read and Write mode?

  • Read and Write: You can add files into the singularity. This is useful when you are setting up your conda environment. However, one singularity overlay can only be written by one process at a time.
  • Read only: You can only read the files in the singularity environment. This is useful when you want to use a pre-built singularity environment.

If you see your terminal prompt changes to singularity:~$, then you are successfully activated the singularity environment.

Step7: Double check the conda environment, and start coding

When this script is finished, you will see a new folder in your scratch directory named as your singularity folder name. That's where everything used for running python is stored.

Now you can activate conda by typing

source /ext3/env.sh

or using the alias shortcut

se

And then, Check your conda path by

which conda
  • If you see /ext3/miniconda3/bin/conda, then you are good to go.

  • If you get message like Illegal option -- Usage: /usr/bin/which [-a] args, Don' panic, run unset -f which, then type which conda again.

  • If you can't see anything after typing which conda, check this part of troubleshooting for help.

You can also check python and pip using which python and which pip. Their path should be /ext3/miniconda3/bin/python and /ext3/miniconda3/bin/pip respectively.

Now you are all set. Install your python libaries, and run python using python file.py, just like you do in the terminal on your local machine.
Note that vscode python debugger won't work in HPC, so you have to test the code in vscode integrated terminal.

If you want to quit the singularity, or meet any other problems, check the trouble shooting guide.

Afterwards

The next time you login to HPC after setting up, all you need to do are:

[1] Change to your scratch directory:

cd /scratch/<netid>

[2] Request a CPU/GPU node:

cdv or chsdevice.sh

[3] Activate/Create the singularity environment:

clc or chslauncher.sh

[4] Activate conda inside singularity (and also activate/create your conda environment if necessary):

se or source /ext3/env.sh

Then you can start testing your python scripts. Just that easy.

Tips: If you are new to using conda on linux, you can google or prompt chatgpt with How to create conda environment on linux? for help.

Open On Demand Jupyter Notebook

Step0: Make your singularity environment compatible with jupyter notebook

You need to create a python environment that can be recognized by the notebook. Everything is the same as setting up/opening singularity environment for interactive sessions.

The only thing you should notice that type 'y' if prompted Do you want to use this python environment in open on demand jupyter notebook? when setting up a new singularity, or select setup this environment for jupyter notebook in OOD when launching existing singularity environment.

This should only be done once for each singularity.

Step1: Install python packages you need for your notebook

With the singularity environment activated, activate conda using

se or source /ext3/env.sh Then activate conda base environment before installing packages

conda activate base

You might find this redundant. However, empirically, if you don't, your package will not be recognized in jupyter notebook. (I'm not sure why this happens).

Also never use !pip install in jupyter notebook, as it will install packages in your home directory rather than in singularity, which will exceed your quota.

Step2: Go to Open On Demand in your browser

To run jupyter notebook on HPC, you have to use open on demand, and start an interactive juptyer notebook session.

The official ood guide has a nice illustration on how to use the gui.

You don't need to be bothered with steps for setting up singularity and conda environment in the offical guide, since we've already done that in Step0.

SLURM Batch Jobs

SLURM is a job scheduler that allows you to run your code in the background, and will keep running even if you close the terminal or vscode. This is ideal for running large-scale experiments that take a long time to finish. However the session will be terminated if your code has bugs, so make sure to test your code in interactive sessions first.

Step0: Have Your Singularity Environment Setup

Same as all above, you should have your singularity environment ready before submitting a batch job.

Step1: Download SLURM script template

I can't help you setup everything automatically, since this time you are actually writing a automation script yourself. I did write a template for you to start with.

wget https://raw.githubusercontent.com/RicercarG/NYU-Greene-HPC-Cheatsheet/main/sbatch_template.slurm

Download this to whatever folder that is convenient for you, and rename it as you wish.

Step2: Open the template and modify it

Open the template with your favorate text editor, and modify the following lines:

[1] Replace YourEXT3PATH.ext3 with the path to your singularity overlay.

[2] Replace YourSIFPATH.sif with the path to your singularity image.

These two paths will be printed when you activate singulairty environment using launcher script of this cheatsheet: clc or chslauncher.sh

After entering your singularity folder name, you will see paths to your overlay and singularity printed in green color. Paste them into the template accordingly.

[3] Replace REPLACE THIS WITH YOUR COMMANDS with your actual python commands. All commands should be the same as you type in terminal in an interactive session.

For example, if you want to run mycode.py, then replace it with python mycode.py.

Step3: Submit the job, and wait for it to finish

To submit the job, run the command like below:

sbatch sbatch_template.slurm

After you submit the job, you can check the status of your job by

squeue -u $USER

Anything printed in your code will be written to slurm-******.out