Skip to content

Scripts to import datasets into hipscat format and perform cross matching using LSDB with Slurm in the LIneA environment.

Notifications You must be signed in to change notification settings

linea-it/slurm_lsdb

Repository files navigation

lsdb on Slurm

The idea of this repository is to store scripts to run lsdb/import-hipscat with Slurm in the LIneA environment.

Setup Environment

  1. Clone the repository and access the directory:
git clone https://github.com/linea-it/slurm_lsdb.git  
cd slurm_lsdb
  1. Create environment (using Conda):
conda create -n hipscat python=3.10
conda activate hipscat
pip install --no-cache-dir hipscat-import
pip install --no-cache-dir ray
ulimit -s 50000
  1. Create configuration file (example used to import DP0 into the LIneA environment - dp0.yml):
cat dp0.yml
# Hipscat config to DP0
# The idea of yaml is to provide configuration parameters for hipscat.
# Each key must be a valid argument to the ImportArguments class.
# https://hipscat-import.readthedocs.io/en/latest/autoapi/hipscat_import/catalog/arguments/index.html#hipscat_import.catalog.arguments.ImportArguments

id_column: objectId
ra_column: ra
dec_column: dec
input_path: /lustre/t0/scratch/users/singulani/hipscat/inputs_dp0
input_format: parquet
output_catalog_name: DP0
output_path: /lustre/t0/scratch/users/singulani/hipscat_gen/cats
dask_tmp: /lustre/t0/scratch/users/singulani/hipscat_gen/tmp
dask_n_workers: 5
overwrite: true
resume: true

Run using only one node

sbatch submit.sbatch dp0.yml

Run using Ray/Dask Cluster (with Slurm)

sbatch submit-ray-cluster.sbatch dp0.yml

About

Scripts to import datasets into hipscat format and perform cross matching using LSDB with Slurm in the LIneA environment.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published