Skip to content

Commit

Permalink
Add script to run analysis per chromosome on HPC. (#6)
Browse files Browse the repository at this point in the history
* Add script to run analysis per chromosome on HPC.

* Describe HPC analysis in pipeline.R and refactor code.
  • Loading branch information
whimsial authored Mar 3, 2023
1 parent e95ea19 commit 8fd37aa
Show file tree
Hide file tree
Showing 3 changed files with 63 additions and 2 deletions.
56 changes: 56 additions & 0 deletions genoscores/analysis.hpc.cli.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
#!/usr/bin/env bash

#' This script is used to run GENOSCORES on a single chromosome in a HPC
#' environment with qsub
## ============================================================================

## Read the path for the directory where the results are to be stored. Typically
## it is on scratch space on HPC
outputdir=$1

## Read the base name (with full path) of the input file (excluding ID).
## Example: if files are called /home/user/cohort_chr<1-22>.bim|bed|fam,
## set basename="/home/user/cohort_chr"
basename=$2

## Read path to the GENOSCORES analysis R script. Here you can use
## example.analysis.R
## Example: gscript="/home/user/example.analysis.R"
gscript=$3

echo "GENOSCORES analysis script: $gscript will be executed."

## Read chromosome number from command line
chr=$4

## Function to process one chromosome at a time.
## If you target file has suffixes after chromosome ID, for example:
## 'cohort_chr21_filtered', they need to be manually added to inputfile:
## inputfile="${basename}/cohort_chr${chr}_filtered"
processchrom() {
inputfile="${basename}/${chr}"
chromdir="${outputdir}/${chr}"

if [[ ! -d "${chromdir}" ]]; then
mkdir -p "${chromdir}"
fi

logfile="${chromdir}"/log.Rout

echo "Processing chromosome: ${chr}"
echo "Input file: ${inputfile}"
echo "Saving results to: ${chromdir} ..."

## Run main GENOSCORES script with CLI options:
## chromdir: subdirectory in the output directory to store results from
## the analysis for the current chromosome.
## inputfile: PLINK file with the current chromosome.
## logfile: file to store the logs from GENOSCORES.
Rscript --no-save --no-restore --verbose "$(realpath "${gscript}")" \
"${chromdir}" "${inputfile}" 2> "${logfile}" > /dev/null
echo "Done."
}

processchrom

echo "All analyses have been executed."
3 changes: 2 additions & 1 deletion genoscores/readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,8 @@ We compute the regional scores in the specified target cohort using weights from
- `gscript` -- path to the GENOSCORES script.
- `N` -- how many chromosomes do you want to run in parallel? This depends on the size of the target cohort and available computational resources.
- `inputfile` -- modify accordingly to the full name of your target cohort files. If the names have any suffixes after chromosome ID, these need to be added.

3. [analysis.hpc.cli.sh](analysis.hpc.cli.sh) is a BASH script which executes [example.analysis.R](example.analysis.R) for a single specified chromosome. This script takes one additional CLI argument `chr` which denotes the chromosome number for which analysis is to be run. This script is useful for high performance computing (HPC) type of analysis where each chromsome can be submitted to a queue within a job array created by the job script.

## General notes

1. If multiple analyses are to be run (eQTL, pQTL, etc), then it may be helpful to either:
Expand Down
6 changes: 5 additions & 1 deletion pipeline.R
Original file line number Diff line number Diff line change
Expand Up @@ -98,7 +98,11 @@ basename <- file.path(...)
gscript <- file.path("genoscores", "example.analysis.R")

cmd <- sprintf("bash genoscores/analysis.cli.sh %s %s %s",
output.dir, basename, gscript)
output.dir, basename, gscript)
## Alternatively, we could run analysis.hpc.cli.sh specifying chromosome number
## as additional parmeter. This is useful when submitting job arrays on HPC
## cluster, where each job ID corresponds to a single chromosome analysis.

system(cmd)

##------------------------------------------------------------------------------
Expand Down

0 comments on commit 8fd37aa

Please sign in to comment.