BUG Input files perceived as NULL while they exist (checked multiple times) #405

ulyssebaruchel · 2024-10-16T01:38:32Z

Hi, I have been trying to run zUMIs on a HPC (both through a sbatch job and on an interactive node), but did not succeed. It seems like it does not see my input files as the Smartseq3.zUMIs_YAMLerror.log says:

WARNING: ignoring environment value of R_HOME
$file1
NULL

$file2
NULL

$file3
NULL

$file4
NULL

[1] ""
[1] ""
[1] ""
[1] ""
[1] "" "" "" ""
[1] ""
[1] ""
[1] ""
[1] ""
[1] "" "" "" ""
[1] "NULL" "NULL" "NULL" "NULL"
$file1
NULL

$file2
NULL

$file3
NULL

$file4
NULL

$file1
NULL

$file2
NULL

$file3
NULL

$file4
NULL

[1] 0

This is my YAML file:

project: Smartseq3
sequence_files:
file1:
name: /home/ubaruchel/smart-seq3/data/240814/exp1/1a_cutadapt/Undetermined_S0_L001_trim_R1.fastq.gz
base_definition:
- cDNA(24-75)
- UMI(12-20)
find_pattern: ATTGCGCAATG
file2:
name: /home/ubaruchel/smart-seq3/data/240814/exp1/1a_cutadapt/Undetermined_S0_L001_trim_R2.fastq.gz
base_definition:
- cDNA(1-75)
file3:
name: /home/ubaruchel/smart-seq3/data/240814/exp1/1c_filter_index_reads/filtered_I1.fastq.gz
base_definition:
- BC(1-10)
file4:
name: /home/ubaruchel/smart-seq3/data/240814/exp1/1c_filter_index_reads/filtered_I2.fastq.gz
base_definition:
- BC(1-10)
reference:
STAR_index: /data/scratch/DBC/UBCN/CANCDYN/genomes/homo-sapiens/hg38-ercc/star
GTF_file: /data/scratch/DBC/UBCN/CANCDYN/genomes/homo-sapiens/hg38-ercc/gtf/combined_hg38_ercc.gtf
out_dir: /home/ubaruchel/smart-seq3/data/240814/exp1/2b_zUMIs
num_threads: 24
mem_limit: 50
filter_cutoffs:
BC_filter:
num_bases: 3
phred: 20
UMI_filter:
num_bases: 2
phred: 20
barcodes:
barcode_num: ~
barcode_file: /home/ubaruchel/smart-seq3/data/240814/exp1/0c_prep_well_barcodes/expected_well_barcodes.txt
automatic: no
BarcodeBinning: 1
nReadsperCell: 100
demultiplex: no
counting_opts:
introns: yes
downsampling: '0'
strand: 0
Ham_Dist: 1
write_ham: no
velocyto: no
primaryHit: yes
twoPass: no
make_stats: yes
which_Stage: Filtering
zUMIs_directory: /data/scratch/DBC/UBCN/CANCDYN/software/zUMIs

samtools_exec: samtools
pigz_exec: pigz
STAR_exec: STAR
Rscript_exec: Rscript

I ran this command through a .sh file that is called through a sbatch script (SLURM):

#!/bin/bash

Always add these two commands to your scripts when using a environment

eval "$(conda shell.bash hook)"
source $CONDA_PREFIX/etc/profile.d/mamba.sh

Source the parameters file

source ./params_bioinfo_experiments/0_params.sh

Set variables

input_dir=$input_dir_2b
output_dir=$output_dir_2b
log_dir=$log_dir_2b

Create the output and log directories if they don't exist

mkdir -p "$output_dir"
mkdir -p "$log_dir"

Run zUMIs using its own miniconda environment (-c)

and the prepared YAML file (input_dir)

$path_zUMIs/zUMIs.sh -c -y $input_dir

I do not know what the problem is. My hypothesis is that maybe the micoconda environment makes it not see in the input files (that do exist and are not empty as verified by the du -sh command). But at the same time it seems to be able to detect a slight discrepancy in the STAR versions used for my index (which means it does not see it as NULL) vs the one used by zUMIs.

Can you help me, please?

I have also tried to make my own mamba (conda) environment to run zUMIs following the vignette https://github.com/sdparekh/zUMIs/wiki/Installation#dependencies but I have not been able to complete the last part of the dependencies installation: devtools::install_github('VPetukhov/ggrastr') (some issues with Cairo)... And Docker is not accepted by HPCs (for security reasons)... Is there anyway you could make it into a Singularity file, please? This would make it much easier to deploy and in particular into pipelines (Nextflow / Snakemake)...

Thank you very much,

Best wishes,

Ulysse

ulyssebaruchel · 2024-10-30T00:44:30Z

@sdparekh I have noticed the YAMLerror.log has the same error even going back to a few years. Do you know how I may solve this issue, please? Thank you

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG Input files perceived as NULL while they exist (checked multiple times) #405

BUG Input files perceived as NULL while they exist (checked multiple times) #405

ulyssebaruchel commented Oct 16, 2024

ulyssebaruchel commented Oct 30, 2024

BUG Input files perceived as NULL while they exist (checked multiple times) #405

BUG Input files perceived as NULL while they exist (checked multiple times) #405

Comments

ulyssebaruchel commented Oct 16, 2024

Always add these two commands to your scripts when using a environment

Source the parameters file

Set variables

Create the output and log directories if they don't exist

Run zUMIs using its own miniconda environment (-c)

and the prepared YAML file (input_dir)

$path_zUMIs/zUMIs.sh -c -y $input_dir

$path_zUMIs/zUMIs.sh -c -y $input_dir

ulyssebaruchel commented Oct 30, 2024