Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG Input files perceived as NULL while they exist (checked multiple times) #405

Open
ulyssebaruchel opened this issue Oct 16, 2024 · 1 comment

Comments

@ulyssebaruchel
Copy link

Hi, I have been trying to run zUMIs on a HPC (both through a sbatch job and on an interactive node), but did not succeed. It seems like it does not see my input files as the Smartseq3.zUMIs_YAMLerror.log says:



WARNING: ignoring environment value of R_HOME
$file1
NULL

$file2
NULL

$file3
NULL

$file4
NULL

[1] ""
[1] ""
[1] ""
[1] ""
[1] "" "" "" ""
[1] ""
[1] ""
[1] ""
[1] ""
[1] "" "" "" ""
[1] "NULL" "NULL" "NULL" "NULL"
$file1
NULL

$file2
NULL

$file3
NULL

$file4
NULL

$file1
NULL

$file2
NULL

$file3
NULL

$file4
NULL

[1] 0



This is my YAML file:



project: Smartseq3
sequence_files:
file1:
name: /home/ubaruchel/smart-seq3/data/240814/exp1/1a_cutadapt/Undetermined_S0_L001_trim_R1.fastq.gz
base_definition:
- cDNA(24-75)
- UMI(12-20)
find_pattern: ATTGCGCAATG
file2:
name: /home/ubaruchel/smart-seq3/data/240814/exp1/1a_cutadapt/Undetermined_S0_L001_trim_R2.fastq.gz
base_definition:
- cDNA(1-75)
file3:
name: /home/ubaruchel/smart-seq3/data/240814/exp1/1c_filter_index_reads/filtered_I1.fastq.gz
base_definition:
- BC(1-10)
file4:
name: /home/ubaruchel/smart-seq3/data/240814/exp1/1c_filter_index_reads/filtered_I2.fastq.gz
base_definition:
- BC(1-10)
reference:
STAR_index: /data/scratch/DBC/UBCN/CANCDYN/genomes/homo-sapiens/hg38-ercc/star
GTF_file: /data/scratch/DBC/UBCN/CANCDYN/genomes/homo-sapiens/hg38-ercc/gtf/combined_hg38_ercc.gtf
out_dir: /home/ubaruchel/smart-seq3/data/240814/exp1/2b_zUMIs
num_threads: 24
mem_limit: 50
filter_cutoffs:
BC_filter:
num_bases: 3
phred: 20
UMI_filter:
num_bases: 2
phred: 20
barcodes:
barcode_num: ~
barcode_file: /home/ubaruchel/smart-seq3/data/240814/exp1/0c_prep_well_barcodes/expected_well_barcodes.txt
automatic: no
BarcodeBinning: 1
nReadsperCell: 100
demultiplex: no
counting_opts:
introns: yes
downsampling: '0'
strand: 0
Ham_Dist: 1
write_ham: no
velocyto: no
primaryHit: yes
twoPass: no
make_stats: yes
which_Stage: Filtering
zUMIs_directory: /data/scratch/DBC/UBCN/CANCDYN/software/zUMIs

samtools_exec: samtools
pigz_exec: pigz
STAR_exec: STAR
Rscript_exec: Rscript



I ran this command through a .sh file that is called through a sbatch script (SLURM):



#!/bin/bash

Always add these two commands to your scripts when using a environment

eval "$(conda shell.bash hook)"
source $CONDA_PREFIX/etc/profile.d/mamba.sh

Source the parameters file

source ./params_bioinfo_experiments/0_params.sh

Set variables

input_dir=$input_dir_2b
output_dir=$output_dir_2b
log_dir=$log_dir_2b

Create the output and log directories if they don't exist

mkdir -p "$output_dir"
mkdir -p "$log_dir"

Run zUMIs using its own miniconda environment (-c)

and the prepared YAML file (input_dir)

$path_zUMIs/zUMIs.sh -c -y $input_dir

$path_zUMIs/zUMIs.sh -c -y $input_dir


I do not know what the problem is. My hypothesis is that maybe the micoconda environment makes it not see in the input files (that do exist and are not empty as verified by the du -sh command). But at the same time it seems to be able to detect a slight discrepancy in the STAR versions used for my index (which means it does not see it as NULL) vs the one used by zUMIs.

Can you help me, please?

I have also tried to make my own mamba (conda) environment to run zUMIs following the vignette https://github.com/sdparekh/zUMIs/wiki/Installation#dependencies but I have not been able to complete the last part of the dependencies installation: devtools::install_github('VPetukhov/ggrastr') (some issues with Cairo)... And Docker is not accepted by HPCs (for security reasons)... Is there anyway you could make it into a Singularity file, please? This would make it much easier to deploy and in particular into pipelines (Nextflow / Snakemake)...

Thank you very much,

Best wishes,

Ulysse

@ulyssebaruchel
Copy link
Author

@sdparekh I have noticed the YAMLerror.log has the same error even going back to a few years. Do you know how I may solve this issue, please? Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant