Some Samples Encounter ForkProcess Empty Issues with No Output Using run_pangenome_aware_deepvariant #926

EEEdyeah · 2025-01-19T02:50:33Z

Have you checked the FAQ? https://github.com/google/deepvariant/blob/r1.8/docs/FAQ.md:
yes
Describe the issue:
(A clear and concise description of what the issue is.)
I was running deepvariant_pangenome_aware_deepvariant on vg Giraffe-mapped BAM files. However, part of the sample encountered a Process ForkProcess issue. It didn’t throw an error, didn’t terminate properly, and produced no output files.
Setup

Operating system: slurm
DeepVariant version: 1.8.0
Installation method (Docker, built from source, etc.): singularity pull
Type of data: (sequencing instrument, reference genome, anything special that is unlike the case studies?)
Illumina human 30x WGS, vg Giraffe-mapped HPRC
Steps to reproduce:
Command:
singularity exec -B /path/:/path/ /path/deepvariant_pangenome_aware_deepvariant-1.8.0.sif /opt/deepvariant/bin/run_pangenome_aware_deepvariant
--model_type=WGS
--ref=/path/HPRC.GRCh38.reordered.fa
--reads=/path/$sample_name.surject.GRCh38.sorted.dedup.lefted.realigned.bam
--num_shards=4
--sample_name_reads=$sample_name
--output_vcf /path/$sample_name.deepvariant.vcf.gz
--output_gvcf /path/$sample_name.deepvariant.gvcf.gz
--pangenome /path/HPRC_graph.gbz
--sample_name_pangenome HPRC
--regions chr6:28000000-35000000
--disable_small_model
--intermediate_results_dir /path/dpvariant
Error trace: (if applicable)
The logs indicate the program was running normally until encountering the following issues:
2025-01-18 22:43:10.537301: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1';
2025-01-18 22:43:10.537341: W tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:265] failed call to cuInit: UNKNOWN ERROR (303)
I0118 22:43:10.537735 47448200671232 call_variants.py:918] call_variants: env = {'BASH_FUNC_module()': '() { eval /usr/bin/modulecmd bash $*\n}', 'SH
I0118 22:43:10.659484 47448200671232 call_variants.py:785] Total 1 writing processes started.
I0118 22:43:10.661774 47448200671232 call_variants.py:796] Use saved model: True
I0118 22:43:10.665955 47448200671232 dv_utils.py:325] From /path/dpvariant/make_examples_pangenome_aware_dv.t
I0118 22:43:21.476414 47448200671232 dv_utils.py:325] From /opt/models/pangenome_aware_deepvariant/wgs/example_info.json: Shape of input examples: [200,
I0118 22:43:21.476675 47448200671232 call_variants.py:814] example_shape: [200, 221, 7]
Process ForkProcess-1:
Traceback (most recent call last):
File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/tmp/Bazel.runfiles_yqt9b630/runfiles/com_google_deepvariant/deepvariant/call_variants.py", line 551, in post_processing
item = output_queue.get(timeout=180)
File "/usr/lib/python3.10/multiprocessing/queues.py", line 114, in get
raise Empty
_queue.Empty
I0118 22:46:46.215257 47448200671232 call_variants.py:891] Predicted 1024 examples in 1 batches [19.962 sec per 100].
I0118 23:42:47.613373 47448200671232 call_variants.py:967] Complete: call_variants.

Does the quick start test work on your system?
Yes, the quick start test works, and most of the samples finish normally.

Any additional context:
Initially, I thought the issue was caused by the small model, so I added the --disable_small_model parameter. While this allowed some samples to run successfully, the same issue persists for other samples.

The text was updated successfully, but these errors were encountered:

kishwarshafin · 2025-01-19T03:19:55Z

Hi @EEEdyeah , can you please run it on entire chr6 to see if the issue persists?

EEEdyeah · 2025-01-19T22:56:43Z

@kishwarshafin Hi, I will try and it's still running. In the meantime, I found that when I reran the same code (chr6:28000000-35000000), part of the previously failed sample ran successfully. This suggests that the same code can produce different results, which makes me question the stability of the previously successful runs?

kishwarshafin · 2025-01-19T23:15:46Z

@EEEdyeah are you running on a system that pauses the processes? It seems like in your run, call variants was paused and the queue did not receive anything for 180 secs which is why it got killed. Can you try by setting num cpus to 0 from the command line and see if it still gets killed.

EEEdyeah · 2025-01-21T08:03:43Z

@kishwarshafin Sorry for the late reply. I’m not entirely sure what caused the issue, but I think I’ve found a solution. Running each job on a separate node seems to prevent the error from occurring.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some Samples Encounter ForkProcess Empty Issues with No Output Using run_pangenome_aware_deepvariant #926

Some Samples Encounter ForkProcess Empty Issues with No Output Using run_pangenome_aware_deepvariant #926

EEEdyeah commented Jan 19, 2025

kishwarshafin commented Jan 19, 2025

EEEdyeah commented Jan 19, 2025

kishwarshafin commented Jan 19, 2025

EEEdyeah commented Jan 21, 2025

Some Samples Encounter ForkProcess Empty Issues with No Output Using run_pangenome_aware_deepvariant #926

Some Samples Encounter ForkProcess Empty Issues with No Output Using run_pangenome_aware_deepvariant #926

Comments

EEEdyeah commented Jan 19, 2025

kishwarshafin commented Jan 19, 2025

EEEdyeah commented Jan 19, 2025

kishwarshafin commented Jan 19, 2025

EEEdyeah commented Jan 21, 2025