-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gpu_type in wlds no appropriate #414
Comments
Hi @Neu970 , Thanks for reaching, and thank you for reporting this issue! Your issue with GPU quota limit when using cellbender workflow seems related to the low availability of GPUs in the regions. But I'll test at my side if it is related to any issue at our side. For STARsolo workflow, could you please give some example on the issue? I'm asking this because the workflow runs well in my analysis tasks with the default 32 vCPUs and 120GB memory. If your data require more computing resources, please notice that you can increase them by setting your own values for Hope it helps! Sincerely, |
Hi Yiming, |
Hi Yiming,
First I resolve this issue changing the computing resources.
But I have another issue and I hope your help.
I am applying the starsolo wdl (Source: github.com/lilab-bcb/cumulus/STARsolo:master) on Terra platform. A couple of weeks ago I used the same protocol and the same files and now it doesn't work.
The generate_count_config process works without problem but starsolo_count informs me in the Terra job manager:
“Task starsolo_count.run_starsolo:NA:1 failed. Job exit code 1. Check gs: //fc-a17dc2ff-9944-43db-a126-8b87eeb0b279/submissions/5af1c8ff-ad3f-4d8e-bdc4-183410b1260f/starsolo_workflow/893360c8-347c-4b14-ba07- 9456bb9f67d3/call-starsolo_count/shard-0/starsolo_count/262fa48e-7704-412e-a993-bc5abf946d8d/call-run_starsolo/stderr for more information. PAPI error code 9. Please check the log file for more details: gs: //fc-a17dc2ff-9944-43db-a126-8b87eeb0b279/submissions/5af1c8ff-ad3f-4d8e-bdc4-183410b1260f/starsolo_workflow/893360c8-347c-4b14- ba07-9456bb9f67d3/call-starsolo_count/shard-0/starsolo_count/262fa48e-7704-412e-a993-bc5abf946d8d/call-run_starsolo/run_starsolo. log.”
and in the run_starsolo.log it is as if strato cannot recognize the arguments:
“Average throughput: 251.2MiB/s 2024/09/11 04:47:18 Localization script execution complete.
2024/09/11 04:47:58 Done localization.
2024/09/11 04:48:14 Running user action: docker run -v /mnt/local-disk:/cromwell_root --entrypoint=/bin/bash ***@***.***:f6f598545121fd36bba4df78979d821fdb4292804d48b8a7a31a54a1c95b819c /cromwell_root/script
usage: strato cp [-h] [-r] [-m] [--ionice] [--profile PROFILE] [--quiet] [--dryrun] filenames [filenames …]
strato cp: error: unrecognized arguments: --backend gs://fc-a17dc2ff-9944-43db-a126-8b87eeb0b279/SRR15931900/GSM5585219_1.fastq.gz GSM5585219_0/
strato exists --backend gcp gs://fc-a17dc2ff-9944-43db-a126-8b87eeb0b279/SRR15931900/GSM5585219/
strato cp --backend gcp -m gs://fc-a17dc2ff-9944-43db-a126-8b87eeb0b279/SRR15931900/GSM5585219_1.fastq.gz GSM5585219_0/
Traceback (most recent call last):
File "<stdin>", line 29, in <module>
File "/usr/lib/python3.9/subprocess.py", line 373, in check_call raise CalledProcessError(retcode, cmd) subprocess.
CalledProcessError: Command '['strato', 'exists', '--backend', 'gcp', 'gs://fc-a17dc2ff-9944-43db-a126-8b87eeb0b279/SRR15931900/GSM5585219/']' returned non-zero exit status 2."
if you think it should be indicated in the lilab-bcb/cumulus issues section, let me know and I will send it to you.
again thank you very much for your assistance
Best regards
Neurod
… El 10 sept 2024, a las 22:42, Yiming Yang ***@***.***> escribió:
Hi @Neu970 <https://github.com/Neu970> ,
Thanks for reaching, and thank you for reporting this issue!
Your issue with GPU quota limit when using cellbender workflow seems related to the low availability of GPUs in the regions. But I'll test at my side if it is related to any issue at our side.
For STARsolo workflow, could you please give some example on the issue? I'm asking this because the workflow runs well in my analysis tasks with the default 32 vCPUs and 120GB memory. If your data require more computing resources, please notice that you can increase them by setting your values for num_cpu and memory inputs. (Please look for these inputs in https://cumulus.readthedocs.io/en/stable/starsolo.html#workflow-inputs)
Hope it helps!
Sincerely,
Yiming
—
Reply to this email directly, view it on GitHub <#414 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AUNK2ZZUZVA7HP4KEH3YBITZV5KSDAVCNFSM6AAAAABMEBVW2CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNBRHE3TGNJYHE>.
You are receiving this because you were mentioned.
|
Hi @Neu970 , This is a backward compatibility issue. I just fixed it in the Sincerely, |
Dear Yiming,
Thank you for your support and help; everything went smoothly. I have submitted various analyses with different samples and haven't encountered any errors.
I truly appreciate your quick response and how efficiently you solved the problem.
Best regards,
Rodrigo
… El 12 sept 2024, a las 18:56, Yiming Yang ***@***.***> escribió:
Hi @Neu970 <https://github.com/Neu970> ,
This is a backward compatibility issue. I just fixed it in the master branch, which you use in your jobs. Please check it out now, and let me know if the issue still persists.
Sincerely,
Yiming
—
Reply to this email directly, view it on GitHub <#414 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AUNK2ZYSB4WXJX6AQSIDP6TZWHBSBAVCNFSM6AAAAABMEBVW2CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNBWHAYDANZXGE>.
You are receiving this because you were mentioned.
|
Dear Team lilab-bcb/cumulus,
I wanted to bring to your attention that several of the WDLs protocols you offer, such as Cellbender/remove_background, are no longer functioning properly. Specifically, I encountered an issue where Google Cloud reports insufficient quota to run the protocol using the "nvidia-tesla-t4" GPU.
The error message I received is as follows:
"Task cellbender.run_cellbender_remove_background_gpu:0:20 failed. The job was stopped before the command finished. PAPI error code 9. Could not start instance custom-4-8192 due to insufficient quota. Cromwell retries exhausted, task failed. Backend info: Execution failed: allocating: selecting resources: selecting region and zone: no available zones: us-east1: 1 NVIDIA_T4_GPUS (0/0 available) quota too low, us-west1: 1 NVIDIA_T4_GPUS (0/0 available) quota too low, us-central1: 1 NVIDIA_T4_GPUS (0/0 available) quota too low."
Additionally, protocols like Starsolo require adjustments to memory and CPU values, as the default settings are no longer adequate.
A couple of months ago, these protocols worked without any issues, but they seem to have stopped functioning recently. I hope this information helps you investigate whether this is a temporary problem or something more persistent. I greatly appreciate your efforts in developing these WDLs, as their value is immense, but at the moment, they are not very useful due to these issues.
Thank you for your attention to this matter.
Best regards,
Rod
The text was updated successfully, but these errors were encountered: