Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sge_submit_makeflow #40

Open
mlap-t opened this issue Dec 14, 2020 · 11 comments
Open

sge_submit_makeflow #40

mlap-t opened this issue Dec 14, 2020 · 11 comments

Comments

@mlap-t
Copy link

mlap-t commented Dec 14, 2020

Dear makeflow expert,

First thanks for the excellent tools that you've developped!

I was wondering if there was a tool similar to condor_submit_makeflow (see https://cctools.readthedocs.io/en/latest/man_pages/condor_submit_makeflow/).

Now when I run jobs on sge I have to remain logged in and this is problematic for long jobs and/or unstable networks.

thanks in advance for your help,

mathieu

@btovar
Copy link
Member

btovar commented Dec 14, 2020

There is sge_submit_workflow; it should be installed along condor_submit_workflow. Please let us know if you find it, and if it works for you.

@dthain
Copy link
Member

dthain commented Dec 14, 2020

Whoops, looks like that command is not part of the install rule.

@btovar
Copy link
Member

btovar commented Dec 14, 2020

Got it, I'll fix it now.

@btovar
Copy link
Member

btovar commented Dec 14, 2020

@mlap-t
Copy link
Author

mlap-t commented Dec 15, 2020

Thanks a lot. I have to update my makeflow version from git then (so far i was using the binart tarball). I will do it later today or tomorrow.
mathieu

@mlap-t
Copy link
Author

mlap-t commented Dec 23, 2020

Sorry for the delay in addressing this issue...

I normally run my jobs with this command:
sge_submit_makeflow -T sge -B '-P P_antares -q long -l sps=1' --safe-submit-mode -J 250 --jx scan2d.jx

How shall I do with sge_submit_makeflow? Thanks for your help.

@btovar
Copy link
Member

btovar commented Dec 23, 2020

The command sge_submit_makeflow is designed to work with work queue, but I think for now we can trick it using an environment variable like (all in one command line):

makeflow_ops="-T sge -B '-P P_antares -q long -l sps=1' --safe-submit-mode -J 250 --jx" ./sge_submit_makeflow -p'-P_antares -q long -l sps=1'  scand2jxprojectname scan2d.jx

The double specification of '-P ...' etc. is needed because makeflow and the jobs may run on different queues.

When I'm back from the break I'll look for a more clean solution.

@mlap-t
Copy link
Author

mlap-t commented Jan 4, 2021

I tried this command:
makeflow_ops="-T sge -B '-P P_km3net -q long -l sps=1' --safe-submit-mode -J 250 --jx" sge_submit_makeflow -p' -P P_km3net -q long -l sps=1' scand2jxprojectname scan2d.jx

It created a single job which I think was supposed to launch this shell script (sge_submit.sh)

#!/bin/sh
./makeflow -T wq -a -e -N scand2jxprojectname -T sge -B '-P P_km3net -q long -l sps=1' --safe-submit-mode -J 250 --jx 
scan2d.jx

but once the job start nothing happen and I have no logs.
Any idea why?
I couldn't find the meaning of the -e option.
What about the two -T options (wq and sge) in the shell script?

@btovar
Copy link
Member

btovar commented Jan 5, 2021

I have proposed some changes here:

cooperative-computing-lab/cctools#2503

The sge_submit_makeflow proposed new script:
https://raw.githubusercontent.com/cooperative-computing-lab/cctools/7417c0d160af94f7991ffadc0add75cc0ff1b33b/makeflow/src/sge_submit_makeflow

The command line would look something like:

./sge_submit_makeflow -T sge -p '-P P_km3net -q long -l sps=1' -E '--safe-submit-mode -J 250 --jx' scand2jxprojectname  scan2d.jx

Please let me know if that works for you!

@mlap-t
Copy link
Author

mlap-t commented Jan 6, 2021

I ran the command you proposed but it does something very similar to using only 'makeflow' i.e. many jobs are submitted and I don't get back the shell prompt to logout.
My understanding was that the script should allow to submit a single job to sge that would in-turn submit sge sub-jobs.

@btovar
Copy link
Member

btovar commented Jan 6, 2021

My mistake, I forgot that sge nodes usually cannot submit jobs by themselves. The way we usually handle this is by directing makeflow to use the wq batch system, and have additional sge jobs to serve as workers that execute the tasks. With this, rather than having one sge job per rule in your makeflow, you have one sge job per worker.

I submitted a couple of fixes for this to work correctly with sge_submit_makeflow:

cooperative-computing-lab/cctools#2504
https://github.com/cooperative-computing-lab/cctools/blob/4bbdf06d9c9bd08c66e0e27ab8be5218221934a3/makeflow/src/sge_submit_makeflow

You would do something like:

./sge_submit_makeflow  -p '-P P_km3net -q long -l sps=1' -E '--safe-submit-mode -J 250 --jx' scand2jxprojectname  scan2d.jx

There is a chance that your workflow will not work as is. This is because with wq all references to files are made with respect to the local filesystem the jobs are running, rather than the shared filesystem assumed for sge jobs. This usually can be easily fixed by adding all input files (including executables) to your makeflow's rule specifications. We can help you to make this declarations in case you have any questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants