Skip to content

Commit

Permalink
allow for user to avoid recursion on input directories
Browse files Browse the repository at this point in the history
this is required to be able to run over a directory of DB event
libraries (which are directories themselves)
  • Loading branch information
tomeichlersmith committed Jul 13, 2021
1 parent 7eb6e89 commit d141e7e
Show file tree
Hide file tree
Showing 4 changed files with 28 additions and 146 deletions.
16 changes: 14 additions & 2 deletions batch/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ ldmx-submit-jobs -c production.py -o EXAMPLE -d ldmx/pro:v2.3.0 -n 5

*Comments* :
- The output directory defined using `-o` is relative to your hdfs directory (so you will find the output of these five jobs in `<your-hdfs-dir>/EXAMPLE/`. If you want the output in some other directory, you need to specify the full path.
- The version of ldmx-sw you want to use can be defined using the name of the directory it is in when using `ldmx-make-stable`. Your options for a stable installation are in `/local/cms/user/$USER/ldmx/stable-installs/`.
- The version of ldmx-sw you want to use can be defined by providing the production container using a DockerHub tag (`-d`) or providing the path to the singularity file you built (`-s`).
- By default, the run numbers will start at `0` and count up from there. You can change the first run number by using the `--start_job` option. This is helpful when (for example), you want to run small group of jobs to make sure everything is working, but you don't want to waste time re-running the same run numbers.

#### 2. Analysis
Expand All @@ -72,7 +72,7 @@ ldmx-submit-jobs -c analysis.py -o EXAMPLE/hists -i EXAMPLE -d ldmx/pro:v2.3.0 -

*Comments*:
- Like the output directory, the input directory is also relative to your hdfs directory unless a full path is specified.
**The current `run_fire.sh` script only mounts hdfs, so the container will think directories/files outside of hdfs don't exist.**
**The current `run_ldmx.sh` script only mounts hdfs, so the container will think directories/files outside of hdfs don't exist.**
- Since there are five files to analyze and we are asking for two files per job, we will have three jobs
(two with two files and one with one).

Expand Down Expand Up @@ -112,3 +112,15 @@ We put all of these generated files in the `<output-directory>/detail` directory
- You can use the command `condor_q` to see the current status of your jobs.
- The `-long` option to `condor_q` or `condor_history` dumps all of the information about the job(s) that you have selected with the other command line options. This is helpful for seeing exactly what was run.
- If you see a long list of sequential jobs "fail", it might be that a specific worker node isn't configured properly. Check that it is one worker-node's fault by running `my-q -held -long | uniq-hosts`. If only one worker node shows up (but you know that you have tens of failed jobs), then you can `ssh` to that machine to try to figure it out (or email csehelp if you aren't sure what to do). In the mean time, you can put that machine in your list of `Machine != <full machine name>` at the top of the submit file.

# Dark Brem Signal Generation

This sample generation is a special case that requires some modification.
Normally, we want to recursively enter directories in order to get a list of all `.root` files to use as input.
The DB event libraries are directories themselves, so we need to turn off recursion.
Here is an example of submitting a job where we provide the directory hold the DB event libraries.
Notice that we need _both_ `--no_recursive` _and_ `--files_per_job 1` so that we can run the DB sim once for each event library we have.

```
ldmx-submit-jobs -c db_sim.py -d ldmx/pro:edge -i /hdfs/cms/user/eichl008/ldmx/dark-brem-event-libraries --no_recursive -o TEST --files_per_job 1 --config_args "--num_events 20000 --material tungsten"
```
16 changes: 12 additions & 4 deletions batch/python/umn_htcondor/submit.py
Original file line number Diff line number Diff line change
Expand Up @@ -255,7 +255,7 @@ def periodic_release(self) :

self['periodic_release'] = held_by_us.and_((exit_code == 99).or_(exit_code == 100).or_(exit_code == 117).or_(exit_code == 118))

def run_over_input_dirs(self, input_dirs, num_files_per_job) :
def run_over_input_dirs(self, input_dirs, num_files_per_job, recursive = True) :
"""Have the config script run over num_files_per_job files taken from input_dirs, generating jobs
until all of the files in input_dirs are included.
Expand All @@ -265,6 +265,8 @@ def run_over_input_dirs(self, input_dirs, num_files_per_job) :
List of input directories, files, or file listings to run over
num_files_per_job : int
Number of files for each job to have (maximum, could be less)
recursive : bool
True if we should recursively search for root and list files in the supplied directories
"""

if self.__items_to_loop_over is not None :
Expand All @@ -283,14 +285,20 @@ def smart_recursive_input(file_or_dir) :
file_listing = listing.readlines()

full_list.extend(smart_recursive_input([f.strip() for f in file_listing]))
elif os.path.isdir(file_or_dir) :
full_list.extend(smart_recursive_input([os.path.join(file_or_dir,f) for f in os.listdir(file_or_dir)]))
elif os.path.isdir(utility.full_dir(file_or_dir)) :
d = utility.full_dir(file_or_dir)
full_list.extend(smart_recursive_input([os.path.join(d,f) for f in os.listdir(d)]))
else :
print(f"'{file_or_dir}' is not a ROOT file, a directory, or a list of files. Skipping.")
#file or directory
return full_list

input_file_list = smart_recursive_input(input_dirs)
if recursive :
input_file_list = smart_recursive_input(input_dirs)
else :
input_file_list = []
for d in [utility.full_dir(d) for d in input_dirs] :
input_file_list.extend([os.path.join(d,f) for f in os.listdir(d)])

# we need to define a list of dictionaries that htcondor submission will loop over
# we partition the list of input files into space separate lists of maximum length arg.files_per_job
Expand Down
139 changes: 0 additions & 139 deletions batch/run_fire.sh

This file was deleted.

3 changes: 2 additions & 1 deletion batch/submit_jobs.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@
parser.add_argument("--input_arg_name",type=str,default='',help='Name of argument that should go before the input file or run number when passing it to the config script.')
parser.add_argument("--start_job",type=int,default=0,help="Starting number to use when run numbers. Only used if NOT running over items in a directory.")
parser.add_argument("--files_per_job",type=int,default=10,help="If running over an input directory, this argument defines how many files to group together per job.")
parser.add_argument("--no_recursive",default=False,action='store_true',help='Should we NOT recursively enter the input directories?')

# rarely-used optional args
full_path_to_dir_we_are_in=os.path.dirname(os.path.realpath(__file__))
Expand Down Expand Up @@ -101,7 +102,7 @@
job_instructions.periodic_release()

if arg.input_dir is not None :
job_instructions.run_over_input_dirs(arg.input_dir, arg.files_per_job)
job_instructions.run_over_input_dirs(arg.input_dir, arg.files_per_job, not arg.no_recursive)
elif arg.refill :
job_instructions.run_refill()
else :
Expand Down

0 comments on commit d141e7e

Please sign in to comment.