diff --git a/docs/materials/data/part1-ex1-data-needs.md b/docs/materials/data/part1-ex1-data-needs.md index 6290da0..30266f7 100644 --- a/docs/materials/data/part1-ex1-data-needs.md +++ b/docs/materials/data/part1-ex1-data-needs.md @@ -35,8 +35,8 @@ genome information. 1. Copy the BLAST executables: :::console - user@ap40 $ wget http://proxy.chtc.wisc.edu/SQUID/osg-school-2023/ncbi-blast-2.12.0+-x64-linux.tar.gz - user@ap40 $ tar -xzvf ncbi-blast-2.12.0+-x64-linux.tar.gz + user@ap40 $ wget http://proxy.chtc.wisc.edu/SQUID/osg-school-2023/ncbi-blast-2.15.0+-x64-linux.tar.gz + user@ap40 $ tar -xzvf ncbi-blast-2.15.0+-x64-linux.tar.gz 1. Download these files to your current directory: @@ -55,7 +55,7 @@ Understanding BLAST Remember that `blastx` is executed in a command like the following: ``` console -user@ap40 $ ./ncbi-blast-2.12.0+/bin/blastx -db -query -out +user@ap40 $ ./ncbi-blast-2.15.0+/bin/blastx -db -query -out ``` In the above, the `` is the name of a file containing a number of genetic sequences (e.g. `mouse.fa`), and diff --git a/docs/materials/data/part1-ex2-file-transfer.md b/docs/materials/data/part1-ex2-file-transfer.md index 9ce65e6..f3133f7 100644 --- a/docs/materials/data/part1-ex2-file-transfer.md +++ b/docs/materials/data/part1-ex2-file-transfer.md @@ -209,7 +209,7 @@ destination `science-results/mouse.fa.result`. Remember that the `transfer_output_remaps` value requires double quotes around it. -Submit the job, and wait for it to complete. Was there +Submit the job, and wait for it to complete. Are there any errors? Can you find mouse.fa.result? Conclusions diff --git a/docs/materials/htcondor/part1-ex1-login.md b/docs/materials/htcondor/part1-ex1-login.md index a1b0d1f..94a00b1 100644 --- a/docs/materials/htcondor/part1-ex1-login.md +++ b/docs/materials/htcondor/part1-ex1-login.md @@ -52,7 +52,7 @@ In the exercises, we will show commands that you are supposed to type or copy in ``` console username@ap1 $ hostname -ap1.facility.path-cc.io +path-ap2001 ``` !!! note @@ -92,7 +92,7 @@ HTCondor is installed on this server. But what version? You can ask HTCondor its ``` console username@ap1 $ condor_version -$CondorVersion: 10.7.0 2023-07-10 BuildID: 659788 PackageID: 10.7.0-0.659788 RC $ +$CondorVersion: 23.9.0 2024-06-27 BuildID: 742143 PackageID: 23.9.0-0.742143 GitSHA: 68fde429 RC $ $CondorPlatform: x86_64_AlmaLinux8 $ ``` diff --git a/docs/materials/htcondor/part1-ex3-jobs.md b/docs/materials/htcondor/part1-ex3-jobs.md index c2f138c..e6bbf28 100644 --- a/docs/materials/htcondor/part1-ex3-jobs.md +++ b/docs/materials/htcondor/part1-ex3-jobs.md @@ -159,13 +159,13 @@ or perhaps a shell script of commands that you'd like to run within a job. In th 1. Test your script from the command line: :::console - username@ap1 $ ./test-script.sh hello 42 - Date: Mon Jul 17 10:02:20 CDT 2017 - Host: learn.chtc.wisc.edu - System: Linux x86_64 GNU/Linux + username@ap1 $ ./test-script.sh hello 42 + Date: Mon Jul 1 14:03:56 CDT 2024 + Host: path-ap2001 + System: Linux x86_64 GNU/Linux Program: ./test-script.sh Args: hello 42 - ls: hostname.sub montage hostname.err hostname.log hostname.out test-script.sh + ls: hostname.err hostname.log hostname.out hostname.sub sleep.log sleep.sub test-script.sh This step is **really** important! If you cannot run your executable from the command-line, HTCondor probably cannot run it on another machine, either. Further, debugging problems like this one is surprisingly difficult. diff --git a/docs/materials/htcondor/part1-ex4-logs.md b/docs/materials/htcondor/part1-ex4-logs.md index d5a1bc7..d8dc8d7 100644 --- a/docs/materials/htcondor/part1-ex4-logs.md +++ b/docs/materials/htcondor/part1-ex4-logs.md @@ -23,15 +23,15 @@ For this exercise, we can examine a log file for any previous job that you have A job log file is updated throughout the life of a job, usually at key events. Each event starts with a heading that indicates what happened and when. Here are **all** of the event headings from the `sleep` job log (detailed output in between headings has been omitted here): ``` file -000 (5739.000.000) 2023-07-10 10:44:20 Job submitted from host: <128.104.100.43:9618?addrs=...> -040 (5739.000.000) 2023-07-10 10:45:10 Started transferring input files -040 (5739.000.000) 2023-07-10 10:45:10 Finished transferring input files -001 (5739.000.000) 2023-07-10 10:45:11 Job executing on host: <128.104.55.42:9618?addrs=...> -006 (5739.000.000) 2023-07-10 10:45:20 Image size of job updated: 72 -040 (5739.000.000) 2023-07-10 10:45:20 Started transferring output files -040 (5739.000.000) 2023-07-10 10:45:20 Finished transferring output files -006 (5739.000.000) 2023-07-10 10:46:11 Image size of job updated: 4072 -005 (5739.000.000) 2023-07-10 10:46:11 Job terminated. +000 (5739.000.000) 2024-07-10 10:44:20 Job submitted from host: <128.104.100.43:9618?addrs=...> +040 (5739.000.000) 2024-07-10 10:45:10 Started transferring input files +040 (5739.000.000) 2024-07-10 10:45:10 Finished transferring input files +001 (5739.000.000) 2024-07-10 10:45:11 Job executing on host: <128.104.55.42:9618?addrs=...> +006 (5739.000.000) 2024-07-10 10:45:20 Image size of job updated: 72 +040 (5739.000.000) 2024-07-10 10:45:20 Started transferring output files +040 (5739.000.000) 2024-07-10 10:45:20 Finished transferring output files +006 (5739.000.000) 2024-07-10 10:46:11 Image size of job updated: 4072 +005 (5739.000.000) 2024-07-10 10:46:11 Job terminated. ``` There is a lot of extra information in those lines, but you can see: @@ -43,7 +43,7 @@ There is a lot of extra information in those lines, but you can see: Some events provide no information in addition to the heading. For example: ``` file -000 (5739.000.000) 2020-07-10 10:44:20 Job submitted from host: <128.104.100.43:9618?addrs=...> +000 (5739.000.000) 2024-07-10 10:44:20 Job submitted from host: <128.104.100.43:9618?addrs=...> ... ``` @@ -53,7 +53,7 @@ Some events provide no information in addition to the heading. For example: However, some lines have additional information to help you quickly understand where and how your job is running. For example: ``` file -001 (5739.000.000) 2020-07-10 10:45:11 Job executing on host: <128.104.55.42:9618?addrs=...> +001 (5739.000.000) 2024-07-10 10:45:11 Job executing on host: <128.104.55.42:9618?addrs=...> SlotName: slot1@WISC-PATH-IDPL-EP.osgvo-docker-pilot-idpl-7c6575d494-2sj5w CondorScratchDir = "/pilot/osgvo-pilot-2q71K9/execute/dir_9316" Cpus = 1 @@ -70,7 +70,7 @@ However, some lines have additional information to help you quickly understand w Another example of is the periodic update: ``` file -006 (5739.000.000) 2020-07-10 10:45:20 Image size of job updated: 72 +006 (5739.000.000) 2024-07-10 10:45:20 Image size of job updated: 72 1 - MemoryUsage of job (MB) 72 - ResidentSetSize of job (KB) ... @@ -81,7 +81,7 @@ These updates record the amount of memory that the job is using on the execute m The job termination event includes a lot of very useful information: ``` file -005 (5739.000.000) 2023-07-10 10:46:11 Job terminated. +005 (5739.000.000) 2024-07-10 10:46:11 Job terminated. (1) Normal termination (return value 0) Usr 0 00:00:00, Sys 0 00:00:00 - Run Remote Usage Usr 0 00:00:00, Sys 0 00:00:00 - Run Local Usage @@ -95,7 +95,7 @@ The job termination event includes a lot of very useful information: Cpus : 1 1 Disk (KB) : 40 30 4203309 Memory (MB) : 1 1 1 -Job terminated of its own accord at 2023-07-10 10:46:11 with exit-code 0. +Job terminated of its own accord at 2024-07-10 10:46:11 with exit-code 0. ... ``` diff --git a/docs/materials/htcondor/part1-ex5-request.md b/docs/materials/htcondor/part1-ex5-request.md index 7a57bb2..84cd9e6 100644 --- a/docs/materials/htcondor/part1-ex5-request.md +++ b/docs/materials/htcondor/part1-ex5-request.md @@ -44,7 +44,7 @@ On Mac and Windows, for example, the "Activity Monitor" and "Task Manager" appli Using `ps`: ``` console -username@learn $ ps ux +username@ap1 $ ps ux USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND alice 24342 0.0 0.0 90224 1864 ? S 13:39 0:00 sshd: alice@pts/0 alice 24343 0.0 0.0 66096 1580 pts/0 Ss 13:39 0:00 -bash diff --git a/docs/materials/htcondor/part1-ex7-compile.md b/docs/materials/htcondor/part1-ex7-compile.md index 94a32c7..1bf0788 100644 --- a/docs/materials/htcondor/part1-ex7-compile.md +++ b/docs/materials/htcondor/part1-ex7-compile.md @@ -48,19 +48,19 @@ Save that code to a file, for example, `simple.c`. Compile the program with static linking: ``` console -username@learn $ gcc -static -o simple simple.c +username@ap1 $ gcc -static -o simple simple.c ``` As always, test that you can run your command from the command line first. First, without arguments to make sure it fails correctly: ``` console -username@learn $ ./simple +username@ap1 $ ./simple ``` and then with valid arguments: ``` console -username@learn $ ./simple 5 21 +username@ap1 $ ./simple 5 21 ``` Running a Compiled C Program diff --git a/docs/materials/htcondor/part1-ex8-queue.md b/docs/materials/htcondor/part1-ex8-queue.md index 0cfc079..d6cea46 100644 --- a/docs/materials/htcondor/part1-ex8-queue.md +++ b/docs/materials/htcondor/part1-ex8-queue.md @@ -17,32 +17,32 @@ Selecting Jobs The `condor_q` program has many options for selecting which jobs are listed. You have already seen that the default mode is to show only your jobs in "batch" mode: ``` console -username@learn $ condor_q +username@ap1 $ condor_q ``` You've seen that you can view all jobs (all users) in the submit node's queue by using the `-all` argument: ``` console -username@learn $ condor_q -all +username@ap1 $ condor_q -all ``` And you've seen that you can view more details about queued jobs, with each separate job on a single line using the `-nobatch` option: ``` console -username@learn $ condor_q -nobatch -username@learn $ condor_q -all -nobatch +username@ap1 $ condor_q -nobatch +username@ap1 $ condor_q -all -nobatch ``` Did you know you can also name one or more user IDs on the command line, in which case jobs for all of the named users are listed at once? ``` console -username@learn $ condor_q +username@ap1 $ condor_q ``` To list just the jobs associated with a single cluster number: ``` console -username@learn $ condor_q +username@ap1 $ condor_q ``` For example, if you want to see the jobs in cluster 5678 (i.e., `5678.0`, `5678.1`, etc.), you use `condor_q 5678`. @@ -50,7 +50,7 @@ For example, if you want to see the jobs in cluster 5678 (i.e., `5678.0`, `5678. To list a specific job (i.e., cluster.process, as in 5678.0): ``` console -username@learn $ condor_q +username@ap1 $ condor_q ``` For example, to see job ID 5678.1, you use `condor_q 5678.1`. @@ -79,7 +79,7 @@ You may have wondered why it is useful to be able to list a single job ID using If you add the `-long` option to `condor_q` (or its short form, `-l`), it will show the complete ClassAd for each selected job, instead of the one-line summary that you have seen so far. Because job ClassAds may have 80–90 attributes (or more), it probably makes the most sense to show the ClassAd for a single job at a time. And you know how to show just one job! Here is what the command looks like: ``` console -username@learn $ condor_q -long +username@ap1 $ condor_q -long ``` The output from this command is long and complex. Most of the attributes that HTCondor adds to a job are arcane and uninteresting for us now. But here are some examples of common, interesting attributes taken directly from `condor_q` output (except with some line breaks added to the `Requirements` attribute): @@ -138,7 +138,7 @@ Sometimes, you submit a job and it just sits in the queue in Idle state, never r To ask HTCondor why your job is not running, add the `-better-analyze` option to `condor_q` for the specific job. For example, for job ID 2423.0, the command is: ``` console -username@learn $ condor_q -better-analyze 2423.0 +username@ap1 $ condor_q -better-analyze 2423.0 ``` Of course, replace the job ID with your own. @@ -166,7 +166,7 @@ There is a lot of output, but a few items are worth highlighting. Here is a samp ``` file --- Schedd: learn.chtc.wisc.edu : <128.104.100.148:9618?... +-- Schedd: ap1.facility.path-cc.io : <128.105.68.66:9618?... ... Job 98096.000 defines the following attributes: @@ -215,7 +215,7 @@ There is a way to select the specific job attributes you want `condor_q` to tell To use autoformatting, use the `-af` option followed by the attribute name, for each attribute that you want to output: ``` console -username@learn $ condor_q -all -af Owner ClusterId Cmd +username@ap1 $ condor_q -all -af Owner ClusterId Cmd moate 2418 /share/test.sh cat 2421 /bin/sleep cat 2422 /bin/sleep @@ -228,7 +228,7 @@ References As suggested above, if you want to learn more about `condor_q`, you can do some reading: -- Read the `condor_q` man page or HTCondor Manual section (same text) to learn about more options -- Read about ClassAd attributes in Appendix A of the HTCondor Manual +- Read the `condor_q` man page or [HTCondor Manual section](https://htcondor.readthedocs.io/en/latest/man-pages/condor_q.html) (same text) to learn about more options +- Read about [ClassAd attributes](https://htcondor.readthedocs.io/en/latest/classad-attributes/job-classad-attributes.html) in the HTCondor Manual diff --git a/docs/materials/htcondor/part1-ex9-status.md b/docs/materials/htcondor/part1-ex9-status.md index b5bd884..aaa5f93 100644 --- a/docs/materials/htcondor/part1-ex9-status.md +++ b/docs/materials/htcondor/part1-ex9-status.md @@ -19,7 +19,7 @@ The `condor_status` program has many options for selecting which slots are liste Another convenient option is to list only those slots that are available now: ``` console -username@learn $ condor_status -avail +username@ap1 $ condor_status -avail ``` Of course, the individual execute machines only report their slots to the collector at certain time intervals, so this list will not reflect the up-to-the-second reality of all slots. But this limitation is true of all `condor_status` output, not just with the `-avail` option. @@ -27,25 +27,25 @@ Of course, the individual execute machines only report their slots to the collec Similar to `condor_q`, you can limit the slots that are listed in two easy ways. To list just the slots on a specific machine: ``` console -username@learn $ condor_status +username@ap1 $ condor_status ``` For example, if you want to see the slots on `e2337.chtc.wisc.edu` (in the CHTC pool): ``` console -username@learn $ condor_status e2337.chtc.wisc.edu +username@ap1 $ condor_status e2337.chtc.wisc.edu ``` To list a specific slot on a machine: ``` console -username@learn $ condor_status @ +username@ap1 $ condor_status @ ``` For example, to see the “first” slot on the machine above: ``` console -username@learn $ condor_status slot1@e2337.chtc.wisc.edu +username@ap1 $ condor_status slot1@e2337.chtc.wisc.edu ``` !!! note @@ -68,7 +68,7 @@ Viewing a Slot ClassAd Just as with `condor_q`, you can use `condor_status` to view the complete ClassAd for a given slot (often confusingly called the “machine” ad): ``` console -username@learn $ condor_status -long @ +username@ap1 $ condor_status -long @ ``` Because slot ClassAds may have 150–200 attributes (or more), it probably makes the most sense to show the ClassAd for a single slot at a time, as shown above. @@ -91,7 +91,7 @@ Memory = 1024 As you may be able to tell, there is a mix of attributes about the machine as a whole (hence the name “machine ad”) and about the slot in particular. -Go ahead and examine a machine ClassAd now. I suggest looking at one of the slots on, say, `e2337.chtc.wisc.edu` because of its relatively simple configuration. +Go ahead and examine a machine ClassAd now. Viewing Slots by ClassAd Expression ----------------------------------- @@ -101,7 +101,7 @@ Often, it is helpful to view slots that meet some particular criteria. For examp For example, suppose we want to list all slots that are running Scientific Linux 7 (operating system) and have at least 16 GB memory available. Note that memory is reported in units of Megabytes. The command is: ``` console -username@learn $ condor_status -constraint 'OpSysAndVer == "CentOS7" && Memory >= 16000' +username@ap1 $ condor_status -constraint 'OpSysAndVer == "CentOS7" && Memory >= 16000' ``` !!! note @@ -109,7 +109,7 @@ username@learn $ condor_status -constraint 'OpSysAndVer == "CentOS7" && Memory > In the example above, the single quotes (`'`) are for the shell, so that the entire expression is passed to `condor_status` untouched, and the double quotes (`"`) surround a string value within the expression itself. -Currently on CHTC, there are only a few slots that meet these criteria (our high-memory servers, mainly used for metagenomics assemblies). +Currently on PATh, there are only a few slots that meet these criteria (our high-memory servers, mainly used for metagenomics assemblies). If you are interested in learning more about writing ClassAd expressions, look at section 4.1 and especially 4.1.4 of the HTCondor Manual. This is definitely advanced material, so if you do not want to read it, that is fine. But if you do, take some time to practice writing expressions for the `condor_status -constraint` command. @@ -125,7 +125,7 @@ The `condor_status` command accepts the same `-autoformat` (`-af`) options that For example, I was curious about the host name and operating system of the slots with more than 32GB of memory: ``` console -username@learn $ condor_status -af Machine -af OpSysAndVer -constraint 'Memory >= 32000' +username@ap1 $ condor_status -af Machine -af OpSysAndVer -constraint 'Memory >= 32000' ``` If you like, spend a few minutes now or later experimenting with `condor_status` formatting. diff --git a/docs/materials/osg/part1-ex3-hardware-diffs.md b/docs/materials/osg/part1-ex3-hardware-diffs.md index 3dd670f..b247e79 100644 --- a/docs/materials/osg/part1-ex3-hardware-diffs.md +++ b/docs/materials/osg/part1-ex3-hardware-diffs.md @@ -66,7 +66,7 @@ we will create a **new** submit file with the queue…in syntax and change the value of our parameter (`request_memory`) for each batch of jobs. 1. Log in or switch back to `ap1.facility.path-cc.io` (yes, back to PATh!) -1. Create and change into a new subdirectory called `osg-ex14` +1. Create and change into a new subdirectory called `osg-ex13` 1. Create a submit file named `sleep.sub` that executes the command `/bin/sleep 300`. !!! note @@ -108,7 +108,7 @@ Now you will do essentially the same thing on the OSPool. 1. Log in or switch to `ap40.uw.osg-htc.org` -1. Copy the `osg-ex14` directory from the [section above](#checking-chtc-memory-availability) +1. Copy the `osg-ex13` directory from the [section above](#checking-chtc-memory-availability) from `ap1.facility.path-cc.io` to `ap40.uw.osg-htc.org` If you get stuck during the copying process, refer to [OSG exercise 1.1](part1-ex1-login-scp.md). diff --git a/docs/materials/scaling/part1-ex1-organization.md b/docs/materials/scaling/part1-ex1-organization.md index 6acd606..81438c2 100644 --- a/docs/materials/scaling/part1-ex1-organization.md +++ b/docs/materials/scaling/part1-ex1-organization.md @@ -16,7 +16,7 @@ Make sure you are logged into `ap40.uw.osg-htc.org`. To get the files for this exercise: -1. Type `wget https://github.com/osg-htc/user-school-2023/raw/main/docs/materials/scaling/files/osgus23-day4-ex11-organizing-files.tar.gz` to download the tarball. +1. Type `wget https://github.com/osg-htc/school-2024/raw/main/docs/materials/scaling/files/osgus23-day4-ex11-organizing-files.tar.gz` to download the tarball. 1. As you learned earlier, expand this tarball file; it will create a `organizing-files` directory. 1. Change to that directory, or create a separate one for this exercise and copy the files in. diff --git a/docs/materials/software/part1-ex4-apptainer-build.md b/docs/materials/software/part1-ex4-apptainer-build.md index 8e233a7..fed1f7e 100644 --- a/docs/materials/software/part1-ex4-apptainer-build.md +++ b/docs/materials/software/part1-ex4-apptainer-build.md @@ -93,7 +93,7 @@ allow us to test our new container. 1. Try running: :::console - $ singularity shell first-image.sif + $ singularity shell py-cowsay.sif 1. Then try running the `hello-cow.py` script: diff --git a/docs/materials/software/part3-ex1-apptainer-recipes.md b/docs/materials/software/part3-ex1-apptainer-recipes.md index 2ed1600..8b64afe 100644 --- a/docs/materials/software/part3-ex1-apptainer-recipes.md +++ b/docs/materials/software/part3-ex1-apptainer-recipes.md @@ -16,8 +16,8 @@ the basic options and syntax of the "build" or definition file. |---------| | [Bootstrap/From](#where-to-start) | | [%files](#files-needed-for-building-or-running) | -| [%files](#commnds-to-install) | -| [%files](#environment) | +| [%post](#commands-to-install) | +| [%env](#environment) | Where to start diff --git a/docs/materials/software/part4-ex1-download.md b/docs/materials/software/part4-ex1-download.md index 622ffa7..5819f6e 100644 --- a/docs/materials/software/part4-ex1-download.md +++ b/docs/materials/software/part4-ex1-download.md @@ -63,8 +63,8 @@ it. If you want to do this all from the command line, the sequence will look like this (using `wget` as the download command.) :::console - user@login $ wget https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ncbi-blast-2.14.0+-x64-linux.tar.gz - user@login $ tar -xzf ncbi-blast-2.14.0+-x64-linux.tar.gz + user@login $ wget https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ncbi-blast-2.15.0+-x64-linux.tar.gz + user@login $ tar -xzf ncbi-blast-2.15.0+-x64-linux.tar.gz 1. We're going to be using the `blastx` binary in our job. Where is it in the directory you just decompressed? @@ -124,7 +124,7 @@ directory of our downloaded BLAST directory. We need to use the `arguments` line in the submit file to express the rest of the command. :::file - executable = ncbi-blast-2.13.0+/bin/blastx + executable = ncbi-blast-2.15.0+/bin/blastx arguments = -db pdbaa/pdbaa -query mouse.fa -out results.txt * The BLAST program requires our input file and database, so they diff --git a/docs/materials/software/part4-ex2-wrapper.md b/docs/materials/software/part4-ex2-wrapper.md index 3ed5386..c297ed4 100644 --- a/docs/materials/software/part4-ex2-wrapper.md +++ b/docs/materials/software/part4-ex2-wrapper.md @@ -34,7 +34,7 @@ Our wrapper script will be a bash script that runs several commands. :::bash #!/bin/bash - ncbi-blast-2.12.0+/bin/blastx -db pdbaa/pdbaa -query mouse.fa -out results.txt + ncbi-blast-2.15.0+/bin/blastx -db pdbaa/pdbaa -query mouse.fa -out results.txt Submit File Changes @@ -53,7 +53,7 @@ We now need to make some changes to our submit file. 1. Note that since the `blastx` program is no longer listed as the executable, it will be need to be included in `transfer_input_files`. Instead of transferring just that program, we will transfer the original downloaded `tar.gz` file. To achieve efficiency, we'll also transfer the pdbaa database as the original `tar.gz` file instead of as the unzipped folder: :::console - transfer_input_files = pdbaa.tar.gz, mouse.fa, ncbi-blast-2.12.0+-x64-linux.tar.gz + transfer_input_files = pdbaa.tar.gz, mouse.fa, ncbi-blast-2.15.0+-x64-linux.tar.gz 1. If you really want to be on top of things, look at the log file for the last exercise, and update your memory and disk requests to be just slightly above the actual "Usage" values in the log. @@ -73,10 +73,10 @@ Now that our database and BLAST software are being transferred to the job as `ta :::bash #!/bin/bash - tar -xzf ncbi-blast-2.12.0+-x64-linux.tar.gz + tar -xzf ncbi-blast-2.15.0+-x64-linux.tar.gz tar -xzf pdbaa.tar.gz - ncbi-blast-2.12.0+/bin/blastx -db pdbaa/pdbaa -query mouse.fa -out results2.txt + ncbi-blast-2.15.0+/bin/blastx -db pdbaa/pdbaa -query mouse.fa -out results2.txt 1. While not strictly necessary, it's a good idea to enable executable permissions on the wrapper script, like so: diff --git a/docs/materials/software/part4-ex3-arguments.md b/docs/materials/software/part4-ex3-arguments.md index d412849..6d71fbd 100644 --- a/docs/materials/software/part4-ex3-arguments.md +++ b/docs/materials/software/part4-ex3-arguments.md @@ -50,7 +50,7 @@ and third arguments, respectively. Thus, in the main command of the script, replace the various names with these variables: :::bash - ncbi-blast-2.12.0+/bin/blastx -db $1/$1 -query $2 -out $3 + ncbi-blast-2.15.0+/bin/blastx -db $1/$1 -query $2 -out $3 > If your wrapper script is in a different language, you should use that language's syntax for reading in variables from the command line. @@ -71,12 +71,12 @@ One of the downsides of this approach, is that our command has become harder to read. The original script contains all the information at a glance: :::bash - ncbi-blast-2.12.0+/bin/blastx -db pdbaa/pdbaa -query mouse.fa -out results2.txt + ncbi-blast-2.15.0+/bin/blastx -db pdbaa/pdbaa -query mouse.fa -out results2.txt But our new version is more cryptic -- what is `$1`?: :::bash - ncbi-blast-2.10.1+/bin/blastx -db $1 -query $2 -out $3 + ncbi-blast-2.15.1+/bin/blastx -db $1 -query $2 -out $3 One way to overcome this is to create our own variable names inside the wrapper script and assign the argument values to them. Here is an example for our @@ -89,10 +89,10 @@ BLAST script: INFILE=$2 OUTFILE=$3 - tar -xzf ncbi-blast-2.10.1+-x64-linux.tar.gz + tar -xzf ncbi-blast-2.15.1+-x64-linux.tar.gz tar -xzf pdbaa.tar.gz - ncbi-blast-2.10.1+/bin/blastx -db $DATABASE/$DATABASE -query $INFILE -out $OUTFILE + ncbi-blast-2.15.1+/bin/blastx -db $DATABASE/$DATABASE -query $INFILE -out $OUTFILE Here, we are assigning the input arguments (`$1`, `$2` and `$3`) to new variable names, and then using **those** names (`$DATABASE`, `$INFILE`, and `$OUTFILE`) in the command, diff --git a/docs/materials/software/part5-ex1-prepackaged.md b/docs/materials/software/part5-ex1-prepackaged.md index 7732292..afe8eb1 100644 --- a/docs/materials/software/part5-ex1-prepackaged.md +++ b/docs/materials/software/part5-ex1-prepackaged.md @@ -7,7 +7,7 @@ status: testing Software Exercise 5.1: Pre-package a Research Code ========================================== -**Objective**: Install software (HHMER) to a folder and run it in a job using a wrapper script. +**Objective**: Install software (HMMER) to a folder and run it in a job using a wrapper script. **Why learn this?**: If not using a container, this is a template for how to create a portable software installation using your own files, especially if the software @@ -45,7 +45,7 @@ for this example, we are going to compile directly on the Access Point :::console username@host $ tar -zxf hmmer.tar.gz - username@host $ cd hmmer-3.3.2 + username@host $ cd hmmer-3.4 1. Now we can follow the second set of installation instructions. For the prefix, we'll use the variable `$PWD` to capture the name of our current working directory and then a relative path to the `hmmer-build` directory we created in step 1: @@ -112,7 +112,7 @@ We're almost ready! We need two more pieces to run a HMMER job. run the job. You already have these files back in the directory where you unpacked the source code: :::console - username@ap1 $ ls hmmer-3.3.2/tutorial + username@ap1 $ ls hmmer-3.4/tutorial 7LESS_DROME fn3.hmm globins45.fa globins4.sto MADE1.hmm Pkinase.hmm dna_target.fa fn3.sto globins4.hmm HBB_HUMAN MADE1.sto Pkinase.sto @@ -124,7 +124,7 @@ run the job. You already have these files back in the directory where you unpack :::file executable = run_hmmer.sh - transfer_input_files = hmmer-build.tar.gz, hmmer-3.3.2/tutorial/ + transfer_input_files = hmmer-build.tar.gz, hmmer-3.4/tutorial/ A wrapper script will always be a job's `executable`. When using a wrapper script, you must also always remember to transfer the software/source code using diff --git a/docs/materials/workflows/part1-ex1-simple-dag.md b/docs/materials/workflows/part1-ex1-simple-dag.md index e65710e..8458893 100644 --- a/docs/materials/workflows/part1-ex1-simple-dag.md +++ b/docs/materials/workflows/part1-ex1-simple-dag.md @@ -82,153 +82,153 @@ In the third window, watch what DAGMan does (what you see may be slightly differ ``` console username@ap40 $ tail -f --lines=500 simple.dag.dagman.out -08/02/23 15:44:57 ****************************************************** -08/02/23 15:44:57 ** condor_scheduniv_exec.271100.0 (CONDOR_DAGMAN) STARTING UP -08/02/23 15:44:57 ** /usr/bin/condor_dagman -08/02/23 15:44:57 ** SubsystemInfo: name=DAGMAN type=DAGMAN(9) class=CLIENT(2) -08/02/23 15:44:57 ** Configuration: subsystem:DAGMAN local: class:CLIENT -08/02/23 15:44:57 ** $CondorVersion: 10.7.0 2023-07-10 BuildID: 659788 PackageID: 10.7.0-0.659788 RC $ -08/02/23 15:44:57 ** $CondorPlatform: x86_64_AlmaLinux8 $ -08/02/23 15:44:57 ** PID = 2340103 -08/02/23 15:44:57 ** Log last touched time unavailable (No such file or directory) -08/02/23 15:44:57 ****************************************************** -08/02/23 15:44:57 Daemon Log is logging: D_ALWAYS D_ERROR D_STATUS -08/02/23 15:44:57 DaemonCore: No command port requested. -08/02/23 15:44:57 DAGMAN_USE_STRICT setting: 1 -08/02/23 15:44:57 DAGMAN_VERBOSITY setting: 3 -08/02/23 15:44:57 DAGMAN_DEBUG_CACHE_SIZE setting: 5242880 -08/02/23 15:44:57 DAGMAN_DEBUG_CACHE_ENABLE setting: False -08/02/23 15:44:57 DAGMAN_SUBMIT_DELAY setting: 0 -08/02/23 15:44:57 DAGMAN_MAX_SUBMIT_ATTEMPTS setting: 6 -08/02/23 15:44:57 DAGMAN_STARTUP_CYCLE_DETECT setting: False -08/02/23 15:44:57 DAGMAN_MAX_SUBMITS_PER_INTERVAL setting: 100 -08/02/23 15:44:57 DAGMAN_AGGRESSIVE_SUBMIT setting: False -08/02/23 15:44:57 DAGMAN_USER_LOG_SCAN_INTERVAL setting: 5 -08/02/23 15:44:57 DAGMAN_QUEUE_UPDATE_INTERVAL setting: 300 -08/02/23 15:44:57 DAGMAN_DEFAULT_PRIORITY setting: 0 -08/02/23 15:44:57 DAGMAN_SUPPRESS_NOTIFICATION setting: True -08/02/23 15:44:57 allow_events (DAGMAN_ALLOW_EVENTS) setting: 114 -08/02/23 15:44:57 DAGMAN_RETRY_SUBMIT_FIRST setting: True -08/02/23 15:44:57 DAGMAN_RETRY_NODE_FIRST setting: False -08/02/23 15:44:57 DAGMAN_MAX_JOBS_IDLE setting: 1000 -08/02/23 15:44:57 DAGMAN_MAX_JOBS_SUBMITTED setting: 0 -08/02/23 15:44:57 DAGMAN_MAX_PRE_SCRIPTS setting: 20 -08/02/23 15:44:57 DAGMAN_MAX_POST_SCRIPTS setting: 20 -08/02/23 15:44:57 DAGMAN_MAX_HOLD_SCRIPTS setting: 20 -08/02/23 15:44:57 DAGMAN_MUNGE_NODE_NAMES setting: True -08/02/23 15:44:57 DAGMAN_PROHIBIT_MULTI_JOBS setting: False -08/02/23 15:44:57 DAGMAN_SUBMIT_DEPTH_FIRST setting: False -08/02/23 15:44:57 DAGMAN_ALWAYS_RUN_POST setting: False -08/02/23 15:44:57 DAGMAN_CONDOR_SUBMIT_EXE setting: /usr/bin/condor_submit -08/02/23 15:44:57 DAGMAN_USE_DIRECT_SUBMIT setting: True -08/02/23 15:44:57 DAGMAN_DEFAULT_APPEND_VARS setting: False -08/02/23 15:44:57 DAGMAN_ABORT_DUPLICATES setting: True -08/02/23 15:44:57 DAGMAN_ABORT_ON_SCARY_SUBMIT setting: True -08/02/23 15:44:57 DAGMAN_PENDING_REPORT_INTERVAL setting: 600 -08/02/23 15:44:57 DAGMAN_AUTO_RESCUE setting: True -08/02/23 15:44:57 DAGMAN_MAX_RESCUE_NUM setting: 100 -08/02/23 15:44:57 DAGMAN_WRITE_PARTIAL_RESCUE setting: True -08/02/23 15:44:57 DAGMAN_DEFAULT_NODE_LOG setting: @(DAG_DIR)/@(DAG_FILE).nodes.log -08/02/23 15:44:57 DAGMAN_GENERATE_SUBDAG_SUBMITS setting: True -08/02/23 15:44:57 DAGMAN_MAX_JOB_HOLDS setting: 100 -08/02/23 15:44:57 DAGMAN_HOLD_CLAIM_TIME setting: 20 -08/02/23 15:44:57 ALL_DEBUG setting: -08/02/23 15:44:57 DAGMAN_DEBUG setting: -08/02/23 15:44:57 DAGMAN_SUPPRESS_JOB_LOGS setting: False -08/02/23 15:44:57 DAGMAN_REMOVE_NODE_JOBS setting: True -08/02/23 15:44:57 DAGMAN will adjust edges after parsing -08/02/23 15:44:57 argv[0] == "condor_scheduniv_exec.271100.0" -08/02/23 15:44:57 argv[1] == "-Lockfile" -08/02/23 15:44:57 argv[2] == "simple.dag.lock" -08/02/23 15:44:57 argv[3] == "-AutoRescue" -08/02/23 15:44:57 argv[4] == "1" -08/02/23 15:44:57 argv[5] == "-DoRescueFrom" -08/02/23 15:44:57 argv[6] == "0" -08/02/23 15:44:57 argv[7] == "-Dag" -08/02/23 15:44:57 argv[8] == "simple.dag" -08/02/23 15:44:57 argv[9] == "-Suppress_notification" -08/02/23 15:44:57 argv[10] == "-CsdVersion" -08/02/23 15:44:57 argv[11] == "$CondorVersion: 10.7.0 2023-07-10 BuildID: 659788 PackageID: 10.7.0-0.659788 RC $" -08/02/23 15:44:57 argv[12] == "-Dagman" -08/02/23 15:44:57 argv[13] == "/usr/bin/condor_dagman" -08/02/23 15:44:57 Default node log file is: -08/02/23 15:44:57 DAG Lockfile will be written to simple.dag.lock -08/02/23 15:44:57 DAG Input file is simple.dag -08/02/23 15:44:57 Parsing 1 dagfiles -08/02/23 15:44:57 Parsing simple.dag ... -08/02/23 15:44:57 Adjusting edges -08/02/23 15:44:57 Dag contains 1 total jobs -08/02/23 15:44:57 Bootstrapping... -08/02/23 15:44:57 Number of pre-completed nodes: 0 -08/02/23 15:44:57 MultiLogFiles: truncating log file /home/mats.rynge/dagman-1/./simple.dag.nodes.log -08/02/23 15:44:57 DAG status: 0 (DAG_STATUS_OK) -08/02/23 15:44:57 Of 1 nodes total: -08/02/23 15:44:57 Done Pre Queued Post Ready Un-Ready Failed Futile -08/02/23 15:44:57 === === === === === === === === -08/02/23 15:44:57 0 0 0 0 1 0 0 0 -08/02/23 15:44:57 0 job proc(s) currently held -08/02/23 15:44:57 Registering condor_event_timer... -08/02/23 15:44:58 Submitting HTCondor Node Simple job(s)... +08/02/24 15:44:57 ****************************************************** +08/02/24 15:44:57 ** condor_scheduniv_exec.271100.0 (CONDOR_DAGMAN) STARTING UP +08/02/24 15:44:57 ** /usr/bin/condor_dagman +08/02/24 15:44:57 ** SubsystemInfo: name=DAGMAN type=DAGMAN(9) class=CLIENT(2) +08/02/24 15:44:57 ** Configuration: subsystem:DAGMAN local: class:CLIENT +08/02/24 15:44:57 ** $CondorVersion: 10.7.0 2024-07-10 BuildID: 659788 PackageID: 10.7.0-0.659788 RC $ +08/02/24 15:44:57 ** $CondorPlatform: x86_64_AlmaLinux8 $ +08/02/24 15:44:57 ** PID = 2340103 +08/02/24 15:44:57 ** Log last touched time unavailable (No such file or directory) +08/02/24 15:44:57 ****************************************************** +08/02/24 15:44:57 Daemon Log is logging: D_ALWAYS D_ERROR D_STATUS +08/02/24 15:44:57 DaemonCore: No command port requested. +08/02/24 15:44:57 DAGMAN_USE_STRICT setting: 1 +08/02/24 15:44:57 DAGMAN_VERBOSITY setting: 3 +08/02/24 15:44:57 DAGMAN_DEBUG_CACHE_SIZE setting: 5242880 +08/02/24 15:44:57 DAGMAN_DEBUG_CACHE_ENABLE setting: False +08/02/24 15:44:57 DAGMAN_SUBMIT_DELAY setting: 0 +08/02/24 15:44:57 DAGMAN_MAX_SUBMIT_ATTEMPTS setting: 6 +08/02/24 15:44:57 DAGMAN_STARTUP_CYCLE_DETECT setting: False +08/02/24 15:44:57 DAGMAN_MAX_SUBMITS_PER_INTERVAL setting: 100 +08/02/24 15:44:57 DAGMAN_AGGRESSIVE_SUBMIT setting: False +08/02/24 15:44:57 DAGMAN_USER_LOG_SCAN_INTERVAL setting: 5 +08/02/24 15:44:57 DAGMAN_QUEUE_UPDATE_INTERVAL setting: 300 +08/02/24 15:44:57 DAGMAN_DEFAULT_PRIORITY setting: 0 +08/02/24 15:44:57 DAGMAN_SUPPRESS_NOTIFICATION setting: True +08/02/24 15:44:57 allow_events (DAGMAN_ALLOW_EVENTS) setting: 114 +08/02/24 15:44:57 DAGMAN_RETRY_SUBMIT_FIRST setting: True +08/02/24 15:44:57 DAGMAN_RETRY_NODE_FIRST setting: False +08/02/24 15:44:57 DAGMAN_MAX_JOBS_IDLE setting: 1000 +08/02/24 15:44:57 DAGMAN_MAX_JOBS_SUBMITTED setting: 0 +08/02/24 15:44:57 DAGMAN_MAX_PRE_SCRIPTS setting: 20 +08/02/24 15:44:57 DAGMAN_MAX_POST_SCRIPTS setting: 20 +08/02/24 15:44:57 DAGMAN_MAX_HOLD_SCRIPTS setting: 20 +08/02/24 15:44:57 DAGMAN_MUNGE_NODE_NAMES setting: True +08/02/24 15:44:57 DAGMAN_PROHIBIT_MULTI_JOBS setting: False +08/02/24 15:44:57 DAGMAN_SUBMIT_DEPTH_FIRST setting: False +08/02/24 15:44:57 DAGMAN_ALWAYS_RUN_POST setting: False +08/02/24 15:44:57 DAGMAN_CONDOR_SUBMIT_EXE setting: /usr/bin/condor_submit +08/02/24 15:44:57 DAGMAN_USE_DIRECT_SUBMIT setting: True +08/02/24 15:44:57 DAGMAN_DEFAULT_APPEND_VARS setting: False +08/02/24 15:44:57 DAGMAN_ABORT_DUPLICATES setting: True +08/02/24 15:44:57 DAGMAN_ABORT_ON_SCARY_SUBMIT setting: True +08/02/24 15:44:57 DAGMAN_PENDING_REPORT_INTERVAL setting: 600 +08/02/24 15:44:57 DAGMAN_AUTO_RESCUE setting: True +08/02/24 15:44:57 DAGMAN_MAX_RESCUE_NUM setting: 100 +08/02/24 15:44:57 DAGMAN_WRITE_PARTIAL_RESCUE setting: True +08/02/24 15:44:57 DAGMAN_DEFAULT_NODE_LOG setting: @(DAG_DIR)/@(DAG_FILE).nodes.log +08/02/24 15:44:57 DAGMAN_GENERATE_SUBDAG_SUBMITS setting: True +08/02/24 15:44:57 DAGMAN_MAX_JOB_HOLDS setting: 100 +08/02/24 15:44:57 DAGMAN_HOLD_CLAIM_TIME setting: 20 +08/02/24 15:44:57 ALL_DEBUG setting: +08/02/24 15:44:57 DAGMAN_DEBUG setting: +08/02/24 15:44:57 DAGMAN_SUPPRESS_JOB_LOGS setting: False +08/02/24 15:44:57 DAGMAN_REMOVE_NODE_JOBS setting: True +08/02/24 15:44:57 DAGMAN will adjust edges after parsing +08/02/24 15:44:57 argv[0] == "condor_scheduniv_exec.271100.0" +08/02/24 15:44:57 argv[1] == "-Lockfile" +08/02/24 15:44:57 argv[2] == "simple.dag.lock" +08/02/24 15:44:57 argv[3] == "-AutoRescue" +08/02/24 15:44:57 argv[4] == "1" +08/02/24 15:44:57 argv[5] == "-DoRescueFrom" +08/02/24 15:44:57 argv[6] == "0" +08/02/24 15:44:57 argv[7] == "-Dag" +08/02/24 15:44:57 argv[8] == "simple.dag" +08/02/24 15:44:57 argv[9] == "-Suppress_notification" +08/02/24 15:44:57 argv[10] == "-CsdVersion" +08/02/24 15:44:57 argv[11] == "$CondorVersion: 10.7.0 2024-07-10 BuildID: 659788 PackageID: 10.7.0-0.659788 RC $" +08/02/24 15:44:57 argv[12] == "-Dagman" +08/02/24 15:44:57 argv[13] == "/usr/bin/condor_dagman" +08/02/24 15:44:57 Default node log file is: +08/02/24 15:44:57 DAG Lockfile will be written to simple.dag.lock +08/02/24 15:44:57 DAG Input file is simple.dag +08/02/24 15:44:57 Parsing 1 dagfiles +08/02/24 15:44:57 Parsing simple.dag ... +08/02/24 15:44:57 Adjusting edges +08/02/24 15:44:57 Dag contains 1 total jobs +08/02/24 15:44:57 Bootstrapping... +08/02/24 15:44:57 Number of pre-completed nodes: 0 +08/02/24 15:44:57 MultiLogFiles: truncating log file /home/mats.rynge/dagman-1/./simple.dag.nodes.log +08/02/24 15:44:57 DAG status: 0 (DAG_STATUS_OK) +08/02/24 15:44:57 Of 1 nodes total: +08/02/24 15:44:57 Done Pre Queued Post Ready Un-Ready Failed Futile +08/02/24 15:44:57 === === === === === === === === +08/02/24 15:44:57 0 0 0 0 1 0 0 0 +08/02/24 15:44:57 0 job proc(s) currently held +08/02/24 15:44:57 Registering condor_event_timer... +08/02/24 15:44:58 Submitting HTCondor Node Simple job(s)... ``` **Here's where the job is submitted** ```file -08/02/23 15:44:58 Submitting HTCondor Node Simple job(s)... -08/02/23 15:44:58 Submitting node Simple from file job.sub using direct job submission -08/02/23 15:44:58 assigned HTCondor ID (271101.0.0) -08/02/23 15:44:58 Just submitted 1 job this cycle... +08/02/24 15:44:58 Submitting HTCondor Node Simple job(s)... +08/02/24 15:44:58 Submitting node Simple from file job.sub using direct job submission +08/02/24 15:44:58 assigned HTCondor ID (271101.0.0) +08/02/24 15:44:58 Just submitted 1 job this cycle... ``` **Here's where DAGMan noticed that the job is running** ```file -08/02/23 15:45:18 Event: ULOG_EXECUTE for HTCondor Node Simple (271101.0.0) {08/02/23 15:45:14} -08/02/23 15:45:18 Number of idle job procs: 0 +08/02/24 15:45:18 Event: ULOG_EXECUTE for HTCondor Node Simple (271101.0.0) {08/02/24 15:45:14} +08/02/24 15:45:18 Number of idle job procs: 0 ``` **Here's where DAGMan noticed that the job finished.** ```file -08/02/23 15:45:23 Event: ULOG_JOB_TERMINATED for HTCondor Node Simple (271101.0.0) {08/02/23 15:45:19} -08/02/23 15:45:23 Number of idle job procs: 0 -08/02/23 15:45:23 Node Simple job proc (271101.0.0) completed successfully. -08/02/23 15:45:23 Node Simple job completed -08/02/23 15:45:23 DAG status: 0 (DAG_STATUS_OK) -08/02/23 15:45:23 Of 1 nodes total: -08/02/23 15:45:23 Done Pre Queued Post Ready Un-Ready Failed Futile -08/02/23 15:45:23 === === === === === === === === -08/02/23 15:45:23 1 0 0 0 0 0 0 0 +08/02/24 15:45:23 Event: ULOG_JOB_TERMINATED for HTCondor Node Simple (271101.0.0) {08/02/24 15:45:19} +08/02/24 15:45:23 Number of idle job procs: 0 +08/02/24 15:45:23 Node Simple job proc (271101.0.0) completed successfully. +08/02/24 15:45:23 Node Simple job completed +08/02/24 15:45:23 DAG status: 0 (DAG_STATUS_OK) +08/02/24 15:45:23 Of 1 nodes total: +08/02/24 15:45:23 Done Pre Queued Post Ready Un-Ready Failed Futile +08/02/24 15:45:23 === === === === === === === === +08/02/24 15:45:23 1 0 0 0 0 0 0 0 ``` **Here's where DAGMan noticed that all the work is done.** ```file -08/02/23 15:45:23 All jobs Completed! -08/02/23 15:45:23 Note: 0 total job deferrals because of -MaxJobs limit (0) -08/02/23 15:45:23 Note: 0 total job deferrals because of -MaxIdle limit (1000) -08/02/23 15:45:23 Note: 0 total job deferrals because of node category throttles -08/02/23 15:45:23 Note: 0 total PRE script deferrals because of -MaxPre limit (20) or DEFER -08/02/23 15:45:23 Note: 0 total POST script deferrals because of -MaxPost limit (20) or DEFER -08/02/23 15:45:23 Note: 0 total HOLD script deferrals because of -MaxHold limit (20) or DEFER +08/02/24 15:45:23 All jobs Completed! +08/02/24 15:45:23 Note: 0 total job deferrals because of -MaxJobs limit (0) +08/02/24 15:45:23 Note: 0 total job deferrals because of -MaxIdle limit (1000) +08/02/24 15:45:23 Note: 0 total job deferrals because of node category throttles +08/02/24 15:45:23 Note: 0 total PRE script deferrals because of -MaxPre limit (20) or DEFER +08/02/24 15:45:23 Note: 0 total POST script deferrals because of -MaxPost limit (20) or DEFER +08/02/24 15:45:23 Note: 0 total HOLD script deferrals because of -MaxHold limit (20) or DEFER ``` Now verify your results: ``` console username@ap40 $ cat simple.log -000 (271101.000.000) 2023-08-02 15:44:58 Job submitted from host: <128.105.68.92:9618?addrs=128.105.68.92-9618+[2607-f388-2200-100-eaeb-d3ff-fe40-111c]-9618&alias=ap40.uw.osg-htc.org&noUDP&sock=schedd_35391_dc5c> +000 (271101.000.000) 2024-08-02 15:44:58 Job submitted from host: <128.105.68.92:9618?addrs=128.105.68.92-9618+[2607-f388-2200-100-eaeb-d3ff-fe40-111c]-9618&alias=ap40.uw.osg-htc.org&noUDP&sock=schedd_35391_dc5c> DAG Node: Simple ... -040 (271101.000.000) 2023-08-02 15:45:13 Started transferring input files +040 (271101.000.000) 2024-08-02 15:45:13 Started transferring input files Transferring to host: <10.136.81.233:37425?CCBID=128.104.103.162:9618%3faddrs%3d128.104.103.162-9618%26alias%3dospool-ccb.osg.chtc.io%26noUDP%26sock%3dcollector4#23067238%20192.170.231.9:9618%3faddrs%3d192.170.231.9-9618+[fd85-ee78-d8a6-8607--1-73b6]-9618%26alias%3dospool-ccb.osgprod.tempest.chtc.io%26noUDP%26sock%3dcollector10#1512850&PrivNet=comp-cc-0463.gwave.ics.psu.edu&addrs=10.136.81.233-37425&alias=comp-cc-0463.gwave.ics.psu.edu&noUDP> ... -040 (271101.000.000) 2023-08-02 15:45:13 Finished transferring input files +040 (271101.000.000) 2024-08-02 15:45:13 Finished transferring input files ... -021 (271101.000.000) 2023-08-02 15:45:14 Warning from starter on slot1_4@glidein_2635188_104012775@comp-cc-0463.gwave.ics.psu.edu: +021 (271101.000.000) 2024-08-02 15:45:14 Warning from starter on slot1_4@glidein_2635188_104012775@comp-cc-0463.gwave.ics.psu.edu: PREPARE_JOB (prepare-hook) succeeded (reported status 000): Using default Singularity image /cvmfs/singularity.opensciencegrid.org/htc/rocky:8-cuda-11.0.3 ... -001 (271101.000.000) 2023-08-02 15:45:14 Job executing on host: <10.136.81.233:39645?CCBID=128.104.103.162:9618%3faddrs%3d128.104.103.162-9618%26alias%3dospool-ccb.osg.chtc.io%26noUDP%26sock%3dcollector10#1506459%20192.170.231.9:9618%3faddrs%3d192.170.231.9-9618+[fd85-ee78-d8a6-8607--1-73b4]-9618%26alias%3dospool-ccb.osgprod.tempest.chtc.io%26noUDP%26sock%3dcollector10#1506644&PrivNet=comp-cc-0463.gwave.ics.psu.edu&addrs=10.136.81.233-39645&alias=comp-cc-0463.gwave.ics.psu.edu&noUDP> +001 (271101.000.000) 2024-08-02 15:45:14 Job executing on host: <10.136.81.233:39645?CCBID=128.104.103.162:9618%3faddrs%3d128.104.103.162-9618%26alias%3dospool-ccb.osg.chtc.io%26noUDP%26sock%3dcollector10#1506459%20192.170.231.9:9618%3faddrs%3d192.170.231.9-9618+[fd85-ee78-d8a6-8607--1-73b4]-9618%26alias%3dospool-ccb.osgprod.tempest.chtc.io%26noUDP%26sock%3dcollector10#1506644&PrivNet=comp-cc-0463.gwave.ics.psu.edu&addrs=10.136.81.233-39645&alias=comp-cc-0463.gwave.ics.psu.edu&noUDP> SlotName: slot1_4@comp-cc-0463.gwave.ics.psu.edu CondorScratchDir = "/localscratch/condor/execute/dir_2635172/glide_uZ6qXM/execute/dir_3252113" Cpus = 1 @@ -236,15 +236,15 @@ username@ap40 $ cat simple.log GLIDEIN_ResourceName = "PSU-LIGO" Memory = 1024 ... -006 (271101.000.000) 2023-08-02 15:45:19 Image size of job updated: 2296464 +006 (271101.000.000) 2024-08-02 15:45:19 Image size of job updated: 2296464 47 - MemoryUsage of job (MB) 47684 - ResidentSetSize of job (KB) ... -040 (271101.000.000) 2023-08-02 15:45:19 Started transferring output files +040 (271101.000.000) 2024-08-02 15:45:19 Started transferring output files ... -040 (271101.000.000) 2023-08-02 15:45:19 Finished transferring output files +040 (271101.000.000) 2024-08-02 15:45:19 Finished transferring output files ... -005 (271101.000.000) 2023-08-02 15:45:19 Job terminated. +005 (271101.000.000) 2024-08-02 15:45:19 Job terminated. (1) Normal termination (return value 0) Usr 0 00:00:00, Sys 0 00:00:00 - Run Remote Usage Usr 0 00:00:00, Sys 0 00:00:00 - Run Local Usage @@ -259,7 +259,7 @@ username@ap40 $ cat simple.log Disk (KB) : 149 1048576 2699079 Memory (MB) : 47 1024 1024 - Job terminated of its own accord at 2023-08-02T20:45:19Z with exit-code 0. + Job terminated of its own accord at 2024-08-02T20:45:19Z with exit-code 0. ... ``` @@ -287,7 +287,7 @@ remove_kill_sig = SIGUSR1 # is killed (e.g., during a reboot). on_exit_remove = (ExitSignal =?= 11 || (ExitCode =!= UNDEFINED && ExitCode >=0 && ExitCode <= 2)) copy_to_spool = False -arguments = "-p 0 -f -l . -Lockfile simple.dag.lock -AutoRescue 1 -DoRescueFrom 0 -Dag simple.dag -Suppress_notification -CsdVersion $CondorVersion:' '10.7.0' '2023-07-10' 'BuildID:' '659788' 'PackageID:' '10.7.0-0.659788' 'RC' '$ -Dagman /usr/bin/condor_dagman" +arguments = "-p 0 -f -l . -Lockfile simple.dag.lock -AutoRescue 1 -DoRescueFrom 0 -Dag simple.dag -Suppress_notification -CsdVersion $CondorVersion:' '10.7.0' '2024-07-10' 'BuildID:' '659788' 'PackageID:' '10.7.0-0.659788' 'RC' '$ -Dagman /usr/bin/condor_dagman" environment = "_CONDOR_DAGMAN_LOG=simple.dag.dagman.out _CONDOR_MAX_DAGMAN_LOG=0 _CONDOR_SCHEDD_ADDRESS_FILE=/var/lib/condor/spool/.schedd_address _CONDOR_SCHEDD_DAEMON_AD_FILE=/var/lib/condor/spool/.schedd_classad" queue ``` diff --git a/docs/materials/workflows/part1-ex3-complex-dag.md b/docs/materials/workflows/part1-ex3-complex-dag.md index 08df972..fb8a310 100644 --- a/docs/materials/workflows/part1-ex3-complex-dag.md +++ b/docs/materials/workflows/part1-ex3-complex-dag.md @@ -146,7 +146,7 @@ Let’s follow the progress of the whole DAG: Total: 1 jobs; 1 running - Updated at 2021-07-28 13:52:57 + Updated at 2024-07-28 13:52:57 **DAGMan has submitted the goatbrot jobs, but they haven't started running yet** @@ -158,7 +158,7 @@ Let’s follow the progress of the whole DAG: Total: 5 jobs; 4 idle, 1 running - Updated at 2021-07-28 13:53:53 + Updated at 2024-07-28 13:53:53 **They're running** @@ -170,7 +170,7 @@ Let’s follow the progress of the whole DAG: Total: 5 jobs; 5 running - Updated at 2021-07-28 13:54:33 + Updated at 2024-07-28 13:54:33 **They finished, but DAGMan hasn't noticed yet. It only checks periodically:** @@ -182,7 +182,7 @@ Let’s follow the progress of the whole DAG: Total: 5 jobs; 4 completed, 1 running - Updated at 2021-07-28 13:55:13 + Updated at 2024-07-28 13:55:13 Eventually, you'll see the montage job submitted, then running, then leave the queue, and then DAGMan will leave the queue. diff --git a/docs/materials/workflows/part1-ex4-failed-dag.md b/docs/materials/workflows/part1-ex4-failed-dag.md index 0011d12..d2e2c04 100644 --- a/docs/materials/workflows/part1-ex4-failed-dag.md +++ b/docs/materials/workflows/part1-ex4-failed-dag.md @@ -33,7 +33,7 @@ queue Submit the DAG again: ``` console -username@learn $ condor_submit_dag goatbrot.dag +username@ap40 $ condor_submit_dag goatbrot.dag ----------------------------------------------------------------------- File for submitting this DAG to Condor : goatbrot.dag.condor.sub Log of DAGMan debugging messages : goatbrot.dag.dagman.out @@ -49,50 +49,50 @@ Submitting job(s). Use watch to watch the jobs until they finish. In a separate window, use `tail --lines=500 -f goatbrot.dag.dagman.out` to watch what DAGMan does. ``` console -06/22/12 17:57:41 Setting maximum accepts per cycle 8. -06/22/12 17:57:41 ****************************************************** -06/22/12 17:57:41 ** condor_scheduniv_exec.77.0 (CONDOR_DAGMAN) STARTING UP -06/22/12 17:57:41 ** /usr/bin/condor_dagman -06/22/12 17:57:41 ** SubsystemInfo: name=DAGMAN type=DAGMAN(10) class=DAEMON(1) -06/22/12 17:57:41 ** Configuration: subsystem:DAGMAN local: class:DAEMON -06/22/12 17:57:41 ** $CondorVersion: 7.7.6 Apr 16 2012 BuildID: 34175 PRE-RELEASE-UWCS $ -06/22/12 17:57:41 ** $CondorPlatform: x86_64_rhap_5.7 $ -06/22/12 17:57:41 ** PID = 26867 -06/22/12 17:57:41 ** Log last touched time unavailable (No such file or directory) -06/22/12 17:57:41 ****************************************************** -06/22/12 17:57:41 Using config source: /etc/condor/condor_config -06/22/12 17:57:41 Using local config sources: -06/22/12 17:57:41 /etc/condor/config.d/00-chtc-global.conf -06/22/12 17:57:41 /etc/condor/config.d/01-chtc-submit.conf -06/22/12 17:57:41 /etc/condor/config.d/02-chtc-flocking.conf -06/22/12 17:57:41 /etc/condor/config.d/03-chtc-jobrouter.conf -06/22/12 17:57:41 /etc/condor/config.d/04-chtc-blacklist.conf -06/22/12 17:57:41 /etc/condor/config.d/99-osg-ss-group.conf -06/22/12 17:57:41 /etc/condor/config.d/99-roy-extras.conf -06/22/12 17:57:41 /etc/condor/condor_config.local +06/22/24 17:57:41 Setting maximum accepts per cycle 8. +06/22/24 17:57:41 ****************************************************** +06/22/24 17:57:41 ** condor_scheduniv_exec.77.0 (CONDOR_DAGMAN) STARTING UP +06/22/24 17:57:41 ** /usr/bin/condor_dagman +06/22/24 17:57:41 ** SubsystemInfo: name=DAGMAN type=DAGMAN(10) class=DAEMON(1) +06/22/24 17:57:41 ** Configuration: subsystem:DAGMAN local: class:DAEMON +06/22/24 17:57:41 ** $CondorVersion: 23.9.0 2024-07-02 BuildID: 742617 PackageID: 23.9.0-0.742617 GitSHA: 5acb07ea RC $ +06/22/24 17:57:41 ** $CondorPlatform: x86_64_AlmaLinux9 $ +06/22/24 17:57:41 ** PID = 26867 +06/22/24 17:57:41 ** Log last touched time unavailable (No such file or directory) +06/22/24 17:57:41 ****************************************************** +06/22/24 17:57:41 Using config source: /etc/condor/condor_config +06/22/24 17:57:41 Using local config sources: +06/22/24 17:57:41 /etc/condor/config.d/00-chtc-global.conf +06/22/24 17:57:41 /etc/condor/config.d/01-chtc-submit.conf +06/22/24 17:57:41 /etc/condor/config.d/02-chtc-flocking.conf +06/22/24 17:57:41 /etc/condor/config.d/03-chtc-jobrouter.conf +06/22/24 17:57:41 /etc/condor/config.d/04-chtc-blacklist.conf +06/22/24 17:57:41 /etc/condor/config.d/99-osg-ss-group.conf +06/22/24 17:57:41 /etc/condor/config.d/99-roy-extras.conf +06/22/24 17:57:41 /etc/condor/condor_config.local ``` Below is where DAGMan realizes that the montage node failed: ```console -06/22/12 18:08:42 Event: ULOG_EXECUTE for Condor Node montage (82.0.0) -06/22/12 18:08:42 Number of idle job procs: 0 -06/22/12 18:08:42 Event: ULOG_IMAGE_SIZE for Condor Node montage (82.0.0) -06/22/12 18:08:42 Event: ULOG_JOB_TERMINATED for Condor Node montage (82.0.0) -06/22/12 18:08:42 Node montage job proc (82.0.0) failed with status 1. -06/22/12 18:08:42 Number of idle job procs: 0 -06/22/12 18:08:42 Of 5 nodes total: -06/22/12 18:08:42 Done Pre Queued Post Ready Un-Ready Failed -06/22/12 18:08:42 === === === === === === === -06/22/12 18:08:42 4 0 0 0 0 0 1 -06/22/12 18:08:42 0 job proc(s) currently held -06/22/12 18:08:42 Aborting DAG... -06/22/12 18:08:42 Writing Rescue DAG to goatbrot.dag.rescue001... -06/22/12 18:08:42 Note: 0 total job deferrals because of -MaxJobs limit (0) -06/22/12 18:08:42 Note: 0 total job deferrals because of -MaxIdle limit (0) -06/22/12 18:08:42 Note: 0 total job deferrals because of node category throttles -06/22/12 18:08:42 Note: 0 total PRE script deferrals because of -MaxPre limit (0) -06/22/12 18:08:42 Note: 0 total POST script deferrals because of -MaxPost limit (0) -06/22/12 18:08:42 **** condor_scheduniv_exec.77.0 (condor_DAGMAN) pid 26867 EXITING WITH STATUS 1 +06/22/24 18:08:42 Event: ULOG_EXECUTE for Condor Node montage (82.0.0) +06/22/24 18:08:42 Number of idle job procs: 0 +06/22/24 18:08:42 Event: ULOG_IMAGE_SIZE for Condor Node montage (82.0.0) +06/22/24 18:08:42 Event: ULOG_JOB_TERMINATED for Condor Node montage (82.0.0) +06/22/24 18:08:42 Node montage job proc (82.0.0) failed with status 1. +06/22/24 18:08:42 Number of idle job procs: 0 +06/22/24 18:08:42 Of 5 nodes total: +06/22/24 18:08:42 Done Pre Queued Post Ready Un-Ready Failed +06/22/24 18:08:42 === === === === === === === +06/22/24 18:08:42 4 0 0 0 0 0 1 +06/22/24 18:08:42 0 job proc(s) currently held +06/22/24 18:08:42 Aborting DAG... +06/22/24 18:08:42 Writing Rescue DAG to goatbrot.dag.rescue001... +06/22/24 18:08:42 Note: 0 total job deferrals because of -MaxJobs limit (0) +06/22/24 18:08:42 Note: 0 total job deferrals because of -MaxIdle limit (0) +06/22/24 18:08:42 Note: 0 total job deferrals because of node category throttles +06/22/24 18:08:42 Note: 0 total PRE script deferrals because of -MaxPre limit (0) +06/22/24 18:08:42 Note: 0 total POST script deferrals because of -MaxPost limit (0) +06/22/24 18:08:42 **** condor_scheduniv_exec.77.0 (condor_DAGMAN) pid 26867 EXITING WITH STATUS 1 ``` DAGMan notices that one of the jobs failed because its exit code was non-zero. DAGMan ran as much of the DAG as possible and logged enough information to continue the run when the situation is resolved. Do you see the part where it wrote the rescue DAG? @@ -100,10 +100,10 @@ DAGMan notices that one of the jobs failed because its exit code was non-zero. D Look at the rescue DAG file. It's called a partial DAG because it indicates what part of the DAG has already been completed. ``` console -username@learn $ cat goatbrot.dag.rescue001 +username@ap40 $ cat goatbrot.dag.rescue001 # Rescue DAG file, created after running # the goatbrot.dag DAG file -# Created 6/22/2012 23:08:42 UTC +# Created 6/22/2024 23:08:42 UTC # Rescue DAG version: 2.0.1 (partial) # # Total number of Nodes: 5 @@ -135,7 +135,7 @@ queue Now we can re-submit our original DAG and DAGMan will pick up where it left off. It will automatically notice the rescue DAG. If you didn't fix the problem, DAGMan would generate another rescue DAG. ``` console -username@learn $ condor_submit_dag goatbrot.dag +username@ap40 $ condor_submit_dag goatbrot.dag Running rescue DAG 1 ----------------------------------------------------------------------- File for submitting this DAG to Condor : goatbrot.dag.condor.sub @@ -148,47 +148,47 @@ Submitting job(s). 1 job(s) submitted to cluster 83. ----------------------------------------------------------------------- -username@learn $ tail -f goatbrot.dag.dagman.out -06/23/12 11:30:53 ****************************************************** -06/23/12 11:30:53 ** condor_scheduniv_exec.83.0 (CONDOR_DAGMAN) STARTING UP -06/23/12 11:30:53 ** /usr/bin/condor_dagman -06/23/12 11:30:53 ** SubsystemInfo: name=DAGMAN type=DAGMAN(10) class=DAEMON(1) -06/23/12 11:30:53 ** Configuration: subsystem:DAGMAN local: class:DAEMON -06/23/12 11:30:53 ** $CondorVersion: 7.7.6 Apr 16 2012 BuildID: 34175 PRE-RELEASE-UWCS $ -06/23/12 11:30:53 ** $CondorPlatform: x86_64_rhap_5.7 $ -06/23/12 11:30:53 ** PID = 28576 -06/23/12 11:30:53 ** Log last touched 6/22 18:08:42 -06/23/12 11:30:53 ****************************************************** -06/23/12 11:30:53 Using config source: /etc/condor/condor_config +username@ap40 $ tail -f goatbrot.dag.dagman.out +06/23/24 11:30:53 ****************************************************** +06/23/24 11:30:53 ** condor_scheduniv_exec.83.0 (CONDOR_DAGMAN) STARTING UP +06/23/24 11:30:53 ** /usr/bin/condor_dagman +06/23/24 11:30:53 ** SubsystemInfo: name=DAGMAN type=DAGMAN(10) class=DAEMON(1) +06/23/24 11:30:53 ** Configuration: subsystem:DAGMAN local: class:DAEMON +06/23/24 11:30:53 ** $CondorVersion: 23.9.0 2024-07-02 BuildID: 742617 PackageID: 23.9.0-0.742617 GitSHA: 5acb07ea RC $ +06/23/24 11:30:53 ** $CondorPlatform: x86_64_AlmaLinux9 $ +06/23/24 11:30:53 ** PID = 28576 +06/23/24 11:30:53 ** Log last touched 6/22 18:08:42 +06/23/24 11:30:53 ****************************************************** +06/23/24 11:30:53 Using config source: /etc/condor/condor_config ... ``` **Here is where DAGMAN notices that there is a rescue DAG** ```hl_lines="3" -06/23/12 11:30:53 Parsing 1 dagfiles -06/23/12 11:30:53 Parsing goatbrot.dag ... -06/23/12 11:30:53 Found rescue DAG number 1; running goatbrot.dag.rescue001 in combination with normal DAG file -06/23/12 11:30:53 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -06/23/12 11:30:53 USING RESCUE DAG goatbrot.dag.rescue001 -06/23/12 11:30:53 Dag contains 5 total jobs +06/23/24 11:30:53 Parsing 1 dagfiles +06/23/24 11:30:53 Parsing goatbrot.dag ... +06/23/24 11:30:53 Found rescue DAG number 1; running goatbrot.dag.rescue001 in combination with normal DAG file +06/23/24 11:30:53 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +06/23/24 11:30:53 USING RESCUE DAG goatbrot.dag.rescue001 +06/23/24 11:30:53 Dag contains 5 total jobs ``` **Shortly thereafter it sees that four jobs have already finished.** ```console -06/23/12 11:31:05 Bootstrapping... -06/23/12 11:31:05 Number of pre-completed nodes: 4 -06/23/12 11:31:05 Registering condor_event_timer... -06/23/12 11:31:06 Sleeping for one second for log file consistency -06/23/12 11:31:07 MultiLogFiles: truncating log file /home/roy/condor/goatbrot/montage.log +06/23/24 11:31:05 Bootstrapping... +06/23/24 11:31:05 Number of pre-completed nodes: 4 +06/23/24 11:31:05 Registering condor_event_timer... +06/23/24 11:31:06 Sleeping for one second for log file consistency +06/23/24 11:31:07 MultiLogFiles: truncating log file /home/roy/condor/goatbrot/montage.log ``` **Here is where DAGMan resubmits the montage job and waits for it to complete.** ```console -06/23/12 11:31:07 Submitting Condor Node montage job(s)... -06/23/12 11:31:07 submitting: condor_submit +06/23/24 11:31:07 Submitting Condor Node montage job(s)... +06/23/24 11:31:07 submitting: condor_submit -a dag_node_name' '=' 'montage -a +DAGManJobId' '=' '83 -a DAGManJobId' '=' '83 @@ -197,48 +197,48 @@ username@learn $ tail -f goatbrot.dag.dagman.out -a FAILED_COUNT' '=' '0 -a +DAGParentNodeNames' '=' '"g1,g2,g3,g4" montage.sub -06/23/12 11:31:07 From submit: Submitting job(s). -06/23/12 11:31:07 From submit: 1 job(s) submitted to cluster 84. -06/23/12 11:31:07 assigned Condor ID (84.0.0) -06/23/12 11:31:07 Just submitted 1 job this cycle... -06/23/12 11:31:07 Currently monitoring 1 Condor log file(s) -06/23/12 11:31:07 Event: ULOG_SUBMIT for Condor Node montage (84.0.0) -06/23/12 11:31:07 Number of idle job procs: 1 -06/23/12 11:31:07 Of 5 nodes total: -06/23/12 11:31:07 Done Pre Queued Post Ready Un-Ready Failed -06/23/12 11:31:07 === === === === === === === -06/23/12 11:31:07 4 0 1 0 0 0 0 -06/23/12 11:31:07 0 job proc(s) currently held -06/23/12 11:40:22 Currently monitoring 1 Condor log file(s) -06/23/12 11:40:22 Event: ULOG_EXECUTE for Condor Node montage (84.0.0) -06/23/12 11:40:22 Number of idle job procs: 0 -06/23/12 11:40:22 Event: ULOG_IMAGE_SIZE for Condor Node montage (84.0.0) -06/23/12 11:40:22 Event: ULOG_JOB_TERMINATED for Condor Node montage (84.0.0) +06/23/24 11:31:07 From submit: Submitting job(s). +06/23/24 11:31:07 From submit: 1 job(s) submitted to cluster 84. +06/23/24 11:31:07 assigned Condor ID (84.0.0) +06/23/24 11:31:07 Just submitted 1 job this cycle... +06/23/24 11:31:07 Currently monitoring 1 Condor log file(s) +06/23/24 11:31:07 Event: ULOG_SUBMIT for Condor Node montage (84.0.0) +06/23/24 11:31:07 Number of idle job procs: 1 +06/23/24 11:31:07 Of 5 nodes total: +06/23/24 11:31:07 Done Pre Queued Post Ready Un-Ready Failed +06/23/24 11:31:07 === === === === === === === +06/23/24 11:31:07 4 0 1 0 0 0 0 +06/23/24 11:31:07 0 job proc(s) currently held +06/23/24 11:40:22 Currently monitoring 1 Condor log file(s) +06/23/24 11:40:22 Event: ULOG_EXECUTE for Condor Node montage (84.0.0) +06/23/24 11:40:22 Number of idle job procs: 0 +06/23/24 11:40:22 Event: ULOG_IMAGE_SIZE for Condor Node montage (84.0.0) +06/23/24 11:40:22 Event: ULOG_JOB_TERMINATED for Condor Node montage (84.0.0) ``` **This is where the montage finished.** ```console -06/23/12 11:40:22 Node montage job proc (84.0.0) completed successfully. -06/23/12 11:40:22 Node montage job completed -06/23/12 11:40:22 Number of idle job procs: 0 -06/23/12 11:40:22 Of 5 nodes total: -06/23/12 11:40:22 Done Pre Queued Post Ready Un-Ready Failed -06/23/12 11:40:22 === === === === === === === -06/23/12 11:40:22 5 0 0 0 0 0 0 -06/23/12 11:40:22 0 job proc(s) currently held +06/23/24 11:40:22 Node montage job proc (84.0.0) completed successfully. +06/23/24 11:40:22 Node montage job completed +06/23/24 11:40:22 Number of idle job procs: 0 +06/23/24 11:40:22 Of 5 nodes total: +06/23/24 11:40:22 Done Pre Queued Post Ready Un-Ready Failed +06/23/24 11:40:22 === === === === === === === +06/23/24 11:40:22 5 0 0 0 0 0 0 +06/23/24 11:40:22 0 job proc(s) currently held ``` **And here DAGMan decides that the work is all done.** ```console -06/23/12 11:40:22 All jobs Completed! -06/23/12 11:40:22 Note: 0 total job deferrals because of -MaxJobs limit (0) -06/23/12 11:40:22 Note: 0 total job deferrals because of -MaxIdle limit (0) -06/23/12 11:40:22 Note: 0 total job deferrals because of node category throttles -06/23/12 11:40:22 Note: 0 total PRE script deferrals because of -MaxPre limit (0) -06/23/12 11:40:22 Note: 0 total POST script deferrals because of -MaxPost limit (0) -06/23/12 11:40:22 **** condor_scheduniv_exec.83.0 (condor_DAGMAN) pid 28576 EXITING WITH STATUS 0 +06/23/24 11:40:22 All jobs Completed! +06/23/24 11:40:22 Note: 0 total job deferrals because of -MaxJobs limit (0) +06/23/24 11:40:22 Note: 0 total job deferrals because of -MaxIdle limit (0) +06/23/24 11:40:22 Note: 0 total job deferrals because of node category throttles +06/23/24 11:40:22 Note: 0 total PRE script deferrals because of -MaxPre limit (0) +06/23/24 11:40:22 Note: 0 total POST script deferrals because of -MaxPost limit (0) +06/23/24 11:40:22 **** condor_scheduniv_exec.83.0 (condor_DAGMAN) pid 28576 EXITING WITH STATUS 0 ``` Success! Now go ahead and clean up.