From 22ae6b562e7d4934fe48cc13e1ff72283c511425 Mon Sep 17 00:00:00 2001 From: Mats Rynge Date: Tue, 30 Jul 2024 16:32:52 -0700 Subject: [PATCH 1/2] Bring the data exercises to 2024 --- docs/materials/data/part1-ex1-data-needs.md | 6 +++--- docs/materials/data/part1-ex2-file-transfer.md | 2 +- docs/materials/data/part1-ex3-blast-split.md | 4 +++- docs/materials/data/part2-ex1-osdf-inputs.md | 10 +++++----- docs/materials/data/part2-ex2-osdf-outputs.md | 16 ++++++++-------- 5 files changed, 20 insertions(+), 18 deletions(-) diff --git a/docs/materials/data/part1-ex1-data-needs.md b/docs/materials/data/part1-ex1-data-needs.md index 30266f7..6290da0 100644 --- a/docs/materials/data/part1-ex1-data-needs.md +++ b/docs/materials/data/part1-ex1-data-needs.md @@ -35,8 +35,8 @@ genome information. 1. Copy the BLAST executables: :::console - user@ap40 $ wget http://proxy.chtc.wisc.edu/SQUID/osg-school-2023/ncbi-blast-2.15.0+-x64-linux.tar.gz - user@ap40 $ tar -xzvf ncbi-blast-2.15.0+-x64-linux.tar.gz + user@ap40 $ wget http://proxy.chtc.wisc.edu/SQUID/osg-school-2023/ncbi-blast-2.12.0+-x64-linux.tar.gz + user@ap40 $ tar -xzvf ncbi-blast-2.12.0+-x64-linux.tar.gz 1. Download these files to your current directory: @@ -55,7 +55,7 @@ Understanding BLAST Remember that `blastx` is executed in a command like the following: ``` console -user@ap40 $ ./ncbi-blast-2.15.0+/bin/blastx -db -query -out +user@ap40 $ ./ncbi-blast-2.12.0+/bin/blastx -db -query -out ``` In the above, the `` is the name of a file containing a number of genetic sequences (e.g. `mouse.fa`), and diff --git a/docs/materials/data/part1-ex2-file-transfer.md b/docs/materials/data/part1-ex2-file-transfer.md index f3133f7..40ca95d 100644 --- a/docs/materials/data/part1-ex2-file-transfer.md +++ b/docs/materials/data/part1-ex2-file-transfer.md @@ -53,7 +53,7 @@ log = test.log request_memory = request_disk = request_cpus = 1 -requirements = (OSGVO_OS_STRING == "RHEL 8") +requirements = (OSGVO_OS_STRING == "RHEL 9") queue ``` diff --git a/docs/materials/data/part1-ex3-blast-split.md b/docs/materials/data/part1-ex3-blast-split.md index bb7a724..8641974 100644 --- a/docs/materials/data/part1-ex3-blast-split.md +++ b/docs/materials/data/part1-ex3-blast-split.md @@ -93,7 +93,9 @@ Follow the below steps: transfer_input_files = ... , $(inputfile) -5. Update the memory and disk requests, since the new input file is larger and will also produce larger output. +5. Remove or comment out `transfer_output_files` and `transfer_output_remaps`. + +6. Update the memory and disk requests, since the new input file is larger and will also produce larger output. It may be best to overestimate to something like 1 GB for each. ### Modify the wrapper file diff --git a/docs/materials/data/part2-ex1-osdf-inputs.md b/docs/materials/data/part2-ex1-osdf-inputs.md index bc2e3e5..9300bf2 100644 --- a/docs/materials/data/part2-ex1-osdf-inputs.md +++ b/docs/materials/data/part2-ex1-osdf-inputs.md @@ -39,9 +39,9 @@ Place the Database in OSDF OSDF provides a directory for you to store data which can be accessed through the caching servers. First, you need to move your BLAST database (`pdbaa_files.tar.gz`) into this directory. For `ap40.uw.osg-htc.org`, the directory -to use is `/ospool/PROTECTED/[USERNAME]/` +to use is `/ospool/ap40/data/[USERNAME]/` -As the `PROTECTED` directory name indicates, your files placed in the directory will only be accessible +Note that files placed in the `/ospool/ap40/data/[USERNAME]/` directory will only be accessible by your own jobs. Modify the Submit File and Wrapper @@ -51,10 +51,10 @@ You will have to modify the wrapper and submit file to use OSDF: 1. HTCondor knows how to do OSDF transfers, so you just have to provide the correct URL in `transfer_input_files`. Note there is no servername (3 slashes in :///) and we instead - is is just based on namespace (`/ospool/PROTECTED` in this case): + is is just based on namespace (`/ospool/ap40` in this case): ::file - transfer_input_files = blastx, $(inputfile), osdf:///ospool/PROTECTED//pdbaa_files.tar.gz + transfer_input_files = blastx, $(inputfile), osdf:///ospool/ap40/data/[USERNAME]/pdbaa_files.tar.gz 1. Confirm that your queue statement is correct for the current directory. It should be something like: @@ -86,7 +86,7 @@ Note: Keeping OSDF 'Clean' Just as for any data directory, it is VERY important to remove old files from OSDF when you no longer need them, especially so that you'll have plenty of space for such files in the future. -For example, you would delete (`rm`) files from `/ospool/PROTECTED/` on when you don't need them there +For example, you would delete (`rm`) files from `/ospool/ap40/data/[USERNAME]/` on when you don't need them there anymore, but only after all jobs have finished. The next time you use OSDF after the school, remember to first check for old files that you can delete. diff --git a/docs/materials/data/part2-ex2-osdf-outputs.md b/docs/materials/data/part2-ex2-osdf-outputs.md index 750412f..60d6de7 100644 --- a/docs/materials/data/part2-ex2-osdf-outputs.md +++ b/docs/materials/data/part2-ex2-osdf-outputs.md @@ -24,7 +24,7 @@ To get the exercise set up: directory first): :::console - user@ap40 $ cd /ospool/PROTECTED/[USERNAME]/ + user@ap40 $ cd /ospool/ap40/data/[USERNAME]/ user@ap40 $ wget http://proxy.chtc.wisc.edu/SQUID/osg-school-2023/ducks.mov user@ap40 $ wget http://proxy.chtc.wisc.edu/SQUID/osg-school-2023/teaching.mov user@ap40 $ wget http://proxy.chtc.wisc.edu/SQUID/osg-school-2023/test_open_terminal.mov @@ -84,6 +84,8 @@ An example of that script is below: Ultimately we'll want to submit several jobs (one for each `.mov` file), but to start with, we'll run one job to make sure that everything works. +Remember to `chmod +x run_ffmpeg.sh` to make the script executable. + Submit File ----------- @@ -95,16 +97,16 @@ Create a submit file for this job, based on other submit files from the school. 1. Add the same requirements as the previous exercise: - requirements = (OSGVO_OS_STRING == "RHEL 8") + requirements = (OSGVO_OS_STRING == "RHEL 9") 1. We need to transfer the `ffmpeg` program that we downloaded above, and the movie from OSDF: - transfer_input_files = ffmpeg, osdf:///ospool/PROTECTED/[USERNAME]/test_open_terminal.mov + transfer_input_files = ffmpeg, osdf:///ospool/ap40/data/[USERNAME]/test_open_terminal.mov 1. Transfer outputs via OSDF. This requires a transfer remap: transfer_output_files = test_open_terminal.mp4 - transfer_output_remaps = "test_open_terminal.mp4 = osdf:///ospool/PROTECTED/[USERNAME]/test_open_terminal.mp4" + transfer_output_remaps = "test_open_terminal.mp4 = osdf:///ospool/ap40/data/[USERNAME]/test_open_terminal.mp4" Initial Job @@ -147,8 +149,6 @@ The final script should look like this: ./ffmpeg -i $1 -b:v 400k -s 640x360 $2 ``` -Note that we use the input file name multiple times in our script, so we'll have to use `$1` multiple times as well. - ### Modify your submit file 1. We now need to tell each job what arguments to use. @@ -163,13 +163,13 @@ Note that we use the input file name multiple times in our script, so we'll have 1. Update the `transfer_input_files` to have `$(mov)`: :::file - transfer_input_files = ffmpeg, osdf:///ospool/PROTECTED/[USERNAME]/$(mov) + transfer_input_files = ffmpeg, osdf:///ospool/ap40/data/[USERNAME]/$(mov) 1. Similarly, update the output/remap with `$(mov).mp4`: :::file transfer_output_files = $(mov).mp4 - transfer_output_remaps = "$(mov).mp4 = osdf:///ospool/PROTECTED/[USERNAME]/$(mov).mp4" + transfer_output_remaps = "$(mov).mp4 = osdf:///ospool/ap40/data/[USERNAME]/$(mov).mp4" 1. To set these arguments, we will use the `queue .. from` syntax. In our submit file, we can then change our queue statement to: From becce658d581146be13f348affcc38dab9d136b3 Mon Sep 17 00:00:00 2001 From: Amber Lim <59936462+xamberl@users.noreply.github.com> Date: Wed, 31 Jul 2024 15:22:10 -0500 Subject: [PATCH 2/2] Update part1-ex1-data-needs.md Update squid link from osg-school-2023 -> osg-school-2024 --- docs/materials/data/part1-ex1-data-needs.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/materials/data/part1-ex1-data-needs.md b/docs/materials/data/part1-ex1-data-needs.md index 6290da0..14600fd 100644 --- a/docs/materials/data/part1-ex1-data-needs.md +++ b/docs/materials/data/part1-ex1-data-needs.md @@ -35,14 +35,14 @@ genome information. 1. Copy the BLAST executables: :::console - user@ap40 $ wget http://proxy.chtc.wisc.edu/SQUID/osg-school-2023/ncbi-blast-2.12.0+-x64-linux.tar.gz + user@ap40 $ wget http://proxy.chtc.wisc.edu/SQUID/osg-school-2024/ncbi-blast-2.12.0+-x64-linux.tar.gz user@ap40 $ tar -xzvf ncbi-blast-2.12.0+-x64-linux.tar.gz 1. Download these files to your current directory: :::console - user@ap40 $ wget http://proxy.chtc.wisc.edu/SQUID/osg-school-2023/pdbaa.tar.gz - user@ap40 $ wget http://proxy.chtc.wisc.edu/SQUID/osg-school-2023/mouse.fa + user@ap40 $ wget http://proxy.chtc.wisc.edu/SQUID/osg-school-2024/pdbaa.tar.gz + user@ap40 $ wget http://proxy.chtc.wisc.edu/SQUID/osg-school-2024/mouse.fa 1. Untar the `pdbaa` database: