Skip to content

Commit

Permalink
Merge pull request #19 from rynge/rynge-data
Browse files Browse the repository at this point in the history
Bring the data exercises to 2024
  • Loading branch information
xamberl authored Jul 31, 2024
2 parents 975ff92 + becce65 commit 4a282f2
Show file tree
Hide file tree
Showing 5 changed files with 22 additions and 20 deletions.
10 changes: 5 additions & 5 deletions docs/materials/data/part1-ex1-data-needs.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,14 +35,14 @@ genome information.
1. Copy the BLAST executables:

:::console
user@ap40 $ wget http://proxy.chtc.wisc.edu/SQUID/osg-school-2023/ncbi-blast-2.15.0+-x64-linux.tar.gz
user@ap40 $ tar -xzvf ncbi-blast-2.15.0+-x64-linux.tar.gz
user@ap40 $ wget http://proxy.chtc.wisc.edu/SQUID/osg-school-2024/ncbi-blast-2.12.0+-x64-linux.tar.gz
user@ap40 $ tar -xzvf ncbi-blast-2.12.0+-x64-linux.tar.gz

1. Download these files to your current directory:

:::console
user@ap40 $ wget http://proxy.chtc.wisc.edu/SQUID/osg-school-2023/pdbaa.tar.gz
user@ap40 $ wget http://proxy.chtc.wisc.edu/SQUID/osg-school-2023/mouse.fa
user@ap40 $ wget http://proxy.chtc.wisc.edu/SQUID/osg-school-2024/pdbaa.tar.gz
user@ap40 $ wget http://proxy.chtc.wisc.edu/SQUID/osg-school-2024/mouse.fa

1. Untar the `pdbaa` database:

Expand All @@ -55,7 +55,7 @@ Understanding BLAST
Remember that `blastx` is executed in a command like the following:

``` console
user@ap40 $ ./ncbi-blast-2.15.0+/bin/blastx -db <DATABASE ROOTNAME> -query <INPUT FILE> -out <RESULTS FILE>
user@ap40 $ ./ncbi-blast-2.12.0+/bin/blastx -db <DATABASE ROOTNAME> -query <INPUT FILE> -out <RESULTS FILE>
```

In the above, the `<INPUT FILE>` is the name of a file containing a number of genetic sequences (e.g. `mouse.fa`), and
Expand Down
2 changes: 1 addition & 1 deletion docs/materials/data/part1-ex2-file-transfer.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ log = test.log
request_memory =
request_disk =
request_cpus = 1
requirements = (OSGVO_OS_STRING == "RHEL 8")
requirements = (OSGVO_OS_STRING == "RHEL 9")
queue
```

Expand Down
4 changes: 3 additions & 1 deletion docs/materials/data/part1-ex3-blast-split.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,9 @@ Follow the below steps:

transfer_input_files = ... , $(inputfile)

5. Update the memory and disk requests, since the new input file is larger and will also produce larger output.
5. Remove or comment out `transfer_output_files` and `transfer_output_remaps`.

6. Update the memory and disk requests, since the new input file is larger and will also produce larger output.
It may be best to overestimate to something like 1 GB for each.

### Modify the wrapper file
Expand Down
10 changes: 5 additions & 5 deletions docs/materials/data/part2-ex1-osdf-inputs.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,9 +39,9 @@ Place the Database in OSDF

OSDF provides a directory for you to store data which can be accessed through the caching servers.
First, you need to move your BLAST database (`pdbaa_files.tar.gz`) into this directory. For `ap40.uw.osg-htc.org`, the directory
to use is `/ospool/PROTECTED/[USERNAME]/`
to use is `/ospool/ap40/data/[USERNAME]/`

As the `PROTECTED` directory name indicates, your files placed in the directory will only be accessible
Note that files placed in the `/ospool/ap40/data/[USERNAME]/` directory will only be accessible
by your own jobs.

Modify the Submit File and Wrapper
Expand All @@ -51,10 +51,10 @@ You will have to modify the wrapper and submit file to use OSDF:

1. HTCondor knows how to do OSDF transfers, so you just have to provide the correct URL in
`transfer_input_files`. Note there is no servername (3 slashes in :///) and we instead
is is just based on namespace (`/ospool/PROTECTED` in this case):
is is just based on namespace (`/ospool/ap40` in this case):

::file
transfer_input_files = blastx, $(inputfile), osdf:///ospool/PROTECTED/<USERNAME>/pdbaa_files.tar.gz
transfer_input_files = blastx, $(inputfile), osdf:///ospool/ap40/data/[USERNAME]/pdbaa_files.tar.gz

1. Confirm that your queue statement is correct for the current directory. It should be something like:

Expand Down Expand Up @@ -86,7 +86,7 @@ Note: Keeping OSDF 'Clean'

Just as for any data directory, it is VERY important to remove old files from OSDF when you no longer need them,
especially so that you'll have plenty of space for such files in the future.
For example, you would delete (`rm`) files from `/ospool/PROTECTED/` on when you don't need them there
For example, you would delete (`rm`) files from `/ospool/ap40/data/[USERNAME]/` on when you don't need them there
anymore, but only after all jobs have finished.
The next time you use OSDF after the school, remember to first check for old files that you can delete.

Expand Down
16 changes: 8 additions & 8 deletions docs/materials/data/part2-ex2-osdf-outputs.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ To get the exercise set up:
directory first):

:::console
user@ap40 $ cd /ospool/PROTECTED/[USERNAME]/
user@ap40 $ cd /ospool/ap40/data/[USERNAME]/
user@ap40 $ wget http://proxy.chtc.wisc.edu/SQUID/osg-school-2023/ducks.mov
user@ap40 $ wget http://proxy.chtc.wisc.edu/SQUID/osg-school-2023/teaching.mov
user@ap40 $ wget http://proxy.chtc.wisc.edu/SQUID/osg-school-2023/test_open_terminal.mov
Expand Down Expand Up @@ -84,6 +84,8 @@ An example of that script is below:
Ultimately we'll want to submit several jobs (one for each `.mov` file), but to start with, we'll run one job to make
sure that everything works.

Remember to `chmod +x run_ffmpeg.sh` to make the script executable.

Submit File
-----------

Expand All @@ -95,16 +97,16 @@ Create a submit file for this job, based on other submit files from the school.

1. Add the same requirements as the previous exercise:

requirements = (OSGVO_OS_STRING == "RHEL 8")
requirements = (OSGVO_OS_STRING == "RHEL 9")

1. We need to transfer the `ffmpeg` program that we downloaded above, and the movie from OSDF:

transfer_input_files = ffmpeg, osdf:///ospool/PROTECTED/[USERNAME]/test_open_terminal.mov
transfer_input_files = ffmpeg, osdf:///ospool/ap40/data/[USERNAME]/test_open_terminal.mov

1. Transfer outputs via OSDF. This requires a transfer remap:

transfer_output_files = test_open_terminal.mp4
transfer_output_remaps = "test_open_terminal.mp4 = osdf:///ospool/PROTECTED/[USERNAME]/test_open_terminal.mp4"
transfer_output_remaps = "test_open_terminal.mp4 = osdf:///ospool/ap40/data/[USERNAME]/test_open_terminal.mp4"


Initial Job
Expand Down Expand Up @@ -147,8 +149,6 @@ The final script should look like this:
./ffmpeg -i $1 -b:v 400k -s 640x360 $2
```

Note that we use the input file name multiple times in our script, so we'll have to use `$1` multiple times as well.

### Modify your submit file

1. We now need to tell each job what arguments to use.
Expand All @@ -163,13 +163,13 @@ Note that we use the input file name multiple times in our script, so we'll have
1. Update the `transfer_input_files` to have `$(mov)`:

:::file
transfer_input_files = ffmpeg, osdf:///ospool/PROTECTED/[USERNAME]/$(mov)
transfer_input_files = ffmpeg, osdf:///ospool/ap40/data/[USERNAME]/$(mov)

1. Similarly, update the output/remap with `$(mov).mp4`:

:::file
transfer_output_files = $(mov).mp4
transfer_output_remaps = "$(mov).mp4 = osdf:///ospool/PROTECTED/[USERNAME]/$(mov).mp4"
transfer_output_remaps = "$(mov).mp4 = osdf:///ospool/ap40/data/[USERNAME]/$(mov).mp4"

1. To set these arguments, we will use the `queue .. from` syntax.
In our submit file, we can then change our queue statement to:
Expand Down

0 comments on commit 4a282f2

Please sign in to comment.