diff --git a/docs/hello_nextflow/02_hello_channels.md b/docs/hello_nextflow/02_hello_channels.md index 59b18c1e..0cd0a902 100644 --- a/docs/hello_nextflow/02_hello_channels.md +++ b/docs/hello_nextflow/02_hello_channels.md @@ -23,18 +23,26 @@ It is equivalent to the script produced by working through Part 1 of this traini Just to make sure everything is working, run the script once before making any changes: ```bash -nextflow run hello-channels.nf --greet 'Hello Channels!' +nextflow run hello-channels.nf --greeting 'Hello Channels!' ``` ```console title="Output" N E X T F L O W ~ version 24.10.0 -Launching `hello-channels.nf` [silly_moriondo] DSL2 - revision: a0dfbc86fe +Launching `hello-channels.nf` [insane_lichterman] DSL2 - revision: c33d41f479 executor > local (1) -[03/b321a3] sayHello [100%] 1 of 1 ✔ +[86/9efa08] sayHello | 1 of 1 ✔ ``` +As previously, you will find the output file named `output.txt` in the `results` directory (specified by the `publishDir` directive). + +```console title="output.txt" linenums="1" +Hello Channels! +``` + +If that worked for you, you're ready to learn about channels. + --- ## 1. Provide variable inputs via a channel explicitly @@ -44,41 +52,43 @@ We are going to create a **channel** to pass the variable input to the `sayHello ### 1.1. Create an input channel There are a variety of **channel factories** that we can use to set up a channel. -To keep things simple for now, we are going to use the simplest possible channel factory, which will create a simple channel containing a single value. -Functionally this be exactly equivalent to how we had it set up before. +To keep things simple for now, we are going to use the most basic channel factory, called `Channel.of`, which will create a channel containing a single value. +Functionally this will be exactly equivalent to how we had it set up before, but explicit instead of implicit. -This is the line of code to do it: +This is the line of code we're going to use: -`greeting_ch = Channel.of('Hello world!')` +```console title="Syntax" +greeting_ch = Channel.of('Hello Channels!') +``` -This creates a channel called `greeting_ch` using the `Channel.of()` channel factory, which sets up a simple value channel, and gives it the string `'Hello world!'` to use as the greeting value. +This creates a channel called `greeting_ch` using the `Channel.of()` channel factory, which sets up a simple value channel, and loads the string `'Hello Channels!'` to use as the greeting value. !!! note - We are temporarily switching back to hardcoded strings instead of using a CLI parameter for the sake of simplicity. We'll go back to using CLI parameters once we've covered what's happening at the level of the channel. + We are temporarily switching back to hardcoded strings instead of using a CLI parameter for the sake of readability. We'll go back to using CLI parameters once we've covered what's happening at the level of the channel. In the workflow block, add the channel factory code: _Before:_ -```groovy title="hello-channels.nf" linenums="21" +```groovy title="hello-channels.nf" linenums="27" workflow { // emit a greeting - sayHello(params.greet) + sayHello(params.greeting) } ``` _After:_ -```groovy title="hello-channels.nf" linenums="21" +```groovy title="hello-channels.nf" linenums="27" workflow { // create a channel for inputs - greeting_ch = Channel.of('Hello world!') + greeting_ch = Channel.of('Hello Channels!') // emit a greeting - sayHello(params.greet) + sayHello(params.greeting) } ``` @@ -92,24 +102,24 @@ In the workflow block, make the following code change: _Before:_ -```groovy title="hello-channels.nf" linenums="26" +```groovy title="hello-channels.nf" linenums="27" workflow { // create a channel for inputs - greeting_ch = Channel.of('Hello world!') + greeting_ch = Channel.of('Hello Channels!') // emit a greeting - sayHello(params.greet) + sayHello(params.greeting) } ``` _After:_ -```groovy title="hello-channels.nf" linenums="26" +```groovy title="hello-channels.nf" linenums="27" workflow { // create a channel for inputs - greeting_ch = Channel.of('Hello world!') + greeting_ch = Channel.of('Hello Channels!') // emit a greeting sayHello(greeting_ch) @@ -118,7 +128,7 @@ workflow { This tells Nextflow to run the `sayHello` process on the contents of the `greeting_ch` channel. -Now it's fully functional; it's the explicit equivalent of writing `sayHello('Hello world!')`. +Now our workflow is properly functional; it is the explicit equivalent of writing `sayHello('Hello Channels!')`. ### 1.3. Run the workflow command again @@ -130,25 +140,30 @@ nextflow run hello-channels.nf If you made both edits correctly, you should get another successful execution: -```console title="Output" +```console title="Output" linenums="1" N E X T F L O W ~ version 24.10.0 - ┃ Launching `hello-channels.nf` [prickly_avogadro] DSL2 - revision: b58b6ab94b +Launching `hello-channels.nf` [nice_heisenberg] DSL2 - revision: 41b4aeb7e9 executor > local (1) -[1f/50efd5] sayHello (1) [100%] 1 of 1 ✔ +[3b/f2b109] sayHello (1) | 1 of 1 ✔ +``` + +You can check the results directory to satisfy yourself that the outcome is still the same as previously. + +```console title="output.txt" linenums="1" +Hello Channels! ``` -Feel free to check the results directory to satisfy yourself that the outcome is still the same as previously. So far we're just progressively tweaking the code to increase the flexibility of our workflow while achieving the same end result. !!! note - This may seem like we're writing more code for no tangible benefit, but the value will become clear as soon as we start handling more complex inputs. + This may seem like we're writing more code for no tangible benefit, but the value will become clear as soon as we start handling more inputs. ### Takeaway -You know how to use a simple channel to provide an input to a process. +You know how to use a basic channel factory to provide an input to a process. ### What's next? @@ -171,14 +186,14 @@ In the workflow block, make the following code change: _Before:_ -```groovy title="hello-channels.nf" linenums="46" +```groovy title="hello-channels.nf" linenums="29" // create a channel for inputs -greeting_ch = Channel.of('Hello') +greeting_ch = Channel.of('Hello Channels') ``` _After:_ -```groovy title="hello-channels.nf" linenums="46" +```groovy title="hello-channels.nf" linenums="29" // create a channel for inputs greeting_ch = Channel.of('Hello','Bonjour','Holà') ``` @@ -193,20 +208,22 @@ Let's try it. nextflow run hello-channels.nf ``` -Well, it certainly seems to run just fine: +It certainly seems to run just fine: -```console title="Output" +```console title="Output" linenums="1" N E X T F L O W ~ version 24.10.0 - ┃ Launching `hello-channels.nf` [lonely_pare] DSL2 - revision: b9f1d96905 +Launching `hello-channels.nf` [suspicious_lamport] DSL2 - revision: 778deadaea executor > local (3) -[3d/1fe62c] sayHello (2) [100%] 3 of 3 ✔ +[cd/77a81f] sayHello (3) | 3 of 3 ✔ ``` -However... This seems to indicate that '3 of 3' calls were made for the process, which is encouraging, but this only shows us a single run of the process, with one subdirectory path (`3d/1fe62c...`). What's going on? +However... This seems to indicate that '3 of 3' calls were made for the process, which is encouraging, but this only shows us a single run of the process, with one subdirectory path (`cd/77a81f`). +What's going on? -By default, the ANSI logging system writes the logging from multiple calls to the same process on the same line. Fortunately, we can disable that behavior. +By default, the ANSI logging system writes the logging from multiple calls to the same process on the same line. +Fortunately, we can disable that behavior to see the full list of process calls. #### 2.1.3. Run the command again with the `-ansi-log false` option @@ -218,17 +235,23 @@ nextflow run hello-channels.nf -ansi-log false This time we see all three process runs and their associated work subdirectories listed in the output: -```console title="Output" -N E X T F L O W ~ version 24.02.0-edge -Launching `hello-channels.nf` [big_woese] DSL2 - revision: 53f20aeb70 -[62/d81e63] Submitted process > sayHello (1) -[19/507af3] Submitted process > sayHello (2) -[8a/3126e6] Submitted process > sayHello (3) +```console title="Output" linenums="1" +N E X T F L O W ~ version 24.10.0 +Launching `hello-channels.nf` [pensive_poitras] DSL2 - revision: 778deadaea +[76/f61695] Submitted process > sayHello (1) +[6e/d12e35] Submitted process > sayHello (3) +[c1/097679] Submitted process > sayHello (2) ``` That's much better; at least for a simple workflow. For a complex workflow, or a large number of inputs, having the full list output to the terminal might get a bit overwhelming, so you might not choose to use `-ansi-log false` in those cases. +!!! note + + The way the status is reported is a bit different between the two logging modes. + In the condensed mode, Nextflow reports whether calls were completed successfully or not. + In this expanded mode, it only reports that they were submitted. + That being said, we have another problem. If you look in the `results` directory, there is only one file: `output.txt`! ```console title="Directory contents" @@ -238,7 +261,12 @@ results What's up with that? Shouldn't we be expecting a separate file per input greeting, so three files in all? Did all three greetings go into a single file? -You can check the contents of `output.txt`; you will find only one of the three. + +You can check the contents of `output.txt`; you will find only one of the three, containing one of the three greetings we provided. + +```console title="output.txt" linenums="1" +Bonjour +``` You may recall that we hardcoded the output file name for the `sayHello` process, so all three calls produced a file called `output.txt`. You can check the work subdirectories for each of the three processes; each of them contains a file called `output.txt` as expected. @@ -251,7 +279,8 @@ But when the `publishDir` directive copies each of them to the same `results` di We can continue publishing all the outputs to the same results directory, but we need to ensure they will have unique names. Specifically, we need to modify the first process to generate a file name dynamically so that the final file names will be unique. -So how do we make the file names unique? A common way to do that is to use some unique piece of metadata from the inputs (received from the input channel) as part of the output file name. +So how do we make the file names unique? +A common way to do that is to use some unique piece of metadata from the inputs (received from the input channel) as part of the output file name. Here, for convenience, we'll just use the greeting itself since it's just a short string, and prepend it to the base output filename. #### 2.2.1. Construct a dynamic output file name @@ -260,7 +289,7 @@ In the process block, make the following code changes: _Before:_ -```groovy title="hello-channels.nf" linenums="11" +```groovy title="hello-channels.nf" linenums="6" process sayHello { publishDir 'results', mode: 'copy' @@ -280,7 +309,7 @@ process sayHello { _After:_ -```groovy title="hello-channels.nf" linenums="11" +```groovy title="hello-channels.nf" linenums="6" process sayHello { publishDir 'results', mode: 'copy' @@ -306,9 +335,9 @@ Make sure to replace `output.txt` in both the output definition and in the `scri This should produce a unique output file name every time the process is called, so that it can be distinguished from the outputs from other iterations of the same process in the output directory. -#### 2.2.2. Run the workflow and look at the results directory +#### 2.2.2. Run the workflow -Let's run it and check that it works. +Let's run it: ```bash nextflow run hello-channels.nf @@ -316,16 +345,16 @@ nextflow run hello-channels.nf Reverting back to the summary view, the output looks like this again: -```console title="Output" +```console title="Output" linenums="1" N E X T F L O W ~ version 24.10.0 - ┃ Launching `hello-channels.nf` [jovial_mccarthy] DSL2 - revision: 53f20aeb70 +Launching `hello-channels.nf` [astonishing_bell] DSL2 - revision: f57ff44a69 executor > local (3) -[03/f007f2] sayHello (1) [100%] 3 of 3 ✔ +[2d/90a2e2] sayHello (1) | 3 of 3 ✔ ``` -But more importantly, now we have three new files in addition to the one we already had in the `results` directory: +Importantly, now we have three new files in addition to the one we already had in the `results` directory: ```console title="Directory contents" results @@ -335,12 +364,27 @@ results └── output.txt ``` +They each have the expected contents: + +```console title="Bonjour-output.txt" linenums="1" +Bonjour +``` + +```console title="Hello-output.txt" linenums="1" +Hello +``` + +```console title="Holà-output.txt" linenums="1" +Holà +``` + Success! Now we can add as many greetings as we like without worrying about output files being overwritten. !!! note In practice, naming files based on the input data itself is almost always impractical. - The better way to generate dynamic filenames is to pass metatdata to a process along with the input files. We can derive that metadata from a sample sheet as we're reading the input files. + The better way to generate dynamic filenames is to pass metadata to a process along with the input files. + The metadata is typically provided via a 'sample sheet' or equivalents. You'll learn how to do that later in your Nextflow training. ### Takeaway @@ -362,7 +406,7 @@ What if we wanted to provide those multiple inputs in a different form? For example, imagine we set up an input variable containing an array of elements like this: -`greetings = ['Hello','Bonjour','Holà']` +`greetings_array = ['Hello','Bonjour','Holà']` Can we load that into our output channel and expect it to work? Let's find out. @@ -372,44 +416,47 @@ Common sense suggests we should be able to simply pass in an array of values ins #### 3.1.1. Set up the input variable -Since we already have the `params.greet` declared, let's hijack that by changing its value to an array as follows: +Let's take the `greetings_array` variable we just imagined and make it a reality by adding it to the workflow block: _Before:_ -```groovy title="hello-channels.nf" linenums="23" -/* - * Pipeline parameters - */ -params.greet = 'Holà mundo' +```groovy title="hello-channels.nf" linenums="27" +workflow { + + // create a channel for inputs + greeting_ch = Channel.of('Hello','Bonjour','Holà') ``` _After:_ -```groovy title="hello-channels.nf" linenums="23" -/* - * Pipeline parameters - */ -params.greet = ['Hello','Bonjour','Holà'] +```groovy title="hello-channels.nf" linenums="27" +workflow { + + // declare an array of input greetings + greetings_array = ['Hello','Bonjour','Holà'] + + // create a channel for inputs + greeting_ch = Channel.of('Hello','Bonjour','Holà') ``` -#### 3.1.2. Set `params.greet` as the input to the channel factory +#### 3.1.2. Set array of greetings as the input to the channel factory -We replace the hardcoded values `'Hello','Bonjour','Holà'` with the `params.greet` that we just updated to contain the array `['Hello','Bonjour','Holà']`. +We're going to replace the values `'Hello','Bonjour','Holà'` currently hardcoded in the channel factory with the `greetings_array` we just created. -Modify the following code: +In the workflow block, make the following change: _Before:_ -```groovy title="hello-channels.nf" linenums="23" -// create a channel for inputs -greeting_ch = Channel.of('Hello','Bonjour','Holà') +```groovy title="hello-channels.nf" linenums="32" + // create a channel for inputs + greeting_ch = Channel.of('Hello','Bonjour','Holà') ``` _After:_ -```groovy title="hello-channels.nf" linenums="23" -// create a channel for inputs -greeting_ch = Channel.of(params.greet) +```groovy title="hello-channels.nf" linenums="32" + // create a channel for inputs + greeting_ch = Channel.of(greetings_array) ``` #### 3.1.3. Run the workflow @@ -422,7 +469,13 @@ nextflow run hello-channels.nf Oh no! Nextflow throws an error that starts like this: -```console title="Output" +```console title="Output" linenums="1" + N E X T F L O W ~ version 24.10.0 + +Launching `hello-channels.nf` [friendly_koch] DSL2 - revision: 97256837a7 + +executor > local (1) +[22/57e015] sayHello (1) | 0 of 1 ERROR ~ Error executing process > 'sayHello (1)' Caused by: @@ -446,50 +499,50 @@ If you skim through the [list of operators](https://www.nextflow.io/docs/latest/ #### 3.2.1. Add the `flatten()` operator -To apply the `flatten()` operator to our input channel, we simply append it to the channel factory declaration. +To apply the `flatten()` operator to our input channel, we append it to the channel factory declaration. In the workflow block, make the following code change: _Before:_ -```groovy title="hello-channels.nf" linenums="29" -// create a channel for inputs -greeting_ch = Channel.of(params.greeting) +```groovy title="hello-channels.nf" linenums="31" + // create a channel for inputs + greeting_ch = Channel.of(greetings_array) ``` _After:_ -```groovy title="hello-channels.nf" linenums="29" -// create a channel for inputs -greeting_ch = Channel.of(params.greeting) - .flatten() +```groovy title="hello-channels.nf" linenums="31" + // create a channel for inputs + greeting_ch = Channel.of(greetings_array) + .flatten() ``` -Here we added the operator on the next line for readability, but you can add operators on the same line as the channel factory if you prefer. +Here we added the operator on the next line for readability, but you can add operators on the same line as the channel factory if you prefer, like this: `greeting_ch = Channel.of(greetings_array).flatten()` #### 3.2.2. Add `view()` to inspect channel contents We could run this right away to test if it works, but while we're at it, we're also going to add a couple of [`view()`](https://www.nextflow.io/docs/latest/reference/operator.html#view) directives, which allow us to inspect the contents of a channel. -You can think of `view()` as a debugging tool, like a `print()` statement in Python, if you're familiar with that. +You can think of `view()` as a debugging tool, like a `print()` statement in Python, or its equivalent in other languages. In the workflow block, make the following code change: _Before:_ -```groovy title="hello-channels.nf" linenums="29" -// create a channel for inputs -greeting_ch = Channel.of(params.greeting) - .flatten() +```groovy title="hello-channels.nf" linenums="31" + // create a channel for inputs + greeting_ch = Channel.of(greetings_array) + .flatten() ``` _After:_ -```groovy title="hello-channels.nf" linenums="29" -// create a channel for inputs -greeting_ch = Channel.of(params.greeting) - .view { "Before flatten: $it" } - .flatten() - .view { "After flatten: $it" } +```groovy title="hello-channels.nf" linenums="31" + // create a channel for inputs + greeting_ch = Channel.of(greetings_array) + .view { "Before flatten: $it" } + .flatten() + .view { "After flatten: $it" } ``` Here `$it` is an implicit variable that represents each individual item loaded in a channel. @@ -504,17 +557,17 @@ nextflow run hello-channels.nf This time it works AND gives us the additional insight into what the contents of the channel look like before and after we run the `flatten()` operator: -TODO UPDATE +```console title="Output" linenums="1" + N E X T F L O W ~ version 24.10.0 -```console title="Output" -Launching `hello-channels.nf` [irreverent_shaw] DSL2 - revision: b3a71cc376 +Launching `hello-channels.nf` [tiny_elion] DSL2 - revision: 1d834f23d2 executor > local (3) -[20/8f32e4] sayHello (1) [100%] 3 of 3 ✔ -Before flatten: [Holà mundo, Konnichiwa, Dobrý den] -After flatten: Holà mundo -After flatten: Konnichiwa -After flatten: Dobrý den +[8e/bb08f3] sayHello (2) | 3 of 3 ✔ +Before flatten: [Hello, Bonjour, Holà] +After flatten: Hello +After flatten: Bonjour +After flatten: Holà ``` You see that we get a single `Before flatten:` statement because at that point the channel contains one item, the original array. @@ -526,12 +579,14 @@ Importantly, this means each item can now be processed separately by the workflo You should delete or comment out the `view()` statements before moving on. - ```groovy title="hello-channels.nf" linenums="29" + ```groovy title="hello-channels.nf" linenums="31" // create a channel for inputs - greeting_ch = Channel.of(params.greeting) + greeting_ch = Channel.of(greetings_array) .flatten() ``` + We left them in the `hello-channels-3.nf` solution file for reference purposes. + ### Takeaway You know how to use an operator like `flatten()` to transform the contents of a channel, and how to use the `view()` directive to inspect channel contents before and after applying an operator. @@ -542,62 +597,74 @@ Learn how to make the workflow take a file as its source of input values. --- -## 4. Use an operator to read in a file as the source of input values +## 4. Use an operator to parse input values from a CSV file It's often the case that, when we want to run on multiple inputs, the input values are contained in a file. -As an example, we prepared a CSV file called `greetings.csv` in the `data/` directory, containing several greetings separated by commas. +As an example, we prepared a CSV file called `greetings.csv` containing several greetings, one on each line (like a column of data). -```csv title="greetings.csv" -Hello,Bonjour,Holà +```csv title="greetings.csv" linenums="1" +Hello +Bonjour +Holà ``` -So we need to modify our workflow to read in the values from a file like that. +So now we need to modify our workflow to read in the values from a file like that. + +### 4.1. Modify the script to expect a CSV file as the source of greetings + +To get started, we're going to need to make two key changes to the script: + +- Switch the input parameter to point to the CSV file +- Switch to a channel factory designed to handle a file -### 4.1. Switch the input parameter to point to the CSV file +#### 4.1.1. Switch the input parameter to point to the CSV file + +Remember the `params.greeting` parameter we set up in Part1? +We're going to update it to point to the CSV file containing our greetings. In the workflow block, make the following code change: _Before:_ -```groovy title="hello-channels.nf" linenums="23" +```groovy title="hello-channels.nf" linenums="25" /* * Pipeline parameters */ -params.greet = ['Hello','Bonjour','Holà'] +params.greeting = ['Hello','Bonjour','Holà'] ``` _After:_ -```groovy title="hello-channels.nf" linenums="23" +```groovy title="hello-channels.nf" linenums="25" /* * Pipeline parameters */ -params.greeting = 'data/greetings.csv' +params.greeting = 'greetings.csv' ``` -### 4.2. Switch to a channel factory designed to handle a file +#### 4.1.2. Switch to a channel factory designed to handle a file -Since we now want to use a file instead of a simple value as the input, we can't use the `Channel.of()` channel factory from before. -We need to switch to using a new channel factory, `Channel.fromPath()`, which has some built-in functionality for handling file paths. +Since we now want to use a file instead of simple values as the input, we can't use the `Channel.of()` channel factory from before. +We need to switch to using a new channel factory, [`Channel.fromPath()`](https://www.nextflow.io/docs/latest/reference/channel.html#channel-path), which has some built-in functionality for handling file paths. In the workflow block, make the following code change: _Before:_ -```groovy title="hello-channels.nf" linenums="29" -// create a channel for inputs -greeting_ch = Channel.of(params.greeting) - .flatten() +```groovy title="hello-channels.nf" linenums="31" + // create a channel for inputs + greeting_ch = Channel.of(greetings_array) + .flatten() ``` _After:_ -```groovy title="hello-channels.nf" linenums="29" -// create a channel for inputs from a CSV file -greeting_ch = Channel.fromPath(params.greeting) +```groovy title="hello-channels.nf" linenums="31" + // create a channel for inputs from a CSV file + greeting_ch = Channel.fromPath(params.greeting) ``` -### 4.3. Run the workflow +#### 4.1.3. Run the workflow Let's try running the workflow with the new channel factory and the input file. @@ -605,46 +672,55 @@ Let's try running the workflow with the new channel factory and the input file. nextflow run hello-channels.nf ``` -Oh no, another error. This one starts like this: +Oh no, it doesn't work. Here's the start of the console output and error message: -```console title="Output" +```console title="Output" linenums="1" + N E X T F L O W ~ version 24.10.0 + +Launching `hello-channels.nf` [adoring_bhabha] DSL2 - revision: 8ce25edc39 + +[- ] sayHello | 0 of 1 ERROR ~ Error executing process > 'sayHello (1)' Caused by: - File `/workspace/gitpod/hello-nextflow/data/greetings.csv-output.txt` is outside the scope of the process work directory: /workspace/gitpod/hello-nextflow/work/cf/8f1c7f433eed3d79e2fa4d1ca81e10 + File `/workspace/gitpod/hello-nextflow/data/greetings.csv-output.txt` is outside the scope of the process work directory: /workspace/gitpod/hello-nextflow/work/e3/c459b3c8f4029094cc778c89a4393d + Command executed: - echo '/workspace/gitpod/hello-nextflow/data/greetings.csv' > '/workspace/gitpod/hello-nextflow/data/greetings.csv-output.txt' + echo '/workspace/gitpod/hello-nextflow/data/greetings.csv' > '/workspace/gitpod/hello-nextflow/data/greetings. ``` -The `Command executed:` bit is especially helpful here (you may need to scroll down a bit to find it). +The `Command executed:` bit (lines 13-15) is especially helpful here. -Once again it looks like Nextflow tried to run a single process call, but using the file path itself as a string value. -So it has resolved the file path correctly, but it didn't open the file, which is what we wanted. +This may look a little bit familiar. +It looks like Nextflow tried to run a single process call using the file path itself as a string value. +So it has resolved the file path correctly, but it didn't actually parse its contents, which is what we wanted. How do we get Nextflow to open the file and load its contents into the channel? -Sounds like we need another [operator](https://www.nextflow.io/docs/latest/reference/operator.html). +Sounds like we need another [operator](https://www.nextflow.io/docs/latest/reference/operator.html)! -### 4.4. Add `splitCsv()` operator +### 4.2. Use the `splitCsv()` operator to parse the file Looking through the list of operators again, we find [`splitCsv()`](https://www.nextflow.io/docs/latest/reference/operator.html#splitCsv), which is designed to parse and split CSV-formatted text. -To apply the operator, add it to the channel construction instruction like previously; and we're also going to include view statements while we're at it. +#### 4.2.1. Apply `splitCsv()` to the channel + +To apply the operator, we append it to the channel factory line like previously. In the workflow block, make the following code change: _Before:_ -```groovy title="hello-channels.nf" linenums="46" +```groovy title="hello-channels.nf" linenums="31" // create a channel for inputs from a CSV file greeting_ch = Channel.fromPath(params.greeting) ``` _After:_ -```groovy title="hello-channels.nf" linenums="46" +```groovy title="hello-channels.nf" linenums="31" // create a channel for inputs from a CSV file greeting_ch = Channel.fromPath(params.greeting) .view { "Before splitCsv: $it" } @@ -652,7 +728,9 @@ greeting_ch = Channel.fromPath(params.greeting) .view { "After splitCsv: $it" } ``` -### 4.5. Run the workflow again +As you can see, we also include before/after view statements while we're at it. + +#### 4.2.2. Run the workflow again Let's try running the workflow with the added CSV-parsing logic. @@ -660,34 +738,64 @@ Let's try running the workflow with the added CSV-parsing logic. nextflow run hello-channels.nf ``` -Sadly, this fails too. The console output and error starts like this: +Interestingly, this fails too, but with a different error. The console output and error starts like this: -```console title="Output" -Before splitCsv: /workspace/gitpod/hello-nextflow/data/greetings.csv -After splitCsv: [Hello, Bonjour, Holà] -ERROR ~ Error executing process > 'sayHello (1)' +```console title="Output" linenums="1" + N E X T F L O W ~ version 24.10.0 + +Launching `hello-channels.nf` [stoic_ride] DSL2 - revision: a0e5de507e + +executor > local (3) +[42/8fea64] sayHello (1) | 0 of 3 +Before splitCsv: /workspace/gitpod/hello-nextflow/greetings.csv +After splitCsv: [Hello] +After splitCsv: [Bonjour] +After splitCsv: [Holà] +ERROR ~ Error executing process > 'sayHello (2)' Caused by: - Missing output file(s) `[Hello, Bonjour, Holà]-output.txt` expected by process `sayHello (1)` + Missing output file(s) `[Bonjour]-output.txt` expected by process `sayHello (2)` + + +Command executed: + + echo '[Bonjour]' > '[Bonjour]-output.txt' ``` -Okay, this looks a bit familiar. It looks like what happened earlier the first time we tried to run on an array of values. -So Nextflow has successfully loaded the file contents into the channel, but as a single item. -Once again we're going to need to split it up. +This time Nextflow has parsed the contents of the file (yay!) but it's added brackets around the greetings. + +Long story short, `splitCsv()` reads each line into an array, and each comma-separated value in the line becomes an element in the array. +So here it gives us three arrays containing one element each. + +!!! note + + Even if this behavior feels inconvenient right now, it's going to be extremely useful later when we deal with input files with multiple columns of data. + +We could solve this by using `flatten()`, which you already know. +However, there's another operator called `map()` that's more appropriate to use here and is really useful to know; it pops up a lot in Nextflow pipelines. + +### 4.3. Use the `map()` operator to extract the greetings + +The `map()` operator is a very handy little tool that allows us to do all kinds of mappings to the contents of a channel. + +In this case, we're going to use it to extract that one element that we want from each line of our file. +This is what the syntax looks like: -Can we use the same solution as that time? +```groovy title="Syntax" +.map { item -> item[0] } +``` -### 4.6. Add `flatten()` operator +This means 'for each item in the channel, take the first of any elements it contains'. -Remember `flatten()`? We love `flatten()`. +So let's apply that to our CVS parsing. -You know the drill now: we're going to add the operator to the channel construction instruction, and include another `view()` call. +#### 4.3.1. Apply `map()` to the channel In the workflow block, make the following code change: _Before:_ -```groovy title="hello-channels.nf" linenums="29" +```groovy title="hello-channels.nf" linenums="31" // create a channel for inputs from a CSV file greeting_ch = Channel.fromPath(params.greeting) .view { "Before splitCsv: $it" } @@ -697,17 +805,19 @@ greeting_ch = Channel.fromPath(params.greeting) _After:_ -```groovy title="hello-channels.nf" linenums="29" +```groovy title="hello-channels.nf" linenums="31" // create a channel for inputs from a CSV file greeting_ch = Channel.fromPath(params.greeting) .view { "Before splitCsv: $it" } .splitCsv() .view { "After splitCsv: $it" } - .flatten() - .view { "After flatten: $it" } + .map { item -> item[0] } + .view { "After map: $it" } ``` -### 4.7. Run the workflow one more time +Once again we include another `view()` call to confirm that the operator does what we expect. + +#### 4.3.2. Run the workflow one more time Let's run it one more time: @@ -715,41 +825,56 @@ Let's run it one more time: nextflow run hello-channels.nf ``` -This time it should run without error! +This time it should run without error. + +```console title="Output" linenums="1" + N E X T F L O W ~ version 24.10.0 + +Launching `hello-channels.nf` [tiny_heisenberg] DSL2 - revision: 845b471427 -```console title="Output" executor > local (3) -[99/657682] sayHello (3) [100%] 3 of 3 ✔ -Before splitCsv: /workspace/gitpod/hello-nextflow/data/greetings.csv -After splitCsv: [Hello, Bonjour, Holà] -After flatten: Hello -After flatten: Bonjour -After flatten: Holà +[1a/1d19ab] sayHello (2) | 3 of 3 ✔ +Before splitCsv: /workspace/gitpod/hello-nextflow/greetings.csv +After splitCsv: [Hello] +After splitCsv: [Bonjour] +After splitCsv: [Holà] +After map: Hello +After map: Bonjour +After map: Holà ``` Looking at the output of the `view()` statements, we see the following: - A single `Before splitCsv:` statement: at that point the channel contains one item, the original file path. -- A single `After splitCsv:` statement: at that point the channel still contains only one item, an array containing the three values. -- Three separate `After flatten:` statements: one for each greeting, which are now individual items in the channel. +- Three separate `After splitCsv:` statements: one for each greeting, but each is contained within an array that corresponds to that line in the file. +- Three separate `After map:` statements: one for each greeting, which are now individual items in the channel. -We can also look at the output files, which show that each greeting was correctly extracted and processed through the workflow. +You can also look at the output files to verify that each greeting was correctly extracted and processed through the workflow. We've achieved the same result as previously, but now we have a lot more flexibility to add more elements to the channel of greetings we want to process by modifying an input file, without modifying any code. !!! note Here we had all greetings on one line in the CSV file. - You can try adding more lines to the CSV file and see what happens, with and without the `flatten()` operator. - You'll learn how to handle more complex forms of inputs in a later training. + You can try adding more columns to the CSV file and see what happens; for example, try the following: + + ```csv title="greetings.csv" + Hello,English + Bonjour,French + Holà,Spanish + ``` + + You can also try replacing `.map { item -> item[0] }` with `.flatten()` and see what happens depending on how many lines and columns you have in the input file. + + You'll learn learn more advanced approaches for handling complex inputs in a later training. ### Takeaway -You know how to use the operators `splitCsv()` and `flatten()` to read in a file of input values and handle them appropriately. +You know how to use the operators `splitCsv()` and `map()` to read in a file of input values and handle them appropriately. More generally, you have a basic understanding of how Nextflow uses **channels** to manage inputs to processes and **operators** to transform their contents. ### What's next? -Take a break! +Take a big break, you worked hard in this one! When you're ready, move on to Part 3 to learn how to add more steps and connect them together into a proper workflow. diff --git a/hello-nextflow/hello-channels.nf b/hello-nextflow/hello-channels.nf new file mode 100644 index 00000000..6236eea2 --- /dev/null +++ b/hello-nextflow/hello-channels.nf @@ -0,0 +1,31 @@ +#!/usr/bin/env nextflow + +/* + * Use echo to print 'Hello World!' to a file + */ +process sayHello { + + publishDir 'results', mode: 'copy' + + input: + val greeting + + output: + path 'output.txt' + + script: + """ + echo '$greeting' > output.txt + """ +} + +/* + * Pipeline parameters + */ +params.greeting = 'Holà mundo!' + +workflow { + + // emit a greeting + sayHello(params.greeting) +} diff --git a/hello-nextflow/hello-workflow.nf b/hello-nextflow/hello-workflow.nf new file mode 100644 index 00000000..e3f1855c --- /dev/null +++ b/hello-nextflow/hello-workflow.nf @@ -0,0 +1,36 @@ +#!/usr/bin/env nextflow + +/* + * Use echo to print 'Hello World!' to a file + */ +process sayHello { + + publishDir 'results', mode: 'copy' + + input: + val greeting + + output: + path "${greeting}-output.txt" + + script: + """ + echo '$greeting' > '$greeting-output.txt' + """ +} + +/* + * Pipeline parameters + */ +params.greeting = 'greetings.csv' + +workflow { + + // create a channel for inputs from a CSV file + greeting_ch = Channel.fromPath(params.greeting) + .splitCsv() + .map { line -> line[0] } + + // emit a greeting + sayHello(greeting_ch) +} diff --git a/hello-nextflow/solutions/2-hello-channels/greetings-4.csv b/hello-nextflow/solutions/2-hello-channels/greetings-4.csv new file mode 100644 index 00000000..ce7e03e2 --- /dev/null +++ b/hello-nextflow/solutions/2-hello-channels/greetings-4.csv @@ -0,0 +1,3 @@ +Hello,English +Bonjour,French +Holà,Spanish diff --git a/hello-nextflow/solutions/2-hello-channels/hello-channels-1.nf b/hello-nextflow/solutions/2-hello-channels/hello-channels-1.nf new file mode 100644 index 00000000..6190b616 --- /dev/null +++ b/hello-nextflow/solutions/2-hello-channels/hello-channels-1.nf @@ -0,0 +1,34 @@ +#!/usr/bin/env nextflow + +/* + * Use echo to print 'Hello World!' to a file + */ +process sayHello { + + publishDir 'results', mode: 'copy' + + input: + val greeting + + output: + path 'output.txt' + + script: + """ + echo '$greeting' > output.txt + """ +} + +/* + * Pipeline parameters + */ +params.greeting = 'Holà mundo!' + +workflow { + + // create a channel for inputs + greeting_ch = Channel.of('Hello Channels!') + + // emit a greeting + sayHello(greeting_ch) +} diff --git a/hello-nextflow/solutions/2-hello-channels/hello-channels-2.nf b/hello-nextflow/solutions/2-hello-channels/hello-channels-2.nf new file mode 100644 index 00000000..5e7a2d50 --- /dev/null +++ b/hello-nextflow/solutions/2-hello-channels/hello-channels-2.nf @@ -0,0 +1,34 @@ +#!/usr/bin/env nextflow + +/* + * Use echo to print 'Hello World!' to a file + */ +process sayHello { + + publishDir 'results', mode: 'copy' + + input: + val greeting + + output: + path "${greeting}-output.txt" + + script: + """ + echo '$greeting' > '$greeting-output.txt' + """ +} + +/* + * Pipeline parameters + */ +params.greeting = 'Holà mundo!' + +workflow { + + // create a channel for inputs + greeting_ch = Channel.of('Hello','Bonjour','Holà') + + // emit a greeting + sayHello(greeting_ch) +} diff --git a/hello-nextflow/solutions/2-hello-channels/hello-channels-3.nf b/hello-nextflow/solutions/2-hello-channels/hello-channels-3.nf new file mode 100644 index 00000000..4762f579 --- /dev/null +++ b/hello-nextflow/solutions/2-hello-channels/hello-channels-3.nf @@ -0,0 +1,39 @@ +#!/usr/bin/env nextflow + +/* + * Use echo to print 'Hello World!' to a file + */ +process sayHello { + + publishDir 'results', mode: 'copy' + + input: + val greeting + + output: + path "${greeting}-output.txt" + + script: + """ + echo '$greeting' > '$greeting-output.txt' + """ +} + +/* + * Pipeline parameters + */ +params.greeting = 'Holà mundo' + +workflow { + + greetings_array = ['Hello','Bonjour','Holà'] + + // create a channel for inputs + greeting_ch = Channel.of(greetings_array) + .view { "Before flatten: $it" } + .flatten() + .view { "After flatten: $it" } + + // emit a greeting + sayHello(greeting_ch) +} diff --git a/hello-nextflow/solutions/2-hello-channels/hello-channels-4.nf b/hello-nextflow/solutions/2-hello-channels/hello-channels-4.nf new file mode 100644 index 00000000..6db3c7dc --- /dev/null +++ b/hello-nextflow/solutions/2-hello-channels/hello-channels-4.nf @@ -0,0 +1,41 @@ +#!/usr/bin/env nextflow + +/* + * Use echo to print 'Hello World!' to a file + */ +process sayHello { + + publishDir 'results', mode: 'copy' + + input: + val greeting + + output: + path "${greeting}-output.txt" + + script: + """ + echo '$greeting' > '$greeting-output.txt' + """ +} + +/* + * Pipeline parameters + */ +params.greeting = 'greetings.csv' + +workflow { + + greetings_array = ['Hello','Bonjour','Holà'] + + // create a channel for inputs from a CSV file + greeting_ch = Channel.fromPath(params.greeting) + .view { "Before splitCsv: $it" } + .splitCsv() + .view { "After splitCsv: $it" } + .map { line -> line[0] } + .view { "After map: $it" } + + // emit a greeting + sayHello(greeting_ch) +}