-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #13 from FAST-HEP/kreczko-first-light
Very basic functionality
- Loading branch information
Showing
21 changed files
with
440 additions
and
54 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -159,3 +159,6 @@ Thumbs.db | |
|
||
# copier | ||
.copier* | ||
|
||
# VSCode | ||
.vscode/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,90 @@ | ||
# Hello world | ||
|
||
Sometimes you just want to see some code. This section contains some real-life | ||
examples of how to use `fasthep-flow`. | ||
|
||
```yaml | ||
stages: | ||
- name: "hello_world in bash" | ||
type: "fasthep_flow.operators.BashOperator" | ||
kwargs: | ||
bash_command: echo "Hello World!" | ||
``` | ||
Save this to a file called `hello_world.yaml`. | ||
|
||
```bash | ||
fasthep-flow execute hello_world.yaml | ||
``` | ||
|
||
This will print "Hello World!" to the console. | ||
|
||
So far so good, but what does it actually do? Let's to execute this | ||
step-by-step. | ||
|
||
## Creating a flow | ||
|
||
The first thing that `fasthep-flow` does is to create a flow. This is done by | ||
creating a `prefect.Flow` object, and adding a task for each step in the YAML | ||
file. The task is created by the `fasthep-flow` operator, and the parameters are | ||
passed to the task as keyword arguments. | ||
|
||
We can do this ourselves by creating a flow and adding a task to it. | ||
|
||
```python | ||
from fasthep.operators import BashOperator | ||
from prefect import Flow | ||
flow = Flow("hello_world") | ||
task = BashOperator(bash_command="echo 'Hello World!'") | ||
flow.add_task(task) | ||
``` | ||
|
||
## Running the flow | ||
|
||
Next we have to decide how to execute this flow. By default, `fasthep-flow` will | ||
run the flow on the local machine. This is done by calling `flow.run()`. | ||
|
||
```python | ||
flow.run() | ||
``` | ||
|
||
## Running the flow on a cluster | ||
|
||
The real strength of `fasthep-flow` is that it can run the flow on a cluster | ||
with the same config file. Internally, this is done by creating a Dask workflow | ||
first, and then running it on the specified cluster (e.g. HTCondor or Google | ||
Cloud Composer). For now, let's just run it on a local Dask cluster. | ||
|
||
```bash | ||
fasthep-flow execute hello_world.yaml --executor DaskLocal | ||
``` | ||
|
||
This will start a Dask cluster on your local machine, and run the flow on it. | ||
While the output will be the same, you will find additional output files for | ||
Dask performance. | ||
|
||
## Provenance | ||
|
||
In a real-world scenario, you would want to keep track of the provenance of your | ||
flow. This is done automatically by `fasthep-flow`, and you can find the | ||
provenance in the `output/provenance` folder. | ||
|
||
For more information, see [Provenance](./provenance.md). | ||
|
||
So what does this look like for our hello world example? | ||
|
||
```bash | ||
tree output | ||
``` | ||
|
||
## Next steps | ||
|
||
This was a very simple example, but it shows the basic concepts of | ||
`fasthep-flow`. For more realistic examples, see the experiment specific | ||
examples in [Examples](./examples/index.md). For more advanced examples, see | ||
[Advanced Examples](./advanced_examples/index.md). | ||
|
||
``` | ||
|
||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
stages: | ||
- name: "hello_world in bash" | ||
type: "fasthep_flow.operators.BashOperator" | ||
kwargs: | ||
bash_command: echo | ||
arguments: ["Hello World!"] | ||
- name: "touch /tmp/date.txt" | ||
type: "fasthep_flow.operators.BashOperator" | ||
kwargs: | ||
bash_command: touch | ||
arguments: ["/tmp/date.txt"] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
# Provenance | ||
|
||
## What is Provenance? | ||
|
||
Provenance refers to the detailed history of the origin, lineage, and changes | ||
made to data throughout its lifecycle. It encompasses the documentation of | ||
processes, inputs, outputs, and transformations that data undergoes, providing a | ||
comprehensive audit trail that can be used to verify the data's integrity and | ||
authenticity. | ||
|
||
## Why Does Provenance Matter for Scientific Data Analysis? | ||
|
||
In scientific data analysis, provenance is crucial as it ensures the | ||
reproducibility and reliability of results. It enables researchers to trace back | ||
through the analysis workflow to understand how data was altered, what | ||
computational steps were performed, and by whom. This traceability is essential | ||
for validating research findings, facilitating peer reviews, and enabling other | ||
researchers to replicate and build upon the work. | ||
|
||
## Our Approach to Provenance in the YAML Config | ||
|
||
To integrate provenance into our workflows, we introduce a dedicated provenance | ||
section within the YAML configuration. This section describes which metadata | ||
should be captured, e.g. version of the dataset used, the origin of the data, | ||
the specific parameters set for each analysis stage, and the individual | ||
responsible for each step (taken from git history). By embedding this | ||
information directly into the workflow configuration, we ensure that every step | ||
of data processing is transparent and traceable. This not only adheres to best | ||
practices in scientific data handling but also empowers users to conduct robust | ||
and transparent analyses. | ||
|
||
### Example | ||
|
||
```yaml | ||
provenance: | ||
datasets: | ||
source: fasthep-curator # Specifies the tool used for dataset curation | ||
analysis: | ||
include: | ||
- steps # Enumerates the individual steps taken in the analysis | ||
- parameters # Parameters used at each step for reproducibility | ||
- git # Git commit hash, branch, and status for version control | ||
- performance # Metrics to measure the efficiency of the analysis | ||
- environment # Software environment, including library versions | ||
- hardware # Hardware specifications where the analysis was run | ||
airflow: | ||
include: | ||
- db # Database configurations and states within Airflow | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.