-
Notifications
You must be signed in to change notification settings - Fork 4
Intro to Luigi
Curtis Rose edited this page Apr 11, 2017
·
1 revision
Intro to Luigi
Luigi is a pipeline management tool that was created and is maintained by Spotify. It is mostly used for teams that create and deal with bioinformatic pipelines. This is the main reason we chose it as our pipeline management tool. A Luigi pipeline can be simplified down to a series of tasks. These tasks each have three parts: requires, output, and run.
- Requires is the function where you specify which other Tasks need to be complete before this Task can start. Luigi will use this to compute the task dependency graph.
- Output is the function where you specify where the output of this Task is produced. Luigi will check whether this output (specified as a Target) exists to determine whether the Task needs to run at all.
- Run is the function that Luigi calls to run the Task. You can do anything you want in here, from calling python methods to running shell scripts to calling APIs.
Online Tutorial: http://help.mortardata.com/technologies/luigi/first_luigi_script
Here is an example of a very basic Luigi Task:
import luigi
class MyExampleTask(luigi.Task):
# Example parameter for the task: a
# date for which a report should be run
report_date = luigi.DateParameter()
def requires(self):
return [MyUpstreamTask(self.report_date)]
def output(self):
return S3Target('s3://my-output-bucket/my-example-tasks-output')
def run(self):
- Home
- About Orchard
- Intro to Luigi
- Orchard Software Tools
- How to Use
- Running the Example Pipeline
- How to update Orchard on PyPi