Name		Name	Last commit message	Last commit date
parent directory ..
hw09_pipelines-make-report_files		hw09_pipelines-make-report_files
Makefile		Makefile
README.md		README.md
attackByRegion5YearBin-regionOrder.txt		attackByRegion5YearBin-regionOrder.txt
attackByRegion5YearBin.csv		attackByRegion5YearBin.csv
attackByRegion5YearBin.png		attackByRegion5YearBin.png
hw09_pipelines-make-report.Rmd		hw09_pipelines-make-report.Rmd
hw09_pipelines-make-report.html		hw09_pipelines-make-report.html
hw09_pipelines-make-report.md		hw09_pipelines-make-report.md
injuryByAttack-attackTypeOrder.txt		injuryByAttack-attackTypeOrder.txt
injuryByAttack.csv		injuryByAttack.csv
injuryByAttack.png		injuryByAttack.png
script01_clean-data.R		script01_clean-data.R
script02_calc-injuries-by-attack.R		script02_calc-injuries-by-attack.R
script03_fig-injuries-by-attack.R		script03_fig-injuries-by-attack.R
script04_calc-attack-per-region-year-bin.R		script04_calc-attack-per-region-year-bin.R
script05_fig-attack-per-region-year-bin.R		script05_fig-attack-per-region-year-bin.R
stat545_make_pipeline.png		stat545_make_pipeline.png

README.md

Homework 09

How to use make to automate pipelines.

Overview

Initially, I wanted to use a cool new dataset from the web, but it was difficult finding a dataset that a. I found very interesting and b. was easily accessible via a URL (ie. did not involve clicking buttons/logging in/etc.). Then I remembered that I have a copy of an interesting dataset from last year's course about global terrorist attacks since 1970, so I will use that as my dataset.

I have developed a short pipeline that begins with downloading the raw data, cleans the data, produces several tabular results in csv format, generates a couple images using these results, and finally generates a markdown and HTML documents as final output using these two images. The results and images produced here are a small excerpt from the project I did last year (available here), but they are enough to demonstrate how to use make in a proficient way.

Output

The Makefile responsible for this pipeline produces this markdown document (or its corresponding HTML).

Pipeline

Here is the workflow of my pipeline:

Disclaimer: I've never drawn a dependency graph so I'm sure this is a little weird and not super amazing, but it does the job and I think it's clear enough-ish?

Note that make all only cares about the final product (the markdown and HTML), and will not attempt to reproduce any of the intermediate files unless it is necessary.

Directory structure

The directory might look a little intimidating, but the only files that are necessary (and that will remain after performing a make clean) are:

README.md
Makefile
script0[1-5]*.R
stat545_make_pipeline.png
hw09_pipelines-make-report.Rmd

Note that not all the files produced by make are included here. The raw dataset and the clean dataset, which are both several MB large, are omitted because of their size and because they can be found elsewhere.

How to run

If you have the contents of this directory, or at least the necessary files mentioned above, you can simply run make clean all to reproduce the outputs.

Note that the raw data is 70MB so it may take a few minutes to download. There are instructions in the Makefile how to download the clean data to bypass this time-consuming step.

Here are some things to try to fidget with the pipeline steps and ensure the Makefile is correct:

make clean: now only the files mentioned in the Directory Structure section should exist
make all: the raw data will be downloaded and the whole pipeline will run, producing all the files that existed initially
remove globalterrorismdb_clean.csv, run make all: nothing happens, as expected
remove hw09_pipelines-make-report.md, run make all: step 6 (Rmd -> md/html) gets run, the final markdown is restored
remove hw09_pipelines-make-report.html and injuryByAttack.png, run make all: steps 3 (generate the figure) and 6 (render) get run, the final HTML is restored
remove hw09_pipelines-make-report.html, injuryByAttack.png, and injuryByAttack-attackTypeOrder.txt, run make all: step 1 gets run (clean data - because we removed the clean data earlier and it is finally needed), then steps 2 and 3 that generate the figure, and then the render step is run as well to generate the HTML
edit injuryByAttack.csv and change one of the numeric values drastically, run make all: step 3 (generate figure) is run because the figure now has different data to show, then the HTML is rendered

There are many more ways to test this out, you can also try editing one of the R scripts and see what happens - that can be left as an exercise to the reader :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hw09_pipelines-make

hw09_pipelines-make

README.md

Homework 09

Overview

Output

Pipeline

Directory structure

How to run

Files

hw09_pipelines-make

Directory actions

More options

Directory actions

More options

Latest commit

History

hw09_pipelines-make

Folders and files

parent directory

README.md

Homework 09

Overview

Output

Pipeline

Directory structure

How to run