block009_dplyr-intro.html

<!DOCTYPE html>

<html xmlns="http://www.w3.org/1999/xhtml">

<head>

<meta charset="utf-8">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta name="generator" content="pandoc" />


<title>Introduction to dplyr</title>

<script src="libs/jquery-1.11.3/jquery.min.js"></script>
<meta name="viewport" content="width=device-width, initial-scale=1" />
<link href="libs/bootstrap-3.3.5/css/bootstrap.min.css" rel="stylesheet" />
<script src="libs/bootstrap-3.3.5/js/bootstrap.min.js"></script>
<script src="libs/bootstrap-3.3.5/shim/html5shiv.min.js"></script>
<script src="libs/bootstrap-3.3.5/shim/respond.min.js"></script>
<script>
  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
  m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
  })(window,document,'script','//www.google-analytics.com/analytics.js','ga');

  ga('create', 'UA-68219208-1', 'auto');
  ga('send', 'pageview');

</script>

<style type="text/css">code{white-space: pre;}</style>
<link rel="stylesheet"
      href="libs/highlight/default.css"
      type="text/css" />
<script src="libs/highlight/highlight.js"></script>
<style type="text/css">
  pre:not([class]) {
    background-color: white;
  }
</style>
<script type="text/javascript">
if (window.hljs && document.readyState && document.readyState === "complete") {
   window.setTimeout(function() {
      hljs.initHighlighting();
   }, 0);
}
</script>


<style type="text/css">
h1 {
  font-size: 34px;
}
h1.title {
  font-size: 38px;
}
h2 {
  font-size: 30px;
}
h3 {
  font-size: 24px;
}
h4 {
  font-size: 18px;
}
h5 {
  font-size: 16px;
}
h6 {
  font-size: 12px;
}
.table th:not([align]) {
  text-align: left;
}
</style>

<link rel="stylesheet" href="libs/local/main.css" type="text/css" />
<link rel="stylesheet" href="libs/local/nav.css" type="text/css" />
<link rel="stylesheet" href="//netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" type="text/css" />

</head>

<body>

<style type = "text/css">
.main-container {
  max-width: 940px;
  margin-left: auto;
  margin-right: auto;
}
code {
  color: inherit;
  background-color: rgba(0, 0, 0, 0.04);
}
img {
  max-width:100%;
  height: auto;
}
.tabbed-pane {
  padding-top: 12px;
}
button.code-folding-btn:focus {
  outline: none;
}
</style>


<div class="container-fluid main-container">

<!-- tabsets -->
<script src="libs/navigation-1.1/tabsets.js"></script>
<script>
$(document).ready(function () {
  window.buildTabsets("TOC");
});
</script>

<!-- code folding -->


<header>
  <div class="nav">
    <a class="nav-logo" href="index.html">
      <img src="static/img/stat545-logo-s.png" width="70px" height="70px"/>
    </a>
    <ul>
      <li class="home"><a href="index.html">Home</a></li>
      <li class="faq"><a href="faq.html">FAQ</a></li>
      <li class="syllabus"><a href="syllabus.html">Syllabus</a></li>
      <li class="topics"><a href="topics.html">Topics</a></li>
      <li class="people"><a href="people.html">People</a></li>
    </ul>
  </div>
</header>

<div class="fluid-row" id="header">


<h1 class="title toc-ignore">Introduction to dplyr</h1>

</div>

<div id="TOC">
<ul>
<li><a href="#intro">Intro</a><ul>
<li><a href="#load-dplyr-and-gapminder">Load <code>dplyr</code> and <code>gapminder</code></a></li>
<li><a href="#say-hello-to-the-gapminder-tibble">Say hello to the Gapminder tibble</a></li>
</ul></li>
<li><a href="#think-before-you-create-excerpts-of-your-data">Think before you create excerpts of your data …</a></li>
<li><a href="#use-filter-to-subset-data-row-wise.">Use <code>filter()</code> to subset data row-wise.</a></li>
<li><a href="#meet-the-new-pipe-operator">Meet the new pipe operator</a></li>
<li><a href="#use-select-to-subset-the-data-on-variables-or-columns.">Use <code>select()</code> to subset the data on variables or columns.</a></li>
<li><a href="#revel-in-the-convenience">Revel in the convenience</a></li>
<li><a href="#pure-predictable-pipeable">Pure, predictable, pipeable</a></li>
<li><a href="#resources">Resources</a></li>
</ul>
</div>

<div id="intro" class="section level3">
<h3>Intro</h3>
<p><code>dplyr</code> is a package for data manipulation, developed by Hadley Wickham and Romain Francois. It is built to be fast, highly expressive, and open-minded about how your data is stored. It is installed as part of the the <a href="https://github.com/hadley/tidyverse"><code>tidyverse</code></a> meta-package and, as a core package, it is among those loaded via <code>library(tidyverse)</code>.</p>
<p><code>dplyr</code>’s roots are in an earlier package called <a href="http://plyr.had.co.nz"><code>plyr</code></a>, which implements the <a href="https://www.jstatsoft.org/article/view/v040i01">“split-apply-combine” strategy for data analysis</a> (PDF). Where <code>plyr</code> covers a diverse set of inputs and outputs (e.g., arrays, data frames, lists), <code>dplyr</code> has a laser-like focus on data frames or, in the <code>tidyverse</code>, “tibbles”. <code>dplyr</code> is a package-level treament of the <code>ddply()</code> function from <code>plyr</code>, because “data frame in, data frame out” proved to be so incredibly important.</p>
<p>Have no idea what I’m talking about? Not sure if you care? If you use these base R functions: <code>subset()</code>, <code>apply()</code>, <code>[sl]apply()</code>, <code>tapply()</code>, <code>aggregate()</code>, <code>split()</code>, <code>do.call()</code>, <code>with()</code>, <code>within()</code>, then you should keep reading. Also, if you use <code>for()</code> loops alot, you might enjoy learning other ways to iterate over rows or groups of rows or variables in a data frame.</p>
<div id="load-dplyr-and-gapminder" class="section level4">
<h4>Load <code>dplyr</code> and <code>gapminder</code></h4>
<p>I choose to load the <code>tidyverse</code>, which will load <code>dplyr</code>, among other packages we use incidentally below. Also load <code>gapminder</code>.</p>
<pre class="r"><code>library(gapminder)
library(tidyverse)</code></pre>
<pre><code>## Loading tidyverse: ggplot2
## Loading tidyverse: tibble
## Loading tidyverse: tidyr
## Loading tidyverse: readr
## Loading tidyverse: purrr
## Loading tidyverse: dplyr</code></pre>
<pre><code>## Conflicts with tidy packages ----------------------------------------------</code></pre>
<pre><code>## filter(): dplyr, stats
## lag():    dplyr, stats</code></pre>
</div>
<div id="say-hello-to-the-gapminder-tibble" class="section level4">
<h4>Say hello to the Gapminder tibble</h4>
<p>The <code>gapminder</code> data frame is a special kind of data frame: a tibble.</p>
<pre class="r"><code>gapminder</code></pre>
<pre><code>## # A tibble: 1,704 × 6
##        country continent  year lifeExp      pop gdpPercap
##         &lt;fctr&gt;    &lt;fctr&gt; &lt;int&gt;   &lt;dbl&gt;    &lt;int&gt;     &lt;dbl&gt;
## 1  Afghanistan      Asia  1952  28.801  8425333  779.4453
## 2  Afghanistan      Asia  1957  30.332  9240934  820.8530
## 3  Afghanistan      Asia  1962  31.997 10267083  853.1007
## 4  Afghanistan      Asia  1967  34.020 11537966  836.1971
## 5  Afghanistan      Asia  1972  36.088 13079460  739.9811
## 6  Afghanistan      Asia  1977  38.438 14880372  786.1134
## 7  Afghanistan      Asia  1982  39.854 12881816  978.0114
## 8  Afghanistan      Asia  1987  40.822 13867957  852.3959
## 9  Afghanistan      Asia  1992  41.674 16317921  649.3414
## 10 Afghanistan      Asia  1997  41.763 22227415  635.3414
## # ... with 1,694 more rows</code></pre>
<p>It’s tibble-ness is why we get nice compact printing. For a reminder of the problems with base data frame printing, go type <code>iris</code> in the R Console or, better yet, print a data frame to screen that has lots of columns.</p>
<p>Note how gapminder’s <code>class()</code> includes <code>tbl_df</code>; the “tibble” terminology is a nod to this.</p>
<pre class="r"><code>class(gapminder)</code></pre>
<pre><code>## [1] &quot;tbl_df&quot;     &quot;tbl&quot;        &quot;data.frame&quot;</code></pre>
<p>There will be some functions, like <code>print()</code>, that know about tibbles and do something special. There will others that do not, like <code>summary()</code>. In which case the regular data frame treatment will happen, because every tibble is also a regular data frame.</p>
<p>To turn any data frame into a tibble use <code>as_tibble()</code>:</p>
<pre class="r"><code>as_tibble(iris)</code></pre>
<pre><code>## # A tibble: 150 × 5
##    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
##           &lt;dbl&gt;       &lt;dbl&gt;        &lt;dbl&gt;       &lt;dbl&gt;  &lt;fctr&gt;
## 1           5.1         3.5          1.4         0.2  setosa
## 2           4.9         3.0          1.4         0.2  setosa
## 3           4.7         3.2          1.3         0.2  setosa
## 4           4.6         3.1          1.5         0.2  setosa
## 5           5.0         3.6          1.4         0.2  setosa
## 6           5.4         3.9          1.7         0.4  setosa
## 7           4.6         3.4          1.4         0.3  setosa
## 8           5.0         3.4          1.5         0.2  setosa
## 9           4.4         2.9          1.4         0.2  setosa
## 10          4.9         3.1          1.5         0.1  setosa
## # ... with 140 more rows</code></pre>
</div>
</div>
<div id="think-before-you-create-excerpts-of-your-data" class="section level3">
<h3>Think before you create excerpts of your data …</h3>
<p>If you feel the urge to store a little snippet of your data:</p>
<pre class="r"><code>(canada &lt;- gapminder[241:252, ])</code></pre>
<pre><code>## # A tibble: 12 × 6
##    country continent  year lifeExp      pop gdpPercap
##     &lt;fctr&gt;    &lt;fctr&gt; &lt;int&gt;   &lt;dbl&gt;    &lt;int&gt;     &lt;dbl&gt;
## 1   Canada  Americas  1952  68.750 14785584  11367.16
## 2   Canada  Americas  1957  69.960 17010154  12489.95
## 3   Canada  Americas  1962  71.300 18985849  13462.49
## 4   Canada  Americas  1967  72.130 20819767  16076.59
## 5   Canada  Americas  1972  72.880 22284500  18970.57
## 6   Canada  Americas  1977  74.210 23796400  22090.88
## 7   Canada  Americas  1982  75.760 25201900  22898.79
## 8   Canada  Americas  1987  76.860 26549700  26626.52
## 9   Canada  Americas  1992  77.950 28523502  26342.88
## 10  Canada  Americas  1997  78.610 30305843  28954.93
## 11  Canada  Americas  2002  79.770 31902268  33328.97
## 12  Canada  Americas  2007  80.653 33390141  36319.24</code></pre>
<p>Stop and ask yourself …</p>
<blockquote>
<p>Do I want to create mini datasets for each level of some factor (or unique combination of several factors) … in order to compute or graph something?</p>
</blockquote>
<p>If YES, <strong>use proper data aggregation techniques</strong> or facetting in <code>ggplot2</code> – <strong>don’t subset the data</strong>. Or, more realistic, only subset the data as a temporary measure while you develop your elegant code for computing on or visualizing these data subsets.</p>
<p>If NO, then maybe you really do need to store a copy of a subset of the data. But seriously consider whether you can achieve your goals by simply using the <code>subset =</code> argument of, e.g., the <code>lm()</code> function, to limit computation to your excerpt of choice. Lots of functions offer a <code>subset =</code> argument!</p>
<p>Copies and excerpts of your data clutter your workspace, invite mistakes, and sow general confusion. Avoid whenever possible.</p>
<p>Reality can also lie somewhere in between. You will find the workflows presented below can help you accomplish your goals with minimal creation of temporary, intermediate objects.</p>
</div>
<div id="use-filter-to-subset-data-row-wise." class="section level3">
<h3>Use <code>filter()</code> to subset data row-wise.</h3>
<p><code>filter()</code> takes logical expressions and returns the rows for which all are <code>TRUE</code>.</p>
<pre class="r"><code>filter(gapminder, lifeExp &lt; 29)</code></pre>
<pre><code>## # A tibble: 2 × 6
##       country continent  year lifeExp     pop gdpPercap
##        &lt;fctr&gt;    &lt;fctr&gt; &lt;int&gt;   &lt;dbl&gt;   &lt;int&gt;     &lt;dbl&gt;
## 1 Afghanistan      Asia  1952  28.801 8425333  779.4453
## 2      Rwanda    Africa  1992  23.599 7290203  737.0686</code></pre>
<pre class="r"><code>filter(gapminder, country == &quot;Rwanda&quot;, year &gt; 1979)</code></pre>
<pre><code>## # A tibble: 6 × 6
##   country continent  year lifeExp     pop gdpPercap
##    &lt;fctr&gt;    &lt;fctr&gt; &lt;int&gt;   &lt;dbl&gt;   &lt;int&gt;     &lt;dbl&gt;
## 1  Rwanda    Africa  1982  46.218 5507565  881.5706
## 2  Rwanda    Africa  1987  44.020 6349365  847.9912
## 3  Rwanda    Africa  1992  23.599 7290203  737.0686
## 4  Rwanda    Africa  1997  36.087 7212583  589.9445
## 5  Rwanda    Africa  2002  43.413 7852401  785.6538
## 6  Rwanda    Africa  2007  46.242 8860588  863.0885</code></pre>
<pre class="r"><code>filter(gapminder, country %in% c(&quot;Rwanda&quot;, &quot;Afghanistan&quot;))</code></pre>
<pre><code>## # A tibble: 24 × 6
##        country continent  year lifeExp      pop gdpPercap
##         &lt;fctr&gt;    &lt;fctr&gt; &lt;int&gt;   &lt;dbl&gt;    &lt;int&gt;     &lt;dbl&gt;
## 1  Afghanistan      Asia  1952  28.801  8425333  779.4453
## 2  Afghanistan      Asia  1957  30.332  9240934  820.8530
## 3  Afghanistan      Asia  1962  31.997 10267083  853.1007
## 4  Afghanistan      Asia  1967  34.020 11537966  836.1971
## 5  Afghanistan      Asia  1972  36.088 13079460  739.9811
## 6  Afghanistan      Asia  1977  38.438 14880372  786.1134
## 7  Afghanistan      Asia  1982  39.854 12881816  978.0114
## 8  Afghanistan      Asia  1987  40.822 13867957  852.3959
## 9  Afghanistan      Asia  1992  41.674 16317921  649.3414
## 10 Afghanistan      Asia  1997  41.763 22227415  635.3414
## # ... with 14 more rows</code></pre>
<p>Compare with some base R code to accomplish the same things</p>
<pre class="r"><code>gapminder[gapminder$lifeExp &lt; 29, ] ## repeat `gapminder`, [i, j] indexing is distracting
subset(gapminder, country == &quot;Rwanda&quot;) ## almost same as filter; quite nice actually</code></pre>
<p>Under no circumstances should you subset your data the way I did at first:</p>
<pre class="r"><code>excerpt &lt;- gapminder[241:252, ]</code></pre>
<p>Why is this a terrible idea?</p>
<ul>
<li>It is not self-documenting. What is so special about rows 241 through 252?</li>
<li>It is fragile. This line of code will produce different results if someone changes the row order of <code>gapminder</code>, e.g. sorts the data earlier in the script.</li>
</ul>
<pre class="r"><code>filter(gapminder, country == &quot;Canada&quot;)</code></pre>
<p>This call explains itself and is fairly robust.</p>
</div>
<div id="meet-the-new-pipe-operator" class="section level3">
<h3>Meet the new pipe operator</h3>
<p>Before we go any further, we should exploit the new pipe operator that the tidyverse imports from the <a href="https://github.com/smbache/magrittr"><code>magrittr</code></a> package by Stefan Bache. This is going to change your data analytical life. You no longer need to enact multi-operation commands by nesting them inside each other, like so many <a href="http://blogue.us/wp-content/uploads/2009/07/Unknown-21.jpeg">Russian nesting dolls</a>. This new syntax leads to code that is much easier to write and to read.</p>
<p>Here’s what it looks like: <code>%&gt;%</code>. The RStudio keyboard shortcut: Ctrl + Shift + M (Windows), Cmd + Shift + M (Mac).</p>
<p>Let’s demo then I’ll explain:</p>
<pre class="r"><code>gapminder %&gt;% head()</code></pre>
<pre><code>## # A tibble: 6 × 6
##       country continent  year lifeExp      pop gdpPercap
##        &lt;fctr&gt;    &lt;fctr&gt; &lt;int&gt;   &lt;dbl&gt;    &lt;int&gt;     &lt;dbl&gt;
## 1 Afghanistan      Asia  1952  28.801  8425333  779.4453
## 2 Afghanistan      Asia  1957  30.332  9240934  820.8530
## 3 Afghanistan      Asia  1962  31.997 10267083  853.1007
## 4 Afghanistan      Asia  1967  34.020 11537966  836.1971
## 5 Afghanistan      Asia  1972  36.088 13079460  739.9811
## 6 Afghanistan      Asia  1977  38.438 14880372  786.1134</code></pre>
<p>This is equivalent to <code>head(gapminder)</code>. The pipe operator takes the thing on the left-hand-side and <strong>pipes</strong> it into the function call on the right-hand-side – literally, drops it in as the first argument.</p>
<p>Never fear, you can still specify other arguments to this function! To see the first 3 rows of Gapminder, we could say <code>head(gapminder, 3)</code> or this:</p>
<pre class="r"><code>gapminder %&gt;% head(3)</code></pre>
<pre><code>## # A tibble: 3 × 6
##       country continent  year lifeExp      pop gdpPercap
##        &lt;fctr&gt;    &lt;fctr&gt; &lt;int&gt;   &lt;dbl&gt;    &lt;int&gt;     &lt;dbl&gt;
## 1 Afghanistan      Asia  1952  28.801  8425333  779.4453
## 2 Afghanistan      Asia  1957  30.332  9240934  820.8530
## 3 Afghanistan      Asia  1962  31.997 10267083  853.1007</code></pre>
<p>I’ve advised you to think “gets” whenever you see the assignment operator, <code>&lt;-</code>. Similary, you should think “then” whenever you see the pipe operator, <code>%&gt;%</code>.</p>
<p>You are probably not impressed yet, but the magic will soon happen.</p>
</div>
<div id="use-select-to-subset-the-data-on-variables-or-columns." class="section level3">
<h3>Use <code>select()</code> to subset the data on variables or columns.</h3>
<p>Back to <code>dplyr</code> …</p>
<p>Use <code>select()</code> to subset the data on variables or columns. Here’s a conventional call:</p>
<pre class="r"><code>select(gapminder, year, lifeExp)</code></pre>
<pre><code>## # A tibble: 1,704 × 2
##     year lifeExp
##    &lt;int&gt;   &lt;dbl&gt;
## 1   1952  28.801
## 2   1957  30.332
## 3   1962  31.997
## 4   1967  34.020
## 5   1972  36.088
## 6   1977  38.438
## 7   1982  39.854
## 8   1987  40.822
## 9   1992  41.674
## 10  1997  41.763
## # ... with 1,694 more rows</code></pre>
<p>And here’s the same operation, but written with the pipe operator and piped through <code>head()</code>:</p>
<pre class="r"><code>gapminder %&gt;%
  select(year, lifeExp) %&gt;%
  head(4)</code></pre>
<pre><code>## # A tibble: 4 × 2
##    year lifeExp
##   &lt;int&gt;   &lt;dbl&gt;
## 1  1952  28.801
## 2  1957  30.332
## 3  1962  31.997
## 4  1967  34.020</code></pre>
<p>Think: “Take <code>gapminder</code>, then select the variables year and lifeExp, then show the first 4 rows.”</p>
</div>
<div id="revel-in-the-convenience" class="section level3">
<h3>Revel in the convenience</h3>
<p>Here’s the data for Cambodia, but only certain variables:</p>
<pre class="r"><code>gapminder %&gt;%
  filter(country == &quot;Cambodia&quot;) %&gt;%
  select(year, lifeExp)</code></pre>
<pre><code>## # A tibble: 12 × 2
##     year lifeExp
##    &lt;int&gt;   &lt;dbl&gt;
## 1   1952  39.417
## 2   1957  41.366
## 3   1962  43.415
## 4   1967  45.415
## 5   1972  40.317
## 6   1977  31.220
## 7   1982  50.957
## 8   1987  53.914
## 9   1992  55.803
## 10  1997  56.534
## 11  2002  56.752
## 12  2007  59.723</code></pre>
<p>and what a typical base R call would look like:</p>
<pre class="r"><code>gapminder[gapminder$country == &quot;Cambodia&quot;, c(&quot;year&quot;, &quot;lifeExp&quot;)]</code></pre>
<pre><code>## # A tibble: 12 × 2
##     year lifeExp
##    &lt;int&gt;   &lt;dbl&gt;
## 1   1952  39.417
## 2   1957  41.366
## 3   1962  43.415
## 4   1967  45.415
## 5   1972  40.317
## 6   1977  31.220
## 7   1982  50.957
## 8   1987  53.914
## 9   1992  55.803
## 10  1997  56.534
## 11  2002  56.752
## 12  2007  59.723</code></pre>
</div>
<div id="pure-predictable-pipeable" class="section level3">
<h3>Pure, predictable, pipeable</h3>
<p>We’ve barely scratched the surface of <code>dplyr</code> but I want to point out key principles you may start to appreciate. If you’re new to R or “programming with data”, feel free skip this section and <a href="block010_dplyr-end-single-table.html">move on</a>.</p>
<p><code>dplyr</code>’s verbs, such as <code>filter()</code> and <code>select()</code>, are what’s called <a href="http://en.wikipedia.org/wiki/Pure_function">pure functions</a>. To quote from Wickham’s <a href="http://adv-r.had.co.nz/Functions.html">Advanced R Programming book</a>:</p>
<blockquote>
<p>The functions that are the easiest to understand and reason about are pure functions: functions that always map the same input to the same output and have no other impact on the workspace. In other words, pure functions have no side effects: they don’t affect the state of the world in any way apart from the value they return.</p>
</blockquote>
<p>In fact, these verbs are a special case of pure functions: they take the same flavor of object as input and output. Namely, a data frame or one of the other data receptacles <code>dplyr</code> supports.</p>
<p>And finally, the data is <strong>always</strong> the very first argument of the verb functions.</p>
<p>This set of deliberate design choices, together with the new pipe operator, produces a highly effective, low friction <a href="http://adv-r.had.co.nz/dsl.html">domain-specific language</a> for data analysis.</p>
<p>Go to the next block, <a href="block010_dplyr-end-single-table.html"><code>dplyr</code> functions for a single dataset</a>, for more <code>dplyr</code>!</p>
</div>
<div id="resources" class="section level3">
<h3>Resources</h3>
<p><code>dplyr</code> official stuff</p>
<ul>
<li>package home <a href="http://cran.r-project.org/web/packages/dplyr/index.html">on CRAN</a>
<ul>
<li>note there are several vignettes, with the <a href="http://cran.r-project.org/web/packages/dplyr/vignettes/introduction.html">introduction</a> being the most relevant right now</li>
<li>the <a href="http://cran.rstudio.com/web/packages/dplyr/vignettes/window-functions.html">one on window functions</a> will also be interesting to you now</li>
</ul></li>
<li>development home <a href="https://github.com/hadley/dplyr">on GitHub</a></li>
<li><a href="https://www.dropbox.com/sh/i8qnluwmuieicxc/AAAgt9tIKoIm7WZKIyK25lh6a">tutorial HW delivered</a> (note this links to a DropBox folder) at useR! 2014 conference</li>
</ul>
<p><a href="https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf">RStudio Data Wrangling cheatsheet</a>, covering <code>dplyr</code> and <code>tidyr</code>. Remember you can get to these via <em>Help &gt; Cheatsheets.</em></p>
<p><a href="https://github.com/tjmahr/MadR_Pipelines">Excellent slides</a> on pipelines and <code>dplyr</code> by TJ Mahr, talk given to the Madison R Users Group.</p>
<p>Blog post <a href="http://www.dataschool.io/dplyr-tutorial-for-faster-data-manipulation-in-r/">Hands-on dplyr tutorial for faster data manipulation in R</a> by Data School, that includes a link to an R Markdown document and links to videos</p>
<p><a href="bit001_dplyr-cheatsheet.html">Cheatsheet</a> I made for <code>dplyr</code> join functions (not relevant yet but soon)</p>
</div>

<div class="footer">
This work is licensed under the  <a href="http://creativecommons.org/licenses/by-nc/3.0/">CC BY-NC 3.0 Creative Commons License</a>.
</div>


</div>

<script>

// add bootstrap table styles to pandoc tables
$(document).ready(function () {
  $('tr.header').parent('thead').parent('table').addClass('table table-condensed');
});


</script>

<!-- dynamically load mathjax for compatibility with self-contained -->
<script>
  (function () {
    var script = document.createElement("script");
    script.type = "text/javascript";
    script.src  = "https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML";
    document.getElementsByTagName("head")[0].appendChild(script);
  })();
</script>

</body>
</html>