diff --git a/_freeze/html/tidyr/execute-results/html.json b/_freeze/html/tidyr/execute-results/html.json index eb154ad0..126f9bd3 100644 --- a/_freeze/html/tidyr/execute-results/html.json +++ b/_freeze/html/tidyr/execute-results/html.json @@ -1,10 +1,9 @@ { - "hash": "1aed65f140641c9d5873609f41af50a6", + "hash": "0cc3c27eed61c11363d03ec9d32cb939", "result": { - "markdown": "---\ntitle: \"Data tidying with tidyr :: Cheatsheet\"\ndescription: \" \"\nimage-alt: \"\"\nexecute:\n eval: true\n output: false\n warning: false\neditor_options: \n chunk_output_type: console\n---\n\n::: {.cell .column-margin}\n\"Hex\n

\n

Download PDF

\n\"\"/\n
\n

Translations (PDF)

\n* Chinese\n:::\n\n\n\n\n**Tidy data** is a way to organize tabular data in a consistent data structure across packages.\nA table is tidy if:\n\n- Each **variable** is in its own **column**\n- Each **observation**, or **case**, is in its own **row**\n- Access **variables** as **vectors**\n- Preserve **cases** in vectorized operations\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(tidyr)\nlibrary(tibble)\n```\n:::\n\n\n\n\n## Tibbles\n\n### An Enhanced Data Frame\n\nTibbles are a table format provided by the **tibble** package.\nThey inherit the data frame class, but have improved behaviors:\n\n- **Subset** a new tibble with `]`, a vector with `[[` and `$`.\n\n- **No partial matching** when subsetting columns.\n\n- **Display** concise views of the data on one screen.\n\n- `options(tibble.print_max = n, tibble.print_min = m, tibble.width = Inf)`: Control default display settings.\n\n- `View()` or `glimpse()`: View the entire data set.\n\n### Construct a Tibble\n\n- `tibble(...)`: Construct by columns.\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n tibble(\n x = 1:3, \n y = c(\"a\", \"b\", \"c\")\n )\n ```\n \n ::: {.cell-output .cell-output-stdout}\n ```\n # A tibble: 3 × 2\n x y \n \n 1 1 a \n 2 2 b \n 3 3 c \n ```\n :::\n :::\n\n\n- `tribble(...)`: Construct by rows.\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n tribble(\n ~x, ~y,\n 1, \"a\",\n 2, \"b\",\n 3, \"c\"\n )\n ```\n :::\n\n\n- `as_tibble(x, ...)`: Convert a data frame to a tibble.\n\n- `enframe(x, name = \"name\", value = \"value\")`: Convert a named vector to a tibble.\n Also `deframe()`.\n\n- `is_tibble(x)`: Test whether x is a tibble.\n\n## Reshape Data\n\nPivot data to reorganize values into a new layout.\n\n- `pivot_longer(data, cols, name_to = \"name\", values_to = \"value\", values_drop_na = FALSE)`: \"Lengthen\" data by collapsing several columns into two.\n\n - The initial `table4a` looks like the following:\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n table4a\n ```\n \n ::: {.cell-output .cell-output-stdout}\n ```\n # A tibble: 3 × 3\n country `1999` `2000`\n \n 1 Afghanistan 745 2666\n 2 Brazil 37737 80488\n 3 China 212258 213766\n ```\n :::\n :::\n\n\n - Column names move to a new `names_to` column and values to a new `values_to` column. The output of `pivot_longer()` will look like the following:\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n pivot_longer(table4a, cols = 2:3, names_to = \"year\", values_to = \"cases\")\n ```\n \n ::: {.cell-output .cell-output-stdout}\n ```\n # A tibble: 6 × 3\n country year cases\n \n 1 Afghanistan 1999 745\n 2 Afghanistan 2000 2666\n 3 Brazil 1999 37737\n 4 Brazil 2000 80488\n 5 China 1999 212258\n 6 China 2000 213766\n ```\n :::\n :::\n\n\n- `pivot_wider(data, name_from = \"name\", values_from = \"value\")`: The inverse of `pivot_longer()`.\n \"Widen\" data by expanding two columns into several.\n\n - The initial `table2` looks like the following:\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n table2\n ```\n \n ::: {.cell-output .cell-output-stdout}\n ```\n # A tibble: 12 × 4\n country year type count\n \n 1 Afghanistan 1999 cases 745\n 2 Afghanistan 1999 population 19987071\n 3 Afghanistan 2000 cases 2666\n 4 Afghanistan 2000 population 20595360\n 5 Brazil 1999 cases 37737\n 6 Brazil 1999 population 172006362\n 7 Brazil 2000 cases 80488\n 8 Brazil 2000 population 174504898\n 9 China 1999 cases 212258\n 10 China 1999 population 1272915272\n 11 China 2000 cases 213766\n 12 China 2000 population 1280428583\n ```\n :::\n :::\n\n\n - One column provides the new column names, the other the values. The output of `pivot_wider()` will look like the following:\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n pivot_wider(table2, names_from = type, values_from = count)\n ```\n \n ::: {.cell-output .cell-output-stdout}\n ```\n # A tibble: 6 × 4\n country year cases population\n \n 1 Afghanistan 1999 745 19987071\n 2 Afghanistan 2000 2666 20595360\n 3 Brazil 1999 37737 172006362\n 4 Brazil 2000 80488 174504898\n 5 China 1999 212258 1272915272\n 6 China 2000 213766 1280428583\n ```\n :::\n :::\n\n\n## Split Cells\n\nUse these functions to split or combine cells into individual, isolated values.\n\n- `unite(data, col, ..., sep = \"_\", remove = TRUE, na.rm = FALSE)`: Collapse cells across several columns into a single column.\n\n - The initial `table5` looks like the following:\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n table5\n ```\n \n ::: {.cell-output .cell-output-stdout}\n ```\n # A tibble: 6 × 4\n country century year rate \n \n 1 Afghanistan 19 99 745/19987071 \n 2 Afghanistan 20 00 2666/20595360 \n 3 Brazil 19 99 37737/172006362 \n 4 Brazil 20 00 80488/174504898 \n 5 China 19 99 212258/1272915272\n 6 China 20 00 213766/1280428583\n ```\n :::\n :::\n\n\n - The output of `unite()` will look like the following:\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n unite(table5, century, year, col = \"year\", sep = \"\")\n ```\n \n ::: {.cell-output .cell-output-stdout}\n ```\n # A tibble: 6 × 3\n country year rate \n \n 1 Afghanistan 1999 745/19987071 \n 2 Afghanistan 2000 2666/20595360 \n 3 Brazil 1999 37737/172006362 \n 4 Brazil 2000 80488/174504898 \n 5 China 1999 212258/1272915272\n 6 China 2000 213766/1280428583\n ```\n :::\n :::\n\n\n- `separate_wider_delim(data, cols, delim, ..., names = NULL, names_sep = NULL, names_repair = \"check unique\", too_few, too_many, cols_remove = TRUE)`: Separate each cell in a column into several columns.\n Also `extract()`.\n\n - The initial `table3` looks like the following:\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n table3\n ```\n \n ::: {.cell-output .cell-output-stdout}\n ```\n # A tibble: 6 × 3\n country year rate \n \n 1 Afghanistan 1999 745/19987071 \n 2 Afghanistan 2000 2666/20595360 \n 3 Brazil 1999 37737/172006362 \n 4 Brazil 2000 80488/174504898 \n 5 China 1999 212258/1272915272\n 6 China 2000 213766/1280428583\n ```\n :::\n :::\n\n\n - The output of `separate_wider_delim()` will look like the following:\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n separate_wider_delim(table3, rate, delim = \"/\", names = c(\"cases\", \"pop\"))\n ```\n \n ::: {.cell-output .cell-output-stdout}\n ```\n # A tibble: 6 × 4\n country year cases pop \n \n 1 Afghanistan 1999 745 19987071 \n 2 Afghanistan 2000 2666 20595360 \n 3 Brazil 1999 37737 172006362 \n 4 Brazil 2000 80488 174504898 \n 5 China 1999 212258 1272915272\n 6 China 2000 213766 1280428583\n ```\n :::\n :::\n\n\n- `separate_longer_delim(data, cols, delim, .., width, keep_empty)`: Separate each cell in a column into several rows.\n\n - The initial `table3` looks like the following:\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n table3\n ```\n \n ::: {.cell-output .cell-output-stdout}\n ```\n # A tibble: 6 × 3\n country year rate \n \n 1 Afghanistan 1999 745/19987071 \n 2 Afghanistan 2000 2666/20595360 \n 3 Brazil 1999 37737/172006362 \n 4 Brazil 2000 80488/174504898 \n 5 China 1999 212258/1272915272\n 6 China 2000 213766/1280428583\n ```\n :::\n :::\n\n\n - The output of `separate_longer_delim()` will look like the following:\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n separate_longer_delim(table3, rate, delim = \"/\")\n ```\n \n ::: {.cell-output .cell-output-stdout}\n ```\n # A tibble: 12 × 3\n country year rate \n \n 1 Afghanistan 1999 745 \n 2 Afghanistan 1999 19987071 \n 3 Afghanistan 2000 2666 \n 4 Afghanistan 2000 20595360 \n 5 Brazil 1999 37737 \n 6 Brazil 1999 172006362 \n 7 Brazil 2000 80488 \n 8 Brazil 2000 174504898 \n 9 China 1999 212258 \n 10 China 1999 1272915272\n 11 China 2000 213766 \n 12 China 2000 1280428583\n ```\n :::\n :::\n\n\n## Expand Tables\n\nCreate new combinations of variables or identify implicit missing values (combinations of variables not present in the data).\n\n- `expand(data, ...)`: Create a new tibble with all possible combinations of the values of the variables listed in ... Drop other variables.\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n expand(mtcars, cyl, gear, carb)\n ```\n :::\n\n\n- `complete(data, ..., fill = list())`: Add missing possible combinations of values of variables listed in ... Fill remaining variables with NA.\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n complete(mtcars, cyl, gear, carb)\n ```\n :::\n\n\n## Handle Missing Values\n\nDrop or replace explicit missing values (`NA`).\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- tribble(\n ~x1, ~x2,\n \"A\", 1,\n \"B\", NA,\n \"C\", NA,\n \"D\", 3,\n \"E\", NA\n)\n```\n:::\n\n\n- `drop_na(data, ...)`: Drop rows containing `NA`s in ... columns.\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n drop_na(x, x2)\n ```\n :::\n\n\n- `fill(data, ..., .direction = \"down\")`: Fill in `NA`s in ... columns using the next or previous value.\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n fill(x, x2)\n ```\n :::\n\n\n- `replace_na(data, replace)`: Specify a value to replace `NA` in selected columns.\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n replace_na(x, list(x2 = 2))\n ```\n :::\n\n\n\n\n## Nested Data\n\nA **nested data frame** stores individual tables as a list-column of data frames within a larger organizing data frame.\nList-columns can also be lists of vectors or lists of varying data types.\nUse a nested data frame to:\n\n- Preserve relationships between observations and subsets of data. Preserve the type of the variables being nested (factors and datetimes aren't coerced to character).\n- Manipulate many sub-tables are once with **purrr** functions like `map()`, `map2()`, or `pmap()` or with **dplyr** `rowwise()` grouping.\n\n### Create Nested Data\n\n- `nest(data, ...)`: Moves groups of cells into a list-column of a data frame. Use alone or with `dplyr::group_by()`.\n\n1. Group the data frame with `group_by()` and use `nest()` to move the groups into a list-column.\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n n_storms <- storms |>\n group_by(name) |>\n nest()\n ```\n :::\n\n\n2. Use `nest(new_col = c(x,y))` to specify the columns to group using `dplyr::select()` syntax.\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n n_storms <- storms |>\n nest(data = c(year:long))\n ```\n :::\n\n\n- Index list-columns with `[[]]`.\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n n_storms$data[[1]]\n ```\n :::\n\n\n### Create Tibbles With List-Columns\n\n- `tibble::tribble(...)`: Makes list-columns when needed.\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n tribble(\n ~max, ~seq,\n 3, 1:3,\n 4, 1:4,\n 5, 1:5\n )\n ```\n :::\n\n\n- `tibble::tibble(...)`: Saves list input as list-columns.\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n tibble(\n max = c(3,4,5),\n seq = list(1:3, 1:4, 1:5)\n )\n ```\n :::\n\n\n- `tibble::enframe(x, name = \"name\", value = \"value\")`: Convert multi-level list to a tibble with list-cols.\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n enframe(list(\"3\" = 1:3, \"4\" = 1:4, \"5\" = 1:5), \"max\", \"seq\")\n ```\n :::\n\n\n### Output List-Columns From Other Functions\n\n- `dplyr::mutate()`, `transmute()`, and `summarise()` will output list-columns if they return a list.\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n mtcars |>\n group_by(cyl) |>\n summarise(q = list(quantile(mpg)))\n ```\n :::\n\n\n### Reshape Nested Data\n\n- `unnest(data, cols, ..., keep_empty = FALSE)`: Flatten nested columns back to regular columns.\n The inverse of `nest()`.\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n n_storms |> unnest(data)\n ```\n :::\n\n\n- `unnest_longer(data, col, values_to = NULL, indices_to = NULL)`: Turn each element of a list-column into a row.\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n starwars |>\n select(name, films) |>\n unnest_longer(films)\n ```\n :::\n\n\n- `unnest_wider(data, col)`: Turn each element of a list-column into a regular column.\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n starwars |>\n select(name, films) |>\n unnest_wider(films, names_sep = \"_\")\n ```\n :::\n\n\n- `hoist(.data, .col, ..., remove = TRUE)`: Selectively pull list components out into their own top-level columns.\n Uses `purrr::pluck()` syntax for selecting from lists.\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n starwars |>\n select(name, films) |>\n hoist(films, first_film = 1, second_film = 2)\n ```\n :::\n\n\n### Transform Nested Data\n\nA vectorized function takes a vector, transforms each element in parallel, and returns a vector of the same length.\nBy themselves vectorized functions cannot work with lists, such as list-columns.\n\n- `dplyr::rowwise(.data, ...)`: Group data so that each row is one group, and within the groups, elements of list-columns appear directly (accessed with `[[`), not as lists of length one.\n **When you use rowwise(), dplyr functions will seem to apply functions to list-columns in a vectorized fashion.**\n\n- Apply a function to a list-column and **create a new list-column.** In this example, `dim()` returns two values per row and so is wrapped with `list()` to tell `mutate()` to create a list-column.\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n n_storms |>\n rowwise() |>\n mutate(n = list(dim(data))) # dim() returns two values per row, wrap with list to tell mutate to create a list-column\n ```\n :::\n\n\n- Apply a function to a list-column and **create a regular column.** In this example, `nrow()` returns one integer per row.\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n n_storms |>\n rowwise() |>\n mutate(n = nrow(data)) # nrow() returns one integer per row\n ```\n :::\n\n\n- Collapse **multiple list-columns** into a single list-column.\n In this example, `append()` returns a list for each row, so col type must be list.\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n starwars |>\n rowwise() |>\n mutate(transport = list(append(vehicles, starships))) # append() returns a list for each row, so col type must be list\n ```\n :::\n\n\n- Apply a function to **multiple list-columns.** In this example, `length()` returns one integer per row.\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n starwars |>\n rowwise() |>\n mutate(n_transports = length(c(vehicles, starships)))\n # length() returns one integer per row\n ```\n :::\n\n\n- See **purrr** package for more list functions.\n\n------------------------------------------------------------------------\n\nCC BY SA Posit Software, PBC • [info\\@posit.co](mailto:info@posit.co) • [posit.co](https://posit.co)\n\nLearn more at [tidyr.tidyverse.org](https://tidyr.tidyverse.org).\n\nUpdated: 2023-07.\n\n\n::: {.cell}\n\n```{.r .cell-code}\npackageVersion(\"tidyr\")\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] '1.3.0'\n```\n:::\n\n```{.r .cell-code}\npackageVersion(\"tibble\")\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] '3.2.1'\n```\n:::\n:::\n\n\n------------------------------------------------------------------------\n", - "supporting": [ - "tidyr_files" - ], + "engine": "knitr", + "markdown": "---\ntitle: \"Data tidying with tidyr :: Cheatsheet\"\ndescription: \" \"\nimage-alt: \"\"\nexecute:\n eval: true\n output: false\n warning: false\neditor_options: \n chunk_output_type: console\n---\n\n::: {.cell .column-margin}\n\"Hex\n

\n

Download PDF

\n\"\"/\n
\n

Translations (PDF)

\n* Chinese\n* Portuguese\n:::\n\n\n\n\n\n**Tidy data** is a way to organize tabular data in a consistent data structure across packages.\nA table is tidy if:\n\n- Each **variable** is in its own **column**\n- Each **observation**, or **case**, is in its own **row**\n- Access **variables** as **vectors**\n- Preserve **cases** in vectorized operations\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(tidyr)\nlibrary(tibble)\n```\n:::\n\n\n\n\n\n## Tibbles\n\n### An Enhanced Data Frame\n\nTibbles are a table format provided by the **tibble** package.\nThey inherit the data frame class, but have improved behaviors:\n\n- **Subset** a new tibble with `]`, a vector with `[[` and `$`.\n\n- **No partial matching** when subsetting columns.\n\n- **Display** concise views of the data on one screen.\n\n- `options(tibble.print_max = n, tibble.print_min = m, tibble.width = Inf)`: Control default display settings.\n\n- `View()` or `glimpse()`: View the entire data set.\n\n### Construct a Tibble\n\n- `tibble(...)`: Construct by columns.\n\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n tibble(\n x = 1:3, \n y = c(\"a\", \"b\", \"c\")\n )\n ```\n \n ::: {.cell-output .cell-output-stdout}\n \n ```\n # A tibble: 3 × 2\n x y \n \n 1 1 a \n 2 2 b \n 3 3 c \n ```\n \n \n :::\n :::\n\n\n\n- `tribble(...)`: Construct by rows.\n\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n tribble(\n ~x, ~y,\n 1, \"a\",\n 2, \"b\",\n 3, \"c\"\n )\n ```\n :::\n\n\n\n- `as_tibble(x, ...)`: Convert a data frame to a tibble.\n\n- `enframe(x, name = \"name\", value = \"value\")`: Convert a named vector to a tibble.\n Also `deframe()`.\n\n- `is_tibble(x)`: Test whether x is a tibble.\n\n## Reshape Data\n\nPivot data to reorganize values into a new layout.\n\n- `pivot_longer(data, cols, name_to = \"name\", values_to = \"value\", values_drop_na = FALSE)`: \"Lengthen\" data by collapsing several columns into two.\n\n - The initial `table4a` looks like the following:\n\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n table4a\n ```\n \n ::: {.cell-output .cell-output-stdout}\n \n ```\n # A tibble: 3 × 3\n country `1999` `2000`\n \n 1 Afghanistan 745 2666\n 2 Brazil 37737 80488\n 3 China 212258 213766\n ```\n \n \n :::\n :::\n\n\n\n - Column names move to a new `names_to` column and values to a new `values_to` column. The output of `pivot_longer()` will look like the following:\n\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n pivot_longer(table4a, cols = 2:3, names_to = \"year\", values_to = \"cases\")\n ```\n \n ::: {.cell-output .cell-output-stdout}\n \n ```\n # A tibble: 6 × 3\n country year cases\n \n 1 Afghanistan 1999 745\n 2 Afghanistan 2000 2666\n 3 Brazil 1999 37737\n 4 Brazil 2000 80488\n 5 China 1999 212258\n 6 China 2000 213766\n ```\n \n \n :::\n :::\n\n\n\n- `pivot_wider(data, name_from = \"name\", values_from = \"value\")`: The inverse of `pivot_longer()`.\n \"Widen\" data by expanding two columns into several.\n\n - The initial `table2` looks like the following:\n\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n table2\n ```\n \n ::: {.cell-output .cell-output-stdout}\n \n ```\n # A tibble: 12 × 4\n country year type count\n \n 1 Afghanistan 1999 cases 745\n 2 Afghanistan 1999 population 19987071\n 3 Afghanistan 2000 cases 2666\n 4 Afghanistan 2000 population 20595360\n 5 Brazil 1999 cases 37737\n 6 Brazil 1999 population 172006362\n 7 Brazil 2000 cases 80488\n 8 Brazil 2000 population 174504898\n 9 China 1999 cases 212258\n 10 China 1999 population 1272915272\n 11 China 2000 cases 213766\n 12 China 2000 population 1280428583\n ```\n \n \n :::\n :::\n\n\n\n - One column provides the new column names, the other the values. The output of `pivot_wider()` will look like the following:\n\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n pivot_wider(table2, names_from = type, values_from = count)\n ```\n \n ::: {.cell-output .cell-output-stdout}\n \n ```\n # A tibble: 6 × 4\n country year cases population\n \n 1 Afghanistan 1999 745 19987071\n 2 Afghanistan 2000 2666 20595360\n 3 Brazil 1999 37737 172006362\n 4 Brazil 2000 80488 174504898\n 5 China 1999 212258 1272915272\n 6 China 2000 213766 1280428583\n ```\n \n \n :::\n :::\n\n\n\n## Split Cells\n\nUse these functions to split or combine cells into individual, isolated values.\n\n- `unite(data, col, ..., sep = \"_\", remove = TRUE, na.rm = FALSE)`: Collapse cells across several columns into a single column.\n\n - The initial `table5` looks like the following:\n\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n table5\n ```\n \n ::: {.cell-output .cell-output-stdout}\n \n ```\n # A tibble: 6 × 4\n country century year rate \n \n 1 Afghanistan 19 99 745/19987071 \n 2 Afghanistan 20 00 2666/20595360 \n 3 Brazil 19 99 37737/172006362 \n 4 Brazil 20 00 80488/174504898 \n 5 China 19 99 212258/1272915272\n 6 China 20 00 213766/1280428583\n ```\n \n \n :::\n :::\n\n\n\n - The output of `unite()` will look like the following:\n\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n unite(table5, century, year, col = \"year\", sep = \"\")\n ```\n \n ::: {.cell-output .cell-output-stdout}\n \n ```\n # A tibble: 6 × 3\n country year rate \n \n 1 Afghanistan 1999 745/19987071 \n 2 Afghanistan 2000 2666/20595360 \n 3 Brazil 1999 37737/172006362 \n 4 Brazil 2000 80488/174504898 \n 5 China 1999 212258/1272915272\n 6 China 2000 213766/1280428583\n ```\n \n \n :::\n :::\n\n\n\n- `separate_wider_delim(data, cols, delim, ..., names = NULL, names_sep = NULL, names_repair = \"check unique\", too_few, too_many, cols_remove = TRUE)`: Separate each cell in a column into several columns.\n Also `extract()`.\n\n - The initial `table3` looks like the following:\n\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n table3\n ```\n \n ::: {.cell-output .cell-output-stdout}\n \n ```\n # A tibble: 6 × 3\n country year rate \n \n 1 Afghanistan 1999 745/19987071 \n 2 Afghanistan 2000 2666/20595360 \n 3 Brazil 1999 37737/172006362 \n 4 Brazil 2000 80488/174504898 \n 5 China 1999 212258/1272915272\n 6 China 2000 213766/1280428583\n ```\n \n \n :::\n :::\n\n\n\n - The output of `separate_wider_delim()` will look like the following:\n\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n separate_wider_delim(table3, rate, delim = \"/\", names = c(\"cases\", \"pop\"))\n ```\n \n ::: {.cell-output .cell-output-stdout}\n \n ```\n # A tibble: 6 × 4\n country year cases pop \n \n 1 Afghanistan 1999 745 19987071 \n 2 Afghanistan 2000 2666 20595360 \n 3 Brazil 1999 37737 172006362 \n 4 Brazil 2000 80488 174504898 \n 5 China 1999 212258 1272915272\n 6 China 2000 213766 1280428583\n ```\n \n \n :::\n :::\n\n\n\n- `separate_longer_delim(data, cols, delim, .., width, keep_empty)`: Separate each cell in a column into several rows.\n\n - The initial `table3` looks like the following:\n\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n table3\n ```\n \n ::: {.cell-output .cell-output-stdout}\n \n ```\n # A tibble: 6 × 3\n country year rate \n \n 1 Afghanistan 1999 745/19987071 \n 2 Afghanistan 2000 2666/20595360 \n 3 Brazil 1999 37737/172006362 \n 4 Brazil 2000 80488/174504898 \n 5 China 1999 212258/1272915272\n 6 China 2000 213766/1280428583\n ```\n \n \n :::\n :::\n\n\n\n - The output of `separate_longer_delim()` will look like the following:\n\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n separate_longer_delim(table3, rate, delim = \"/\")\n ```\n \n ::: {.cell-output .cell-output-stdout}\n \n ```\n # A tibble: 12 × 3\n country year rate \n \n 1 Afghanistan 1999 745 \n 2 Afghanistan 1999 19987071 \n 3 Afghanistan 2000 2666 \n 4 Afghanistan 2000 20595360 \n 5 Brazil 1999 37737 \n 6 Brazil 1999 172006362 \n 7 Brazil 2000 80488 \n 8 Brazil 2000 174504898 \n 9 China 1999 212258 \n 10 China 1999 1272915272\n 11 China 2000 213766 \n 12 China 2000 1280428583\n ```\n \n \n :::\n :::\n\n\n\n## Expand Tables\n\nCreate new combinations of variables or identify implicit missing values (combinations of variables not present in the data).\n\n- `expand(data, ...)`: Create a new tibble with all possible combinations of the values of the variables listed in ...\n Drop other variables.\n\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n expand(mtcars, cyl, gear, carb)\n ```\n :::\n\n\n\n- `complete(data, ..., fill = list())`: Add missing possible combinations of values of variables listed in ...\n Fill remaining variables with NA.\n\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n complete(mtcars, cyl, gear, carb)\n ```\n :::\n\n\n\n## Handle Missing Values\n\nDrop or replace explicit missing values (`NA`).\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- tribble(\n ~x1, ~x2,\n \"A\", 1,\n \"B\", NA,\n \"C\", NA,\n \"D\", 3,\n \"E\", NA\n)\n```\n:::\n\n\n\n- `drop_na(data, ...)`: Drop rows containing `NA`s in ...\n columns.\n\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n drop_na(x, x2)\n ```\n :::\n\n\n\n- `fill(data, ..., .direction = \"down\")`: Fill in `NA`s in ...\n columns using the next or previous value.\n\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n fill(x, x2)\n ```\n :::\n\n\n\n- `replace_na(data, replace)`: Specify a value to replace `NA` in selected columns.\n\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n replace_na(x, list(x2 = 2))\n ```\n :::\n\n\n\n\n\n## Nested Data\n\nA **nested data frame** stores individual tables as a list-column of data frames within a larger organizing data frame.\nList-columns can also be lists of vectors or lists of varying data types.\nUse a nested data frame to:\n\n- Preserve relationships between observations and subsets of data. Preserve the type of the variables being nested (factors and datetimes aren't coerced to character).\n- Manipulate many sub-tables are once with **purrr** functions like `map()`, `map2()`, or `pmap()` or with **dplyr** `rowwise()` grouping.\n\n### Create Nested Data\n\n- `nest(data, ...)`: Moves groups of cells into a list-column of a data frame. Use alone or with `dplyr::group_by()`.\n\n1. Group the data frame with `group_by()` and use `nest()` to move the groups into a list-column.\n\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n n_storms <- storms |>\n group_by(name) |>\n nest()\n ```\n :::\n\n\n\n2. Use `nest(new_col = c(x,y))` to specify the columns to group using `dplyr::select()` syntax.\n\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n n_storms <- storms |>\n nest(data = c(year:long))\n ```\n :::\n\n\n\n- Index list-columns with `[[]]`.\n\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n n_storms$data[[1]]\n ```\n :::\n\n\n\n### Create Tibbles With List-Columns\n\n- `tibble::tribble(...)`: Makes list-columns when needed.\n\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n tribble(\n ~max, ~seq,\n 3, 1:3,\n 4, 1:4,\n 5, 1:5\n )\n ```\n :::\n\n\n\n- `tibble::tibble(...)`: Saves list input as list-columns.\n\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n tibble(\n max = c(3,4,5),\n seq = list(1:3, 1:4, 1:5)\n )\n ```\n :::\n\n\n\n- `tibble::enframe(x, name = \"name\", value = \"value\")`: Convert multi-level list to a tibble with list-cols.\n\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n enframe(list(\"3\" = 1:3, \"4\" = 1:4, \"5\" = 1:5), \"max\", \"seq\")\n ```\n :::\n\n\n\n### Output List-Columns From Other Functions\n\n- `dplyr::mutate()`, `transmute()`, and `summarise()` will output list-columns if they return a list.\n\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n mtcars |>\n group_by(cyl) |>\n summarise(q = list(quantile(mpg)))\n ```\n :::\n\n\n\n### Reshape Nested Data\n\n- `unnest(data, cols, ..., keep_empty = FALSE)`: Flatten nested columns back to regular columns.\n The inverse of `nest()`.\n\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n n_storms |> unnest(data)\n ```\n :::\n\n\n\n- `unnest_longer(data, col, values_to = NULL, indices_to = NULL)`: Turn each element of a list-column into a row.\n\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n starwars |>\n select(name, films) |>\n unnest_longer(films)\n ```\n :::\n\n\n\n- `unnest_wider(data, col)`: Turn each element of a list-column into a regular column.\n\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n starwars |>\n select(name, films) |>\n unnest_wider(films, names_sep = \"_\")\n ```\n :::\n\n\n\n- `hoist(.data, .col, ..., remove = TRUE)`: Selectively pull list components out into their own top-level columns.\n Uses `purrr::pluck()` syntax for selecting from lists.\n\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n starwars |>\n select(name, films) |>\n hoist(films, first_film = 1, second_film = 2)\n ```\n :::\n\n\n\n### Transform Nested Data\n\nA vectorized function takes a vector, transforms each element in parallel, and returns a vector of the same length.\nBy themselves vectorized functions cannot work with lists, such as list-columns.\n\n- `dplyr::rowwise(.data, ...)`: Group data so that each row is one group, and within the groups, elements of list-columns appear directly (accessed with `[[`), not as lists of length one.\n **When you use rowwise(), dplyr functions will seem to apply functions to list-columns in a vectorized fashion.**\n\n- Apply a function to a list-column and **create a new list-column.** In this example, `dim()` returns two values per row and so is wrapped with `list()` to tell `mutate()` to create a list-column.\n\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n n_storms |>\n rowwise() |>\n mutate(n = list(dim(data))) # dim() returns two values per row, wrap with list to tell mutate to create a list-column\n ```\n :::\n\n\n\n- Apply a function to a list-column and **create a regular column.** In this example, `nrow()` returns one integer per row.\n\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n n_storms |>\n rowwise() |>\n mutate(n = nrow(data)) # nrow() returns one integer per row\n ```\n :::\n\n\n\n- Collapse **multiple list-columns** into a single list-column.\n In this example, `append()` returns a list for each row, so col type must be list.\n\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n starwars |>\n rowwise() |>\n mutate(transport = list(append(vehicles, starships))) # append() returns a list for each row, so col type must be list\n ```\n :::\n\n\n\n- Apply a function to **multiple list-columns.** In this example, `length()` returns one integer per row.\n\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n starwars |>\n rowwise() |>\n mutate(n_transports = length(c(vehicles, starships)))\n # length() returns one integer per row\n ```\n :::\n\n\n\n- See **purrr** package for more list functions.\n\n------------------------------------------------------------------------\n\nCC BY SA Posit Software, PBC • [info\\@posit.co](mailto:info@posit.co) • [posit.co](https://posit.co)\n\nLearn more at [tidyr.tidyverse.org](https://tidyr.tidyverse.org).\n\nUpdated: 2024-05.\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\npackageVersion(\"tidyr\")\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n[1] '1.3.1'\n```\n\n\n:::\n\n```{.r .cell-code}\npackageVersion(\"tibble\")\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n[1] '3.2.1'\n```\n\n\n:::\n:::\n\n\n\n------------------------------------------------------------------------\n", + "supporting": [], "filters": [ "rmarkdown/pagebreak.lua" ], diff --git a/html/images/logo-tidyr.png b/html/images/logo-tidyr.png index d33876dd..247aa623 100644 Binary files a/html/images/logo-tidyr.png and b/html/images/logo-tidyr.png differ diff --git a/html/tidyr.qmd b/html/tidyr.qmd index b2764108..4b506d06 100644 --- a/html/tidyr.qmd +++ b/html/tidyr.qmd @@ -196,13 +196,15 @@ Use these functions to split or combine cells into individual, isolated values. Create new combinations of variables or identify implicit missing values (combinations of variables not present in the data). -- `expand(data, ...)`: Create a new tibble with all possible combinations of the values of the variables listed in ... Drop other variables. +- `expand(data, ...)`: Create a new tibble with all possible combinations of the values of the variables listed in ... + Drop other variables. ```{r} expand(mtcars, cyl, gear, carb) ``` -- `complete(data, ..., fill = list())`: Add missing possible combinations of values of variables listed in ... Fill remaining variables with NA. +- `complete(data, ..., fill = list())`: Add missing possible combinations of values of variables listed in ... + Fill remaining variables with NA. ```{r} complete(mtcars, cyl, gear, carb) @@ -223,13 +225,15 @@ x <- tribble( ) ``` -- `drop_na(data, ...)`: Drop rows containing `NA`s in ... columns. +- `drop_na(data, ...)`: Drop rows containing `NA`s in ... + columns. ```{r} drop_na(x, x2) ``` -- `fill(data, ..., .direction = "down")`: Fill in `NA`s in ... columns using the next or previous value. +- `fill(data, ..., .direction = "down")`: Fill in `NA`s in ... + columns using the next or previous value. ```{r} fill(x, x2) diff --git a/keynotes/tidyr.key b/keynotes/tidyr.key index 67f54825..4298b0a4 100644 Binary files a/keynotes/tidyr.key and b/keynotes/tidyr.key differ diff --git a/pngs/tidyr.png b/pngs/tidyr.png index 7afaf200..e66fe14b 100644 Binary files a/pngs/tidyr.png and b/pngs/tidyr.png differ diff --git a/tidyr.pdf b/tidyr.pdf index db1c3eea..a79a6e64 100644 Binary files a/tidyr.pdf and b/tidyr.pdf differ