-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
No content updayr -- just check all still works
- Loading branch information
1 parent
3b02997
commit e6596a1
Showing
6 changed files
with
2 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,7 +1,8 @@ | ||
{ | ||
"hash": "49601350c49c834abf551b0a9230b77d", | ||
"result": { | ||
"markdown": "---\ntitle: \"Factors with forcats :: Cheatsheet\"\ndescription: \" \"\nimage-alt: \"\"\nexecute:\n eval: true\n output: false\n warning: false\n---\n\n::: {.cell .column-margin}\n<img src=\"images/logo-forcats.png\" height=\"138\" alt=\"Hex logo for forcats - drawing of four black cats lounging in a cardboard box. On one side of the box it says 'for' and on the adjacent side is says 'cats'.\" />\n<br><br><a href=\"../factors.pdf\">\n<p><i class=\"bi bi-file-pdf\"></i> Download PDF</p>\n<img src=\"../pngs/factors.png\" width=\"200\" alt=\"\"/>\n</a>\n<br><br><p>Translations (PDF)</p>\n* <a href=\"../translations/japanese/factors_ja.pdf\"><i class=\"bi bi-file-pdf\"></i>Japanese</a>\n* <a href=\"../translations/spanish/factors_es.pdf\"><i class=\"bi bi-file-pdf\"></i>Spanish</a>\n:::\n\n\nThe **forcats** package provides tools for working with factors, which are R's data structure for categorical data.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(forcats)\n```\n:::\n\n\n\n\n## Factors\n\nR represents categorical data with factors.\nA **factor** is an integer vector with a **levels** attribute that stores a set of mappings between integers and categorical values.\nWhen you view a factor, R displays not the integers but the levels associated with them.\n\nFor example, R will display `c(\"a\", \"c\", \"b\", \"a\")` with levels `c(\"a\", \"b\", \"c\")` but will store `c(1, 3, 2, 1)` where 1 = a, 2 = b, and 3 = c.\n\nR will display:\n\n\n::: {.cell}\n::: {.cell-output .cell-output-stdout}\n```\n[1] a c b a\nLevels: a b c\n```\n:::\n:::\n\n\nR will store:\n\n\n::: {.cell}\n::: {.cell-output .cell-output-stdout}\n```\n[1] 1 3 2 1\nattr(,\"levels\")\n[1] \"a\" \"b\" \"c\"\n```\n:::\n:::\n\n\nCreate a factor with `factor()`:\n\n- `factor(x = character(), levels, labels = levels, exclude = NA, ordered = is.ordered(x), nmax = NA)`: Convert a vector to a factor.\n Also `as_factor()`.\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n f <- factor(c(\"a\", \"c\", \"b\", \"a\"), levels = c(\"a\", \"b\", \"c\"))\n ```\n :::\n\n\nReturn its levels with `levels()`:\n\n- `levels(x)`: Return/set the levels of a factor.\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n levels(f)\n levels(f) <- c(\"x\", \"y\", \"z\")\n ```\n :::\n\n\nUse `unclass()` to see its structure.\n\n## Inspect Factors\n\n- `fct_count(f, sort = FALSE, prop = FALSE)`: Count the number of values with each level.\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n fct_count(f)\n ```\n :::\n\n\n- `fct_match(f, lvls)`: Check for `lvls` in `f`.\n\n\n \n\n ::: {.cell}\n \n ```{.r .cell-code}\n fct_match(f, \"a\")\n ```\n :::\n\n\n- `fct_unique(f)`: Return the unique values, removing duplicates.\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n fct_unique(f)\n ```\n :::\n\n\n## Combine Factors\n\n- `fct_c(...)`: Combine factors with different levels.\n Also `fct_cross()`.\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n f1 <- factor(c(\"a\", \"c\"))\n f2 <- factor(c(\"b\", \"a\"))\n fct_c(f1, f2)\n ```\n :::\n\n\n- `fct_unify(fs, levels = lvls_union(fs))`: Standardize levels across a list of factors.\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n fct_unify(list(f2, f1))\n ```\n :::\n\n\n## Change the order of levels\n\n- `fct_relevel(.f, ..., after = 0L)`: Manually reorder factor levels.\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n fct_relevel(f, c(\"b\", \"c\", \"a\"))\n ```\n :::\n\n\n- `fct_infreq(f, ordered = NA)`: Reorder levels by the frequency in which they appear in the data (highest frequency first).\n Also `fct_inseq()`.\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n f3 <- factor(c(\"c\", \"c\", \"a\"))\n fct_infreq(f3)\n ```\n :::\n\n\n- `fct_inorder(f, ordered = NA)`: Reorder levels by order in which they appear in the data.\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n fct_inorder(f2)\n ```\n :::\n\n\n- `fct_rev(f)`: Reverse level order.\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n f4 <- factor(c(\"a\",\"b\",\"c\"))\n fct_rev(f4)\n ```\n :::\n\n\n- `fct_shift(f)`: Shift levels to left or right, wrapping around end.\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n fct_shift(f4)\n ```\n :::\n\n\n- `fct_shuffle(f, n = 1L)`: Randomly permute order of factor levels.\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n fct_shuffle(f4)\n ```\n :::\n\n\n- `fct_reorder(.f, .x, .fun = median, ..., .desc = FALSE)`: Reorder levels by their relationship with another variable.\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n boxplot(PlantGrowth, weight ~ fct_reorder(group, weight))\n ```\n :::\n\n\n- `fct_reorder2(.f, .x, .y, .fun = last2, ..., .desc = TRUE)`: Reorder levels by their final values when plotted with two other variables.\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n ggplot(\n diamonds,\n aes(carat, price, color = fct_reorder2(color, carat, price))\n ) + \n geom_smooth()\n ```\n :::\n\n\n## Change the value of levels\n\n- `fct_recode(.f, ...)`: Manually change levels.\n Also `fct_relabel()` which obeys `purrr::map` syntax to apply a function or expression to each level.\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n fct_recode(f, v = \"a\", x = \"b\", z = \"c\")\n fct_relabel(f, ~ paste0(\"x\", .x))\n ```\n :::\n\n\n- `fct_anon(f, prefix = \"\")`: Anonymize levels with random integers.\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n fct_anon(f)\n ```\n :::\n\n\n- `fct_collapse(.f, …, other_level = NULL)`: Collapse levels into manually defined groups.\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n fct_collapse(f, x = c(\"a\", \"b\"))\n ```\n :::\n\n\n- `fct_lump_min(f, min, w = NULL, other_level = \"Other\")`: Lumps together factors that appear fewer than `min` times.\n Also `fct_lump_n()`, `fct_lump_prop()`, and `fct_lump_lowfreq()`.\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n fct_lump_min(f, min = 2)\n ```\n :::\n\n\n- `fct_other(f, keep, drop, other_level = \"Other\")`: Replace levels with \"other.\"\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n fct_other(f, keep = c(\"a\", \"b\"))\n ```\n :::\n\n\n## Add or drop levels\n\n- `fct_drop(f, only)`: Drop unused levels.\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n f5 <- factor(c(\"a\",\"b\"),c(\"a\",\"b\",\"x\"))\n f6 <- fct_drop(f5)\n ```\n :::\n\n\n- `fct_expand(f, ...)`: Add levels to a factor.\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n fct_expand(f6, \"x\")\n ```\n :::\n\n\n- `fct_na_value_to_level(f, level = \"(Missing)\")`: Assigns a level to NAs to ensure they appear in plots, etc.\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n f <- factor(c(\"a\", \"b\", NA))\n fct_na_value_to_level(f, level = \"(Missing)\")\n ```\n :::\n\n\n------------------------------------------------------------------------\n\nCC BY SA Posit Software, PBC • [info\\@posit.co](mailto:[email protected]) • [posit.co](https://posit.co)\n\nLearn more at [forcats.tidyverse.org](https://forcats.tidyverse.org).\n\nUpdated: 2023-06.\n\n\n::: {.cell}\n\n```{.r .cell-code}\npackageVersion(\"forcats\")\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] '1.0.0'\n```\n:::\n:::\n\n\n------------------------------------------------------------------------\n", | ||
"engine": "knitr", | ||
"markdown": "---\ntitle: \"Factors with forcats :: Cheatsheet\"\ndescription: \" \"\nimage-alt: \"\"\nexecute:\n eval: true\n output: false\n warning: false\n---\n\n::: {.cell .column-margin}\n<img src=\"images/logo-forcats.png\" height=\"138\" alt=\"Hex logo for forcats - drawing of four black cats lounging in a cardboard box. On one side of the box it says 'for' and on the adjacent side is says 'cats'.\" />\n<br><br><a href=\"../factors.pdf\">\n<p><i class=\"bi bi-file-pdf\"></i> Download PDF</p>\n<img src=\"../pngs/factors.png\" width=\"200\" alt=\"\"/>\n</a>\n<br><br><p>Translations (PDF)</p>\n* <a href=\"../translations/japanese/factors_ja.pdf\"><i class=\"bi bi-file-pdf\"></i>Japanese</a>\n* <a href=\"../translations/portuguese/factors_pt_br.pdf\"><i class=\"bi bi-file-pdf\"></i>Portuguese</a>\n* <a href=\"../translations/spanish/factors_es.pdf\"><i class=\"bi bi-file-pdf\"></i>Spanish</a>\n:::\n\n\n\nThe **forcats** package provides tools for working with factors, which are R's data structure for categorical data.\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(forcats)\n```\n:::\n\n\n\n\n\n## Factors\n\nR represents categorical data with factors.\nA **factor** is an integer vector with a **levels** attribute that stores a set of mappings between integers and categorical values.\nWhen you view a factor, R displays not the integers but the levels associated with them.\n\nFor example, R will display `c(\"a\", \"c\", \"b\", \"a\")` with levels `c(\"a\", \"b\", \"c\")` but will store `c(1, 3, 2, 1)` where 1 = a, 2 = b, and 3 = c.\n\nR will display:\n\n\n\n::: {.cell}\n::: {.cell-output .cell-output-stdout}\n\n```\n[1] a c b a\nLevels: a b c\n```\n\n\n:::\n:::\n\n\n\nR will store:\n\n\n\n::: {.cell}\n::: {.cell-output .cell-output-stdout}\n\n```\n[1] 1 3 2 1\nattr(,\"levels\")\n[1] \"a\" \"b\" \"c\"\n```\n\n\n:::\n:::\n\n\n\nCreate a factor with `factor()`:\n\n- `factor(x = character(), levels, labels = levels, exclude = NA, ordered = is.ordered(x), nmax = NA)`: Convert a vector to a factor.\n Also `as_factor()`.\n\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n f <- factor(c(\"a\", \"c\", \"b\", \"a\"), levels = c(\"a\", \"b\", \"c\"))\n ```\n :::\n\n\n\nReturn its levels with `levels()`:\n\n- `levels(x)`: Return/set the levels of a factor.\n\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n levels(f)\n levels(f) <- c(\"x\", \"y\", \"z\")\n ```\n :::\n\n\n\nUse `unclass()` to see its structure.\n\n## Inspect Factors\n\n- `fct_count(f, sort = FALSE, prop = FALSE)`: Count the number of values with each level.\n\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n fct_count(f)\n ```\n :::\n\n\n\n- `fct_match(f, lvls)`: Check for `lvls` in `f`.\n\n\n\n \n\n ::: {.cell}\n \n ```{.r .cell-code}\n fct_match(f, \"a\")\n ```\n :::\n\n\n\n- `fct_unique(f)`: Return the unique values, removing duplicates.\n\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n fct_unique(f)\n ```\n :::\n\n\n\n## Combine Factors\n\n- `fct_c(...)`: Combine factors with different levels.\n Also `fct_cross()`.\n\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n f1 <- factor(c(\"a\", \"c\"))\n f2 <- factor(c(\"b\", \"a\"))\n fct_c(f1, f2)\n ```\n :::\n\n\n\n- `fct_unify(fs, levels = lvls_union(fs))`: Standardize levels across a list of factors.\n\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n fct_unify(list(f2, f1))\n ```\n :::\n\n\n\n## Change the order of levels\n\n- `fct_relevel(.f, ..., after = 0L)`: Manually reorder factor levels.\n\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n fct_relevel(f, c(\"b\", \"c\", \"a\"))\n ```\n :::\n\n\n\n- `fct_infreq(f, ordered = NA)`: Reorder levels by the frequency in which they appear in the data (highest frequency first).\n Also `fct_inseq()`.\n\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n f3 <- factor(c(\"c\", \"c\", \"a\"))\n fct_infreq(f3)\n ```\n :::\n\n\n\n- `fct_inorder(f, ordered = NA)`: Reorder levels by order in which they appear in the data.\n\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n fct_inorder(f2)\n ```\n :::\n\n\n\n- `fct_rev(f)`: Reverse level order.\n\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n f4 <- factor(c(\"a\",\"b\",\"c\"))\n fct_rev(f4)\n ```\n :::\n\n\n\n- `fct_shift(f)`: Shift levels to left or right, wrapping around end.\n\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n fct_shift(f4)\n ```\n :::\n\n\n\n- `fct_shuffle(f, n = 1L)`: Randomly permute order of factor levels.\n\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n fct_shuffle(f4)\n ```\n :::\n\n\n\n- `fct_reorder(.f, .x, .fun = median, ..., .desc = FALSE)`: Reorder levels by their relationship with another variable.\n\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n boxplot(PlantGrowth, weight ~ fct_reorder(group, weight))\n ```\n :::\n\n\n\n- `fct_reorder2(.f, .x, .y, .fun = last2, ..., .desc = TRUE)`: Reorder levels by their final values when plotted with two other variables.\n\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n ggplot(\n diamonds,\n aes(carat, price, color = fct_reorder2(color, carat, price))\n ) + \n geom_smooth()\n ```\n :::\n\n\n\n## Change the value of levels\n\n- `fct_recode(.f, ...)`: Manually change levels.\n Also `fct_relabel()` which obeys `purrr::map` syntax to apply a function or expression to each level.\n\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n fct_recode(f, v = \"a\", x = \"b\", z = \"c\")\n fct_relabel(f, ~ paste0(\"x\", .x))\n ```\n :::\n\n\n\n- `fct_anon(f, prefix = \"\")`: Anonymize levels with random integers.\n\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n fct_anon(f)\n ```\n :::\n\n\n\n- `fct_collapse(.f, …, other_level = NULL)`: Collapse levels into manually defined groups.\n\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n fct_collapse(f, x = c(\"a\", \"b\"))\n ```\n :::\n\n\n\n- `fct_lump_min(f, min, w = NULL, other_level = \"Other\")`: Lumps together factors that appear fewer than `min` times.\n Also `fct_lump_n()`, `fct_lump_prop()`, and `fct_lump_lowfreq()`.\n\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n fct_lump_min(f, min = 2)\n ```\n :::\n\n\n\n- `fct_other(f, keep, drop, other_level = \"Other\")`: Replace levels with \"other.\"\n\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n fct_other(f, keep = c(\"a\", \"b\"))\n ```\n :::\n\n\n\n## Add or drop levels\n\n- `fct_drop(f, only)`: Drop unused levels.\n\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n f5 <- factor(c(\"a\",\"b\"),c(\"a\",\"b\",\"x\"))\n f6 <- fct_drop(f5)\n ```\n :::\n\n\n\n- `fct_expand(f, ...)`: Add levels to a factor.\n\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n fct_expand(f6, \"x\")\n ```\n :::\n\n\n\n- `fct_na_value_to_level(f, level = \"(Missing)\")`: Assigns a level to NAs to ensure they appear in plots, etc.\n\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n f <- factor(c(\"a\", \"b\", NA))\n fct_na_value_to_level(f, level = \"(Missing)\")\n ```\n :::\n\n\n\n------------------------------------------------------------------------\n\nCC BY SA Posit Software, PBC • [info\\@posit.co](mailto:[email protected]) • [posit.co](https://posit.co)\n\nLearn more at [forcats.tidyverse.org](https://forcats.tidyverse.org).\n\nUpdated: 2024-05.\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\npackageVersion(\"forcats\")\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n[1] '1.0.0'\n```\n\n\n:::\n:::\n\n\n\n------------------------------------------------------------------------\n", | ||
"supporting": [ | ||
"factors_files" | ||
], | ||
|
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.