diff --git a/_freeze/html/strings/execute-results/html.json b/_freeze/html/strings/execute-results/html.json index 4b33ceb8..6df85258 100644 --- a/_freeze/html/strings/execute-results/html.json +++ b/_freeze/html/strings/execute-results/html.json @@ -1,7 +1,8 @@ { - "hash": "fd17170df9e3d847b87aa6127d940857", + "hash": "35c0bcda724eb95fdf834959da26b260", "result": { - "markdown": "---\ntitle: \"String manipulation with stringr :: Cheatsheet\"\ndescription: \" \"\nimage-alt: \"\"\nexecute:\n eval: true\n output: false\n warning: false\n---\n\n::: {.cell .column-margin}\n\"Hex\n

\n

Download PDF

\n\"\"/\n
\n

Translations (PDF)

\n* Spanish\n* Vietnamese\n:::\n\n\n\n\nThe **stringr** package provides a set of internally consistent tools for working with character strings, i.e. sequences of characters surrounded by quotation marks.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(stringr)\n```\n:::\n\n\n\n\n## Detect Matches\n\n- `str_detect(string, pattern, negate = FALSE)`: Detect the presence of a pattern match in a string.\n Also `str_like()`.\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n str(fruit, \"a\")\n ```\n :::\n\n\n- `str_starts(string, pattern, negate = FALSE)`: Detect the presence of a pattern match at the beginning of a string.\n Also `str_ends()`.\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n str_starts(fruit, \"a\")\n ```\n :::\n\n\n- `str_which(string, pattern, negate = FALSE)`: Find the indexes of strings that contain a pattern match.\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n str_which(fruit, \"a\")\n ```\n :::\n\n\n- `str_locate(string, pattern)`: Locate the positions of pattern matches in a string.\n Also `str_locate_all()`.\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n str_locate(fruit, \"a\")\n ```\n :::\n\n\n- `str_count(string, pattern)`: Count the number of matches in a string.\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n str_count(fruit, \"a\")\n ```\n :::\n\n\n## Mutate Strings\n\n- `str_sub() <- value`: Replace substrings by identifying the substrings with `str_sub()` and assigning into the results.\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n str_sub(fruit, 1, 3) <- \"str\"\n ```\n :::\n\n\n- `str_replace(string, pattern, replacement)`: Replace the first matched pattern in each string.\n Also `str_remove()`.\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n str_replace(fruit, \"p\", \"-\")\n ```\n :::\n\n\n- `str_replace_all(string, pattern, replacement)`: Replace all matched patterns in each string.\n Also `str_remove_all()`.\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n str_replace_all(fruit, \"p\", \"-\")\n ```\n :::\n\n\n- `str_to_lower(string, locale = \"en\")`^1^: Convert strings to lower case.\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n str_to_lower(sentences)\n ```\n :::\n\n\n- `str_to_upper(string, locale = \"en\")`^1^: Convert strings to upper case.\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n str_to_upper(sentences)\n ```\n :::\n\n\n- `str_to_title(string, locale = \"en\")`^1^: Convert strings to title case.\n Also `str_to_setence()`.\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n str_to_title(sentences)\n ```\n :::\n\n\n## Subset Strings\n\n- `str_sub(string, start = 1L, end = -1L)`: Extract substrings from a character vector.\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n str_sub(fruit, 1, 3)\n str_sub(fruit, -2)\n ```\n :::\n\n\n- `str_subset(string, pattern, negate = FALSE)`: Return only the strings that contain a pattern match.\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n str_subset(fruit, \"p\")\n ```\n :::\n\n\n- `str_extract(string, pattern)`: Return the first pattern match found in each string, as a vector.\n Also `str_extract_all()` to return every pattern match.\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n str_extract(fruit, \"[aeiou]\")\n ```\n :::\n\n\n- `str_match(string, pattern)`: Return the first pattern match found in each string, as a matrix with a column for each ( ) group in pattern.\n Also `str_match_all()`.\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n str_match(sentences, \"(a|the) ([^ +])\")\n ```\n :::\n\n\n## Join and Split\n\n- `str_c(..., sep = \"\", collapse = NULL)`: Join multiple strings into a single string.\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n str_c(letters, LETTERS)\n ```\n :::\n\n\n- `str_flatten(string, collapse = \"\")`: Combines into a single string, separated by collapse.\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n str_flatten(fruit, \", \")\n ```\n :::\n\n\n- `str_dup(string, times)`: Repeat strings times times.\n Also `str_unique()` to remove duplicates.\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n str_dup(fruit, times = 2)\n ```\n :::\n\n\n- `str_split_fixed(string, pattern, n)`: Split a vector of strings into a matrix of substrings (splitting at occurrences of a pattern match).\n Also `str_split()` to return a list of substrings and `str_split_n()` to return the nth substring.\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n str_split_fixed(sentences, \" \", n = 3)\n ```\n :::\n\n\n- `str_glue(..., .sep = \"\", .envir = parent.frame())`: Create a string from strings and {expressions} to evaluate.\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n str_glue(\"Pi is {pi}\")\n ```\n :::\n\n\n- `str_glue_data(.x, ..., .sep = \"\", .envir = parent.frame(), .na = \"NA\")`: Use a data frame, list, or environment to create a string from strings and {expressions} to evaluate.\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n str_glue_data(mtcars, \"{rownames(mtcars)} has {hp} hp\")\n ```\n :::\n\n\n## Manage Lengths\n\n- `str_length(string)`: The width of strings (i.e. number of code points, which generally equals the number of characters).\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n str_length(fruit)\n ```\n :::\n\n\n- `str_pad(string, width, side = c(\"left\", \"right\", \"both\"), pad = \" \")`: Pad strings to constant width.\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n str_pad(fruit, 17)\n ```\n :::\n\n\n- `str_trunc(string, width, side = c(\"left\", \"right\", \"both\"), ellipsis = \"...\")`: Truncate the width of strings, replacing content with ellipsis.\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n str_trunc(sentences, 6)\n ```\n :::\n\n\n- `str_trim(string, side = c(\"left\", \"right\", \"both\"))`: Trim whitespace from the start and/or end of a string.\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n str_trim(str_pad(fruit, 17))\n ```\n :::\n\n\n- `str_squish(string)`: Trim white space from each end and collapse multiple spaces into single spaces.\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n str_squish(str_pad(fruit, 17, \"both\"))\n ```\n :::\n\n\n## Order Strings\n\n- `str_order(x, decreasing = FALSE, na_last = TRUE, locale = \"en\", numeric = FALSE, ...)^1^`: Return the vector of indexes that sorts a character vector.\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n fruit[str_order(fruit)]\n ```\n :::\n\n\n- `str_sort(x, decreasing = FALSE, na_last = TRUE, locale = \"en\", numeric = FALSE, ...)^1^`: Sort a character vector.\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n str_sort(fruit)\n ```\n :::\n\n\n## Helpers\n\n- `str_conv(string, encoding)`: Override the encoding of a string.\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n str_conv(fruit, \"ISO-8859-1\")\n ```\n :::\n\n\n- `str_view(string, pattern, match = NA)`: View HTML rendering of all regex matches.\n Also `str_view()` to see only the first match.\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n str_view(sentences, \"[aeiou]\")\n ```\n :::\n\n\n- `str_equal(x, y, locale = \"en\", ignore_case = FALSE, ...)`^1^: Determine if two strings are equivalent.\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n str_equal(c(\"a\", \"b\"), c(\"a\", \"c\"))\n ```\n :::\n\n\n- `str_wrap(string, width = 80, indent = 0, exdent = 0)`: Wrap strings into nicely formatted paragraphs.\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n str_wrap(sentences, 20)\n ```\n :::\n\n\n^1^ See for a complete list of locales.\n\n\n\n## Regular Expressions\n\nRegular expressions, or *regexps*, are a concise language for describing patterns in strings.\n\n### Need to Know\n\nPattern arguments in stringr are interpreted as regular expressions *after any special characters have been parsed*.\n\nIn R, you write regular expressions as *strings*, sequences of characters surrounded by quotes(`\"\"`) or single quotes (`''`).\n\nSome characters cannot be directly represented in an R string.\nThese must be represented as **special characters**, sequences of characters that have a specific meaning, e.g. `\\\\` represents `\\`, `\\\"` represents `\"`, and `\\n` represents a new line.\nRun `?\"'\"` to see a complete list.\n\nBecause of this, whenever a `\\` appears in a regular expression, you must write it as `\\\\` in the string that represents the regular expression.\n\nUse `writeLines()` to see how R views your string after all special characters have been parsed.\n\nFor example, `writeLines(\"\\\\.\")` will be parsed as `\\.`\n\nand `writeLines(\"\\\\ is a backslash\")` will be parsed as `\\ is a backslash`.\n\n### Interpretation\n\nPatterns in stringr are interpreted as regexs.\nTo change this default, wrap the pattern in one of:\n\n- `regex(pattern, ignore_case = FALSE, multiline = FALSE, comments = FALSE, dotall = FALSE, ...)`: Modifies a regex to ignore cases, match end of lines as well as end of strings, allow R comments within regexs, and/or to have `.` match everthing including `\\n`.\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n str_detect(\"I\", regex(\"i\", TRUE))\n ```\n :::\n\n\n- `fixed()`: Matches raw bytes but will miss some characters that can be represented in multiple ways (fast).\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n str_detect(\"\\u0130\", fixed(\"i\"))\n ```\n :::\n\n\n- `coll()`: Matches raw bytes and will use locale specific collation rules to recognize characters that can be represented in multiple ways (slow).\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n str_detect(\"\\u0130\", coll(\"i\", TRUE, locale = \"tr\"))\n ```\n :::\n\n\n- `boundary()`: Matches boundaries between characters, line_breaks, sentences, or words.\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n str_split(sentences, boundary(\"word\"))\n ```\n :::\n\n\n### Match Characters\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsee <- function(rx) str_view(\"abc ABC 123\\t.!?\\\\(){}\\n\", rx)\n```\n:::\n\n\n+-------------+----------------+-----------------------------------+----------------------+-------------------------------------------------------------------+\n| string\\ | regex\\ | matches\\ | example | example output (highlighted characters are in \\<\\>) |\n| (type this) | (to mean this) | (which matches this) | | |\n+=============+================+===================================+======================+===================================================================+\n| | `a (etc.)` | `a (etc.)` | `see(\"a\")` | ``` |\n| | | | | bc ABC 123\\t.!?\\(){}\\n |\n| | | | | ``` |\n+-------------+----------------+-----------------------------------+----------------------+-------------------------------------------------------------------+\n| `\\\\.` | `\\.` | `.` | ``` see(\"\\\\.\")`` ``` | ``` |\n| | | | | abc ABC 123\\t<.>!?\\(){}\\n |\n| | | | | ``` |\n+-------------+----------------+-----------------------------------+----------------------+-------------------------------------------------------------------+\n| `\\\\!` | `\\!` | `!` | `see(\"\\\\!\")` | ``` |\n| | | | | abc ABC 123\\t.?\\(){}\\n |\n| | | | | ``` |\n+-------------+----------------+-----------------------------------+----------------------+-------------------------------------------------------------------+\n| `\\\\?` | `\\?` | `?` | `see(\"\\\\?\")` | ``` |\n| | | | | abc ABC 123\\t.!\\(){}\\n |\n| | | | | ``` |\n+-------------+----------------+-----------------------------------+----------------------+-------------------------------------------------------------------+\n| `\\\\\\\\` | `\\\\` | `\\` | `see(\"\\\\\\\\\")` | ``` |\n| | | | | abc ABC 123\\t.!?<\\>(){}\\n |\n| | | | | ``` |\n+-------------+----------------+-----------------------------------+----------------------+-------------------------------------------------------------------+\n| `\\\\(` | `\\(` | `(` | `see(\"\\\\(\")` | ``` |\n| | | | | abc ABC 123\\t.!?\\<(>){}\\n |\n| | | | | ``` |\n+-------------+----------------+-----------------------------------+----------------------+-------------------------------------------------------------------+\n| `\\\\)` | `\\)` | `)` | `see(\"\\\\)\")` | ``` |\n| | | | | abc ABC 123\\t.!?\\(<)>{}\\n |\n| | | | | ``` |\n+-------------+----------------+-----------------------------------+----------------------+-------------------------------------------------------------------+\n| `\\\\{` | `\\{` | `{` | `see(\"\\\\{\")` | ``` |\n| | | | | abc ABC 123\\t.!?\\()<{>}\\n |\n| | | | | ``` |\n+-------------+----------------+-----------------------------------+----------------------+-------------------------------------------------------------------+\n| `\\\\}` | `\\}` | `}` | `see(\"\\\\}\")` | ``` |\n| | | | | abc ABC 123\\t.!?\\(){<}>\\n |\n| | | | | ``` |\n+-------------+----------------+-----------------------------------+----------------------+-------------------------------------------------------------------+\n| `\\\\n` | `\\n` | new line (return) | `see(\"\\\\n\")` | ``` |\n| | | | | abc ABC 123\\t.!?\\(){}<\\n> |\n| | | | | ``` |\n+-------------+----------------+-----------------------------------+----------------------+-------------------------------------------------------------------+\n| `\\\\t` | `\\t` | tab | `see(\"\\\\t\")` | ``` |\n| | | | | abc ABC 123<\\t>.!?\\(){}\\n |\n| | | | | ``` |\n+-------------+----------------+-----------------------------------+----------------------+-------------------------------------------------------------------+\n| `\\\\s` | `\\s` | any whitespace\\ | `see(\"\\\\s\")` | ``` |\n| | | (`\\S` for non-whitespaces) | | abc< >ABC< >123<\\t>.!?\\(){}<\\n> |\n| | | | | ``` |\n+-------------+----------------+-----------------------------------+----------------------+-------------------------------------------------------------------+\n| `\\\\d` | `\\d` | any digit\\ | `see(\"\\\\d\")` | ``` |\n| | | (`\\D` for non-digits) | | abc ABC <1><2><3>\\t.!?\\(){}\\n |\n| | | | | ``` |\n+-------------+----------------+-----------------------------------+----------------------+-------------------------------------------------------------------+\n| `\\\\w` | `\\w` | any word character\\ | `see(\"\\\\w\")` | ``` |\n| | | (`\\W` for non-word characters) | | <1><2><3>\\t.!?\\(){}\\n |\n| | | | | ``` |\n+-------------+----------------+-----------------------------------+----------------------+-------------------------------------------------------------------+\n| `\\\\b` | `\\b` | word boundaries | `see(\"\\\\b\")` | ``` |\n| | | | | <>abc<> <>ABC<> <>123<>\\t.!?\\(){}\\n |\n| | | | | ``` |\n+-------------+----------------+-----------------------------------+----------------------+-------------------------------------------------------------------+\n| | `[:digit:]`^1^ | digits | `see(\"[:digit:]\")` | ``` |\n| | | | | abc ABC <1><2><3>\\t.!?\\(){}\\n |\n| | | | | ``` |\n+-------------+----------------+-----------------------------------+----------------------+-------------------------------------------------------------------+\n| | `[:alpha:]`^1^ | letters | `see(\"[:alpha:]\")` | ``` |\n| | | | | 123\\t.!?\\(){}\\n |\n| | | | | ``` |\n+-------------+----------------+-----------------------------------+----------------------+-------------------------------------------------------------------+\n| | `[:lower:]`^1^ | lowercase letters | `see(\"[:lower:]\")` | ``` |\n| | | | | ABC 123\\t.!?\\(){}\\n |\n| | | | | ``` |\n+-------------+----------------+-----------------------------------+----------------------+-------------------------------------------------------------------+\n| | `[:upper:]`^1^ | uppercase letters | `see(\"[:upper:]\")` | ``` |\n| | | | | abc 123\\t.!?\\(){}\\n |\n| | | | | ``` |\n+-------------+----------------+-----------------------------------+----------------------+-------------------------------------------------------------------+\n| | `[:alnum:]`^1^ | letters and numbers | `see(\"[:alnum:]\")` | ``` |\n| | | | | <1><2><3>\\t.!?\\(){}\\n |\n| | | | | ``` |\n+-------------+----------------+-----------------------------------+----------------------+-------------------------------------------------------------------+\n| | `[:punct:]`^1^ | punctuation | `see(\"[:punct:]\")` | ``` |\n| | | | | abc ABC 123\\t<.><\\><(><)><{><}>\\n |\n| | | | | ``` |\n+-------------+----------------+-----------------------------------+----------------------+-------------------------------------------------------------------+\n| | `[:graph:]`^1^ | letters, numbers, and punctuation | `see(\"[:graph:]\")` | ``` |\n| | | | | <1><2><3>\\t<.><\\><(><)><{><}>\\n |\n| | | | | ``` |\n+-------------+----------------+-----------------------------------+----------------------+-------------------------------------------------------------------+\n| | `[:space:]`^1^ | space characters (i.e. `\\s`) | `see(\"[:space:]\")` | ``` |\n| | | | | abc< >ABC< >123<\\t>.!?\\(){}<\\n> |\n| | | | | ``` |\n+-------------+----------------+-----------------------------------+----------------------+-------------------------------------------------------------------+\n| | `[:blank:]`^1^ | space and tab (but not new line) | `see(\"[:blank:]\")` | ``` |\n| | | | | abc< >ABC< >123<\\t>.!?\\(){}\\n |\n| | | | | ``` |\n+-------------+----------------+-----------------------------------+----------------------+-------------------------------------------------------------------+\n| | `.` | every character except a new line | `see(\".\")` | ``` |\n| | | | | < >< ><1><2><3><\\t><.><\\><(><)><{><}><\\n> |\n| | | | | ``` |\n+-------------+----------------+-----------------------------------+----------------------+-------------------------------------------------------------------+\n\n: 1Many base R functions require classes to be wrapped in a second set of \\[ \\], e.g. **\\[\\[:digit:\\]\\]**\n\n#### Classes\n\n- The `[:space:]` class includes new line, and the `[:blank:]` class\n - The `[:blank:]` class includes space and tab (`\\t`)\n- The `[:graph:]` class contains all non-space characters, including `[:punct:]`, `[:symbol:]`, `[:alnum:]`, `[:digit:]`, `[:alpha:]`, `[:lower:]`, and `[:upper:]`\n - `[:punct:]` contains punctuation: `. , : ; ? ! / * @ # - _ \" [ ] { } ( )`\n\n - `[:symbol:]` contains symbols: `` | ` = + ^ ~ < > $ ``\n\n - `[:alnum:]` contains alphanumeric characters, including `[:digit:]`, `[:alpha:]`, `[:lower:]`, and `[:upper:]`\n\n - `[:digit:]` contains the digits 0 through 9\n\n - `[:alpha:]` contains letters, including `[:upper:]` and `[:lower:]`\n\n - `[:upper:]` contains uppercase letters and `[:lower:]` contains lowercase letters\n- The regex `.` contains all characters in the above classes, except new line.\n\n### Alternates\n\n`alt <- function(rx) str_view(\"abcde\", rx)`\n\n+-------------+--------------+-----------------+--------------------------------------+\n| regexp | matches | example | example output\\ |\n| | | | (highlighted characters are in \\<\\>) |\n+=============+==============+=================+======================================+\n| `ab|d` | or | `alt(\"ab|d\")` | ``` |\n| | | | ce |\n| | | | ``` |\n+-------------+--------------+-----------------+--------------------------------------+\n| `[abe]` | one of | `alt(\"[abe]\"` | ``` |\n| | | | cd |\n| | | | ``` |\n+-------------+--------------+-----------------+--------------------------------------+\n| `[^abe]` | anything but | `alt(\"[^abe]\")` | ``` |\n| | | | abe |\n| | | | ``` |\n+-------------+--------------+-----------------+--------------------------------------+\n| `[a-c]` | range | `alt(\"[a-c]\")` | ``` |\n| | | | de |\n| | | | ``` |\n+-------------+--------------+-----------------+--------------------------------------+\n\n: Alternates\n\n### Anchors\n\n`anchor <- function(rx) str_view(\"aaa\", rx)`\n\n+--------------------------------------------------------------------------------------------------------------------------------------------+\n| regexp \\| matches \\| example \\| example output\\ |\n| \\| \\| \\| (highlighted characters are in \\<\\>) |\n+============================================================================================================================================+\n| `^a` \\| start of string \\| `anchor(\"^a\")` \\| `| | | aa | | |` |\n+--------------------------------------------------------------------------------------------------------------------------------------------+\n| `a$` \\| end of string \\| `anchor(\"a$\")` \\| `| | | aa | | |` |\n+--------------------------------------------------------------------------------------------------------------------------------------------+\n\n: Anchors\n\n### Look Arounds\n\n`look <- function(rx) str_view(\"bacad\", rx)`\n\n+-------------+-----------------+-------------------+--------------------------------------+\n| regexp | matches | example | example output\\ |\n| | | | (highlighted characters are in \\<\\>) |\n+=============+=================+===================+======================================+\n| `a(?=c)` | followed by | `look(\"a(?=c)\")` | ``` |\n| | | | bcad |\n| | | | ``` |\n+-------------+-----------------+-------------------+--------------------------------------+\n| `a(?!c)` | not followed by | `look(\"a(?!c)\")` | ``` |\n| | | | bacd |\n| | | | ``` |\n+-------------+-----------------+-------------------+--------------------------------------+\n| `(?<=b)a` | preceded by | `look(\"(?<=b)a\")` | ``` |\n| | | | bcad |\n| | | | ``` |\n+-------------+-----------------+-------------------+--------------------------------------+\n| `(?d |\n| | | | ``` |\n+-------------+-----------------+-------------------+--------------------------------------+\n\n: Look arounds\n\n### Quantifiers\n\n`quant <- function(rx) str_view(\".a.aa.aaa\", rx)`\n\n+-------------+---------------------+-------------------+--------------------------------------+\n| regexp | matches | example | example output\\ |\n| | | | (highlighted characters are in \\<\\>) |\n+=============+=====================+===================+======================================+\n| `a?` | zero or one | `quant(\"a?\")` | ``` |\n| | | | <>.<>.<>.<> |\n| | | | ``` |\n+-------------+---------------------+-------------------+--------------------------------------+\n| `a*` | zero or more | `quant(\"a*\")` | ``` |\n| | | | <>.<>.<>.<> |\n| | | | ``` |\n+-------------+---------------------+-------------------+--------------------------------------+\n| `a+` | one or more | `quant(\"a+\")` | ``` |\n| | | | ... |\n| | | | ``` |\n+-------------+---------------------+-------------------+--------------------------------------+\n| `a{n}` | exactly `n` | `quant(\"a{2}\")` | ``` |\n| | | | .a..a |\n| | | | ``` |\n+-------------+---------------------+-------------------+--------------------------------------+\n| `a{n, }` | `n` or more | `quant(\"a{2,}\")` | ``` |\n| | | | .a.. |\n| | | | ``` |\n+-------------+---------------------+-------------------+--------------------------------------+\n| `a{n, m}` | between `n` and `m` | `quant(\"a{2,4}\")` | ``` |\n| | | | .a.. |\n| | | | ``` |\n+-------------+---------------------+-------------------+--------------------------------------+\n\n: Quantifiers\n\n### Groups\n\n`ref <- function(rx) str_view(\"abbaab\", rx)`\n\nUse parentheses to set precedent (order of evaluation) and create groups\n\n+-------------+-----------------+------------------+--------------------------------------+\n| regexp | matches | example | example output\\ |\n| | | | (highlighted characters are in \\<\\>) |\n+=============+=================+==================+======================================+\n| `(ab|d)e` | sets precedence | `alt(\"(ab|d)e\")` | ``` |\n| | | | abc |\n| | | | ``` |\n+-------------+-----------------+------------------+--------------------------------------+\n\n: Groups\n\nUse an escaped number to refer to and duplicate parentheses groups that occur earlier in a pattern.\nRefer to each group by its order of appearance\n\n+-------------+----------------+----------------------+-------------------------------------------+--------------------------------------+\n| string\\ | regexp\\ | matches\\ | example\\ | example output\\ |\n| (type this) | (to mean this) | (which matches this) | (the result is the same as `ref(\"abba\")`) | (highlighted characters are in \\<\\>) |\n+=============+================+======================+===========================================+======================================+\n| `\\\\1` | `\\1` (etc.) | first () group, etc. | `ref(\"(a)(b)\\\\2\\\\1\")` | ``` |\n| | | | | ab |\n| | | | | ``` |\n+-------------+----------------+----------------------+-------------------------------------------+--------------------------------------+\n\n: More groups\n\n------------------------------------------------------------------------\n\nCC BY SA Posit Software, PBC • [info\\@posit.co](mailto:info@posit.co) • [posit.co](https://posit.co)\n\nLearn more at [stringr.tidyverse.org](https://stringr.tidyverse.org).\n\nUpdated: 2023-07.\n\n\n::: {.cell}\n\n```{.r .cell-code}\npackageVersion(\"stringr\")\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] '1.5.0'\n```\n:::\n:::\n\n\n------------------------------------------------------------------------\n", + "engine": "knitr", + "markdown": "---\ntitle: \"String manipulation with stringr :: Cheatsheet\"\ndescription: \" \"\nimage-alt: \"\"\nexecute:\n eval: true\n output: false\n warning: false\n---\n\n::: {.cell .column-margin}\n\"Hex\n

\n

Download PDF

\n\"\"/\n
\n

Translations (PDF)

\n* Portuguese\n* Spanish\n* Vietnamese\n:::\n\n\n\n\n\nThe **stringr** package provides a set of internally consistent tools for working with character strings, i.e. sequences of characters surrounded by quotation marks.\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(stringr)\n```\n:::\n\n\n\n\n\n## Detect Matches\n\n- `str_detect(string, pattern, negate = FALSE)`: Detect the presence of a pattern match in a string.\n Also `str_like()`.\n\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n str(fruit, \"a\")\n ```\n :::\n\n\n\n- `str_starts(string, pattern, negate = FALSE)`: Detect the presence of a pattern match at the beginning of a string.\n Also `str_ends()`.\n\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n str_starts(fruit, \"a\")\n ```\n :::\n\n\n\n- `str_which(string, pattern, negate = FALSE)`: Find the indexes of strings that contain a pattern match.\n\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n str_which(fruit, \"a\")\n ```\n :::\n\n\n\n- `str_locate(string, pattern)`: Locate the positions of pattern matches in a string.\n Also `str_locate_all()`.\n\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n str_locate(fruit, \"a\")\n ```\n :::\n\n\n\n- `str_count(string, pattern)`: Count the number of matches in a string.\n\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n str_count(fruit, \"a\")\n ```\n :::\n\n\n\n## Mutate Strings\n\n- `str_sub() <- value`: Replace substrings by identifying the substrings with `str_sub()` and assigning into the results.\n\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n str_sub(fruit, 1, 3) <- \"str\"\n ```\n :::\n\n\n\n- `str_replace(string, pattern, replacement)`: Replace the first matched pattern in each string.\n Also `str_remove()`.\n\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n str_replace(fruit, \"p\", \"-\")\n ```\n :::\n\n\n\n- `str_replace_all(string, pattern, replacement)`: Replace all matched patterns in each string.\n Also `str_remove_all()`.\n\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n str_replace_all(fruit, \"p\", \"-\")\n ```\n :::\n\n\n\n- `str_to_lower(string, locale = \"en\")`^1^: Convert strings to lower case.\n\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n str_to_lower(sentences)\n ```\n :::\n\n\n\n- `str_to_upper(string, locale = \"en\")`^1^: Convert strings to upper case.\n\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n str_to_upper(sentences)\n ```\n :::\n\n\n\n- `str_to_title(string, locale = \"en\")`^1^: Convert strings to title case.\n Also `str_to_setence()`.\n\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n str_to_title(sentences)\n ```\n :::\n\n\n\n## Subset Strings\n\n- `str_sub(string, start = 1L, end = -1L)`: Extract substrings from a character vector.\n\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n str_sub(fruit, 1, 3)\n str_sub(fruit, -2)\n ```\n :::\n\n\n\n- `str_subset(string, pattern, negate = FALSE)`: Return only the strings that contain a pattern match.\n\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n str_subset(fruit, \"p\")\n ```\n :::\n\n\n\n- `str_extract(string, pattern)`: Return the first pattern match found in each string, as a vector.\n Also `str_extract_all()` to return every pattern match.\n\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n str_extract(fruit, \"[aeiou]\")\n ```\n :::\n\n\n\n- `str_match(string, pattern)`: Return the first pattern match found in each string, as a matrix with a column for each ( ) group in pattern.\n Also `str_match_all()`.\n\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n str_match(sentences, \"(a|the) ([^ +])\")\n ```\n :::\n\n\n\n## Join and Split\n\n- `str_c(..., sep = \"\", collapse = NULL)`: Join multiple strings into a single string.\n\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n str_c(letters, LETTERS)\n ```\n :::\n\n\n\n- `str_flatten(string, collapse = \"\")`: Combines into a single string, separated by collapse.\n\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n str_flatten(fruit, \", \")\n ```\n :::\n\n\n\n- `str_dup(string, times)`: Repeat strings times times.\n Also `str_unique()` to remove duplicates.\n\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n str_dup(fruit, times = 2)\n ```\n :::\n\n\n\n- `str_split_fixed(string, pattern, n)`: Split a vector of strings into a matrix of substrings (splitting at occurrences of a pattern match).\n Also `str_split()` to return a list of substrings and `str_split_n()` to return the nth substring.\n\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n str_split_fixed(sentences, \" \", n = 3)\n ```\n :::\n\n\n\n- `str_glue(..., .sep = \"\", .envir = parent.frame())`: Create a string from strings and {expressions} to evaluate.\n\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n str_glue(\"Pi is {pi}\")\n ```\n :::\n\n\n\n- `str_glue_data(.x, ..., .sep = \"\", .envir = parent.frame(), .na = \"NA\")`: Use a data frame, list, or environment to create a string from strings and {expressions} to evaluate.\n\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n str_glue_data(mtcars, \"{rownames(mtcars)} has {hp} hp\")\n ```\n :::\n\n\n\n## Manage Lengths\n\n- `str_length(string)`: The width of strings (i.e. number of code points, which generally equals the number of characters).\n\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n str_length(fruit)\n ```\n :::\n\n\n\n- `str_pad(string, width, side = c(\"left\", \"right\", \"both\"), pad = \" \")`: Pad strings to constant width.\n\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n str_pad(fruit, 17)\n ```\n :::\n\n\n\n- `str_trunc(string, width, side = c(\"left\", \"right\", \"both\"), ellipsis = \"...\")`: Truncate the width of strings, replacing content with ellipsis.\n\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n str_trunc(sentences, 6)\n ```\n :::\n\n\n\n- `str_trim(string, side = c(\"left\", \"right\", \"both\"))`: Trim whitespace from the start and/or end of a string.\n\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n str_trim(str_pad(fruit, 17))\n ```\n :::\n\n\n\n- `str_squish(string)`: Trim white space from each end and collapse multiple spaces into single spaces.\n\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n str_squish(str_pad(fruit, 17, \"both\"))\n ```\n :::\n\n\n\n## Order Strings\n\n- `str_order(x, decreasing = FALSE, na_last = TRUE, locale = \"en\", numeric = FALSE, ...)^1^`: Return the vector of indexes that sorts a character vector.\n\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n fruit[str_order(fruit)]\n ```\n :::\n\n\n\n- `str_sort(x, decreasing = FALSE, na_last = TRUE, locale = \"en\", numeric = FALSE, ...)^1^`: Sort a character vector.\n\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n str_sort(fruit)\n ```\n :::\n\n\n\n## Helpers\n\n- `str_conv(string, encoding)`: Override the encoding of a string.\n\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n str_conv(fruit, \"ISO-8859-1\")\n ```\n :::\n\n\n\n- `str_view(string, pattern, match = NA)`: View HTML rendering of all regex matches.\n Also `str_view()` to see only the first match.\n\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n str_view(sentences, \"[aeiou]\")\n ```\n :::\n\n\n\n- `str_equal(x, y, locale = \"en\", ignore_case = FALSE, ...)`^1^: Determine if two strings are equivalent.\n\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n str_equal(c(\"a\", \"b\"), c(\"a\", \"c\"))\n ```\n :::\n\n\n\n- `str_wrap(string, width = 80, indent = 0, exdent = 0)`: Wrap strings into nicely formatted paragraphs.\n\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n str_wrap(sentences, 20)\n ```\n :::\n\n\n\n^1^ See for a complete list of locales.\n\n\n\n## Regular Expressions\n\nRegular expressions, or *regexps*, are a concise language for describing patterns in strings.\n\n### Need to Know\n\nPattern arguments in stringr are interpreted as regular expressions *after any special characters have been parsed*.\n\nIn R, you write regular expressions as *strings*, sequences of characters surrounded by quotes(`\"\"`) or single quotes (`''`).\n\nSome characters cannot be directly represented in an R string.\nThese must be represented as **special characters**, sequences of characters that have a specific meaning, e.g. `\\\\` represents `\\`, `\\\"` represents `\"`, and `\\n` represents a new line.\nRun `?\"'\"` to see a complete list.\n\nBecause of this, whenever a `\\` appears in a regular expression, you must write it as `\\\\` in the string that represents the regular expression.\n\nUse `writeLines()` to see how R views your string after all special characters have been parsed.\n\nFor example, `writeLines(\"\\\\.\")` will be parsed as `\\.`\n\nand `writeLines(\"\\\\ is a backslash\")` will be parsed as `\\ is a backslash`.\n\n### Interpretation\n\nPatterns in stringr are interpreted as regexs.\nTo change this default, wrap the pattern in one of:\n\n- `regex(pattern, ignore_case = FALSE, multiline = FALSE, comments = FALSE, dotall = FALSE, ...)`: Modifies a regex to ignore cases, match end of lines as well as end of strings, allow R comments within regexs, and/or to have `.` match everthing including `\\n`.\n\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n str_detect(\"I\", regex(\"i\", TRUE))\n ```\n :::\n\n\n\n- `fixed()`: Matches raw bytes but will miss some characters that can be represented in multiple ways (fast).\n\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n str_detect(\"\\u0130\", fixed(\"i\"))\n ```\n :::\n\n\n\n- `coll()`: Matches raw bytes and will use locale specific collation rules to recognize characters that can be represented in multiple ways (slow).\n\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n str_detect(\"\\u0130\", coll(\"i\", TRUE, locale = \"tr\"))\n ```\n :::\n\n\n\n- `boundary()`: Matches boundaries between characters, line_breaks, sentences, or words.\n\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n str_split(sentences, boundary(\"word\"))\n ```\n :::\n\n\n\n### Match Characters\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsee <- function(rx) str_view(\"abc ABC 123\\t.!?\\\\(){}\\n\", rx)\n```\n:::\n\n\n\n+-------------+----------------+-----------------------------------+----------------------+-------------------------------------------------------------------+\n| string\\ | regex\\ | matches\\ | example | example output (highlighted characters are in \\<\\>) |\n| (type this) | (to mean this) | (which matches this) | | |\n+=============+================+===================================+======================+===================================================================+\n| | `a (etc.)` | `a (etc.)` | `see(\"a\")` | ``` |\n| | | | | bc ABC 123\\t.!?\\(){}\\n |\n| | | | | ``` |\n+-------------+----------------+-----------------------------------+----------------------+-------------------------------------------------------------------+\n| `\\\\.` | `\\.` | `.` | ``` see(\"\\\\.\")`` ``` | ``` |\n| | | | | abc ABC 123\\t<.>!?\\(){}\\n |\n| | | | | ``` |\n+-------------+----------------+-----------------------------------+----------------------+-------------------------------------------------------------------+\n| `\\\\!` | `\\!` | `!` | `see(\"\\\\!\")` | ``` |\n| | | | | abc ABC 123\\t.?\\(){}\\n |\n| | | | | ``` |\n+-------------+----------------+-----------------------------------+----------------------+-------------------------------------------------------------------+\n| `\\\\?` | `\\?` | `?` | `see(\"\\\\?\")` | ``` |\n| | | | | abc ABC 123\\t.!\\(){}\\n |\n| | | | | ``` |\n+-------------+----------------+-----------------------------------+----------------------+-------------------------------------------------------------------+\n| `\\\\\\\\` | `\\\\` | `\\` | `see(\"\\\\\\\\\")` | ``` |\n| | | | | abc ABC 123\\t.!?<\\>(){}\\n |\n| | | | | ``` |\n+-------------+----------------+-----------------------------------+----------------------+-------------------------------------------------------------------+\n| `\\\\(` | `\\(` | `(` | `see(\"\\\\(\")` | ``` |\n| | | | | abc ABC 123\\t.!?\\<(>){}\\n |\n| | | | | ``` |\n+-------------+----------------+-----------------------------------+----------------------+-------------------------------------------------------------------+\n| `\\\\)` | `\\)` | `)` | `see(\"\\\\)\")` | ``` |\n| | | | | abc ABC 123\\t.!?\\(<)>{}\\n |\n| | | | | ``` |\n+-------------+----------------+-----------------------------------+----------------------+-------------------------------------------------------------------+\n| `\\\\{` | `\\{` | `{` | `see(\"\\\\{\")` | ``` |\n| | | | | abc ABC 123\\t.!?\\()<{>}\\n |\n| | | | | ``` |\n+-------------+----------------+-----------------------------------+----------------------+-------------------------------------------------------------------+\n| `\\\\}` | `\\}` | `}` | `see(\"\\\\}\")` | ``` |\n| | | | | abc ABC 123\\t.!?\\(){<}>\\n |\n| | | | | ``` |\n+-------------+----------------+-----------------------------------+----------------------+-------------------------------------------------------------------+\n| `\\\\n` | `\\n` | new line (return) | `see(\"\\\\n\")` | ``` |\n| | | | | abc ABC 123\\t.!?\\(){}<\\n> |\n| | | | | ``` |\n+-------------+----------------+-----------------------------------+----------------------+-------------------------------------------------------------------+\n| `\\\\t` | `\\t` | tab | `see(\"\\\\t\")` | ``` |\n| | | | | abc ABC 123<\\t>.!?\\(){}\\n |\n| | | | | ``` |\n+-------------+----------------+-----------------------------------+----------------------+-------------------------------------------------------------------+\n| `\\\\s` | `\\s` | any whitespace\\ | `see(\"\\\\s\")` | ``` |\n| | | (`\\S` for non-whitespaces) | | abc< >ABC< >123<\\t>.!?\\(){}<\\n> |\n| | | | | ``` |\n+-------------+----------------+-----------------------------------+----------------------+-------------------------------------------------------------------+\n| `\\\\d` | `\\d` | any digit\\ | `see(\"\\\\d\")` | ``` |\n| | | (`\\D` for non-digits) | | abc ABC <1><2><3>\\t.!?\\(){}\\n |\n| | | | | ``` |\n+-------------+----------------+-----------------------------------+----------------------+-------------------------------------------------------------------+\n| `\\\\w` | `\\w` | any word character\\ | `see(\"\\\\w\")` | ``` |\n| | | (`\\W` for non-word characters) | | <1><2><3>\\t.!?\\(){}\\n |\n| | | | | ``` |\n+-------------+----------------+-----------------------------------+----------------------+-------------------------------------------------------------------+\n| `\\\\b` | `\\b` | word boundaries | `see(\"\\\\b\")` | ``` |\n| | | | | <>abc<> <>ABC<> <>123<>\\t.!?\\(){}\\n |\n| | | | | ``` |\n+-------------+----------------+-----------------------------------+----------------------+-------------------------------------------------------------------+\n| | `[:digit:]`^1^ | digits | `see(\"[:digit:]\")` | ``` |\n| | | | | abc ABC <1><2><3>\\t.!?\\(){}\\n |\n| | | | | ``` |\n+-------------+----------------+-----------------------------------+----------------------+-------------------------------------------------------------------+\n| | `[:alpha:]`^1^ | letters | `see(\"[:alpha:]\")` | ``` |\n| | | | | 123\\t.!?\\(){}\\n |\n| | | | | ``` |\n+-------------+----------------+-----------------------------------+----------------------+-------------------------------------------------------------------+\n| | `[:lower:]`^1^ | lowercase letters | `see(\"[:lower:]\")` | ``` |\n| | | | | ABC 123\\t.!?\\(){}\\n |\n| | | | | ``` |\n+-------------+----------------+-----------------------------------+----------------------+-------------------------------------------------------------------+\n| | `[:upper:]`^1^ | uppercase letters | `see(\"[:upper:]\")` | ``` |\n| | | | | abc 123\\t.!?\\(){}\\n |\n| | | | | ``` |\n+-------------+----------------+-----------------------------------+----------------------+-------------------------------------------------------------------+\n| | `[:alnum:]`^1^ | letters and numbers | `see(\"[:alnum:]\")` | ``` |\n| | | | | <1><2><3>\\t.!?\\(){}\\n |\n| | | | | ``` |\n+-------------+----------------+-----------------------------------+----------------------+-------------------------------------------------------------------+\n| | `[:punct:]`^1^ | punctuation | `see(\"[:punct:]\")` | ``` |\n| | | | | abc ABC 123\\t<.><\\><(><)><{><}>\\n |\n| | | | | ``` |\n+-------------+----------------+-----------------------------------+----------------------+-------------------------------------------------------------------+\n| | `[:graph:]`^1^ | letters, numbers, and punctuation | `see(\"[:graph:]\")` | ``` |\n| | | | | <1><2><3>\\t<.><\\><(><)><{><}>\\n |\n| | | | | ``` |\n+-------------+----------------+-----------------------------------+----------------------+-------------------------------------------------------------------+\n| | `[:space:]`^1^ | space characters (i.e. `\\s`) | `see(\"[:space:]\")` | ``` |\n| | | | | abc< >ABC< >123<\\t>.!?\\(){}<\\n> |\n| | | | | ``` |\n+-------------+----------------+-----------------------------------+----------------------+-------------------------------------------------------------------+\n| | `[:blank:]`^1^ | space and tab (but not new line) | `see(\"[:blank:]\")` | ``` |\n| | | | | abc< >ABC< >123<\\t>.!?\\(){}\\n |\n| | | | | ``` |\n+-------------+----------------+-----------------------------------+----------------------+-------------------------------------------------------------------+\n| | `.` | every character except a new line | `see(\".\")` | ``` |\n| | | | | < >< ><1><2><3><\\t><.><\\><(><)><{><}><\\n> |\n| | | | | ``` |\n+-------------+----------------+-----------------------------------+----------------------+-------------------------------------------------------------------+\n\n: 1Many base R functions require classes to be wrapped in a second set of \\[ \\], e.g. **\\[\\[:digit:\\]\\]**\n\n#### Classes\n\n- The `[:space:]` class includes new line, and the `[:blank:]` class\n - The `[:blank:]` class includes space and tab (`\\t`)\n- The `[:graph:]` class contains all non-space characters, including `[:punct:]`, `[:symbol:]`, `[:alnum:]`, `[:digit:]`, `[:alpha:]`, `[:lower:]`, and `[:upper:]`\n - `[:punct:]` contains punctuation: `. , : ; ? ! / * @ # - _ \" [ ] { } ( )`\n\n - `[:symbol:]` contains symbols: `` | ` = + ^ ~ < > $ ``\n\n - `[:alnum:]` contains alphanumeric characters, including `[:digit:]`, `[:alpha:]`, `[:lower:]`, and `[:upper:]`\n\n - `[:digit:]` contains the digits 0 through 9\n\n - `[:alpha:]` contains letters, including `[:upper:]` and `[:lower:]`\n\n - `[:upper:]` contains uppercase letters and `[:lower:]` contains lowercase letters\n- The regex `.` contains all characters in the above classes, except new line.\n\n### Alternates\n\n`alt <- function(rx) str_view(\"abcde\", rx)`\n\n+----------+--------------+-----------------+--------------------------------------+\n| regexp | matches | example | example output\\ |\n| | | | (highlighted characters are in \\<\\>) |\n+==========+==============+=================+======================================+\n| `ab|d` | or | `alt(\"ab|d\")` | ``` |\n| | | | ce |\n| | | | ``` |\n+----------+--------------+-----------------+--------------------------------------+\n| `[abe]` | one of | `alt(\"[abe]\"` | ``` |\n| | | | cd |\n| | | | ``` |\n+----------+--------------+-----------------+--------------------------------------+\n| `[^abe]` | anything but | `alt(\"[^abe]\")` | ``` |\n| | | | abe |\n| | | | ``` |\n+----------+--------------+-----------------+--------------------------------------+\n| `[a-c]` | range | `alt(\"[a-c]\")` | ``` |\n| | | | de |\n| | | | ``` |\n+----------+--------------+-----------------+--------------------------------------+\n\n: Alternates\n\n### Anchors\n\n`anchor <- function(rx) str_view(\"aaa\", rx)`\n\n+--------------------------------------------------------------------------------------------------------------------------------------------+\n| regexp \\| matches \\| example \\| example output\\ |\n| \\| \\| \\| (highlighted characters are in \\<\\>) |\n+============================================================================================================================================+\n| `^a` \\| start of string \\| `anchor(\"^a\")` \\| `| | | aa | | |` |\n+--------------------------------------------------------------------------------------------------------------------------------------------+\n| `a$` \\| end of string \\| `anchor(\"a$\")` \\| `| | | aa | | |` |\n+--------------------------------------------------------------------------------------------------------------------------------------------+\n\n: Anchors\n\n### Look Arounds\n\n`look <- function(rx) str_view(\"bacad\", rx)`\n\n+-----------+-----------------+-------------------+--------------------------------------+\n| regexp | matches | example | example output\\ |\n| | | | (highlighted characters are in \\<\\>) |\n+===========+=================+===================+======================================+\n| `a(?=c)` | followed by | `look(\"a(?=c)\")` | ``` |\n| | | | bcad |\n| | | | ``` |\n+-----------+-----------------+-------------------+--------------------------------------+\n| `a(?!c)` | not followed by | `look(\"a(?!c)\")` | ``` |\n| | | | bacd |\n| | | | ``` |\n+-----------+-----------------+-------------------+--------------------------------------+\n| `(?<=b)a` | preceded by | `look(\"(?<=b)a\")` | ``` |\n| | | | bcad |\n| | | | ``` |\n+-----------+-----------------+-------------------+--------------------------------------+\n| `(?d |\n| | | | ``` |\n+-----------+-----------------+-------------------+--------------------------------------+\n\n: Look arounds\n\n### Quantifiers\n\n`quant <- function(rx) str_view(\".a.aa.aaa\", rx)`\n\n+-----------+---------------------+-------------------+--------------------------------------+\n| regexp | matches | example | example output\\ |\n| | | | (highlighted characters are in \\<\\>) |\n+===========+=====================+===================+======================================+\n| `a?` | zero or one | `quant(\"a?\")` | ``` |\n| | | | <>.<>.<>.<> |\n| | | | ``` |\n+-----------+---------------------+-------------------+--------------------------------------+\n| `a*` | zero or more | `quant(\"a*\")` | ``` |\n| | | | <>.<>.<>.<> |\n| | | | ``` |\n+-----------+---------------------+-------------------+--------------------------------------+\n| `a+` | one or more | `quant(\"a+\")` | ``` |\n| | | | ... |\n| | | | ``` |\n+-----------+---------------------+-------------------+--------------------------------------+\n| `a{n}` | exactly `n` | `quant(\"a{2}\")` | ``` |\n| | | | .a..a |\n| | | | ``` |\n+-----------+---------------------+-------------------+--------------------------------------+\n| `a{n, }` | `n` or more | `quant(\"a{2,}\")` | ``` |\n| | | | .a.. |\n| | | | ``` |\n+-----------+---------------------+-------------------+--------------------------------------+\n| `a{n, m}` | between `n` and `m` | `quant(\"a{2,4}\")` | ``` |\n| | | | .a.. |\n| | | | ``` |\n+-----------+---------------------+-------------------+--------------------------------------+\n\n: Quantifiers\n\n### Groups\n\n`ref <- function(rx) str_view(\"abbaab\", rx)`\n\nUse parentheses to set precedent (order of evaluation) and create groups\n\n+-----------+-----------------+------------------+--------------------------------------+\n| regexp | matches | example | example output\\ |\n| | | | (highlighted characters are in \\<\\>) |\n+===========+=================+==================+======================================+\n| `(ab|d)e` | sets precedence | `alt(\"(ab|d)e\")` | ``` |\n| | | | abc |\n| | | | ``` |\n+-----------+-----------------+------------------+--------------------------------------+\n\n: Groups\n\nUse an escaped number to refer to and duplicate parentheses groups that occur earlier in a pattern.\nRefer to each group by its order of appearance\n\n+-------------+----------------+----------------------+-------------------------------------------+--------------------------------------+\n| string\\ | regexp\\ | matches\\ | example\\ | example output\\ |\n| (type this) | (to mean this) | (which matches this) | (the result is the same as `ref(\"abba\")`) | (highlighted characters are in \\<\\>) |\n+=============+================+======================+===========================================+======================================+\n| `\\\\1` | `\\1` (etc.) | first () group, etc. | `ref(\"(a)(b)\\\\2\\\\1\")` | ``` |\n| | | | | ab |\n| | | | | ``` |\n+-------------+----------------+----------------------+-------------------------------------------+--------------------------------------+\n\n: More groups\n\n------------------------------------------------------------------------\n\nCC BY SA Posit Software, PBC • [info\\@posit.co](mailto:info@posit.co) • [posit.co](https://posit.co)\n\nLearn more at [stringr.tidyverse.org](https://stringr.tidyverse.org).\n\nUpdated: 2024-05.\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\npackageVersion(\"stringr\")\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n[1] '1.5.1'\n```\n\n\n:::\n:::\n\n\n\n------------------------------------------------------------------------\n", "supporting": [], "filters": [ "rmarkdown/pagebreak.lua" diff --git a/html/images/logo-stringr.png b/html/images/logo-stringr.png index 270b58db..c485eb3e 100644 Binary files a/html/images/logo-stringr.png and b/html/images/logo-stringr.png differ diff --git a/html/strings.qmd b/html/strings.qmd index 226d4e25..7e910e60 100644 --- a/html/strings.qmd +++ b/html/strings.qmd @@ -442,26 +442,26 @@ see <- function(rx) str_view("abc ABC 123\t.!?\\(){}\n", rx) `alt <- function(rx) str_view("abcde", rx)` -+-------------+--------------+-----------------+--------------------------------------+ -| regexp | matches | example | example output\ | -| | | | (highlighted characters are in \<\>) | -+=============+==============+=================+======================================+ -| `ab|d` | or | `alt("ab|d")` | ``` | -| | | | ce | -| | | | ``` | -+-------------+--------------+-----------------+--------------------------------------+ -| `[abe]` | one of | `alt("[abe]"` | ``` | -| | | | cd | -| | | | ``` | -+-------------+--------------+-----------------+--------------------------------------+ -| `[^abe]` | anything but | `alt("[^abe]")` | ``` | -| | | | abe | -| | | | ``` | -+-------------+--------------+-----------------+--------------------------------------+ -| `[a-c]` | range | `alt("[a-c]")` | ``` | -| | | | de | -| | | | ``` | -+-------------+--------------+-----------------+--------------------------------------+ ++----------+--------------+-----------------+--------------------------------------+ +| regexp | matches | example | example output\ | +| | | | (highlighted characters are in \<\>) | ++==========+==============+=================+======================================+ +| `ab|d` | or | `alt("ab|d")` | ``` | +| | | | ce | +| | | | ``` | ++----------+--------------+-----------------+--------------------------------------+ +| `[abe]` | one of | `alt("[abe]"` | ``` | +| | | | cd | +| | | | ``` | ++----------+--------------+-----------------+--------------------------------------+ +| `[^abe]` | anything but | `alt("[^abe]")` | ``` | +| | | | abe | +| | | | ``` | ++----------+--------------+-----------------+--------------------------------------+ +| `[a-c]` | range | `alt("[a-c]")` | ``` | +| | | | de | +| | | | ``` | ++----------+--------------+-----------------+--------------------------------------+ : Alternates @@ -484,26 +484,26 @@ see <- function(rx) str_view("abc ABC 123\t.!?\\(){}\n", rx) `look <- function(rx) str_view("bacad", rx)` -+-------------+-----------------+-------------------+--------------------------------------+ -| regexp | matches | example | example output\ | -| | | | (highlighted characters are in \<\>) | -+=============+=================+===================+======================================+ -| `a(?=c)` | followed by | `look("a(?=c)")` | ``` | -| | | | bcad | -| | | | ``` | -+-------------+-----------------+-------------------+--------------------------------------+ -| `a(?!c)` | not followed by | `look("a(?!c)")` | ``` | -| | | | bacd | -| | | | ``` | -+-------------+-----------------+-------------------+--------------------------------------+ -| `(?<=b)a` | preceded by | `look("(?<=b)a")` | ``` | -| | | | bcad | -| | | | ``` | -+-------------+-----------------+-------------------+--------------------------------------+ -| `(?d | -| | | | ``` | -+-------------+-----------------+-------------------+--------------------------------------+ ++-----------+-----------------+-------------------+--------------------------------------+ +| regexp | matches | example | example output\ | +| | | | (highlighted characters are in \<\>) | ++===========+=================+===================+======================================+ +| `a(?=c)` | followed by | `look("a(?=c)")` | ``` | +| | | | bcad | +| | | | ``` | ++-----------+-----------------+-------------------+--------------------------------------+ +| `a(?!c)` | not followed by | `look("a(?!c)")` | ``` | +| | | | bacd | +| | | | ``` | ++-----------+-----------------+-------------------+--------------------------------------+ +| `(?<=b)a` | preceded by | `look("(?<=b)a")` | ``` | +| | | | bcad | +| | | | ``` | ++-----------+-----------------+-------------------+--------------------------------------+ +| `(?d | +| | | | ``` | ++-----------+-----------------+-------------------+--------------------------------------+ : Look arounds @@ -511,34 +511,34 @@ see <- function(rx) str_view("abc ABC 123\t.!?\\(){}\n", rx) `quant <- function(rx) str_view(".a.aa.aaa", rx)` -+-------------+---------------------+-------------------+--------------------------------------+ -| regexp | matches | example | example output\ | -| | | | (highlighted characters are in \<\>) | -+=============+=====================+===================+======================================+ -| `a?` | zero or one | `quant("a?")` | ``` | -| | | | <>.<>.<>.<> | -| | | | ``` | -+-------------+---------------------+-------------------+--------------------------------------+ -| `a*` | zero or more | `quant("a*")` | ``` | -| | | | <>.<>.<>.<> | -| | | | ``` | -+-------------+---------------------+-------------------+--------------------------------------+ -| `a+` | one or more | `quant("a+")` | ``` | -| | | | ... | -| | | | ``` | -+-------------+---------------------+-------------------+--------------------------------------+ -| `a{n}` | exactly `n` | `quant("a{2}")` | ``` | -| | | | .a..a | -| | | | ``` | -+-------------+---------------------+-------------------+--------------------------------------+ -| `a{n, }` | `n` or more | `quant("a{2,}")` | ``` | -| | | | .a.. | -| | | | ``` | -+-------------+---------------------+-------------------+--------------------------------------+ -| `a{n, m}` | between `n` and `m` | `quant("a{2,4}")` | ``` | -| | | | .a.. | -| | | | ``` | -+-------------+---------------------+-------------------+--------------------------------------+ ++-----------+---------------------+-------------------+--------------------------------------+ +| regexp | matches | example | example output\ | +| | | | (highlighted characters are in \<\>) | ++===========+=====================+===================+======================================+ +| `a?` | zero or one | `quant("a?")` | ``` | +| | | | <>.<>.<>.<> | +| | | | ``` | ++-----------+---------------------+-------------------+--------------------------------------+ +| `a*` | zero or more | `quant("a*")` | ``` | +| | | | <>.<>.<>.<> | +| | | | ``` | ++-----------+---------------------+-------------------+--------------------------------------+ +| `a+` | one or more | `quant("a+")` | ``` | +| | | | ... | +| | | | ``` | ++-----------+---------------------+-------------------+--------------------------------------+ +| `a{n}` | exactly `n` | `quant("a{2}")` | ``` | +| | | | .a..a | +| | | | ``` | ++-----------+---------------------+-------------------+--------------------------------------+ +| `a{n, }` | `n` or more | `quant("a{2,}")` | ``` | +| | | | .a.. | +| | | | ``` | ++-----------+---------------------+-------------------+--------------------------------------+ +| `a{n, m}` | between `n` and `m` | `quant("a{2,4}")` | ``` | +| | | | .a.. | +| | | | ``` | ++-----------+---------------------+-------------------+--------------------------------------+ : Quantifiers @@ -548,14 +548,14 @@ see <- function(rx) str_view("abc ABC 123\t.!?\\(){}\n", rx) Use parentheses to set precedent (order of evaluation) and create groups -+-------------+-----------------+------------------+--------------------------------------+ -| regexp | matches | example | example output\ | -| | | | (highlighted characters are in \<\>) | -+=============+=================+==================+======================================+ -| `(ab|d)e` | sets precedence | `alt("(ab|d)e")` | ``` | -| | | | abc | -| | | | ``` | -+-------------+-----------------+------------------+--------------------------------------+ ++-----------+-----------------+------------------+--------------------------------------+ +| regexp | matches | example | example output\ | +| | | | (highlighted characters are in \<\>) | ++===========+=================+==================+======================================+ +| `(ab|d)e` | sets precedence | `alt("(ab|d)e")` | ``` | +| | | | abc | +| | | | ``` | ++-----------+-----------------+------------------+--------------------------------------+ : Groups diff --git a/keynotes/strings.key b/keynotes/strings.key index 53d0fbaf..b8a7fb48 100644 Binary files a/keynotes/strings.key and b/keynotes/strings.key differ diff --git a/pngs/strings/strings.001.png b/pngs/strings/strings.001.png new file mode 100644 index 00000000..420269c5 Binary files /dev/null and b/pngs/strings/strings.001.png differ diff --git a/strings.pdf b/strings.pdf index 43a62de7..8cef4fa7 100644 Binary files a/strings.pdf and b/strings.pdf differ