1.0.0 release candidate (#1017)

* version and doc refresh * doc refresh * Undo the default set in #951 to stop reverse dependency breakage
tidymodels · Jul 1, 2022 · 41bd8bf · 41bd8bf
1 parent ee02196
commit 41bd8bf
Show file tree

Hide file tree

Showing 9 changed files with 43 additions and 39 deletions.
diff --git a/DESCRIPTION b/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: recipes
 Title: Preprocessing and Feature Engineering Steps for Modeling
-Version: 0.2.0.9002
+Version: 1.0.0
 Authors@R: c(
     person("Max", "Kuhn", , "[email protected]", role = c("aut", "cre")),
     person("Hadley", "Wickham", , "[email protected]", role = "aut"),
@@ -44,7 +44,7 @@ Imports:
 Suggests: 
     covr,
     ddalpha,
-    dials (>= 0.0.10.9001),
+    dials (>= 1.0.0),
     ggplot2,
     igraph,
     kernlab,

diff --git a/NEWS.md b/NEWS.md
@@ -1,8 +1,6 @@
-# recipes (development version)
+# recipes (1.0.0
 
-* recipes now checks that all columns in the `data` supplied to `recipe()` are also present in the `new_data` supplied to `bake()`. An exception is made for columns with roles of either `"outcome"` or `"case_weights"`, which are typically not required at `bake()` time. The new `update_role_requirements()` function can be used to adjust whether or not columns of a particular role are required at `bake()` time if you need to opt out of this check (#1011).
-
-* The `summary()` method for recipe objects now contains an extra column to indicate which columns are required when `bake()` is used. 
+## Improvements and Other Changes
 
 * Added support for case weights in the following steps
     - `step_center()`
@@ -24,32 +22,42 @@
 
 * A number of developer focused functions to deal with case weights are added: `are_weights_used()`, `get_case_weights()`, `averages()`, `medians()`, `variances()`, `correlations()`, `covariances()`, and `pca_wts()`
 
+* recipes now checks that all columns in the `data` supplied to `recipe()` are also present in the `new_data` supplied to `bake()`. An exception is made for columns with roles of either `"outcome"` or `"case_weights"`, which are typically not required at `bake()` time. The new `update_role_requirements()` function can be used to adjust whether or not columns of a particular role are required at `bake()` time if you need to opt out of this check (#1011).
+
+* The `summary()` method for recipe objects now contains an extra column to indicate which columns are required when `bake()` is used. 
+
+## New Steps
+
+* `step_time()` has been added that extracts time features such as hour, minute, or second. (#968)
+
+## Bug Fixes
+
 * Fixed bug in which functions that `step_hyperbolic()` uses (#932).
 
 * `step_dummy_multi_choice()` now respects factor-levels of the selected variables when creating dummies. (#916)
 
-* Finally removed `step_upsample()` and `step_downsample()` in recipes as they are now available in the themis package.
+* `step_dummy()` no works correctly with recipes trained on version 0.1.17 or earlier. (#921)
+
+* Fixed a bug where setting `fresh = TRUE` in `prep()` wouldn't result in re-prepping the recipe. (#492)
 
-* `discretize()` and `step_discretize()` now defaults to returning factor levels similar to `cut()` by default, in line with `step_discretize_*()` steps from the embed package. (#674)
+* Bug was fixed in `step_holiday()` which used to error when it was applied to variable with missing values. (#743)
 
-* `step_dummy()` no works correctly with recipes trained on version 0.1.17 or earlier. (#921)
+* A bug was fixed in `step_normalize()` which used to error if 1 variable was selected. (#963)
+
+## Improvements and Other Changes
+
+* Finally removed `step_upsample()` and `step_downsample()` in recipes as they are now available in the themis package.
+
+* `discretize()` and `step_discretize()` now can return factor levels similar to `cut()`. (#674)
 
 * `step_naomit()` now actually had their defaults for `skip` changed to `TRUE` as was stated in release  0.1.13. (934)
 
 * `step_dummy()` has been made more robust to non-standard column names. (#879)
 
 * `step_pls()` now allows you use use multiple outcomes if they are numeric. (#651)
 
-* Fixed a bug where setting `fresh = TRUE` in `prep()` wouldn't result in re-prepping the recipe. (#492)
-
 * `step_normalize()` and `step_scale()` ignore columns with zero variance, generate a warning and suggest to use `step_zv()` (#920).
 
-* Bug was fixed in `step_holiday()` which used to error when it was applied to variable with missing values. (#743)
-
-* A bug was fixed in `step_normalize()` which used to error if 1 variable was selected. (#963)
-
-* `step_time()` has been added that extracts time features such as hour, minute, or second. (#968)
-
 * printing for `step_impute_knn()` now show variables that were imputed instead of variables used for imputing. (#837)
 
 * `step_discretize()` and `discretize()` will automatically remove missing values if `keep_na = TRUE`, removing the need to specify `keep_na = TRUE` and `na.rm = TRUE`. (#982)

diff --git a/R/discretize.R b/R/discretize.R
@@ -26,7 +26,7 @@ discretize.default <- function(x, ...) {
 #'  for the factor levels (e.g. `bin1`, `bin2`, ...). If
 #'  the string is not a valid R name, it is coerced to one.
 #'  If `prefix = NULL` then the factor levels will be labelled
-#'  according to the output of `cut()`. Defaults to `NULL`.
+#'  according to the output of `cut()`.
 #' @param keep_na A logical for whether a factor level should be
 #'  created to identify missing values in `x`. If `keep_na` is
 #'  set to `TRUE` then `na.rm = TRUE` is used when calling
@@ -79,7 +79,7 @@ discretize.numeric <-
   function(x,
            cuts = 4,
            labels = NULL,
-           prefix = NULL,
+           prefix = "bin",
            keep_na = TRUE,
            infs = TRUE,
            min_unique = 10,
@@ -292,7 +292,7 @@ step_discretize <- function(recipe,
                             num_breaks = 4,
                             min_unique = 10,
                             objects = NULL,
-                            options = list(),
+                            options = list(prefix = "bin"),
                             skip = FALSE,
                             id = rand_id("discretize")) {
   if (any(names(options) %in% c("cuts", "min_unique"))) {

diff --git a/README.md b/README.md
@@ -4,7 +4,7 @@
 [![R-CMD-check](https://github.com/tidymodels/recipes/workflows/R-CMD-check/badge.svg)](https://github.com/tidymodels/recipes/actions)
 [![Codecov test
 coverage](https://codecov.io/gh/tidymodels/recipes/branch/main/graph/badge.svg)](https://app.codecov.io/gh/tidymodels/recipes?branch=main)
-[![CRAN\_Status\_Badge](https://www.r-pkg.org/badges/version/recipes)](https://CRAN.R-project.org/package=recipes)
+[![CRAN_Status_Badge](https://www.r-pkg.org/badges/version/recipes)](https://CRAN.R-project.org/package=recipes)
 [![Downloads](https://cranlogs.r-pkg.org/badges/recipes)](https://CRAN.R-project.org/package=recipes)
 [![lifecycle](https://img.shields.io/badge/lifecycle-stable-brightgreen.svg)](https://lifecycle.r-lib.org/articles/stages.html)
 
@@ -68,10 +68,6 @@ devtools::install_github("tidymodels/recipes")
 
 ## Contributing
 
-This project is released with a [Contributor Code of
-Conduct](https://contributor-covenant.org/version/2/0/CODE_OF_CONDUCT.html).
-By contributing to this project, you agree to abide by its terms.
-
 -   For questions and discussions about tidymodels packages, modeling,
     and machine learning, please [post on RStudio
     Community](https://community.rstudio.com/c/ml/15).

diff --git a/man/discretize.Rd b/man/discretize.Rd
diff --git a/man/roles.Rd b/man/roles.Rd
diff --git a/man/step_discretize.Rd b/man/step_discretize.Rd
diff --git a/man/summary.recipe.Rd b/man/summary.recipe.Rd
diff --git a/tests/testthat/test_discretized.R b/tests/testthat/test_discretized.R
@@ -17,39 +17,39 @@ lvls_breaks_4 <- c("[missing]", "[-Inf,25.8]", "(25.8,50.5]", "(50.5,75.2]", "(7
 lvls_breaks_4_bin <- c("bin_missing", "bin1", "bin2", "bin3", "bin4")
 
 test_that("default args", {
-  bin_1 <- discretize(ex_tr$x1)
+  bin_1 <- discretize(ex_tr$x1, prefix = NULL)
   pred_1 <- predict(bin_1, ex_te$x1)
   exp_1 <- factor(lvls_breaks_4[c(2, 3, 5, 1)], levels = lvls_breaks_4)
   expect_equal(pred_1, exp_1)
 
-  bin_1 <- discretize(ex_tr$x1, prefix = "bin")
+  bin_1 <- discretize(ex_tr$x1)
   pred_1 <- predict(bin_1, ex_te$x1)
   exp_1 <- factor(c("bin1", "bin2", "bin4", "bin_missing"), levels = lvls_breaks_4_bin)
   expect_equal(pred_1, exp_1)
 })
 
 test_that("NA values", {
-  bin_2 <- discretize(ex_tr$x1, keep_na = FALSE)
+  bin_2 <- discretize(ex_tr$x1, keep_na = FALSE, prefix = NULL)
   pred_2 <- predict(bin_2, ex_te$x1)
   exp_2 <- factor(lvls_breaks_4[c(2, 3, 5, NA)], levels = lvls_breaks_4[-1])
   expect_equal(pred_2, exp_2)
 
-  bin_2 <- discretize(ex_tr$x1, keep_na = FALSE, prefix = "bin")
+  bin_2 <- discretize(ex_tr$x1, keep_na = FALSE)
   pred_2 <- predict(bin_2, ex_te$x1)
   exp_2 <- factor(c("bin1", "bin2", "bin4", NA), levels = lvls_breaks_4_bin[-1])
   expect_equal(pred_2, exp_2)
 })
 
 test_that("NA values from out of range", {
-  bin_3 <- discretize(ex_tr$x1, keep_na = FALSE, infs = FALSE)
+  bin_3 <- discretize(ex_tr$x1, keep_na = FALSE, infs = FALSE, prefix = NULL)
   pred_3 <- predict(bin_3, ex_te$x1)
   exp_3 <- factor(
     c("[1,25.8]", "(25.8,50.5]", NA, NA),
     levels = c("[1,25.8]", "(25.8,50.5]", "(50.5,75.2]", "(75.2,100]")
   )
   expect_equal(pred_3, exp_3)
 
-  bin_3 <- discretize(ex_tr$x1, keep_na = FALSE, infs = FALSE, prefix = "bin")
+  bin_3 <- discretize(ex_tr$x1, keep_na = FALSE, infs = FALSE)
   pred_3 <- predict(bin_3, ex_te$x1)
   exp_3 <- factor(c("bin1", "bin2", NA, NA), levels = lvls_breaks_4_bin[-1])
   expect_equal(pred_3, exp_3)
@@ -104,9 +104,9 @@ test_that("tidys", {
 
 test_that("bad args", {
   expect_snapshot(error = TRUE,
-    recipe(~., data = ex_tr) %>%
-      step_discretize(x1, num_breaks = 1) %>%
-      prep()
+                  recipe(~., data = ex_tr) %>%
+                    step_discretize(x1, num_breaks = 1) %>%
+                    prep()
   )
   expect_snapshot(
     recipe(~., data = ex_tr) %>%