Skip to content

Commit

Permalink
1.0.0 release candidate (#1017)
Browse files Browse the repository at this point in the history
* version and doc refresh

* doc refresh

* Undo the default set in #951 to stop reverse dependency breakage
  • Loading branch information
topepo authored Jul 1, 2022
1 parent ee02196 commit 41bd8bf
Show file tree
Hide file tree
Showing 9 changed files with 43 additions and 39 deletions.
4 changes: 2 additions & 2 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Package: recipes
Title: Preprocessing and Feature Engineering Steps for Modeling
Version: 0.2.0.9002
Version: 1.0.0
Authors@R: c(
person("Max", "Kuhn", , "[email protected]", role = c("aut", "cre")),
person("Hadley", "Wickham", , "[email protected]", role = "aut"),
Expand Down Expand Up @@ -44,7 +44,7 @@ Imports:
Suggests:
covr,
ddalpha,
dials (>= 0.0.10.9001),
dials (>= 1.0.0),
ggplot2,
igraph,
kernlab,
Expand Down
38 changes: 23 additions & 15 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,6 @@
# recipes (development version)
# recipes (1.0.0

* recipes now checks that all columns in the `data` supplied to `recipe()` are also present in the `new_data` supplied to `bake()`. An exception is made for columns with roles of either `"outcome"` or `"case_weights"`, which are typically not required at `bake()` time. The new `update_role_requirements()` function can be used to adjust whether or not columns of a particular role are required at `bake()` time if you need to opt out of this check (#1011).

* The `summary()` method for recipe objects now contains an extra column to indicate which columns are required when `bake()` is used.
## Improvements and Other Changes

* Added support for case weights in the following steps
- `step_center()`
Expand All @@ -24,32 +22,42 @@

* A number of developer focused functions to deal with case weights are added: `are_weights_used()`, `get_case_weights()`, `averages()`, `medians()`, `variances()`, `correlations()`, `covariances()`, and `pca_wts()`

* recipes now checks that all columns in the `data` supplied to `recipe()` are also present in the `new_data` supplied to `bake()`. An exception is made for columns with roles of either `"outcome"` or `"case_weights"`, which are typically not required at `bake()` time. The new `update_role_requirements()` function can be used to adjust whether or not columns of a particular role are required at `bake()` time if you need to opt out of this check (#1011).

* The `summary()` method for recipe objects now contains an extra column to indicate which columns are required when `bake()` is used.

## New Steps

* `step_time()` has been added that extracts time features such as hour, minute, or second. (#968)

## Bug Fixes

* Fixed bug in which functions that `step_hyperbolic()` uses (#932).

* `step_dummy_multi_choice()` now respects factor-levels of the selected variables when creating dummies. (#916)

* Finally removed `step_upsample()` and `step_downsample()` in recipes as they are now available in the themis package.
* `step_dummy()` no works correctly with recipes trained on version 0.1.17 or earlier. (#921)

* Fixed a bug where setting `fresh = TRUE` in `prep()` wouldn't result in re-prepping the recipe. (#492)

* `discretize()` and `step_discretize()` now defaults to returning factor levels similar to `cut()` by default, in line with `step_discretize_*()` steps from the embed package. (#674)
* Bug was fixed in `step_holiday()` which used to error when it was applied to variable with missing values. (#743)

* `step_dummy()` no works correctly with recipes trained on version 0.1.17 or earlier. (#921)
* A bug was fixed in `step_normalize()` which used to error if 1 variable was selected. (#963)

## Improvements and Other Changes

* Finally removed `step_upsample()` and `step_downsample()` in recipes as they are now available in the themis package.

* `discretize()` and `step_discretize()` now can return factor levels similar to `cut()`. (#674)

* `step_naomit()` now actually had their defaults for `skip` changed to `TRUE` as was stated in release 0.1.13. (934)

* `step_dummy()` has been made more robust to non-standard column names. (#879)

* `step_pls()` now allows you use use multiple outcomes if they are numeric. (#651)

* Fixed a bug where setting `fresh = TRUE` in `prep()` wouldn't result in re-prepping the recipe. (#492)

* `step_normalize()` and `step_scale()` ignore columns with zero variance, generate a warning and suggest to use `step_zv()` (#920).

* Bug was fixed in `step_holiday()` which used to error when it was applied to variable with missing values. (#743)

* A bug was fixed in `step_normalize()` which used to error if 1 variable was selected. (#963)

* `step_time()` has been added that extracts time features such as hour, minute, or second. (#968)

* printing for `step_impute_knn()` now show variables that were imputed instead of variables used for imputing. (#837)

* `step_discretize()` and `discretize()` will automatically remove missing values if `keep_na = TRUE`, removing the need to specify `keep_na = TRUE` and `na.rm = TRUE`. (#982)
Expand Down
6 changes: 3 additions & 3 deletions R/discretize.R
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ discretize.default <- function(x, ...) {
#' for the factor levels (e.g. `bin1`, `bin2`, ...). If
#' the string is not a valid R name, it is coerced to one.
#' If `prefix = NULL` then the factor levels will be labelled
#' according to the output of `cut()`. Defaults to `NULL`.
#' according to the output of `cut()`.
#' @param keep_na A logical for whether a factor level should be
#' created to identify missing values in `x`. If `keep_na` is
#' set to `TRUE` then `na.rm = TRUE` is used when calling
Expand Down Expand Up @@ -79,7 +79,7 @@ discretize.numeric <-
function(x,
cuts = 4,
labels = NULL,
prefix = NULL,
prefix = "bin",
keep_na = TRUE,
infs = TRUE,
min_unique = 10,
Expand Down Expand Up @@ -292,7 +292,7 @@ step_discretize <- function(recipe,
num_breaks = 4,
min_unique = 10,
objects = NULL,
options = list(),
options = list(prefix = "bin"),
skip = FALSE,
id = rand_id("discretize")) {
if (any(names(options) %in% c("cuts", "min_unique"))) {
Expand Down
6 changes: 1 addition & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
[![R-CMD-check](https://github.com/tidymodels/recipes/workflows/R-CMD-check/badge.svg)](https://github.com/tidymodels/recipes/actions)
[![Codecov test
coverage](https://codecov.io/gh/tidymodels/recipes/branch/main/graph/badge.svg)](https://app.codecov.io/gh/tidymodels/recipes?branch=main)
[![CRAN\_Status\_Badge](https://www.r-pkg.org/badges/version/recipes)](https://CRAN.R-project.org/package=recipes)
[![CRAN_Status_Badge](https://www.r-pkg.org/badges/version/recipes)](https://CRAN.R-project.org/package=recipes)
[![Downloads](https://cranlogs.r-pkg.org/badges/recipes)](https://CRAN.R-project.org/package=recipes)
[![lifecycle](https://img.shields.io/badge/lifecycle-stable-brightgreen.svg)](https://lifecycle.r-lib.org/articles/stages.html)

Expand Down Expand Up @@ -68,10 +68,6 @@ devtools::install_github("tidymodels/recipes")

## Contributing

This project is released with a [Contributor Code of
Conduct](https://contributor-covenant.org/version/2/0/CODE_OF_CONDUCT.html).
By contributing to this project, you agree to abide by its terms.

- For questions and discussions about tidymodels packages, modeling,
and machine learning, please [post on RStudio
Community](https://community.rstudio.com/c/ml/15).
Expand Down
2 changes: 1 addition & 1 deletion man/discretize.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions man/roles.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion man/step_discretize.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion man/summary.recipe.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

18 changes: 9 additions & 9 deletions tests/testthat/test_discretized.R
Original file line number Diff line number Diff line change
Expand Up @@ -17,39 +17,39 @@ lvls_breaks_4 <- c("[missing]", "[-Inf,25.8]", "(25.8,50.5]", "(50.5,75.2]", "(7
lvls_breaks_4_bin <- c("bin_missing", "bin1", "bin2", "bin3", "bin4")

test_that("default args", {
bin_1 <- discretize(ex_tr$x1)
bin_1 <- discretize(ex_tr$x1, prefix = NULL)
pred_1 <- predict(bin_1, ex_te$x1)
exp_1 <- factor(lvls_breaks_4[c(2, 3, 5, 1)], levels = lvls_breaks_4)
expect_equal(pred_1, exp_1)

bin_1 <- discretize(ex_tr$x1, prefix = "bin")
bin_1 <- discretize(ex_tr$x1)
pred_1 <- predict(bin_1, ex_te$x1)
exp_1 <- factor(c("bin1", "bin2", "bin4", "bin_missing"), levels = lvls_breaks_4_bin)
expect_equal(pred_1, exp_1)
})

test_that("NA values", {
bin_2 <- discretize(ex_tr$x1, keep_na = FALSE)
bin_2 <- discretize(ex_tr$x1, keep_na = FALSE, prefix = NULL)
pred_2 <- predict(bin_2, ex_te$x1)
exp_2 <- factor(lvls_breaks_4[c(2, 3, 5, NA)], levels = lvls_breaks_4[-1])
expect_equal(pred_2, exp_2)

bin_2 <- discretize(ex_tr$x1, keep_na = FALSE, prefix = "bin")
bin_2 <- discretize(ex_tr$x1, keep_na = FALSE)
pred_2 <- predict(bin_2, ex_te$x1)
exp_2 <- factor(c("bin1", "bin2", "bin4", NA), levels = lvls_breaks_4_bin[-1])
expect_equal(pred_2, exp_2)
})

test_that("NA values from out of range", {
bin_3 <- discretize(ex_tr$x1, keep_na = FALSE, infs = FALSE)
bin_3 <- discretize(ex_tr$x1, keep_na = FALSE, infs = FALSE, prefix = NULL)
pred_3 <- predict(bin_3, ex_te$x1)
exp_3 <- factor(
c("[1,25.8]", "(25.8,50.5]", NA, NA),
levels = c("[1,25.8]", "(25.8,50.5]", "(50.5,75.2]", "(75.2,100]")
)
expect_equal(pred_3, exp_3)

bin_3 <- discretize(ex_tr$x1, keep_na = FALSE, infs = FALSE, prefix = "bin")
bin_3 <- discretize(ex_tr$x1, keep_na = FALSE, infs = FALSE)
pred_3 <- predict(bin_3, ex_te$x1)
exp_3 <- factor(c("bin1", "bin2", NA, NA), levels = lvls_breaks_4_bin[-1])
expect_equal(pred_3, exp_3)
Expand Down Expand Up @@ -104,9 +104,9 @@ test_that("tidys", {

test_that("bad args", {
expect_snapshot(error = TRUE,
recipe(~., data = ex_tr) %>%
step_discretize(x1, num_breaks = 1) %>%
prep()
recipe(~., data = ex_tr) %>%
step_discretize(x1, num_breaks = 1) %>%
prep()
)
expect_snapshot(
recipe(~., data = ex_tr) %>%
Expand Down

0 comments on commit 41bd8bf

Please sign in to comment.