-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement join_by()
#222
Comments
In general I'm open to this but I'll be honest - I don't know much about rolling joins. A few things:
|
Answering each question in order:
Edit: data.table's documentation reference |
Let me know what is your naming convention preference for the API and I will go head and submit a PR. Here is the plan for left_join. <- function(x, y, by = NULL, roll = NULL, rollends = NULL){
stopifnot(length(roll) == 1)
stopifnot(is.character(roll) || is.numeric(roll))
# do all prelim operations like creating `all_names`, `on_vec`
# ...
if (is.null(roll)){
# go with regular left_join
return_df = y[x, on = on_vec, allow.cartesian = TRUE]
} else {
# locf: last obs carried forward (TRUE, + Inf)
# nocb: next obs caried backward (-Inf)
# nearest: nearest
if (is.character(roll)){
roll %in% c("locf", "nocb", "nearest")
if (roll == "locf"){
roll_value = Inf
} else if (roll == "nocb"){
roll_value = -Inf
} else {
roll_value = "nearest"
}
}
# use data.table's defaults when rollends is not specified
if (is.null(rollends)){
if (roll_value == "nearest"){
rollends_value = c(TRUE,TRUE)
} else if (roll_value >= 0){
rollends_value = c(FALSE,TRUE)
} else {
rollends_value = c(TRUE,FALSE)
}
} else {
stopifnot(is.logical(rollends))
stopifnot(between(length(rollends), 1, 2))
rollends_value = rollends
}
return_df <- y[x
, on = on_vec
, allow.cartesian = FALSE
, roll = roll_value
, rollends = rollends_value
]
}
# apply post-processing ops like setting names, column order
# ....
return(as_tidytable(return_df))
} |
Thanks for taking a look at this.
Let me think about this one for a bit. |
As I've thought about this I'm going to hold off adding this to tidytable for now. I think it's a good idea to wait for the tidyverse team to create their version and then just mimic their syntax. In the meantime I'll keep this issue open, as it's definitely functionality that will be in tidytable at some point. |
tidyverse/dplyr#5910 was just merged into The documentation here seems to cover everything. Along with Edit: Looks like they got rid of I'm guessing this will be doable for left/right/inner joins. I'm guessing it won't work for full joins because the translation isn't quite as direct in |
This is going to be an interesting one to implement because So the pacman::p_load(tidytable)
sales <- tidytable(
id = c(1L, 1L, 1L, 2L, 2L),
sale_date = as.Date(c("2018-12-31", "2019-01-02", "2019-01-05", "2019-01-04", "2019-01-01"))
)
promos <- tidytable(
id = c(1L, 1L, 2L),
promo_date = as.Date(c("2019-01-01", "2019-01-05", "2019-01-02"))
)
# Match `id` to `id`, and `sale_date` to `promo_date`
by <- dplyr::join_by(id, sale_date == promo_date)
dplyr::left_join(sales, promos, by)
#> # A tidytable: 5 × 2
#> id sale_date
#> <int> <date>
#> 1 1 2018-12-31
#> 2 1 2019-01-02
#> 3 1 2019-01-05
#> 4 2 2019-01-04
#> 5 2 2019-01-01
dt(promos, sales, on = .(id, promo_date == sale_date))
#> # A tidytable: 5 × 2
#> id promo_date
#> <int> <date>
#> 1 1 2018-12-31
#> 2 1 2019-01-02
#> 3 1 2019-01-05
#> 4 2 2019-01-04
#> 5 2 2019-01-01 Another note: There is a lot of renaming/internal prep done to make sure |
join_by()
Hi Mark,
Consider adding rolling joins for left and right joins.
data.table
has this elegant function missing indplyr
.Code wise, we just require to add
roll = TRUE
(by default this can be kept FALSE).For example, we need to change from
return_df <- y[x, on = on_vec, allow.cartesian = TRUE]
toreturn_df <- y[x, on = on_vec, allow.cartesian = TRUE, roll = roll]
(from left_join.) withroll
being a new argument.Let me know if you would like me submitting a PR.
The text was updated successfully, but these errors were encountered: