Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixed #6556 #6684

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -1014,3 +1014,5 @@ rowwiseDT(
20. Some clarity is added to `?GForce` for the case when subtle changes to `j` produce different results because of differences in locale. Because `data.table` _always_ uses the "C" locale, small changes to queries which activate/deactivate GForce might cause confusingly different results when sorting is involved, [#5331](https://github.com/Rdatatable/data.table/issues/5331). The inspirational example compared `DT[, .(max(a), max(b)), by=grp]` and `DT[, .(max(a), max(tolower(b))), by=grp]` -- in the latter case, GForce is deactivated owing to the _ad-hoc_ column, so the result for `max(a)` might differ for the two queries. An example is added to `?GForce`. As always, there are several options to guarantee consistency, for example (1) use namespace qualification to deactivate GForce: `DT[, .(base::max(a), base::max(b)), by=grp]`; (2) turn off all optimizations with `options(datatable.optimize = 0)`; or (3) set your R session to always sort in C locale with `Sys.setlocale("LC_COLLATE", "C")` (or temporarily with e.g. `withr::with_locale()`). Thanks @markseeto for the example and @michaelchirico for the improved documentation.

# data.table v1.14.10 (Dec 2023) back to v1.10.0 (Dec 2016) has been moved to [NEWS.1.md](https://github.com/Rdatatable/data.table/blob/master/NEWS.1.md)

merge() now provides improved error handling for invalid column names in the by argument. When performing a join, the error messages explicitly identify the missing columns in both x and y, ensuring clarity for users. Fixes #6556. Thanks @venom1204 for the PR.
15 changes: 13 additions & 2 deletions R/merge.R
Original file line number Diff line number Diff line change
Expand Up @@ -50,8 +50,19 @@
by = intersect(nm_x, nm_y)
if (length(by) == 0L || !is.character(by))
stopf("A non-empty vector of column names for `by` is required.")
if (!all(by %chin% intersect(nm_x, nm_y)))
stopf("Elements listed in `by` must be valid column names in x and y")

## Updated Error Handling Section
missing_in_x = setdiff(by, nm_x)
missing_in_y = setdiff(by, nm_y)
if (length(missing_in_x) > 0 || length(missing_in_y) > 0) {
error_msg = "Columns listed in `by` must be valid column names in both data.tables.\n"
if (length(missing_in_x) > 0)

Check warning on line 59 in R/merge.R

View workflow job for this annotation

GitHub Actions / lint-r

file=R/merge.R,line=59,col=36,[trailing_whitespace_linter] Remove trailing whitespace.
error_msg = paste0(error_msg, sprintf("✖ Missing in x: %s\n", paste(missing_in_x, collapse = ", ")))

Check warning on line 60 in R/merge.R

View workflow job for this annotation

GitHub Actions / lint-r

file=R/merge.R,line=60,col=71,[paste_linter] toString(.) is more expressive than paste(., collapse = ", "). Note also glue::glue_collapse() and and::and() for constructing human-readable / translation-friendly lists
if (length(missing_in_y) > 0)

Check warning on line 61 in R/merge.R

View workflow job for this annotation

GitHub Actions / lint-r

file=R/merge.R,line=61,col=36,[trailing_whitespace_linter] Remove trailing whitespace.
error_msg = paste0(error_msg, sprintf("✖ Missing in y: %s", paste(missing_in_y, collapse = ", ")))

Check warning on line 62 in R/merge.R

View workflow job for this annotation

GitHub Actions / lint-r

file=R/merge.R,line=62,col=69,[paste_linter] toString(.) is more expressive than paste(., collapse = ", "). Note also glue::glue_collapse() and and::and() for constructing human-readable / translation-friendly lists
stopf(error_msg)
}

by = unname(by)
by.x = by.y = by
}
Expand Down
43 changes: 43 additions & 0 deletions inst/tests/tests.Rraw
Original file line number Diff line number Diff line change
Expand Up @@ -20697,3 +20697,46 @@ if (test_bit64) {
test(2300.3, DT1[DT2, on='id'], error="Incompatible join types")
test(2300.4, DT2[DT1, on='id'], error="Incompatible join types")
}

if (test_bit64) {
# Test for identifying missing columns in the `by` argument
DT1 = data.table(x = as.integer64(1:5), y = letters[1:5])
DT2 = data.table(a = as.integer64(6:10), b = letters[6:10])

# Missing column in both data tables
test(2301.1, {
tryCatch({
merge.data.table(DT1, DT2, by = "z")
}, error = function(e) {
e$message
})
}, "Columns listed in `by` must be valid column names in both data.tables.\n✖ Missing in x: z\n✖ Missing in y: z")

# Multiple missing columns
test(2301.2, {
tryCatch({
merge.data.table(DT1, DT2, by = c("x", "a"))
}, error = function(e) {
e$message
})
}, "Columns listed in `by` must be valid column names in both data.tables.\n✖ Missing in x: a\n✖ Missing in y: x")

# Valid columns for `by`
test(2301.3, {
tryCatch({
merge.data.table(DT1, DT2, by = c("y", "b"))
}, error = function(e) {
e$message
})
}, NULL) # Expect no error since `y` and `b` exist in DT1 and DT2 respectively

# Incompatible join types
DT2[, a := as.numeric(a)]
test(2301.4, {
tryCatch({
merge.data.table(DT1, DT2, by = c("x", "a"))
}, error = function(e) {
e$message
})
}, "Incompatible join types")
}
Loading