Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

installing and running the compare-data.table-tidyverse package #6

Open
DorisAmoakohene opened this issue Oct 12, 2023 · 7 comments
Open

Comments

@DorisAmoakohene
Copy link
Owner

@tdhock

I tried to run the vignette for the compare-data.table-tidyverse package but i run into this error in line

cache(read.real.vary.cols.limit, atime_read_limit("10_real_rows_fwrite_*.csv"))
aplot(read.real.vary.cols.limit, "Read first 10 columns of CSV with 10 real rows", 1e8, 1e1, "Number of columns in CSV", limit.colors)

'''
the error "cache(read.real.vary.rows.limit, atime_read_limit("10_real_cols_fwrite_*.csv"))
> aplot(read.real.vary.rows.limit, "Read first 10 rows of CSV with 10 real columns", 1e9, 1e1, "Number of rows in CSV", limit.colors)
> cache(read.real.vary.cols.limit, atime_read_limit("10_real_rows_fwrite_*.csv"))
Error in x:y : argument of length 0
> cache(read.real.vary.cols.limit, atime_read_limit("10_real_rows_fwrite_*.csv"))
Error in x:y : argument of length 0"


also i tried to install the packages from github "

remotes::install_github("tdhock/compare-data.table-tidyverse")

Error: Failed to install 'unknown package' from GitHub:
  HTTP error 404.
  Not Found

  Did you spell the repo owner (`tdhock`) and repo name (`compare-data.table-tidyverse`) correctly?
  - If spelling is correct, check that you have the required permissions to access the repo.
@tdhock
Copy link

tdhock commented Oct 12, 2023

compare-data.table-tidyverse is not a package, it is a vignette in the atime package. So it is normal that remotes::install_github("tdhock/compare-data.table-tidyverse") errors, because there is no such repository.

@tdhock
Copy link

tdhock commented Oct 12, 2023

I'm not sure I understand the error you got. What code did you try to run, and what was the error and traceback?

@tdhock
Copy link

tdhock commented Oct 12, 2023

Here is some more code you could try, which is simpler, but less comprehensive, than that vignette. https://tdhock.github.io/blog/2023/dt-atime-figures/ source: https://github.com/tdhock/tdhock.github.io/blob/master/_posts/2023-10-08-dt-atime-figures.Rmd

@DorisAmoakohene
Copy link
Owner Author

I'm not sure I understand the error you got. What code did you try to run, and what was the error and traceback?

@tdhock I was trying to run the vignette for the compare -data.table-tidyverse, the error occured at when i was running the line 672(read real numbers with a constant number of columns, and a
variable number of rows.)

Traceback: > cache(read.real.vary.cols.limit, atime_read_limit("10_real_rows_fwrite_*.csv"))
Error in x:y : argument of length 0
> traceback()
27: eval_colon(expr, data_mask, context_mask)
26: walk_data_tree(new, data_mask, context_mask)
25: reduce_sels(node, data_mask, context_mask, init = init)
24: eval_c(expr, data_mask, context_mask)
23: walk_data_tree(expr, data_mask, context_mask)
22: vars_select_eval(vars, expr, strict = strict, data = x, name_spec = name_spec, 
        uniquely_named = uniquely_named, allow_rename = allow_rename, 
        allow_empty = allow_empty, allow_predicates = allow_predicates, 
        type = type, error_call = error_call)
21: withCallingHandlers(expr, condition = function(cnd) {
        {
            .__handler_frame__. <- TRUE
            .__setup_frame__. <- frame
            if (inherits(cnd, "message")) {
                except <- c("warning", "error")
            }
            else if (inherits(cnd, "warning")) {
                except <- "error"
            }
            else {
                except <- ""
            }
        }
        while (!is_null(cnd)) {
            if (inherits(cnd, "vctrs_error_subscript")) {
                out <- handlers[[1L]](cnd)
                if (!inherits(out, "rlang_zap")) 
                    throw(out)
            }
     ...
20: try_fetch(expr, vctrs_error_subscript = function(cnd) {
        cnd$subscript_action <- subscript_action(type)
        cnd$subscript_elt <- "column"
        cnd_signal(cnd)
    })
19: with_subscript_errors(out <- vars_select_eval(vars, expr, strict = strict, 
        data = x, name_spec = name_spec, uniquely_named = uniquely_named, 
        allow_rename = allow_rename, allow_empty = allow_empty, allow_predicates = allow_predicates, 
        type = type, error_call = error_call), type = type)
18: eval_select_impl(NULL, .vars, expr(c(!!!dots)), include = .include, 
        exclude = .exclude, strict = .strict, name_spec = unique_name_spec, 
        uniquely_named = TRUE, error_call = caller_env())
17: tidyselect::vars_select(names(spec$cols), !!col_select, .strict = FALSE)
16: names(spec$cols) %in% tidyselect::vars_select(names(spec$cols), 
        !!col_select, .strict = FALSE)
15: (function (spec, num_cols, col_names, col_select, name_repair) 
    {
        if (num_cols == 0) {
            if (length(spec$cols) > 0) {
                num_cols <- length(spec$cols)
            }
            else if (length(col_names) > 0) {
                num_cols <- length(col_names)
            }
        }
        if (length(col_names) == 0) {
            col_names <- make_names(NULL, num_cols)
        }
        col_names <- vctrs::vec_as_names(col_names, repair = name_repair)
        type_names <- names(spec$cols)
        if (length(spec$cols) == 0) {
            spec$cols <- rep(list(spec$default), num_cols)
            names(spec$cols) <- col_names[seq_along(spec$cols)]
        }
        else if (is.null(type_names)) {
     ...
14: vroom_(file, delim = delim %||% col_types$delim, col_names = col_names, 
        col_types = col_types, id = id, skip = skip, col_select = col_select, 
        name_repair = .name_repair, na = na, quote = quote, trim_ws = trim_ws, 
        escape_double = escape_double, escape_backslash = escape_backslash, 
        comment = comment, skip_empty_rows = skip_empty_rows, locale = locale, 
        guess_max = guess_max, n_max = n_max, altrep = vroom_altrep(altrep), 
        num_threads = num_threads, progress = progress)
13: vroom::vroom(file, delim = ",", col_names = col_names, col_types = col_types, 
        col_select = {
            {
                col_select
            }
        }, id = id, .name_repair = name_repair, skip = skip, n_max = n_max, 
        na = na, quote = quote, comment = comment, skip_empty_rows = skip_empty_rows, 
        trim_ws = trim_ws, escape_double = TRUE, escape_backslash = FALSE, 
        locale = locale, guess_max = guess_max, show_col_types = show_col_types, 
        progress = progress, altrep = lazy, num_threads = num_threads)
12: readr::read_csv(f.csv, num_threads = 1, col_select = 1:10, n_max = 10, 
        lazy = FALSE, show_col_types = FALSE, progress = FALSE)
11: eval(e, env)
10: eval(e, env)
9: eval_one(exprs[[i]], memory)
8: bench::mark(iterations = 10, check = FALSE, `readr::read_csv` = {
       readr::read_csv(f.csv, num_threads = 1, col_select = 1:10, 
           n_max = 10, lazy = FALSE, show_col_types = FALSE, progress = FALSE)
   }, `data.table::fread` = {
       data.table::setDTthreads(1)
       data.table::fread(f.csv, nrows = 10, select = 1:10, showProgress = FALSE)
   }, read_csv_arrow = {
       arrow::set_cpu_count(1)
       arrow::read_csv_arrow(f.csv, col_select = 1:10)
   }, `utils::read.csv` = {
       utils::read.csv(f.csv, nrows = 10)
   })
7: eval(m.call, N.env)
6: eval(m.call, N.env)
5: atime::atime(N = csv.dt$N, setup = {
       f.csv <- file.path(tempdir(), sprintf(fmt, N))
   }, seconds.limit = seconds.limit, expr.list = limit.expr.list) at #7
4: atime_read_limit("10_real_rows_fwrite_*.csv")
3: eval(to.eval)
2: eval(to.eval) at #14
1: cache(read.real.vary.cols.limit, atime_read_limit("10_real_rows_fwrite_*.csv"))

@tdhock
Copy link

tdhock commented Oct 12, 2023

looks like some issue with

12: readr::read_csv(f.csv, num_threads = 1, col_select = 1:10, n_max = 10, 
        lazy = FALSE, show_col_types = FALSE, progress = FALSE)

which is trying to read the first ten columns and rows of a CSV file.
try to run that line of code by itself, on a simple example csv file, and if it still gives an error, then we should send an issue with a minimal reproducible example to readr. if it does not, then there is an issue with the code calling read_csv, which you should investigate and fix.
if there is an issue with read_csv, you can comment out those lines, and just run atime on the other csv reading methods.

@DorisAmoakohene
Copy link
Owner Author

okay I'm doing that

@DorisAmoakohene
Copy link
Owner Author

DorisAmoakohene commented Oct 12, 2023

@tdhock I run this simple example on it and it run well.

 library(readr)
> library(data.table)
> # Create a data frame
> dt.pep <- data.table(
+   Name = c("John", "Emily", "Michael"),
+   Age = c(25, 30, 35),
+   Gender = c("Male", "Female", "Male"),
+   Occupation = c("Engineer", "Doctor", "Teacher"),
+   weight = c(80,62,75),
+   apartment = c("table rock", "woodlands village", "university west"),
+   mar.stat = c("single", "married","married"),
+   no.of.kids = c(2,1,3),
+   country = c("usa", "Ghana","Mexico"),
+   fav.color = c("blue","pink","blue"),
+   fav.game = c("football", "tennis","soccer")
+   
+   
+ )
> 
> write.csv(dt.pep, "example.csv", row.names = FALSE)
> 
> 
> 
> data <- readr::read_csv("example.csv", num_threads = 1, col_select = 1:10, n_max = 10, 
+                  lazy = FALSE, show_col_types = FALSE, progress = FALSE)

I'm going back to my code and try and fix the issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants