-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vroom::vroom() reads data as a single column #126
Comments
The heuristic used to guess the delimiter is not perfect, in this case it does not guess the delimiter correctly and a newline is used as the fallback. But you can specify the delimiter explicitly for this data with the vroom::vroom("/tmp/WI_TREE.csv", delim = ",")
#> Observations: 1,109,323
#> Variables: 207
#> chr [ 1]: P2A_GRM_FLG
#> dbl [104]: CN, PLT_CN, PREV_TRE_CN, INVYR, STATECD, UNITCD, COUNTYCD, PLOT, SUBP, TREE...
#> lgl [100]: DAMTYP2, DAMSEV2, WDLDSTEM, CVIGORCD, TREEHISTCD, BHAGE, TOTAGE, CULLDEAD, ...
#> date [ 2]: CREATED_DATE, MODIFIED_DATE
#>
#> Call `spec()` for a copy-pastable column specification
#> Specify the column types with `col_types` to quiet this message
#> # A tibble: 1,109,323 x 207
#> CN PLT_CN PREV_TRE_CN INVYR STATECD UNITCD COUNTYCD PLOT SUBP
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2.41e13 2.41e13 NA 1983 55 1 37 2 101
#> 2 2.41e13 2.41e13 NA 1983 55 1 37 2 102
#> 3 2.41e13 2.41e13 NA 1983 55 1 37 2 103
#> 4 2.41e13 2.41e13 NA 1983 55 1 37 2 103
#> 5 2.41e13 2.41e13 NA 1983 55 1 37 2 104
#> 6 2.41e13 2.41e13 NA 1983 55 1 37 2 104
#> 7 2.41e13 2.41e13 NA 1983 55 1 37 2 104
#> 8 2.41e13 2.41e13 NA 1983 55 1 37 2 107
#> 9 2.41e13 2.41e13 NA 1983 55 1 37 2 108
#> 10 2.41e13 2.41e13 NA 1983 55 1 37 2 110
#> # … with 1,109,313 more rows, and 198 more variables: TREE <dbl>,
#> # CONDID <dbl>, AZIMUTH <dbl>, DIST <dbl>, PREVCOND <dbl>,
#> # STATUSCD <dbl>, SPCD <dbl>, SPGRPCD <dbl>, DIA <dbl>, DIAHTCD <dbl>,
#> # HT <dbl>, HTCD <dbl>, ACTUALHT <dbl>, TREECLCD <dbl>, CR <dbl>,
#> # CCLCD <dbl>, TREEGRCD <dbl>, AGENTCD <dbl>, CULL <dbl>, DAMLOC1 <dbl>,
#> # DAMTYP1 <dbl>, DAMSEV1 <dbl>, DAMLOC2 <dbl>, DAMTYP2 <lgl>,
#> # DAMSEV2 <lgl>, DECAYCD <dbl>, STOCKING <dbl>, WDLDSTEM <lgl>,
#> # VOLCFNET <dbl>, VOLCFGRS <dbl>, VOLCSNET <dbl>, VOLCSGRS <dbl>,
#> # VOLBFNET <dbl>, VOLBFGRS <dbl>, VOLCFSND <dbl>, GROWCFGS <dbl>,
#> # GROWBFSL <dbl>, GROWCFAL <dbl>, MORTCFGS <dbl>, MORTBFSL <dbl>,
#> # MORTCFAL <dbl>, REMVCFGS <dbl>, REMVBFSL <dbl>, REMVCFAL <dbl>,
#> # DIACHECK <dbl>, MORTYR <dbl>, SALVCD <dbl>, UNCRCD <dbl>,
#> # CPOSCD <dbl>, CLIGHTCD <dbl>, CVIGORCD <lgl>, CDENCD <dbl>,
#> # CDIEBKCD <dbl>, TRANSCD <dbl>, TREEHISTCD <lgl>, DIACALC <dbl>,
#> # BHAGE <lgl>, TOTAGE <lgl>, CULLDEAD <lgl>, CULLFORM <lgl>,
#> # CULLMSTOP <lgl>, CULLBF <lgl>, CULLCF <lgl>, BFSND <lgl>, CFSND <lgl>,
#> # SAWHT <lgl>, BOLEHT <lgl>, FORMCL <lgl>, HTCALC <dbl>,
#> # HRDWD_CLUMP_CD <lgl>, SITREE <dbl>, CREATED_BY <lgl>,
#> # CREATED_DATE <date>, CREATED_IN_INSTANCE <dbl>, MODIFIED_BY <lgl>,
#> # MODIFIED_DATE <date>, MODIFIED_IN_INSTANCE <dbl>, MORTCD <lgl>,
#> # HTDMP <dbl>, ROUGHCULL <lgl>, MIST_CL_CD <lgl>, CULL_FLD <dbl>,
#> # RECONCILECD <dbl>, PREVDIA <dbl>, FGROWCFGS <dbl>, FGROWBFSL <dbl>,
#> # FGROWCFAL <dbl>, FMORTCFGS <dbl>, FMORTBFSL <dbl>, FMORTCFAL <dbl>,
#> # FREMVCFGS <dbl>, FREMVBFSL <dbl>, FREMVCFAL <dbl>, P2A_GRM_FLG <chr>,
#> # TREECLCD_NERS <lgl>, TREECLCD_SRS <lgl>, TREECLCD_NCRS <dbl>,
#> # TREECLCD_RMRS <lgl>, STANDING_DEAD_CD <dbl>, PREV_STATUS_CD <dbl>, … Created on 2019-06-04 by the reprex package (v0.2.1) |
If we look into more robust parsing #105 this would be a useful test dataset. |
ran into this as well; having the same operational heuristic as read.csv would maybe be helpful? since you're mostly looking at doing a csv reading product seems odd that it wouldn't automatically know the separator is a comma... |
How am I mostly looking at doing a csv reading product? |
This is rude. Adding |
you know what? you're right. I could have phrased it differently, and the world doesn't need more rude people on the internet, so I apologize. Thanks for pointing it out. |
well, maybe I misread it, but as the docs say: "The most common type of delimited files are CSV (Comma Separated Values) files" and the tidyverse page that I found this through says: "vroom reads rectangular data, such as comma separated (csv), tab separated (tsv) or fixed width files (fwf) into R. It performs similar roles to functions like readr::read_csv(), data.table::fread() or read.csv(). But for many datasets vroom::vroom() can read them much, much faster (hence the name)." Which explicitly references CSVs a lot, and compares itself to the two major CSV libraries, so that's why I'd thought that this was a mostly a csv reading product. maybe that's not the case, the page says 'feedback welcome' so I'd like to make a polite suggestion that commas be the default. Also it's mentioned that delim solves the problem, but I didn't find that in the roll out articles or front page (to be fair it's in the docs). Anyhow, big fan of the tool, was very fast on pulling down a rather large file, and I enjoyed it, so bravo on making a cool package. |
In attempting to read a .csv file with 44 columns, I experienced a parsing failure that rendered a data object with a single column. Reading the same file with
readr::read_csv()
is successful.Thanks for the great package!
The text was updated successfully, but these errors were encountered: