diff --git a/dev/articles/benchmarks.html b/dev/articles/benchmarks.html index c242541..8f8a6d7 100644 --- a/dev/articles/benchmarks.html +++ b/dev/articles/benchmarks.html @@ -109,23 +109,23 @@

Benchmarks

fmt_integer() |> fmt_bytes(columns = "size in memory")
-
- @@ -713,23 +713,23 @@

Benchmarks

fmt_bytes(columns = memory) |> gt_plt_bar(column = `speedup from CSV`)
-
- @@ -1316,23 +1316,23 @@

Benchmarks

fmt_bytes(columns = c(memory, `file size`)) |> gt_plt_bar(column = `speedup from CSV`)
-
- @@ -1938,23 +1938,23 @@

Benchmarks

fmt_bytes(columns = memory) |> gt_plt_bar(column = `speedup from CSV`)
-
- @@ -2523,23 +2523,23 @@

Benchmarks

fmt_bytes(columns = c(memory, `file size`)) |> gt_plt_bar(column = `speedup from CSV`)
-
- diff --git a/dev/pkgdown.yml b/dev/pkgdown.yml index a4227c5..0e0ebba 100644 --- a/dev/pkgdown.yml +++ b/dev/pkgdown.yml @@ -3,7 +3,7 @@ pkgdown: 2.1.1.9000 pkgdown_sha: 74fda8cdb8bbbcd215faf2a2079f4eb98db586c6 articles: articles/benchmarks: benchmarks.html -last_built: 2025-01-29T09:45Z +last_built: 2025-01-29T10:23Z urls: reference: https://nanoparquet.r-lib.org/reference article: https://nanoparquet.r-lib.org/articles diff --git a/dev/reference/read_parquet.html b/dev/reference/read_parquet.html index d8a8e73..41e6f99 100644 --- a/dev/reference/read_parquet.html +++ b/dev/reference/read_parquet.html @@ -61,9 +61,9 @@

Argumentscol_select

Columns to read. It can be a numeric vector of column -indices. It is an error to select the same column multiple times. -The order of the columns in the result is the same as the order in -col_select.

+indices, or a character vector of column names. It is an error to +select the same column multiple times. The order of the columns in +the result is the same as the order in col_select.

options
diff --git a/dev/search.json b/dev/search.json index 7fca5df..dee4e2b 100644 --- a/dev/search.json +++ b/dev/search.json @@ -1 +1 @@ -[{"path":"https://nanoparquet.r-lib.org/dev/articles/benchmarks.html","id":"goals","dir":"Articles","previous_headings":"","what":"Goals","title":"Benchmarks","text":"First, want measure nanoparquet’s speed relative good quality CSV readers writers, also look sizes Parquet CSV files. Second, want see nanoparquet fares relative Parquet implementations available R.","code":"library(dplyr) library(gt) library(gtExtras)"},{"path":"https://nanoparquet.r-lib.org/dev/articles/benchmarks.html","id":"data-sets","dir":"Articles","previous_headings":"","what":"Data sets","title":"Benchmarks","text":"used use three data sets: small, medium large. small data set nycflights13::flights data set, . medium data set contains 20 copies small data set. large data set contains 200 copies small data set. See gen_data() function benchmark-funcs.R file nanoparquet GitHub repository. basic information data set: quick look data:","code":"if (file.exists(file.path(me, \"data-info.parquet\"))) { info_tab <- nanoparquet::read_parquet(file.path(me, \"data-info.parquet\")) } else { get_data_info <- function(x) { list(dim = dim(x), size = object.size(x)) } info <- lapply(data_sizes, function(s) get_data_info(gen_data(s))) info_tab <- data.frame( check.names = FALSE, name = data_sizes, rows = sapply(info, \"[[\", \"dim\")[1,], columns = sapply(info, \"[[\", \"dim\")[2,], \"size in memory\" = sapply(info, \"[[\", \"size\") ) nanoparquet::write_parquet(info_tab, file.path(me, \"data-info.parquet\")) } info_tab |> gt() |> tab_header(title = \"Data sets\") |> tab_options(table.align = \"left\") |> fmt_integer() |> fmt_bytes(columns = \"size in memory\") head(nycflights13::flights) #> # A tibble: 6 × 19 #> year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time #> #> 1 2013 1 1 517 515 2 830 819 #> 2 2013 1 1 533 529 4 850 830 #> 3 2013 1 1 542 540 2 923 850 #> 4 2013 1 1 544 545 -1 1004 1022 #> 5 2013 1 1 554 600 -6 812 837 #> 6 2013 1 1 554 558 -4 740 728 #> # ℹ 11 more variables: arr_delay , carrier , flight , #> # tailnum , origin , dest , air_time , distance , #> # hour , minute , time_hour dplyr::glimpse(nycflights13::flights) #> Rows: 336,776 #> Columns: 19 #> $ year 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2… #> $ month 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1… #> $ day 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1… #> $ dep_time 517, 533, 542, 544, 554, 554, 555, 557, 557, 558, 558, … #> $ sched_dep_time 515, 529, 540, 545, 600, 558, 600, 600, 600, 600, 600, … #> $ dep_delay 2, 4, 2, -1, -6, -4, -5, -3, -3, -2, -2, -2, -2, -2, -1… #> $ arr_time 830, 850, 923, 1004, 812, 740, 913, 709, 838, 753, 849,… #> $ sched_arr_time 819, 830, 850, 1022, 837, 728, 854, 723, 846, 745, 851,… #> $ arr_delay 11, 20, 33, -18, -25, 12, 19, -14, -8, 8, -2, -3, 7, -1… #> $ carrier \"UA\", \"UA\", \"AA\", \"B6\", \"DL\", \"UA\", \"B6\", \"EV\", \"B6\", \"… #> $ flight 1545, 1714, 1141, 725, 461, 1696, 507, 5708, 79, 301, 4… #> $ tailnum \"N14228\", \"N24211\", \"N619AA\", \"N804JB\", \"N668DN\", \"N394… #> $ origin \"EWR\", \"LGA\", \"JFK\", \"JFK\", \"LGA\", \"EWR\", \"EWR\", \"LGA\",… #> $ dest \"IAH\", \"IAH\", \"MIA\", \"BQN\", \"ATL\", \"ORD\", \"FLL\", \"IAD\",… #> $ air_time 227, 227, 160, 183, 116, 150, 158, 53, 140, 138, 149, 1… #> $ distance 1400, 1416, 1089, 1576, 762, 719, 1065, 229, 944, 733, … #> $ hour 5, 5, 5, 5, 6, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 5, 6, 6, 6… #> $ minute 15, 29, 40, 45, 0, 58, 0, 0, 0, 0, 0, 0, 0, 0, 0, 59, 0… #> $ time_hour 2013-01-01 05:00:00, 2013-01-01 05:00:00, 2013-01-01 0…"},{"path":"https://nanoparquet.r-lib.org/dev/articles/benchmarks.html","id":"parquet-implementations","dir":"Articles","previous_headings":"","what":"Parquet implementations","title":"Benchmarks","text":"ran nanoparquet, Arrow DuckDB. also ran data.table without compression readr, read/write CSV files. used running time readr baseline. ran benchmark three times record results third run. make sure data software swapped OS. (Except readr large data set, take long.) include complete raw results end article.","code":"if (file.exists(file.path(me, \"results.parquet\"))) { results <- nanoparquet::read_parquet(file.path(me, \"results.parquet\")) } else { results <- NULL lapply(data_sizes[1:2], function(s) { lapply(variants, function(v) { r <- if (v == \"readr\" && s == \"large\") { measure(v, s) } else { measure(v, s) measure(v, s) measure(v, s) } results <<- rbind(results, r) }) }) nanoparquet::write_parquet(results, file.path(me, \"results.parquet\")) }"},{"path":"https://nanoparquet.r-lib.org/dev/articles/benchmarks.html","id":"parquet-vs-csv","dir":"Articles","previous_headings":"","what":"Parquet vs CSV","title":"Benchmarks","text":"results, focusing CSV readers nanoparquet: Notes: single-threaded nanoparquet Parquet-reader competitive. can read compressed Parquet file just fast state art uncompressed CSV reader uses least 2 threads. nanoparquet vs CSV results writing Parquet CSV files: Notes: data.table CSV writer 3 times fast nanoparquet Parquet writer, CSV file uncompressed. CSV writer uses least 4 threads, Parquet write single-threaded. nanoparquet Parquet writer 2-5 times faster data.table CSV writer CSV file compressed. Parquet files 5-6 times smaller uncompressed CSV files 30-35% smaller compressed CSV files.","code":"csv_tab_read <- results |> filter(software %in% c(\"nanoparquet\", \"data.table\", \"data.table.gz\", \"readr\")) |> filter(direction == \"read\") |> mutate(software = case_when( software == \"data.table.gz\" ~ \"data.table (compressed)\", .default = software )) |> rename(`data size` = data_size, time = time_elapsed) |> mutate(memory = mem_max_after - mem_before) |> mutate(base = tail(time, 1), .by = `data size`) |> mutate(speedup = base / time, x = round(speedup, digits = 1)) |> select(`data size`, software, time, x, speedup, memory) |> mutate(rawtime = time, time = prettyunits::pretty_sec(time)) |> rename(`speedup from CSV` = speedup) csv_tab_read |> gt() |> tab_header(title = \"Parquet vs CSV, reading\") |> tab_options(table.align = \"left\") |> tab_row_group(md(\"**small data**\"), rows = `data size` == \"small\", \"s\") |> tab_row_group(md(\"**medium data**\"), rows = `data size` == \"medium\", \"m\") |> tab_row_group(md(\"**large data**\"), rows = `data size` == \"large\", \"l\") |> row_group_order(c(\"s\", \"m\", \"l\")) |> cols_hide(columns = c(`data size`, rawtime)) |> cols_align(columns = time, align = \"right\") |> fmt_bytes(columns = memory) |> gt_plt_bar(column = `speedup from CSV`) csv_tab_write <- results |> filter(software %in% c(\"nanoparquet\", \"data.table\", \"data.table.gz\", \"readr\")) |> filter(direction == \"write\") |> mutate(software = case_when( software == \"data.table.gz\" ~ \"data.table (compressed)\", .default = software )) |> rename(`data size` = data_size, time = time_elapsed, `file size` = file_size) |> mutate(memory = mem_max_after - mem_before) |> mutate(base = tail(time, 1), .by = `data size`) |> mutate(speedup = base / time, x = round(speedup, digits = 1)) |> select(`data size`, software, time, x, speedup, memory, `file size`) |> mutate(rawtime = time, time = prettyunits::pretty_sec(time)) |> rename(`speedup from CSV` = speedup) csv_tab_write |> gt() |> tab_header(title = \"Parquet vs CSV, writing\") |> tab_options(table.align = \"left\") |> tab_row_group(md(\"**small data**\"), rows = `data size` == \"small\", \"s\") |> tab_row_group(md(\"**medium data**\"), rows = `data size` == \"medium\", \"m\") |> tab_row_group(md(\"**large data**\"), rows = `data size` == \"large\", \"l\") |> row_group_order(c(\"s\", \"m\", \"l\")) |> cols_hide(columns = c(`data size`, rawtime)) |> cols_align(columns = time, align = \"right\") |> fmt_bytes(columns = c(memory, `file size`)) |> gt_plt_bar(column = `speedup from CSV`)"},{"path":"https://nanoparquet.r-lib.org/dev/articles/benchmarks.html","id":"parquet-implementations-1","dir":"Articles","previous_headings":"","what":"Parquet implementations","title":"Benchmarks","text":"summary Parquet readers, three files. Notes: general, three implementations perform similarly. nanoparquet competitive three data sets terms speed also tends use least amount memory. turned ALTREP arrow, reads data memory. summary Parquet writers: Notes: nanoparquet competitive terms speed, slightly faster two implementations, data sets. DuckDB seems waste space writing Parquet files. possibly fine tuned forcing different encoding. behavior improve forthcoming DuckDB 1.2.0 release, see also https://github.com/duckdb/duckdb/issues/3316.","code":"pq_tab_read <- results |> filter(software %in% c(\"nanoparquet\", \"arrow\", \"duckdb\", \"readr\")) |> filter(direction == \"read\") |> rename(`data size` = data_size, time = time_elapsed) |> mutate(memory = mem_max_after - mem_before) |> mutate(base = tail(time, 1), .by = `data size`) |> mutate(speedup = base / time, x = round(speedup, digits = 1)) |> select(`data size`, software, time, x, speedup, memory) |> mutate(rawtime = time, time = prettyunits::pretty_sec(time)) |> filter(software %in% c(\"nanoparquet\", \"arrow\", \"duckdb\")) |> mutate(software = case_when( software == \"arrow\" ~ \"Arrow\", software == \"duckdb\" ~ \"DuckDB\", .default = software )) |> rename(`speedup from CSV` = speedup) pq_tab_read |> gt() |> tab_header(title = \"Parquet implementations, reading\") |> tab_options(table.align = \"left\") |> tab_row_group(md(\"**small data**\"), rows = `data size` == \"small\", \"s\") |> tab_row_group(md(\"**medium data**\"), rows = `data size` == \"medium\", \"m\") |> tab_row_group(md(\"**large data**\"), rows = `data size` == \"large\", \"l\") |> row_group_order(c(\"s\", \"m\", \"l\")) |> cols_hide(columns = c(`data size`, rawtime)) |> cols_align(columns = time, align = \"right\") |> fmt_bytes(columns = memory) |> gt_plt_bar(column = `speedup from CSV`) pq_tab_write <- results |> filter(software %in% c(\"nanoparquet\", \"arrow\", \"duckdb\", \"readr\")) |> filter(direction == \"write\") |> rename(`data size` = data_size, time = time_elapsed, `file size` = file_size) |> mutate(memory = mem_max_after - mem_before) |> mutate(base = tail(time, 1), .by = `data size`) |> mutate(speedup = base / time, x = round(speedup, digits = 1)) |> select(`data size`, software, time, x, speedup, memory, `file size`) |> mutate(rawtime = time, time = prettyunits::pretty_sec(time)) |> filter(software %in% c(\"nanoparquet\", \"arrow\", \"duckdb\", \"readr\")) |> mutate(software = case_when( software == \"arrow\" ~ \"Arrow\", software == \"duckdb\" ~ \"DuckDB\", .default = software )) |> rename(`speedup from CSV` = speedup) pq_tab_write |> gt() |> tab_header(title = \"Parquet implementations, writing\") |> tab_options(table.align = \"left\") |> tab_row_group(md(\"**small data**\"), rows = `data size` == \"small\", \"s\") |> tab_row_group(md(\"**medium data**\"), rows = `data size` == \"medium\", \"m\") |> tab_row_group(md(\"**large data**\"), rows = `data size` == \"large\", \"l\") |> row_group_order(c(\"s\", \"m\", \"l\")) |> cols_hide(columns = c(`data size`, rawtime)) |> cols_align(columns = time, align = \"right\") |> fmt_bytes(columns = c(memory, `file size`)) |> gt_plt_bar(column = `speedup from CSV`)"},{"path":"https://nanoparquet.r-lib.org/dev/articles/benchmarks.html","id":"conclusions","dir":"Articles","previous_headings":"","what":"Conclusions","title":"Benchmarks","text":"results probably change different data sets, different system. particular, Arrow DuckDB probably faster larger systems, data stored multiple physical disks. Arrow DuckDB let run queries data without loading memory first. especially important data fit memory , even columns needed analysis. nanoparquet . However, general, based benchmarks good reasons trust nanoparquet Parquet reader writer competitive implementations available R, terms speed memory use. limitations nanoparquet prohibitive use case, good choice Parquet /O.","code":""},{"path":"https://nanoparquet.r-lib.org/dev/articles/benchmarks.html","id":"raw-benchmark-results","dir":"Articles","previous_headings":"","what":"Raw benchmark results","title":"Benchmarks","text":"raw results. can scroll right screen wide enough whole table. Notes: User time (time_user) plus system time (time_system) can larger elapsed time (time_elapsed) multithreaded implementations indeed tools, except nanoparquet, single-threaded. memory columns bytes. mem_before RSS size reading/writing. mem_max_before maximum RSS size process . mem_max_after maximum RSS size process read/write operation. can calculate (estimate) memory usage tool subtracting mem_before mem_max_after. overestimate memory usage mem_max_after mem_max_before, never happens practice. reading file, mem_max_after includes memory needed store data set . (See data sizes .) arrow, turned ALTREP using options(arrow.use_altrep = FALSE), see benchmarks-funcs.R file. Otherwise arrow actually read data memory.","code":"print(results, n = Inf) #> # A data frame: 36 × 10 #> software direction data_size time_user time_system time_elapsed mem_before mem_max_before mem_max_after file_size #> #> 1 nanoparquet read small 0.0220 0.007 0.0290 156909568 156909568 236765184 NA #> 2 nanoparquet write small 0.103 0.015 0.117 305168384 305168384 386613248 5687737 #> 3 arrow read small 0.06 0.023 0.0400 160317440 160317440 267173888 NA #> 4 arrow write small 0.149 0.00700 0.151 306151424 306151424 340574208 5693381 #> 5 duckdb read small 0.081 0.018 0.0660 166313984 166313984 286916608 NA #> 6 duckdb write small 0.283 0.025 0.207 309510144 309510144 465649664 10684818 #> 7 data.table read small 0.137 0.016 0.0590 164282368 164282368 231325696 NA #> 8 data.table write small 0.158 0.0100 0.0620 313851904 313851904 314769408 30960660 #> 9 data.table.gz read small 0.215 0.026 0.15 164986880 164986880 278757376 NA #> 10 data.table.gz write small 1.45 0.014 0.386 310034432 310034432 311033856 8263176 #> 11 readr read small 1.08 0.27 0.415 162152448 162152448 350666752 NA #> 12 readr write small 1.78 1.85 0.781 314736640 314736640 359104512 31053850 #> 13 nanoparquet read medium 0.84 0.139 0.979 158351360 158351360 1640300544 NA #> 14 nanoparquet write medium 1.97 0.227 2.20 1079656448 1079656448 1762787328 111363363 #> 15 arrow read medium 1.40 0.265 0.982 168099840 168099840 2229256192 NA #> 16 arrow write medium 2.65 0.065 2.60 1090486272 1090486272 1380417536 112167843 #> 17 duckdb read medium 1.82 0.331 1.12 160743424 160743424 2224111616 NA #> 18 duckdb write medium 7.07 0.353 2.38 1099300864 1099300864 3086221312 213168966 #> 19 data.table read medium 2.36 0.135 0.891 159596544 159596544 1453031424 NA #> 20 data.table write medium 2.60 0.098 0.744 1086357504 1086357504 1088962560 619210198 #> 21 data.table.gz read medium 3.26 0.305 1.98 155844608 155844608 1516044288 NA #> 22 data.table.gz write medium 27.6 0.084 7.01 1092681728 1092681728 1095352320 165249944 #> 23 readr read medium 19.1 5.35 5.10 158367744 158367744 3874635776 NA #> 24 readr write medium 34.4 39.4 14.0 1090158592 1090158592 1932197888 621073998 #> 25 nanoparquet read large 7.25 2.44 10.8 73023488 73023488 8098021376 NA #> 26 nanoparquet write large 19.2 4.46 24.8 8158134272 8450293760 8450293760 1113819142 #> 27 arrow read large 12.0 7.32 10.8 72941568 72941568 9892495360 NA #> 28 arrow write large 27.9 2.31 29.9 8304607232 8573747200 8835842048 1121513329 #> 29 duckdb read large 16.2 5.18 14.6 75251712 75251712 8127512576 NA #> 30 duckdb write large 54.7 14.2 33.7 8305164288 8574451712 9348841472 2131769619 #> 31 data.table read large 21.6 3.87 12.8 78872576 78872576 8691007488 NA #> 32 data.table write large 26.3 1.69 8.09 8304033792 8573157376 8573157376 6192100558 #> 33 data.table.gz read large 30.6 7.16 26.7 72876032 72876032 8018870272 NA #> 34 data.table.gz write large 279. 1.93 71.6 8303362048 8572665856 8572665856 1652494401 #> 35 readr read large 144. 177. 231. 73564160 73564160 8500789248 NA #> 36 readr write large 333. 345. 143. 8304148480 8573452288 9224192000 6210738558"},{"path":"https://nanoparquet.r-lib.org/dev/articles/benchmarks.html","id":"about","dir":"Articles","previous_headings":"","what":"About","title":"Benchmarks","text":"See benchmark-funcs.R file nanoparquet repository code benchmarks. ran measurement subprocess, make easier measure memory usage. include package loading time benchmarks. nanoparquet dependencies loads quickly. arrow duckdb packages might take 200ms load test system, need load dependencies also bigger.","code":"sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.4.2 (2024-10-31) #> os Ubuntu 24.04.1 LTS #> system x86_64, linux-gnu #> ui X11 #> language en-US #> collate C.UTF-8 #> ctype C.UTF-8 #> tz UTC #> date 2025-01-29 #> pandoc 3.1.11 @ /opt/hostedtoolcache/pandoc/3.1.11/x64/ (via rmarkdown) #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date (UTC) lib source #> arrow 18.1.0.1 2025-01-08 [1] RSPM #> assertthat 0.2.1 2019-03-21 [1] RSPM #> base64enc 0.1-3 2015-07-28 [1] RSPM #> bit 4.5.0.1 2024-12-03 [1] RSPM #> bit64 4.6.0-1 2025-01-16 [1] RSPM #> cli 3.6.3 2024-06-21 [1] RSPM #> colorspace 2.1-1 2024-07-26 [1] RSPM #> commonmark 1.9.2 2024-10-04 [1] RSPM #> DBI 1.2.3 2024-06-02 [1] RSPM #> digest 0.6.37 2024-08-19 [1] RSPM #> dplyr * 1.1.4 2023-11-17 [1] RSPM #> duckdb 1.1.3-2 2025-01-24 [1] RSPM #> evaluate 1.0.3 2025-01-10 [1] RSPM #> farver 2.1.2 2024-05-13 [1] RSPM #> fastmap 1.2.0 2024-05-15 [1] RSPM #> fontawesome 0.5.3 2024-11-16 [1] RSPM #> generics 0.1.3 2022-07-05 [1] RSPM #> ggplot2 3.5.1 2024-04-23 [1] RSPM #> glue 1.8.0 2024-09-30 [1] RSPM #> gt * 0.11.1 2024-10-04 [1] RSPM #> gtable 0.3.6 2024-10-25 [1] RSPM #> gtExtras * 0.5.0 2023-09-15 [1] RSPM #> htmltools 0.5.8.1 2024-04-04 [1] RSPM #> jsonlite 1.8.9 2024-09-20 [1] RSPM #> knitr 1.49 2024-11-08 [1] RSPM #> labeling 0.4.3 2023-08-29 [1] RSPM #> lifecycle 1.0.4 2023-11-07 [1] RSPM #> magrittr 2.0.3 2022-03-30 [1] RSPM #> markdown 1.13 2024-06-04 [1] RSPM #> munsell 0.5.1 2024-04-01 [1] RSPM #> nanoparquet 0.3.1.9000 2025-01-29 [1] local #> nycflights13 1.0.2 2021-04-12 [1] RSPM #> paletteer 1.6.0 2024-01-21 [1] RSPM #> pillar 1.10.1 2025-01-07 [1] RSPM #> pkgconfig 2.0.3 2019-09-22 [1] RSPM #> prettyunits 1.2.0 2023-09-24 [1] RSPM #> purrr 1.0.2 2023-08-10 [1] RSPM #> R6 2.5.1 2021-08-19 [1] RSPM #> ragg 1.3.3 2024-09-11 [1] RSPM #> rematch2 2.1.2 2020-05-01 [1] RSPM #> rlang 1.1.5 2025-01-17 [1] RSPM #> rmarkdown 2.29 2024-11-04 [1] RSPM #> sass 0.4.9 2024-03-15 [1] RSPM #> scales 1.3.0 2023-11-28 [1] RSPM #> sessioninfo 1.2.2 2021-12-06 [1] any (@1.2.2) #> svglite 2.1.3 2023-12-08 [1] RSPM #> systemfonts 1.2.1 2025-01-20 [1] RSPM #> textshaping 1.0.0 2025-01-20 [1] RSPM #> tibble 3.2.1 2023-03-20 [1] RSPM #> tidyselect 1.2.1 2024-03-11 [1] RSPM #> utf8 1.2.4 2023-10-22 [1] RSPM #> vctrs 0.6.5 2023-12-01 [1] RSPM #> withr 3.0.2 2024-10-28 [1] RSPM #> xfun 0.50 2025-01-07 [1] RSPM #> xml2 1.3.6 2023-12-04 [1] RSPM #> yaml 2.3.10 2024-07-26 [1] RSPM #> #> [1] /home/runner/work/_temp/Library #> [2] /opt/R/4.4.2/lib/R/site-library #> [3] /opt/R/4.4.2/lib/R/library #> #> ──────────────────────────────────────────────────────────────────────────────"},{"path":"https://nanoparquet.r-lib.org/dev/authors.html","id":null,"dir":"","previous_headings":"","what":"Authors","title":"Authors and Citation","text":"Gábor Csárdi. Author, maintainer. Hannes Mühleisen. Author, copyright holder. Google Inc.. Copyright holder. Apache Software Foundation. Copyright holder. . Copyright holder. RAD Game Tools. Copyright holder. Valve Software. Copyright holder. Tenacious Software LLC. Copyright holder. Facebook, Inc.. Copyright holder.","code":""},{"path":"https://nanoparquet.r-lib.org/dev/authors.html","id":"citation","dir":"","previous_headings":"","what":"Citation","title":"Authors and Citation","text":"Csárdi G, Mühleisen H (2025). nanoparquet: Read Write 'Parquet' Files. R package version 0.3.1.9000, https://r-lib.github.io/nanoparquet/, https://github.com/r-lib/nanoparquet.","code":"@Manual{, title = {nanoparquet: Read and Write 'Parquet' Files}, author = {Gábor Csárdi and Hannes Mühleisen}, year = {2025}, note = {R package version 0.3.1.9000, https://r-lib.github.io/nanoparquet/}, url = {https://github.com/r-lib/nanoparquet}, }"},{"path":"https://nanoparquet.r-lib.org/dev/index.html","id":"nanoparquet","dir":"","previous_headings":"","what":"Read and Write Parquet Files","title":"Read and Write Parquet Files","text":"nanoparquet reader writer common subset Parquet files.","code":""},{"path":"https://nanoparquet.r-lib.org/dev/index.html","id":"features","dir":"","previous_headings":"","what":"Features:","title":"Read and Write Parquet Files","text":"Read write flat (.e. non-nested) Parquet files. Can read Parquet data types. Can read subset columns Parquet file. Can write many R data types, including factors temporal types Parquet. Can append data frame Parquet file without first reading rewriting whole file. Completely dependency free. Supports Snappy, Gzip Zstd compression. Competitive tools terms speed, memory use file size.","code":""},{"path":"https://nanoparquet.r-lib.org/dev/index.html","id":"limitations","dir":"","previous_headings":"","what":"Limitations:","title":"Read and Write Parquet Files","text":"Nested Parquet types supported. Parquet logical types supported: INTERVAL, UNKNOWN. Snappy, Gzip Zstd compression supported. Encryption supported. Reading files URLs supported. nanoparquet always reads data (selected subset ) memory. work --memory data Parquet files like Apache Arrow DuckDB .","code":""},{"path":"https://nanoparquet.r-lib.org/dev/index.html","id":"installation","dir":"","previous_headings":"","what":"Installation","title":"Read and Write Parquet Files","text":"Install R package CRAN:","code":"install.packages(\"nanoparquet\")"},{"path":[]},{"path":"https://nanoparquet.r-lib.org/dev/index.html","id":"read","dir":"","previous_headings":"Usage","what":"Read","title":"Read and Write Parquet Files","text":"Call read_parquet() read Parquet file: see columns Parquet file types mapped R types read_parquet(), call read_parquet_schema() first: Folders similar-structured Parquet files (e.g. produced Spark) can read like :","code":"df <- nanoparquet::read_parquet(\"example.parquet\") nanoparquet::read_parquet_schema(\"example.parquet\") df <- data.table::rbindlist(lapply( Sys.glob(\"some-folder/part-*.parquet\"), nanoparquet::read_parquet ))"},{"path":"https://nanoparquet.r-lib.org/dev/index.html","id":"write","dir":"","previous_headings":"Usage","what":"Write","title":"Read and Write Parquet Files","text":"Call write_parquet() write data frame Parquet file: see columns data frame mapped Parquet types write_parquet(), call infer_parquet_schema() first:","code":"nanoparquet::write_parquet(mtcars, \"mtcars.parquet\") nanoparquet::infer_parquet_schema(mtcars)"},{"path":"https://nanoparquet.r-lib.org/dev/index.html","id":"inspect","dir":"","previous_headings":"Usage","what":"Inspect","title":"Read and Write Parquet Files","text":"Call read_parquet_info(), read_parquet_schema(), read_parquet_metadata() see various kinds metadata Parquet file: read_parquet_info() shows basic summary file. read_parquet_schema() shows columns, including non-leaf columns, mapped R types read_parquet(). read_parquet_metadata() shows complete metadata information: file meta data, schema, row groups column chunks file. find file supported isn’t, please open issue link file.","code":"nanoparquet::read_parquet_info(\"mtcars.parquet\") nanoparquet::read_parquet_schema(\"mtcars.parquet\") nanoparquet::read_parquet_metadata(\"mtcars.parquet\")"},{"path":"https://nanoparquet.r-lib.org/dev/index.html","id":"options","dir":"","previous_headings":"","what":"Options","title":"Read and Write Parquet Files","text":"See also ?parquet_options(). nanoparquet.class: extra class add data frames returned read_parquet(). defined, default \"tbl\", changes data frame printed pillar package loaded. nanoparquet.use_arrow_metadata: unless set FALSE, read_parquet() make use Arrow metadata Parquet file. Currently used detect factor columns. nanoparquet.write_arrow_metadata: unless set FALSE, write_parquet() add Arrow metadata Parquet file. helps preserving classes columns, e.g. factors read back factors, nanoparquet Arrow.","code":""},{"path":"https://nanoparquet.r-lib.org/dev/index.html","id":"license","dir":"","previous_headings":"","what":"License","title":"Read and Write Parquet Files","text":"MIT","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/append_parquet.html","id":null,"dir":"Reference","previous_headings":"","what":"Append a data frame to an existing Parquet file — append_parquet","title":"Append a data frame to an existing Parquet file — append_parquet","text":"schema data frame must compatible schema file.","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/append_parquet.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Append a data frame to an existing Parquet file — append_parquet","text":"","code":"append_parquet( x, file, compression = c(\"snappy\", \"gzip\", \"zstd\", \"uncompressed\"), encoding = NULL, row_groups = NULL, options = parquet_options() )"},{"path":"https://nanoparquet.r-lib.org/dev/reference/append_parquet.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Append a data frame to an existing Parquet file — append_parquet","text":"x Data frame append. file Path output file. compression Compression algorithm use newly written data. See write_parquet(). encoding Encoding use newly written data. encoding data file. See write_parquet() possible values. row_groups Row groups new, extended Parquet file. append_parquet() can change last existing row group, row_groups specified, respect . .e. existing file n rows, last row group starts k (k <= n), first row group row_groups refers new data must start k n+1. (simpler specify num_rows_per_row_group options, see parquet_options() instead row_groups. use row_groups need complete control.) options Nanoparquet options, new data, see parquet_options(). keep_row_groups option also affects whether append_parquet() overwrites existing row groups file.","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/append_parquet.html","id":"warning","dir":"Reference","previous_headings":"","what":"Warning","title":"Append a data frame to an existing Parquet file — append_parquet","text":"function atomic! interrupted, may leave file corrupt state. work around create copy original file, append new data copy, rename new, extended file original one.","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/append_parquet.html","id":"about-row-groups","dir":"Reference","previous_headings":"","what":"About row groups","title":"Append a data frame to an existing Parquet file — append_parquet","text":"Parquet file may partitioned multiple row groups, indeed large Parquet files . append_parquet() able update existing file along row group boundaries. two possibilities: append_parquet() keeps existing row groups file, creates new row groups new data. mode can forced keep_row_groups option options, see parquet_options(). Alternatively, write_parquet overwrite last row group file, existing contents plus (beginning ) new data. mode makes sense last row group small, many small row groups inefficient. default append_parquet chooses two modes automatically, aiming create row groups least num_rows_per_row_group (see parquet_options()) rows. can customize behavior keep_row_groups options row_groups argument.","code":""},{"path":[]},{"path":"https://nanoparquet.r-lib.org/dev/reference/infer_parquet_schema.html","id":null,"dir":"Reference","previous_headings":"","what":"Infer Parquet schema of a data frame — infer_parquet_schema","title":"Infer Parquet schema of a data frame — infer_parquet_schema","text":"Infer Parquet schema data frame","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/infer_parquet_schema.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Infer Parquet schema of a data frame — infer_parquet_schema","text":"","code":"infer_parquet_schema(df, options = parquet_options())"},{"path":"https://nanoparquet.r-lib.org/dev/reference/infer_parquet_schema.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Infer Parquet schema of a data frame — infer_parquet_schema","text":"df Data frame. options Return value parquet_options(), may modify R Parquet type mappings.","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/infer_parquet_schema.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Infer Parquet schema of a data frame — infer_parquet_schema","text":"Data frame, inferred schema. columns return value read_parquet_schema(): file_name, name, r_type, type, type_length, repetition_type, converted_type, logical_type, num_children, scale, precision, field_id.","code":""},{"path":[]},{"path":"https://nanoparquet.r-lib.org/dev/reference/nanoparquet-package.html","id":null,"dir":"Reference","previous_headings":"","what":"nanoparquet: Read and Write 'Parquet' Files — nanoparquet-package","title":"nanoparquet: Read and Write 'Parquet' Files — nanoparquet-package","text":"Self-sufficient reader writer flat 'Parquet' files. Can read 'Parquet' data types. Can write many 'R' data types, including factors temporal types. See docs limitations.","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/nanoparquet-package.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"nanoparquet: Read and Write 'Parquet' Files — nanoparquet-package","text":"nanoparquet reader writer common subset Parquet files.","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/nanoparquet-package.html","id":"features-","dir":"Reference","previous_headings":"","what":"Features:","title":"nanoparquet: Read and Write 'Parquet' Files — nanoparquet-package","text":"Read write flat (.e. non-nested) Parquet files. Can read Parquet data types. Can read subset columns Parquet file. Can write many R data types, including factors temporal types Parquet. Can append data frame Parquet file without first reading rewriting whole file. Completely dependency free. Supports Snappy, Gzip Zstd compression. Competitive tools terms speed, memory use file size.","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/nanoparquet-package.html","id":"limitations-","dir":"Reference","previous_headings":"","what":"Limitations:","title":"nanoparquet: Read and Write 'Parquet' Files — nanoparquet-package","text":"Nested Parquet types supported. Parquet logical types supported: INTERVAL, UNKNOWN. Snappy, Gzip Zstd compression supported. Encryption supported. Reading files URLs supported. nanoparquet always reads data (selected subset ) memory. work --memory data Parquet files like Apache Arrow DuckDB .","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/nanoparquet-package.html","id":"installation","dir":"Reference","previous_headings":"","what":"Installation","title":"nanoparquet: Read and Write 'Parquet' Files — nanoparquet-package","text":"Install R package CRAN:","code":"install.packages(\"nanoparquet\")"},{"path":[]},{"path":"https://nanoparquet.r-lib.org/dev/reference/nanoparquet-package.html","id":"read","dir":"Reference","previous_headings":"","what":"Read","title":"nanoparquet: Read and Write 'Parquet' Files — nanoparquet-package","text":"Call read_parquet() read Parquet file: see columns Parquet file types mapped R types read_parquet(), call read_parquet_schema() first: Folders similar-structured Parquet files (e.g. produced Spark) can read like :","code":"df <- nanoparquet::read_parquet(\"example.parquet\") nanoparquet::read_parquet_schema(\"example.parquet\") df <- data.table::rbindlist(lapply( Sys.glob(\"some-folder/part-*.parquet\"), nanoparquet::read_parquet ))"},{"path":"https://nanoparquet.r-lib.org/dev/reference/nanoparquet-package.html","id":"write","dir":"Reference","previous_headings":"","what":"Write","title":"nanoparquet: Read and Write 'Parquet' Files — nanoparquet-package","text":"Call write_parquet() write data frame Parquet file: see columns data frame mapped Parquet types write_parquet(), call infer_parquet_schema() first:","code":"nanoparquet::write_parquet(mtcars, \"mtcars.parquet\") nanoparquet::infer_parquet_schema(mtcars)"},{"path":"https://nanoparquet.r-lib.org/dev/reference/nanoparquet-package.html","id":"inspect","dir":"Reference","previous_headings":"","what":"Inspect","title":"nanoparquet: Read and Write 'Parquet' Files — nanoparquet-package","text":"Call read_parquet_info(), read_parquet_schema(), read_parquet_metadata() see various kinds metadata Parquet file: read_parquet_info() shows basic summary file. read_parquet_schema() shows columns, including non-leaf columns, mapped R types read_parquet(). read_parquet_metadata() shows complete metadata information: file meta data, schema, row groups column chunks file. find file supported , please open issue link file.","code":"nanoparquet::read_parquet_info(\"mtcars.parquet\") nanoparquet::read_parquet_schema(\"mtcars.parquet\") nanoparquet::read_parquet_metadata(\"mtcars.parquet\")"},{"path":"https://nanoparquet.r-lib.org/dev/reference/nanoparquet-package.html","id":"options","dir":"Reference","previous_headings":"","what":"Options","title":"nanoparquet: Read and Write 'Parquet' Files — nanoparquet-package","text":"See also ?parquet_options(). nanoparquet.class: extra class add data frames returned read_parquet(). defined, default \"tbl\", changes data frame printed pillar package loaded. nanoparquet.use_arrow_metadata: unless set FALSE, read_parquet() make use Arrow metadata Parquet file. Currently used detect factor columns. nanoparquet.write_arrow_metadata: unless set FALSE, write_parquet() add Arrow metadata Parquet file. helps preserving classes columns, e.g. factors read back factors, nanoparquet Arrow.","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/nanoparquet-package.html","id":"license","dir":"Reference","previous_headings":"","what":"License","title":"nanoparquet: Read and Write 'Parquet' Files — nanoparquet-package","text":"MIT","code":""},{"path":[]},{"path":"https://nanoparquet.r-lib.org/dev/reference/nanoparquet-package.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"nanoparquet: Read and Write 'Parquet' Files — nanoparquet-package","text":"Maintainer: Gábor Csárdi csardi.gabor@gmail.com Authors: Hannes Mühleisen (ORCID) [copyright holder] contributors: Google Inc. [copyright holder] Apache Software Foundation [copyright holder] Posit Software, PBC [copyright holder] RAD Game Tools [copyright holder] Valve Software [copyright holder] Tenacious Software LLC [copyright holder] Facebook, Inc. [copyright holder]","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/nanoparquet-types.html","id":null,"dir":"Reference","previous_headings":"","what":"nanoparquet's type maps — nanoparquet-types","title":"nanoparquet's type maps — nanoparquet-types","text":"nanoparquet maps R types Parquet types.","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/nanoparquet-types.html","id":"r-s-data-types","dir":"Reference","previous_headings":"","what":"R's data types","title":"nanoparquet's type maps — nanoparquet-types","text":"writing data frame, nanoparquet maps R's data types Parquet logical types. following table summary mapping. details see . non-default mappings can selected via schema argument. E.g. write factor column called 'name' ENUM, use detailed mapping rules listed , order preference. rules likely change nanoparquet reaches version 1.0.0. Factors (.e. vectors inherit factor class) converted character vectors using .character(), written STRSXP (character vector) type. fact column factor stored Arrow metadata (see ), unless nanoparquet.write_arrow_metadata option set FALSE. Dates (.e. Date class) written DATE logical type, INT32 type internally. hms objects (hms package) written TIME(true, MILLIS). logical type, internally INT32 Parquet type. Sub-milliseconds precision lost. POSIXct objects written TIMESTAMP(true, MICROS) logical type, internally INT64 Parquet type. Sub-microsecond precision lost. difftime objects (hms objects, see ), written INT64 Parquet type, noting Arrow metadata (see ) column type Duration NANOSECONDS unit. Integer vectors (INTSXP) written INT(32, true) logical type, corresponds INT32 type. Real vectors (REALSXP) written DOUBLE type. Character vectors (STRSXP) written STRING logical type, BYTE_ARRAY type. always converted UTF-8 writing. Logical vectors (LGLSXP) written BOOLEAN type. vectors error currently. can use infer_parquet_schema() data frame map R data types Parquet data types. change default R Parquet mapping, use parquet_schema() schema argument write_parquet(). Currently supported non-default mappings : integer INT64, integer INT96, double INT96, double FLOAT, character BYTE_ARRAY, character FIXED_LEN_BYTE_ARRAY, character ENUM, factor ENUM, integer DECIAML & INT32, integer DECIAML & INT64, double DECIAML & INT32, double DECIAML & INT64, integer INT(8, *), INT(16, *), INT(32, signed), double INT(*, *), character UUID, double FLOAT16, list raw vectors BYTE_ARRAY, list raw vectors FIXED_LEN_BYTE_ARRAY.","code":"write_parquet(..., schema = parquet_schema(name = \"ENUM\"))"},{"path":"https://nanoparquet.r-lib.org/dev/reference/nanoparquet-types.html","id":"parquet-s-data-types","dir":"Reference","previous_headings":"","what":"Parquet's data types","title":"nanoparquet's type maps — nanoparquet-types","text":"reading Parquet file nanoparquet also relies logical types Arrow metadata (present, see ) addition low level data types. following table summarizes mappings. See details . exact rules . rules likely change nanoparquet reaches version 1.0.0. BOOLEAN type read logical vector (LGLSXP). STRING logical type UTF8 converted type read character vector UTF-8 encoding. DATE logical type DATE converted type read Date R object. TIME logical type TIME_MILLIS TIME_MICROS converted types read hms object, see hms package. TIMESTAMP logical type TIMESTAMP_MILLIS TIMESTAMP_MICROS converted types read POSIXct objects. logical type UTC flag set, time zone POSIXct object set UTC. INT32 read integer vector (INTSXP). INT64, DOUBLE FLOAT read real vectors (REALSXP). INT96 read POSIXct read vector tzone attribute set \"UTC\". old convention store time stamps INT96 objects. DECIMAL converted type (FIXED_LEN_BYTE_ARRAY BYTE_ARRAY type) read real vector (REALSXP), potentially losing precision. ENUM logical type read character vector. UUID logical type read character vector uses 00112233-4455-6677-8899-aabbccddeeff form. FLOAT16 logical type read real vector (REALSXP). BYTE_ARRAY read factor object file written Arrow original data type column factor. (See 'Arrow metadata .) Otherwise BYTE_ARRAY read list raw vectors, missing values denoted NULL. logical converted types read annotated low level types: INT(8, true), INT(16, true) INT(32, true) read integer vectors INT32 internally Parquet. INT(64, true) read real vector (REALSXP). Unsigned integer types INT(8, false), INT(16, false) INT(32, false) read integer vectors (INTSXP). Large positive values may overflow negative values, known issue fix. INT(64, false) read real vector (REALSXP). Large positive values may overflow negative values, known issue fix. INTERVAL fixed length byte array, nanoparquet reads list raw vectors. Missing values denoted NULL. JSON columns read character vectors (STRSXP). BSON columns read raw vectors (RAWSXP). types yet supported: Nested types (LIST, MAP) supported. UNKNOWN logical type supported. can use read_parquet_schema() function see R read columns Parquet file. Look r_type column.","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/nanoparquet-types.html","id":"the-arrow-metadata","dir":"Reference","previous_headings":"","what":"The Arrow metadata","title":"nanoparquet's type maps — nanoparquet-types","text":"Apache Arrow (.e. arrow R package) adds additional metadata Parquet files writing arrow::write_parquet(). , reading file arrow::read_parquet(), uses metadata recreate Arrow R data types writing. nanoparquet::write_parquet() also adds Arrow metadata Parquet files, unless nanoparquet.write_arrow_metadata option set FALSE. Similarly, nanoparquet::read_parquet() uses Arrow metadata Parquet file (present), unless nanoparquet.use_arrow_metadata option set FALSE. Arrow metadata stored file level key-value metadata, key ARROW:schema. Currently nanoparquet uses Arrow metadata two things: uses detect factors. Without Arrow metadata factors read string vectors. uses detect difftime objects. Without arrow metadata read INT64 columns, containing time difference nanoseconds.","code":""},{"path":[]},{"path":"https://nanoparquet.r-lib.org/dev/reference/parquet-encodings.html","id":null,"dir":"Reference","previous_headings":"","what":"Parquet encodings — parquet-encodings","title":"Parquet encodings — parquet-encodings","text":"Various Parquet encodings","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/parquet-encodings.html","id":"nanoparquet-defaults","dir":"Reference","previous_headings":"","what":"Nanoparquet defaults","title":"Parquet encodings — parquet-encodings","text":"Currently defaults decided based R types. might change future. general, defaults likely change nanoparquet reaches version 1.0.0. Current encoding defaults: Definition levels always use RLE. (Nanoparquet currently write repetition levels, also use RLE, implemented.) factor columns use RLE_DICTIONARY. logical columns use RLE average run length first 10,000 values least 15. Otherwise use PLAIN encoding. integer, double character columns use RLE_DICTIONARY least two third values repeated. Otherwise use PLAIN encoding. list columns raw vectors always use PLAIN encoding currently.","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/parquet-encodings.html","id":"parquet-encodings","dir":"Reference","previous_headings":"","what":"Parquet encodings","title":"Parquet encodings — parquet-encodings","text":"See https://github.com/apache/parquet-format/blob/master/Encodings.md details Parquet encodings.","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/parquet-encodings.html","id":"plain-encoding","dir":"Reference","previous_headings":"","what":"PLAIN encoding","title":"Parquet encodings — parquet-encodings","text":"Supported types: . general values written back back: Integer types little endian. Floating point types follow IEEE standard. BYTE_ARRAY: element, little endian 4-byte length bytes . FIXED_LEN_BYTE_ARRAY: bytes written back back. Nanoparquet can read write encoding primitive types.","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/parquet-encodings.html","id":"rle-dictionary-encoding","dir":"Reference","previous_headings":"","what":"RLE_DICTIONARY encoding","title":"Parquet encodings — parquet-encodings","text":"Supported types: dictionary indices data pages. encoding combines run-length encoding bit-packing. Repeated sequences value can run-length encoded, non-repeated parts bit packed. used data pages dictionaries. dictionary pages PLAIN encoded. deprecated PLAIN_DICTIONARY name treated RLE_DICTIONARY. Nanoparquet can read write encoding.","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/parquet-encodings.html","id":"rle-encoding","dir":"Reference","previous_headings":"","what":"RLE encoding","title":"Parquet encodings — parquet-encodings","text":"Supported types: BOOLEAN. Also definition repetition levels. encoding RLE_DICTIONARY, slightly different header. combines run-length encoding bit packing. used BOOLEAN columns, also definition repetition levels. Nanoparquet can read write encoding.","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/parquet-encodings.html","id":"bit-packed-encoding-deprecated-in-favor-of-rle-","dir":"Reference","previous_headings":"","what":"BIT_PACKED encoding (deprecated in favor of RLE)","title":"Parquet encodings — parquet-encodings","text":"Supported types: none. definition repetition levels, RLE used instead. simple bit packing encoding integers, previously used encoding definition repetition levels. used new Parquet files RLE encoding includes better. Nanoparquet currently read write BIT_PACKED encoding.","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/parquet-encodings.html","id":"delta-binary-packed-encoding","dir":"Reference","previous_headings":"","what":"DELTA_BINARY_PACKED encoding","title":"Parquet encodings — parquet-encodings","text":"Supported types: INT32, INT64. encoding efficiently encodes integer columns differences consecutive elements often , /differences consecutive elements small. extreme case arithmetic sequence can encoded O(1) space. Nanoparquet can read encoding, currently write .","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/parquet-encodings.html","id":"delta-length-byte-array-encoding","dir":"Reference","previous_headings":"","what":"DELTA_LENGTH_BYTE_ARRAY encoding","title":"Parquet encodings — parquet-encodings","text":"Supported types: BYTE_ARRAY. encoding uses DELTA_BINARY_PACKED encode length byte array elements. especially efficient short byte array elements, .e. column short strings. Nanoparquet can read encoding, currently write .","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/parquet-encodings.html","id":"delta-byte-array-encoding","dir":"Reference","previous_headings":"","what":"DELTA_BYTE_ARRAY encoding","title":"Parquet encodings — parquet-encodings","text":"Supported types: BYTE_ARRAY, FIXED_LEN_BYTE_ARRAY. encoding efficient consecutive byte array elements share prefix, element can reuse prefix previous element. Nanoparquet can read encoding, currently write .","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/parquet-encodings.html","id":"byte-stream-split-encoding","dir":"Reference","previous_headings":"","what":"BYTE_STREAM_SPLIT encoding","title":"Parquet encodings — parquet-encodings","text":"Supported types: FLOAT, DOUBLE, INT32, INT64, FIXED_LEN_BYTE_ARRAY. encoding stores first bytes elements first, second bytes, etc. reduce size , may allow efficient compression. Nanoparquet can read encoding, currently write .","code":""},{"path":[]},{"path":"https://nanoparquet.r-lib.org/dev/reference/parquet_column_types.html","id":null,"dir":"Reference","previous_headings":"","what":"Map between R and Parquet data types — parquet_column_types","title":"Map between R and Parquet data types — parquet_column_types","text":"Note function now deprecated. Please use read_parquet_schema() files, infer_parquet_schema() data frames.","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/parquet_column_types.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Map between R and Parquet data types — parquet_column_types","text":"","code":"parquet_column_types(x, options = parquet_options())"},{"path":"https://nanoparquet.r-lib.org/dev/reference/parquet_column_types.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Map between R and Parquet data types — parquet_column_types","text":"x Path Parquet file, data frame. options Nanoparquet options, see parquet_options().","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/parquet_column_types.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Map between R and Parquet data types — parquet_column_types","text":"Data frame columns: file_name: file name. name: column name. type: (low level) Parquet data type. r_type: R type corresponds Parquet type. Might NA read_parquet() read column. See nanoparquet-types type mapping rules. repetition_type: whether column REQUIRED (NA) OPTIONAL (may NA). REPEATED columns currently supported nanoparquet. logical_type: Parquet logical type list column. element least entry called type, potentially additional entries, e.g. bit_width, is_signed, etc.","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/parquet_column_types.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Map between R and Parquet data types — parquet_column_types","text":"function works two ways. can map R types data frame Parquet types, see write_parquet() write data frame. can also map types Parquet file R types, see read_parquet() read file R.","code":""},{"path":[]},{"path":"https://nanoparquet.r-lib.org/dev/reference/parquet_options.html","id":null,"dir":"Reference","previous_headings":"","what":"Nanoparquet options — parquet_options","title":"Nanoparquet options — parquet_options","text":"Create list nanoparquet options.","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/parquet_options.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Nanoparquet options — parquet_options","text":"","code":"parquet_options( class = getOption(\"nanoparquet.class\", \"tbl\"), compression_level = getOption(\"nanoparquet.compression_level\", NA_integer_), keep_row_groups = FALSE, num_rows_per_row_group = getOption(\"nanoparquet.num_rows_per_row_group\", 10000000L), use_arrow_metadata = getOption(\"nanoparquet.use_arrow_metadata\", TRUE), write_arrow_metadata = getOption(\"nanoparquet.write_arrow_metadata\", TRUE), write_data_page_version = getOption(\"nanoparquet.write_data_page_version\", 1L), write_minmax_values = getOption(\"nanoparquet.write_minmax_values\", TRUE) )"},{"path":"https://nanoparquet.r-lib.org/dev/reference/parquet_options.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Nanoparquet options — parquet_options","text":"class extra class classes add data frames created read_parquet(). default nanoparquet adds \"tbl\" class, data frames printed differently pillar package loaded. compression_level compression level write_parquet(). NA default, specifies default compression level method. Inf always selects highest possible compression level. details: Snappy support compression levels currently. GZIP supports levels 0 (uncompressed), 1 (fastest), 9 (best). default 6. ZSTD allows positive levels 22 currently. 20 require memory. Negative levels also allowed, lower level, faster speed, cost compression. Currently smallest level -131072. default level 3. keep_row_groups option used appending Parquet file append_parquet(). TRUE existing row groups file always kept nanoparquet creates new row groups new data. FALSE (default), last row group file overwritten smaller default row group size, .e. num_rows_per_row_group. num_rows_per_row_group number rows put row group, row groups specified explicitly. integer scalar. Defaults 10 million. use_arrow_metadata TRUE FALSE. TRUE, read_parquet() read_parquet_schema() make use Apache Arrow metadata assign R classes Parquet columns. currently used detect factor columns, detect \"difftime\" columns. option FALSE: \"factor\" columns read character vectors. \"difftime\" columns read real numbers, meaning one seconds, milliseconds, microseconds nanoseconds. Impossible tell without using Arrow metadata. write_arrow_metadata Whether add Apache Arrow types metadata file write_parquet(). write_data_page_version Data version write default. Possible values 1 2. Default 1. write_minmax_values Whether write minimum maximum values per row group, data types support write_parquet(). However, nanoparquet currently support minimum maximum values DECIMAL, UUID FLOAT16 logical types BOOLEAN, BYTE_ARRAY FIXED_LEN_BYTE_ARRAY primitive types writing without logical type. Currently default TRUE.","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/parquet_options.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Nanoparquet options — parquet_options","text":"List nanoparquet options.","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/parquet_options.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Nanoparquet options — parquet_options","text":"","code":"if (FALSE) { # the effect of using Arrow metadata tmp <- tempfile(fileext = \".parquet\") d <- data.frame( fct = as.factor(\"a\"), dft = as.difftime(10, units = \"secs\") ) write_parquet(d, tmp) read_parquet(tmp, options = parquet_options(use_arrow_metadata = TRUE)) read_parquet(tmp, options = parquet_options(use_arrow_metadata = FALSE)) }"},{"path":"https://nanoparquet.r-lib.org/dev/reference/parquet_schema.html","id":null,"dir":"Reference","previous_headings":"","what":"Create a Parquet schema — parquet_schema","title":"Create a Parquet schema — parquet_schema","text":"can use schema specify write data frame Parquet file write_parquet().","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/parquet_schema.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Create a Parquet schema — parquet_schema","text":"","code":"parquet_schema(...)"},{"path":"https://nanoparquet.r-lib.org/dev/reference/parquet_schema.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Create a Parquet schema — parquet_schema","text":"... Parquet type specifications, see . backwards compatibility, can supply file name , parquet_schema behaves read_parquet_schema().","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/parquet_schema.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Create a Parquet schema — parquet_schema","text":"Data frame columns read_parquet_schema(): file_name, name, r_type, type, type_length, repetition_type, converted_type, logical_type, num_children, scale, precision, field_id.","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/parquet_schema.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Create a Parquet schema — parquet_schema","text":"schema list potentially named type specifications. schema stored data frame. (potentially named) argument parquet_schema may character scalar, list. Parameterized types need specified list. Primitive Parquet types may specified string list.","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/parquet_schema.html","id":"possible-types-","dir":"Reference","previous_headings":"","what":"Possible types:","title":"Create a Parquet schema — parquet_schema","text":"Special type: \"AUTO\": Parquet type, tells write_parquet() map R type Parquet automatically, using default mapping rules. Primitive Parquet types: \"BOOLEAN\" \"INT32\" \"INT64\" \"INT96\" \"FLOAT\" \"DOUBLE\" \"BYTE_ARRAY\" \"FIXED_LEN_BYTE_ARRAY\": fixed-length byte array. needs type_length parameter, integer 0 2^31-1. Parquet logical types: \"STRING\" \"ENUM\" \"UUID\" \"INTEGER\": signed unsigned integer. needs bit_width is_signed parameter. bit_width must 8, 16, 32 64. is_signed must TRUE FALSE. \"INT\": \"INTEGER\". Parquet documentation uses \"INT\", actual specification uses \"INTEGER\". supported nanoparquet. \"DECIMAL\": decimal number specified scale precision. needs precision primitive_type parameters. Also supports scale parameter, defaults zero specified. \"FLOAT16\" \"DATE\" \"TIME\": needs is_adjusted_utc (TRUE FALSE) unit parameter. unit must \"MILLIS\", \"MICROS\" \"NANOS\". \"TIMESTAMP\": needs is_adjusted_utc (TRUE FALSE) unit parameter. unit must \"MILLIS\", \"MICROS\" \"NANOS\". \"JSON\" \"BSON\" Logical types MAP, LIST UNKNOWN supported currently. Converted types deprecated Parquet specification favor logical types, parquet_schema() accepts converted types syntactic shortcut corresponding logical types: INT_8 mean list(\"INT\", bit_width = 8, is_signed = TRUE). INT_16 mean list(\"INT\", bit_width = 16, is_signed = TRUE). INT_32 mean list(\"INT\", bit_width = 32, is_signed = TRUE). INT_64 mean list(\"INT\", bit_width = 64, is_signed = TRUE). TIME_MICROS means list(\"TIME\", is_adjusted_utc = TRUE, unit = \"MICROS\"). TIME_MILLIS means list(\"TIME\", is_adjusted_utc = TRUE, unit = \"MILLIS\"). TIMESTAMP_MICROS means list(\"TIMESTAMP\", is_adjusted_utc = TRUE, unit = \"MICROS\"). TIMESTAMP_MILLIS means list(\"TIMESTAMP\", is_adjusted_utc = TRUE, unit = \"MILLIS\"). UINT_8 means list(\"INT\", bit_width = 8, is_signed = FALSE). UINT_16 means list(\"INT\", bit_width = 16, is_signed = FALSE). UINT_32 means list(\"INT\", bit_width = 32, is_signed = FALSE). UINT_64 means list(\"INT\", bit_width = 64, is_signed = FALSE).","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/parquet_schema.html","id":"missing-values","dir":"Reference","previous_headings":"","what":"Missing values","title":"Create a Parquet schema — parquet_schema","text":"type might also repetition_type parameter, possible values \"REQUIRED\", \"OPTIONAL\" \"REPEATED\". \"REQUIRED\" columns allow missing values. Missing values allowed \"OPTIONAL\" columns. \"REPEATED\" columns currently supported write_parquet().","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/parquet_schema.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Create a Parquet schema — parquet_schema","text":"","code":"parquet_schema( c1 = \"INT32\", c2 = list(\"INT\", bit_width = 64, is_signed = TRUE), c3 = list(\"STRING\", repetition_type = \"OPTIONAL\") ) #> # A data frame: 3 × 12 #> file_name name r_type type type_length repetition_type converted_type #> * #> 1 NA c1 NA INT32 NA NA NA #> 2 NA c2 NA INT64 NA NA INT_64 #> 3 NA c3 NA BYTE_… NA OPTIONAL UTF8 #> # ℹ 5 more variables: logical_type >, num_children , #> # scale , precision , field_id "},{"path":"https://nanoparquet.r-lib.org/dev/reference/read_parquet.html","id":null,"dir":"Reference","previous_headings":"","what":"Read a Parquet file into a data frame — read_parquet","title":"Read a Parquet file into a data frame — read_parquet","text":"Converts contents named Parquet file R data frame.","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/read_parquet.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Read a Parquet file into a data frame — read_parquet","text":"","code":"read_parquet(file, col_select = NULL, options = parquet_options())"},{"path":"https://nanoparquet.r-lib.org/dev/reference/read_parquet.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Read a Parquet file into a data frame — read_parquet","text":"file Path Parquet file. may also R connection, case first reads data connection, writes temporary file, reads temporary file, deletes . connection might open, case must binary connection. open, read_parquet() open also close end. col_select Columns read. can numeric vector column indices. error select column multiple times. order columns result order col_select. options Nanoparquet options, see parquet_options().","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/read_parquet.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Read a Parquet file into a data frame — read_parquet","text":"data.frame file's contents.","code":""},{"path":[]},{"path":"https://nanoparquet.r-lib.org/dev/reference/read_parquet.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Read a Parquet file into a data frame — read_parquet","text":"","code":"file_name <- system.file(\"extdata/userdata1.parquet\", package = \"nanoparquet\") parquet_df <- nanoparquet::read_parquet(file_name) print(str(parquet_df)) #> Classes ‘tbl’ and 'data.frame':\t1000 obs. of 13 variables: #> $ registration: POSIXct, format: \"2016-02-03 07:55:29\" \"2016-02-03 17:04:03\" ... #> $ id : int 1 2 3 4 5 6 7 8 9 10 ... #> $ first_name : chr \"Amanda\" \"Albert\" \"Evelyn\" \"Denise\" ... #> $ last_name : chr \"Jordan\" \"Freeman\" \"Morgan\" \"Riley\" ... #> $ email : chr \"ajordan0@com.com\" \"afreeman1@is.gd\" \"emorgan2@altervista.org\" \"driley3@gmpg.org\" ... #> $ gender : Factor w/ 2 levels \"Female\",\"Male\": 1 2 1 1 NA 1 2 2 2 1 ... #> $ ip_address : chr \"1.197.201.2\" \"218.111.175.34\" \"7.161.136.94\" \"140.35.109.83\" ... #> $ cc : chr \"6759521864920116\" NA \"6767119071901597\" \"3576031598965625\" ... #> $ country : chr \"Indonesia\" \"Canada\" \"Russia\" \"China\" ... #> $ birthdate : Date, format: \"1971-03-08\" \"1968-01-16\" ... #> $ salary : num 49757 150280 144973 90263 NA ... #> $ title : chr \"Internal Auditor\" \"Accountant IV\" \"Structural Engineer\" \"Senior Cost Accountant\" ... #> $ comments : chr \"1E+02\" NA NA NA ... #> NULL"},{"path":"https://nanoparquet.r-lib.org/dev/reference/read_parquet_info.html","id":null,"dir":"Reference","previous_headings":"","what":"Short summary of a Parquet file — read_parquet_info","title":"Short summary of a Parquet file — read_parquet_info","text":"Short summary Parquet file","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/read_parquet_info.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Short summary of a Parquet file — read_parquet_info","text":"","code":"read_parquet_info(file) parquet_info(file)"},{"path":"https://nanoparquet.r-lib.org/dev/reference/read_parquet_info.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Short summary of a Parquet file — read_parquet_info","text":"file Path Parquet file.","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/read_parquet_info.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Short summary of a Parquet file — read_parquet_info","text":"Data frame columns: file_name: file name. num_cols: number (leaf) columns. num_rows: number rows. num_row_groups: number row groups. file_size: file size bytes. parquet_version: Parquet version. created_by: string scalar, usually name software created file. NA available.","code":""},{"path":[]},{"path":"https://nanoparquet.r-lib.org/dev/reference/read_parquet_metadata.html","id":null,"dir":"Reference","previous_headings":"","what":"Read the metadata of a Parquet file — read_parquet_metadata","title":"Read the metadata of a Parquet file — read_parquet_metadata","text":"function work files, even read_parquet() unable read , unsupported schema, encoding, compression reason.","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/read_parquet_metadata.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Read the metadata of a Parquet file — read_parquet_metadata","text":"","code":"read_parquet_metadata(file, options = parquet_options()) parquet_metadata(file)"},{"path":"https://nanoparquet.r-lib.org/dev/reference/read_parquet_metadata.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Read the metadata of a Parquet file — read_parquet_metadata","text":"file Path Parquet file. options Options potentially alter default Parquet R type mappings, see parquet_options().","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/read_parquet_metadata.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Read the metadata of a Parquet file — read_parquet_metadata","text":"named list entries: file_meta_data: data frame file meta data: file_name: file name. version: Parquet version, integer. num_rows: total number rows. key_value_metadata: list column data frames two character columns called key value. key-value metadata file. Arrow stores schema . created_by: string scalar, usually name software created file. schema: data frame, schema file. one row node (inner node leaf node). flat files means one root node (inner node), always first one, one row \"real\" column. nested schemas, rows depth-first search order. important columns : file_name: file name. name: column name. r_type: R type corresponds Parquet type. Might NA read_parquet() read column. See nanoparquet-types type mapping rules. r_type: type: data type. One low level data types. type_length: length fixed length byte arrays. repettion_type: character, one REQUIRED, OPTIONAL REPEATED. logical_type: list column, logical types columns. element least entry called type, potentially additional entries, e.g. bit_width, is_signed, etc. num_children: number child nodes. non-negative integer root node, NA leaf node. $row_groups: data frame, information row groups. important columns: file_name: file name. id: row group id, integer zero number row groups minus one. total_byte_size: total uncompressed size column data. num_rows: number rows. file_offset: row group starts file. optional, might NA. total_compressed_size: total byte size compressed (potentially encrypted) column data row group. optional, might NA. ordinal: ordinal position row group file, starting zero. optional, might NA. NA, order row groups appear metadata. $column_chunks: data frame, information column chunks, across row groups. important columns: file_name: file name. row_group: row group chunk belongs . column: leaf column chunks belongs . order $schema, leaf columns (.e. columns NA children) counted. file_path: file chunk stored . NA means file. file_offset: column chunk begins file. type: low level parquet data type. encodings: encodings used store chunk. list column character vectors encoding names. Current possible encodings: \"PLAIN\", \"GROUP_VAR_INT\", \"PLAIN_DICTIONARY\", \"RLE\", \"BIT_PACKED\", \"DELTA_BINARY_PACKED\", \"DELTA_LENGTH_BYTE_ARRAY\", \"DELTA_BYTE_ARRAY\", \"RLE_DICTIONARY\", \"BYTE_STREAM_SPLIT\". path_in_scema: list column character vectors. simply path root node. simply column name flat schemas. codec: compression codec used column chunk. Possible values : \"UNCOMPRESSED\", \"SNAPPY\", \"GZIP\", \"LZO\", \"BROTLI\", \"LZ4\", \"ZSTD\". num_values: number values column chunk. total_uncompressed_size: total uncompressed size bytes. total_compressed_size: total compressed size bytes. data_page_offset: absolute position first data page column chunk file. index_page_offset: absolute position first index page column chunk file, NA index pages. dictionary_page_offset: absolute position first dictionary page column chunk file, NA dictionary pages. null_count: number missing values column chunk. may NA. min_value: list column raw vectors, minimum value column, binary. NULL, specified. column experimental. max_value: list column raw vectors, maximum value column, binary. NULL, specified. column experimental. is_min_value_exact: whether minimum value actual value column, bound. may NA. is_max_value_exact: whether maximum value actual value column, bound. may NA.","code":""},{"path":[]},{"path":"https://nanoparquet.r-lib.org/dev/reference/read_parquet_metadata.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Read the metadata of a Parquet file — read_parquet_metadata","text":"","code":"file_name <- system.file(\"extdata/userdata1.parquet\", package = \"nanoparquet\") nanoparquet::read_parquet_metadata(file_name) #> $file_meta_data #> # A data frame: 1 × 5 #> file_name version num_rows key_value_metadata created_by #> > #> 1 /home/runner/work/_temp/… 1 1000 https://g… #> #> $schema #> # A data frame: 14 × 12 #> file_name name r_type type type_length repetition_type converted_type #> #> 1 /home/ru… sche… NA NA NA NA NA #> 2 /home/ru… regi… POSIX… INT64 NA REQUIRED TIMESTAMP_MIC… #> 3 /home/ru… id integ… INT32 NA REQUIRED INT_32 #> 4 /home/ru… firs… chara… BYTE… NA OPTIONAL UTF8 #> 5 /home/ru… last… chara… BYTE… NA REQUIRED UTF8 #> 6 /home/ru… email factor BYTE… NA OPTIONAL UTF8 #> 7 /home/ru… gend… chara… BYTE… NA OPTIONAL UTF8 #> 8 /home/ru… ip_a… chara… BYTE… NA REQUIRED UTF8 #> 9 /home/ru… cc chara… BYTE… NA OPTIONAL UTF8 #> 10 /home/ru… coun… chara… BYTE… NA REQUIRED UTF8 #> 11 /home/ru… birt… Date INT32 NA OPTIONAL DATE #> 12 /home/ru… sala… double DOUB… NA OPTIONAL NA #> 13 /home/ru… title chara… BYTE… NA OPTIONAL UTF8 #> 14 /home/ru… comm… chara… BYTE… NA OPTIONAL UTF8 #> # ℹ 5 more variables: logical_type >, num_children , #> # scale , precision , field_id #> #> $row_groups #> # A data frame: 1 × 7 #> file_name id total_byte_size num_rows file_offset #> #> 1 /home/runner/work/_temp/Libr… 0 71427 1000 NA #> # ℹ 2 more variables: total_compressed_size , ordinal #> #> $column_chunks #> # A data frame: 13 × 24 #> file_name row_group column file_path file_offset offset_index_offset #> #> 1 /home/runne… 0 0 NA 4 NA #> 2 /home/runne… 0 1 NA 6741 NA #> 3 /home/runne… 0 2 NA 12259 NA #> 4 /home/runne… 0 3 NA 15211 NA #> 5 /home/runne… 0 4 NA 16239 NA #> 6 /home/runne… 0 5 NA 31759 NA #> 7 /home/runne… 0 6 NA 32031 NA #> 8 /home/runne… 0 7 NA 42952 NA #> 9 /home/runne… 0 8 NA 55009 NA #> 10 /home/runne… 0 9 NA 55925 NA #> 11 /home/runne… 0 10 NA 59312 NA #> 12 /home/runne… 0 11 NA 67026 NA #> 13 /home/runne… 0 12 NA 71089 NA #> # ℹ 18 more variables: offset_index_length , #> # column_index_offset , column_index_length , type , #> # encodings >, path_in_schema >, codec , #> # num_values , total_uncompressed_size , #> # total_compressed_size , data_page_offset , #> # index_page_offset , dictionary_page_offset , #> # null_count , min_value >, max_value >, … #>"},{"path":"https://nanoparquet.r-lib.org/dev/reference/read_parquet_page.html","id":null,"dir":"Reference","previous_headings":"","what":"Read a page from a Parquet file — read_parquet_page","title":"Read a page from a Parquet file — read_parquet_page","text":"Read page Parquet file","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/read_parquet_page.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Read a page from a Parquet file — read_parquet_page","text":"","code":"read_parquet_page(file, offset)"},{"path":"https://nanoparquet.r-lib.org/dev/reference/read_parquet_page.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Read a page from a Parquet file — read_parquet_page","text":"file Path Parquet file. offset Integer offset start page file. See read_parquet_pages() list pages offsets.","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/read_parquet_page.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Read a page from a Parquet file — read_parquet_page","text":"Named list. Many entries correspond columns result read_parquet_pages(). Additional entries : codec: compression codec. Possible values: has_repetition_levels: whether page repetition levels. has_definition_levels: whether page definition levels. schema_column: schema column page corresponds . Note leaf columns pages. data_type: low level Parquet data type. Possible values: repetition_type: whether column page belongs REQUIRED, OPTIONAL REPEATED. page_header: bytes page header raw vector. num_null: number missing (NA) values. set V2 data pages. num_rows: num_values flat tables, .e. files without repetition levels. compressed_data: data page raw vector. includes repetition definition levels, . data: uncompressed data, nanoparquet supports compression codec file (GZIP SNAPPY time writing), file compressed. latter case compressed_data.","code":""},{"path":[]},{"path":"https://nanoparquet.r-lib.org/dev/reference/read_parquet_page.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Read a page from a Parquet file — read_parquet_page","text":"","code":"file_name <- system.file(\"extdata/userdata1.parquet\", package = \"nanoparquet\") nanoparquet:::read_parquet_pages(file_name) #> # A data frame: 19 × 14 #> file_name row_group column page_type page_header_offset #> #> 1 /home/runner/work/_temp/… 0 0 DATA_PAGE 4 #> 2 /home/runner/work/_temp/… 0 1 DATA_PAGE 6741 #> 3 /home/runner/work/_temp/… 0 2 DICTIONA… 10766 #> 4 /home/runner/work/_temp/… 0 2 DATA_PAGE 12259 #> 5 /home/runner/work/_temp/… 0 3 DICTIONA… 13334 #> 6 /home/runner/work/_temp/… 0 3 DATA_PAGE 15211 #> 7 /home/runner/work/_temp/… 0 4 DATA_PAGE 16239 #> 8 /home/runner/work/_temp/… 0 5 DICTIONA… 31726 #> 9 /home/runner/work/_temp/… 0 5 DATA_PAGE 31759 #> 10 /home/runner/work/_temp/… 0 6 DATA_PAGE 32031 #> 11 /home/runner/work/_temp/… 0 7 DATA_PAGE 42952 #> 12 /home/runner/work/_temp/… 0 8 DICTIONA… 53749 #> 13 /home/runner/work/_temp/… 0 8 DATA_PAGE 55009 #> 14 /home/runner/work/_temp/… 0 9 DATA_PAGE 55925 #> 15 /home/runner/work/_temp/… 0 10 DATA_PAGE 59312 #> 16 /home/runner/work/_temp/… 0 11 DICTIONA… 65063 #> 17 /home/runner/work/_temp/… 0 11 DATA_PAGE 67026 #> 18 /home/runner/work/_temp/… 0 12 DICTIONA… 68019 #> 19 /home/runner/work/_temp/… 0 12 DATA_PAGE 71089 #> # ℹ 9 more variables: uncompressed_page_size , #> # compressed_page_size , crc , num_values , #> # encoding , definition_level_encoding , #> # repetition_level_encoding , data_offset , #> # page_header_length options(max.print = 100) # otherwise long raw vector nanoparquet:::read_parquet_page(file_name, 4L) #> $page_type #> [1] \"DATA_PAGE\" #> #> $row_group #> [1] 0 #> #> $column #> [1] 0 #> #> $page_header_offset #> [1] 4 #> #> $data_page_offset #> [1] 24 #> #> $page_header_length #> [1] 20 #> #> $compressed_page_size #> [1] 6717 #> #> $uncompressed_page_size #> [1] 8000 #> #> $codec #> [1] \"SNAPPY\" #> #> $num_values #> [1] 1000 #> #> $encoding #> [1] \"PLAIN\" #> #> $definition_level_encoding #> [1] \"PLAIN\" #> #> $repetition_level_encoding #> [1] \"PLAIN\" #> #> $has_repetition_levels #> [1] FALSE #> #> $has_definition_levels #> [1] FALSE #> #> $schema_column #> [1] 1 #> #> $data_type #> [1] \"INT64\" #> #> $repetition_type #> [1] \"REQUIRED\" #> #> $page_header #> [1] 15 00 15 80 7d 15 fa 68 2c 15 d0 0f 15 00 15 00 15 00 00 00 #> #> $data #> [1] 40 be 0c f1 d8 2a 05 00 c0 86 e0 9a e0 2a 05 00 c0 28 33 45 d3 2a 05 #> [24] 00 40 2b 96 ce d2 2a 05 00 c0 9c 33 91 d6 2a 05 00 80 a2 54 7b d8 2a #> [47] 05 00 00 59 b2 77 d9 2a 05 00 80 ee 7d fc d7 2a 05 00 40 cf 71 8d d5 #> [70] 2a 05 00 c0 bc 7b cd e1 2a 05 00 80 e4 da 72 d2 2a 05 00 80 30 4d 73 #> [93] e1 2a 05 00 40 fe a4 0f #> [ reached getOption(\"max.print\") -- omitted 7900 entries ] #> #> $definition_levels_byte_length #> [1] NA #> #> $repetition_levels_byte_length #> [1] NA #> #> $num_nulls #> [1] NA #> #> $num_rows #> [1] NA #> #> $compressed_data #> [1] c0 3e 30 40 be 0c f1 d8 2a 05 00 c0 86 e0 9a e0 01 08 2c 28 33 45 d3 #> [24] 2a 05 00 40 2b 96 ce d2 01 10 28 9c 33 91 d6 2a 05 00 80 a2 54 7b 01 #> [47] 28 10 00 59 b2 77 d9 01 10 0c ee 7d fc d7 01 28 0c cf 71 8d d5 01 28 #> [70] 0c bc 7b cd e1 01 18 08 e4 da 72 01 38 0c 80 30 4d 73 01 10 30 40 fe #> [93] a4 0f e2 2a 05 00 00 eb #> [ reached getOption(\"max.print\") -- omitted 6617 entries ] #>"},{"path":"https://nanoparquet.r-lib.org/dev/reference/read_parquet_pages.html","id":null,"dir":"Reference","previous_headings":"","what":"Metadata of all pages of a Parquet file — read_parquet_pages","title":"Metadata of all pages of a Parquet file — read_parquet_pages","text":"Metadata pages Parquet file","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/read_parquet_pages.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Metadata of all pages of a Parquet file — read_parquet_pages","text":"","code":"read_parquet_pages(file)"},{"path":"https://nanoparquet.r-lib.org/dev/reference/read_parquet_pages.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Metadata of all pages of a Parquet file — read_parquet_pages","text":"file Path Parquet file.","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/read_parquet_pages.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Metadata of all pages of a Parquet file — read_parquet_pages","text":"Data frame columns: file_name: file name. row_group: id row group page belongs , integer 0 number row groups minus one. column: id column. integer number leaf columns minus one. Note leaf columns considered, non-leaf columns pages. page_type: DATA_PAGE, INDEX_PAGE, DICTIONARY_PAGE DATA_PAGE_V2. page_header_offset: offset data page (header) file. uncompressed_page_size: include page header, per Parquet spec. compressed_page_size: without page header. crc: integer, checksum, present file, can NA. num_values: number data values page, include NULL (NA R) values. encoding: encoding page, current possible encodings: \"PLAIN\", \"GROUP_VAR_INT\", \"PLAIN_DICTIONARY\", \"RLE\", \"BIT_PACKED\", \"DELTA_BINARY_PACKED\", \"DELTA_LENGTH_BYTE_ARRAY\", \"DELTA_BYTE_ARRAY\", \"RLE_DICTIONARY\", \"BYTE_STREAM_SPLIT\". definition_level_encoding: encoding definition levels, see encoding possible values. can missing V2 data pages, always RLE encoded. repetition_level_encoding: encoding repetition levels, see encoding possible values. can missing V2 data pages, always RLE encoded. data_offset: offset actual data file. page_header_length: size page header, bytes.","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/read_parquet_pages.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Metadata of all pages of a Parquet file — read_parquet_pages","text":"Reading page headers might slow large files, especially file many small pages.","code":""},{"path":[]},{"path":"https://nanoparquet.r-lib.org/dev/reference/read_parquet_pages.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Metadata of all pages of a Parquet file — read_parquet_pages","text":"","code":"file_name <- system.file(\"extdata/userdata1.parquet\", package = \"nanoparquet\") nanoparquet:::read_parquet_pages(file_name) #> # A data frame: 19 × 14 #> file_name row_group column page_type page_header_offset #> #> 1 /home/runner/work/_temp/… 0 0 DATA_PAGE 4 #> 2 /home/runner/work/_temp/… 0 1 DATA_PAGE 6741 #> 3 /home/runner/work/_temp/… 0 2 DICTIONA… 10766 #> 4 /home/runner/work/_temp/… 0 2 DATA_PAGE 12259 #> 5 /home/runner/work/_temp/… 0 3 DICTIONA… 13334 #> 6 /home/runner/work/_temp/… 0 3 DATA_PAGE 15211 #> 7 /home/runner/work/_temp/… 0 4 DATA_PAGE 16239 #> 8 /home/runner/work/_temp/… 0 5 DICTIONA… 31726 #> 9 /home/runner/work/_temp/… 0 5 DATA_PAGE 31759 #> 10 /home/runner/work/_temp/… 0 6 DATA_PAGE 32031 #> 11 /home/runner/work/_temp/… 0 7 DATA_PAGE 42952 #> 12 /home/runner/work/_temp/… 0 8 DICTIONA… 53749 #> 13 /home/runner/work/_temp/… 0 8 DATA_PAGE 55009 #> 14 /home/runner/work/_temp/… 0 9 DATA_PAGE 55925 #> 15 /home/runner/work/_temp/… 0 10 DATA_PAGE 59312 #> 16 /home/runner/work/_temp/… 0 11 DICTIONA… 65063 #> 17 /home/runner/work/_temp/… 0 11 DATA_PAGE 67026 #> 18 /home/runner/work/_temp/… 0 12 DICTIONA… 68019 #> 19 /home/runner/work/_temp/… 0 12 DATA_PAGE 71089 #> # ℹ 9 more variables: uncompressed_page_size , #> # compressed_page_size , crc , num_values , #> # encoding , definition_level_encoding , #> # repetition_level_encoding , data_offset , #> # page_header_length "},{"path":"https://nanoparquet.r-lib.org/dev/reference/read_parquet_schema.html","id":null,"dir":"Reference","previous_headings":"","what":"Read the schema of a Parquet file — read_parquet_schema","title":"Read the schema of a Parquet file — read_parquet_schema","text":"function work files, even read_parquet() unable read , unsupported schema, encoding, compression reason.","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/read_parquet_schema.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Read the schema of a Parquet file — read_parquet_schema","text":"","code":"read_parquet_schema(file, options = parquet_options())"},{"path":"https://nanoparquet.r-lib.org/dev/reference/read_parquet_schema.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Read the schema of a Parquet file — read_parquet_schema","text":"file Path Parquet file. options Return value parquet_options(), options potentially modify Parquet R type mappings.","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/read_parquet_schema.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Read the schema of a Parquet file — read_parquet_schema","text":"","code":"Data frame, the schema of the file. It has one row for each node (inner node or leaf node). For flat files this means one root node (inner node), always the first one, and then one row for each \"real\" column. For nested schemas, the rows are in depth-first search order. Most important columns are: - `file_name`: file name. - `name`: column name. - `r_type`: the R type that corresponds to the Parquet type. Might be `NA` if [read_parquet()] cannot read this column. See [nanoparquet-types] for the type mapping rules. - `type`: data type. One of the low level data types. - `type_length`: length for fixed length byte arrays. - `repettion_type`: character, one of `REQUIRED`, `OPTIONAL` or `REPEATED`. - `logical_type`: a list column, the logical types of the columns. An element has at least an entry called `type`, and potentially additional entries, e.g. `bit_width`, `is_signed`, etc. - `num_children`: number of child nodes. Should be a non-negative integer for the root node, and `NA` for a leaf node."},{"path":[]},{"path":"https://nanoparquet.r-lib.org/dev/reference/rle_decode_int.html","id":null,"dir":"Reference","previous_headings":"","what":"RLE decode integers — rle_decode_int","title":"RLE decode integers — rle_decode_int","text":"RLE decode integers","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/rle_decode_int.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"RLE decode integers — rle_decode_int","text":"","code":"rle_decode_int( x, bit_width = attr(x, \"bit_width\"), length = attr(x, \"length\") %||% NA )"},{"path":"https://nanoparquet.r-lib.org/dev/reference/rle_decode_int.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"RLE decode integers — rle_decode_int","text":"x Raw vector encoded integers. bit_width Bit width used encoding. length Length output. NA assume x starts length output, encoded 4 byte integer.","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/rle_decode_int.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"RLE decode integers — rle_decode_int","text":"decoded integer vector.","code":""},{"path":[]},{"path":"https://nanoparquet.r-lib.org/dev/reference/rle_encode_int.html","id":null,"dir":"Reference","previous_headings":"","what":"RLE encode integers — rle_encode_int","title":"RLE encode integers — rle_encode_int","text":"RLE encode integers","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/rle_encode_int.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"RLE encode integers — rle_encode_int","text":"","code":"rle_encode_int(x)"},{"path":"https://nanoparquet.r-lib.org/dev/reference/rle_encode_int.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"RLE encode integers — rle_encode_int","text":"x Integer vector.","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/rle_encode_int.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"RLE encode integers — rle_encode_int","text":"Raw vector, encoded integers. two attributes: bit_length: number bits needed encode input, length: length original integer input.","code":""},{"path":[]},{"path":"https://nanoparquet.r-lib.org/dev/reference/write_parquet.html","id":null,"dir":"Reference","previous_headings":"","what":"Write a data frame to a Parquet file — write_parquet","title":"Write a data frame to a Parquet file — write_parquet","text":"Writes contents R data frame Parquet file.","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/write_parquet.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Write a data frame to a Parquet file — write_parquet","text":"","code":"write_parquet( x, file, schema = NULL, compression = c(\"snappy\", \"gzip\", \"zstd\", \"uncompressed\"), encoding = NULL, metadata = NULL, row_groups = NULL, options = parquet_options() )"},{"path":"https://nanoparquet.r-lib.org/dev/reference/write_parquet.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Write a data frame to a Parquet file — write_parquet","text":"x Data frame write. file Path output file. string \":raw:\", data frame written memory buffer, memory buffer returned raw vector. schema Parquet schema. Specify schema tweak default nanoparquet R -> Parquet type mappings. Use parquet_schema() create schema can use , read_parquet_schema() use schema Parquet file. compression Compression algorithm use. Currently \"snappy\" (default), \"gzip\", \"zstd\", \"uncompressed\" supported. encoding Encoding use. Possible values: NULL, appropriate encoding selected automatically: RLE PLAIN BOOLEAN columns, RLE_DICTIONARY columns many repeated values, PLAIN otherwise. single (unnamed) character string, 'll used columns. unnamed character vector encoding names length number columns data frame, encodings used column. named character vector, named must unique name must match column name, specify encoding column. special empty name (\"\") applies rest columns. empty name, rest columns use default encoding. NA_character_ specified column, default encoding used column. specified encoding invalid certain column type, nanoparquet implement , write_parquet() throws error. version nanoparquet supports following encodings: PLAIN, GROUP_VAR_INT, PLAIN_DICTIONARY, RLE, BIT_PACKED, DELTA_BINARY_PACKED, DELTA_LENGTH_BYTE_ARRAY, DELTA_BYTE_ARRAY, RLE_DICTIONARY, BYTE_STREAM_SPLIT. See parquet-encodings encodings. metadata Additional key-value metadata add file. must named character vector, data frame columns character columns called key value. row_groups Row groups Parquet file. NULL, num_rows_per_row_group option used options argument, see parquet_options(). Otherwise must integer vector, specifying starts row groups. options Nanoparquet options, see parquet_options().","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/write_parquet.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Write a data frame to a Parquet file — write_parquet","text":"NULL, unless file \":raw:\", case Parquet file returned raw vector.","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/write_parquet.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Write a data frame to a Parquet file — write_parquet","text":"write_parquet() converts string columns UTF-8 encoding calling base::enc2utf8(). factor levels.","code":""},{"path":[]},{"path":"https://nanoparquet.r-lib.org/dev/reference/write_parquet.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Write a data frame to a Parquet file — write_parquet","text":"","code":"if (FALSE) { # add row names as a column, because `write_parquet()` ignores them. mtcars2 <- cbind(name = rownames(mtcars), mtcars) write_parquet(mtcars2, \"mtcars.parquet\") }"},{"path":"https://nanoparquet.r-lib.org/dev/news/index.html","id":"nanoparquet-development-version","dir":"Changelog","previous_headings":"","what":"nanoparquet (development version)","title":"nanoparquet (development version)","text":"API changes: parquet_schema() now called read_parquet_schema(). new parquet_schema() function falls back read_parquet_schema() called single string argument, warning. parquet_info() now called read_parquet_info(). parquet_info( still works now, warning. parquet_metadata() now called read_parquet_metadata(). parquet_metadata() still works, warning. parquet_column_types() now deprecated, issues warning. Use read_parquet_schema() new infer_parquet_schema() function instead. improvements: new parquet_schema() function creates Parquet schema scratch. can use schema new schema argument write_parquet(), specify columns data frame mapped Parquet types. New append_parquet() function append data frame existing Parquet file. New col_select argument read_parquet() read subset columns Parquet file. write_parquet() can now write multiple row groups. default puts 10 million rows single row group. can choose row groups manually row_groups argument. write_parquet() now writes minimum maximum values per row group types. See ?parquet_options() turning . also writes number non-missing values. Newly supported type conversions write_parquet() via schema argument: integer INT64, integer INT96, double INT96, double FLOAT, character BYTE_ARRAY, character FIXED_LEN_BYTE_ARRAY, character ENUM, factor ENUM. integer DECIMAL, INT32, integer DECIMAL, INT64, double DECIMAL, INT32, double DECIMAL, INT64, integer INT(8, *), INT(16, *), INT(32, signed), double INT(*, *), character UUID, double FLOAT16, list raw vectors BYTE_ARRAY, list raw vectors FIXED_LEN_BYTE_ARRAY. write_parquet() can now write version 2 data pages. default still version 1, might change future. write_parquet(file = \":raw:\") now works correctly larger data frames (#77). New compression_level option select compression level manually. See ?parquet_options details. (#91). read_parquet() can now read R connection (#71). read_parquet() now reads DECIMAL values correctly INT32 INT64 columns scale zero. read_parquet() now reads JSON columns character vectors, documented. read_parquet() now reads FLOAT16 logical type real (double) vector. class argument parquet_options() nanoparquet.class option now work (#104).","code":""},{"path":"https://nanoparquet.r-lib.org/dev/news/index.html","id":"nanoparquet-031","dir":"Changelog","previous_headings":"","what":"nanoparquet 0.3.1","title":"nanoparquet 0.3.1","text":"CRAN release: 2024-07-01 version fixes write_parquet() crash (#73).","code":""},{"path":"https://nanoparquet.r-lib.org/dev/news/index.html","id":"nanoparquet-030","dir":"Changelog","previous_headings":"","what":"nanoparquet 0.3.0","title":"nanoparquet 0.3.0","text":"CRAN release: 2024-06-17 read_parquet() type mapping changes: STRING logical type UTF8 converted type still read character vector, BYTE_ARRAY types without converted logical types , read list raw vectors. Missing values indicated NULL values. DECIMAL converted type read REALSXP now, even type FIXED_LEN_BYTE_ARRAY. (just BYTE_ARRAY). UUID logical type now read character vector, formatted 00112233-4455-6677-8899-aabbccddeeff. BYTE_ARRAY FIXED_LEN_BYTE_ARRAY types without logical converted types; unsupported ones: FLOAT16, INTERVAL; now read list raw vectors. Missing values denoted NULL. write_parquet() now automatically uses dictionary encoding columns many repeated values. first 10k rows used decide dictionary used . Similarly, logical columns written RLE encoding contain runs repeated values. NA values ignored selecting encoding (#18). write_parquet() can now write data frame memory buffer, returned raw vector, special \":raw:\" filename used (#31). read_parquet() can now read Parquet files V2 data pages (#37). read_parquet() write_parquet() now support GZIP ZSTD compressed Parquet files. read_parquet() now supports RLE encoding BOOLEAN columns also supports DELTA_BINARY_PACKED, DELTA_LENGTH_BYTE_ARRAY, DELTA_BYTE_ARRAY BYTE_STREAM_SPLIT encodings. parquet_columns() function now called parquet_column_types() can now map column types data frame Parquet types. parquet_info(), parquet_metadata() parquet_column_types() now work created_by metadata field unset. New parquet_options() function can use set nanoparquet options single read_parquet() write_parquet() call.","code":""},{"path":"https://nanoparquet.r-lib.org/dev/news/index.html","id":"nanoparquet-020","dir":"Changelog","previous_headings":"","what":"nanoparquet 0.2.0","title":"nanoparquet 0.2.0","text":"CRAN release: 2024-05-30 First release CRAN. contains Parquet reader https://github.com/hannes/miniparquet, Parquet writer, functions read Parquet metadata, many improvements.","code":""}] +[{"path":"https://nanoparquet.r-lib.org/dev/articles/benchmarks.html","id":"goals","dir":"Articles","previous_headings":"","what":"Goals","title":"Benchmarks","text":"First, want measure nanoparquet’s speed relative good quality CSV readers writers, also look sizes Parquet CSV files. Second, want see nanoparquet fares relative Parquet implementations available R.","code":"library(dplyr) library(gt) library(gtExtras)"},{"path":"https://nanoparquet.r-lib.org/dev/articles/benchmarks.html","id":"data-sets","dir":"Articles","previous_headings":"","what":"Data sets","title":"Benchmarks","text":"used use three data sets: small, medium large. small data set nycflights13::flights data set, . medium data set contains 20 copies small data set. large data set contains 200 copies small data set. See gen_data() function benchmark-funcs.R file nanoparquet GitHub repository. basic information data set: quick look data:","code":"if (file.exists(file.path(me, \"data-info.parquet\"))) { info_tab <- nanoparquet::read_parquet(file.path(me, \"data-info.parquet\")) } else { get_data_info <- function(x) { list(dim = dim(x), size = object.size(x)) } info <- lapply(data_sizes, function(s) get_data_info(gen_data(s))) info_tab <- data.frame( check.names = FALSE, name = data_sizes, rows = sapply(info, \"[[\", \"dim\")[1,], columns = sapply(info, \"[[\", \"dim\")[2,], \"size in memory\" = sapply(info, \"[[\", \"size\") ) nanoparquet::write_parquet(info_tab, file.path(me, \"data-info.parquet\")) } info_tab |> gt() |> tab_header(title = \"Data sets\") |> tab_options(table.align = \"left\") |> fmt_integer() |> fmt_bytes(columns = \"size in memory\") head(nycflights13::flights) #> # A tibble: 6 × 19 #> year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time #> #> 1 2013 1 1 517 515 2 830 819 #> 2 2013 1 1 533 529 4 850 830 #> 3 2013 1 1 542 540 2 923 850 #> 4 2013 1 1 544 545 -1 1004 1022 #> 5 2013 1 1 554 600 -6 812 837 #> 6 2013 1 1 554 558 -4 740 728 #> # ℹ 11 more variables: arr_delay , carrier , flight , #> # tailnum , origin , dest , air_time , distance , #> # hour , minute , time_hour dplyr::glimpse(nycflights13::flights) #> Rows: 336,776 #> Columns: 19 #> $ year 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2… #> $ month 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1… #> $ day 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1… #> $ dep_time 517, 533, 542, 544, 554, 554, 555, 557, 557, 558, 558, … #> $ sched_dep_time 515, 529, 540, 545, 600, 558, 600, 600, 600, 600, 600, … #> $ dep_delay 2, 4, 2, -1, -6, -4, -5, -3, -3, -2, -2, -2, -2, -2, -1… #> $ arr_time 830, 850, 923, 1004, 812, 740, 913, 709, 838, 753, 849,… #> $ sched_arr_time 819, 830, 850, 1022, 837, 728, 854, 723, 846, 745, 851,… #> $ arr_delay 11, 20, 33, -18, -25, 12, 19, -14, -8, 8, -2, -3, 7, -1… #> $ carrier \"UA\", \"UA\", \"AA\", \"B6\", \"DL\", \"UA\", \"B6\", \"EV\", \"B6\", \"… #> $ flight 1545, 1714, 1141, 725, 461, 1696, 507, 5708, 79, 301, 4… #> $ tailnum \"N14228\", \"N24211\", \"N619AA\", \"N804JB\", \"N668DN\", \"N394… #> $ origin \"EWR\", \"LGA\", \"JFK\", \"JFK\", \"LGA\", \"EWR\", \"EWR\", \"LGA\",… #> $ dest \"IAH\", \"IAH\", \"MIA\", \"BQN\", \"ATL\", \"ORD\", \"FLL\", \"IAD\",… #> $ air_time 227, 227, 160, 183, 116, 150, 158, 53, 140, 138, 149, 1… #> $ distance 1400, 1416, 1089, 1576, 762, 719, 1065, 229, 944, 733, … #> $ hour 5, 5, 5, 5, 6, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 5, 6, 6, 6… #> $ minute 15, 29, 40, 45, 0, 58, 0, 0, 0, 0, 0, 0, 0, 0, 0, 59, 0… #> $ time_hour 2013-01-01 05:00:00, 2013-01-01 05:00:00, 2013-01-01 0…"},{"path":"https://nanoparquet.r-lib.org/dev/articles/benchmarks.html","id":"parquet-implementations","dir":"Articles","previous_headings":"","what":"Parquet implementations","title":"Benchmarks","text":"ran nanoparquet, Arrow DuckDB. also ran data.table without compression readr, read/write CSV files. used running time readr baseline. ran benchmark three times record results third run. make sure data software swapped OS. (Except readr large data set, take long.) include complete raw results end article.","code":"if (file.exists(file.path(me, \"results.parquet\"))) { results <- nanoparquet::read_parquet(file.path(me, \"results.parquet\")) } else { results <- NULL lapply(data_sizes[1:2], function(s) { lapply(variants, function(v) { r <- if (v == \"readr\" && s == \"large\") { measure(v, s) } else { measure(v, s) measure(v, s) measure(v, s) } results <<- rbind(results, r) }) }) nanoparquet::write_parquet(results, file.path(me, \"results.parquet\")) }"},{"path":"https://nanoparquet.r-lib.org/dev/articles/benchmarks.html","id":"parquet-vs-csv","dir":"Articles","previous_headings":"","what":"Parquet vs CSV","title":"Benchmarks","text":"results, focusing CSV readers nanoparquet: Notes: single-threaded nanoparquet Parquet-reader competitive. can read compressed Parquet file just fast state art uncompressed CSV reader uses least 2 threads. nanoparquet vs CSV results writing Parquet CSV files: Notes: data.table CSV writer 3 times fast nanoparquet Parquet writer, CSV file uncompressed. CSV writer uses least 4 threads, Parquet write single-threaded. nanoparquet Parquet writer 2-5 times faster data.table CSV writer CSV file compressed. Parquet files 5-6 times smaller uncompressed CSV files 30-35% smaller compressed CSV files.","code":"csv_tab_read <- results |> filter(software %in% c(\"nanoparquet\", \"data.table\", \"data.table.gz\", \"readr\")) |> filter(direction == \"read\") |> mutate(software = case_when( software == \"data.table.gz\" ~ \"data.table (compressed)\", .default = software )) |> rename(`data size` = data_size, time = time_elapsed) |> mutate(memory = mem_max_after - mem_before) |> mutate(base = tail(time, 1), .by = `data size`) |> mutate(speedup = base / time, x = round(speedup, digits = 1)) |> select(`data size`, software, time, x, speedup, memory) |> mutate(rawtime = time, time = prettyunits::pretty_sec(time)) |> rename(`speedup from CSV` = speedup) csv_tab_read |> gt() |> tab_header(title = \"Parquet vs CSV, reading\") |> tab_options(table.align = \"left\") |> tab_row_group(md(\"**small data**\"), rows = `data size` == \"small\", \"s\") |> tab_row_group(md(\"**medium data**\"), rows = `data size` == \"medium\", \"m\") |> tab_row_group(md(\"**large data**\"), rows = `data size` == \"large\", \"l\") |> row_group_order(c(\"s\", \"m\", \"l\")) |> cols_hide(columns = c(`data size`, rawtime)) |> cols_align(columns = time, align = \"right\") |> fmt_bytes(columns = memory) |> gt_plt_bar(column = `speedup from CSV`) csv_tab_write <- results |> filter(software %in% c(\"nanoparquet\", \"data.table\", \"data.table.gz\", \"readr\")) |> filter(direction == \"write\") |> mutate(software = case_when( software == \"data.table.gz\" ~ \"data.table (compressed)\", .default = software )) |> rename(`data size` = data_size, time = time_elapsed, `file size` = file_size) |> mutate(memory = mem_max_after - mem_before) |> mutate(base = tail(time, 1), .by = `data size`) |> mutate(speedup = base / time, x = round(speedup, digits = 1)) |> select(`data size`, software, time, x, speedup, memory, `file size`) |> mutate(rawtime = time, time = prettyunits::pretty_sec(time)) |> rename(`speedup from CSV` = speedup) csv_tab_write |> gt() |> tab_header(title = \"Parquet vs CSV, writing\") |> tab_options(table.align = \"left\") |> tab_row_group(md(\"**small data**\"), rows = `data size` == \"small\", \"s\") |> tab_row_group(md(\"**medium data**\"), rows = `data size` == \"medium\", \"m\") |> tab_row_group(md(\"**large data**\"), rows = `data size` == \"large\", \"l\") |> row_group_order(c(\"s\", \"m\", \"l\")) |> cols_hide(columns = c(`data size`, rawtime)) |> cols_align(columns = time, align = \"right\") |> fmt_bytes(columns = c(memory, `file size`)) |> gt_plt_bar(column = `speedup from CSV`)"},{"path":"https://nanoparquet.r-lib.org/dev/articles/benchmarks.html","id":"parquet-implementations-1","dir":"Articles","previous_headings":"","what":"Parquet implementations","title":"Benchmarks","text":"summary Parquet readers, three files. Notes: general, three implementations perform similarly. nanoparquet competitive three data sets terms speed also tends use least amount memory. turned ALTREP arrow, reads data memory. summary Parquet writers: Notes: nanoparquet competitive terms speed, slightly faster two implementations, data sets. DuckDB seems waste space writing Parquet files. possibly fine tuned forcing different encoding. behavior improve forthcoming DuckDB 1.2.0 release, see also https://github.com/duckdb/duckdb/issues/3316.","code":"pq_tab_read <- results |> filter(software %in% c(\"nanoparquet\", \"arrow\", \"duckdb\", \"readr\")) |> filter(direction == \"read\") |> rename(`data size` = data_size, time = time_elapsed) |> mutate(memory = mem_max_after - mem_before) |> mutate(base = tail(time, 1), .by = `data size`) |> mutate(speedup = base / time, x = round(speedup, digits = 1)) |> select(`data size`, software, time, x, speedup, memory) |> mutate(rawtime = time, time = prettyunits::pretty_sec(time)) |> filter(software %in% c(\"nanoparquet\", \"arrow\", \"duckdb\")) |> mutate(software = case_when( software == \"arrow\" ~ \"Arrow\", software == \"duckdb\" ~ \"DuckDB\", .default = software )) |> rename(`speedup from CSV` = speedup) pq_tab_read |> gt() |> tab_header(title = \"Parquet implementations, reading\") |> tab_options(table.align = \"left\") |> tab_row_group(md(\"**small data**\"), rows = `data size` == \"small\", \"s\") |> tab_row_group(md(\"**medium data**\"), rows = `data size` == \"medium\", \"m\") |> tab_row_group(md(\"**large data**\"), rows = `data size` == \"large\", \"l\") |> row_group_order(c(\"s\", \"m\", \"l\")) |> cols_hide(columns = c(`data size`, rawtime)) |> cols_align(columns = time, align = \"right\") |> fmt_bytes(columns = memory) |> gt_plt_bar(column = `speedup from CSV`) pq_tab_write <- results |> filter(software %in% c(\"nanoparquet\", \"arrow\", \"duckdb\", \"readr\")) |> filter(direction == \"write\") |> rename(`data size` = data_size, time = time_elapsed, `file size` = file_size) |> mutate(memory = mem_max_after - mem_before) |> mutate(base = tail(time, 1), .by = `data size`) |> mutate(speedup = base / time, x = round(speedup, digits = 1)) |> select(`data size`, software, time, x, speedup, memory, `file size`) |> mutate(rawtime = time, time = prettyunits::pretty_sec(time)) |> filter(software %in% c(\"nanoparquet\", \"arrow\", \"duckdb\", \"readr\")) |> mutate(software = case_when( software == \"arrow\" ~ \"Arrow\", software == \"duckdb\" ~ \"DuckDB\", .default = software )) |> rename(`speedup from CSV` = speedup) pq_tab_write |> gt() |> tab_header(title = \"Parquet implementations, writing\") |> tab_options(table.align = \"left\") |> tab_row_group(md(\"**small data**\"), rows = `data size` == \"small\", \"s\") |> tab_row_group(md(\"**medium data**\"), rows = `data size` == \"medium\", \"m\") |> tab_row_group(md(\"**large data**\"), rows = `data size` == \"large\", \"l\") |> row_group_order(c(\"s\", \"m\", \"l\")) |> cols_hide(columns = c(`data size`, rawtime)) |> cols_align(columns = time, align = \"right\") |> fmt_bytes(columns = c(memory, `file size`)) |> gt_plt_bar(column = `speedup from CSV`)"},{"path":"https://nanoparquet.r-lib.org/dev/articles/benchmarks.html","id":"conclusions","dir":"Articles","previous_headings":"","what":"Conclusions","title":"Benchmarks","text":"results probably change different data sets, different system. particular, Arrow DuckDB probably faster larger systems, data stored multiple physical disks. Arrow DuckDB let run queries data without loading memory first. especially important data fit memory , even columns needed analysis. nanoparquet . However, general, based benchmarks good reasons trust nanoparquet Parquet reader writer competitive implementations available R, terms speed memory use. limitations nanoparquet prohibitive use case, good choice Parquet /O.","code":""},{"path":"https://nanoparquet.r-lib.org/dev/articles/benchmarks.html","id":"raw-benchmark-results","dir":"Articles","previous_headings":"","what":"Raw benchmark results","title":"Benchmarks","text":"raw results. can scroll right screen wide enough whole table. Notes: User time (time_user) plus system time (time_system) can larger elapsed time (time_elapsed) multithreaded implementations indeed tools, except nanoparquet, single-threaded. memory columns bytes. mem_before RSS size reading/writing. mem_max_before maximum RSS size process . mem_max_after maximum RSS size process read/write operation. can calculate (estimate) memory usage tool subtracting mem_before mem_max_after. overestimate memory usage mem_max_after mem_max_before, never happens practice. reading file, mem_max_after includes memory needed store data set . (See data sizes .) arrow, turned ALTREP using options(arrow.use_altrep = FALSE), see benchmarks-funcs.R file. Otherwise arrow actually read data memory.","code":"print(results, n = Inf) #> # A data frame: 36 × 10 #> software direction data_size time_user time_system time_elapsed mem_before mem_max_before mem_max_after file_size #> #> 1 nanoparquet read small 0.0220 0.007 0.0290 156909568 156909568 236765184 NA #> 2 nanoparquet write small 0.103 0.015 0.117 305168384 305168384 386613248 5687737 #> 3 arrow read small 0.06 0.023 0.0400 160317440 160317440 267173888 NA #> 4 arrow write small 0.149 0.00700 0.151 306151424 306151424 340574208 5693381 #> 5 duckdb read small 0.081 0.018 0.0660 166313984 166313984 286916608 NA #> 6 duckdb write small 0.283 0.025 0.207 309510144 309510144 465649664 10684818 #> 7 data.table read small 0.137 0.016 0.0590 164282368 164282368 231325696 NA #> 8 data.table write small 0.158 0.0100 0.0620 313851904 313851904 314769408 30960660 #> 9 data.table.gz read small 0.215 0.026 0.15 164986880 164986880 278757376 NA #> 10 data.table.gz write small 1.45 0.014 0.386 310034432 310034432 311033856 8263176 #> 11 readr read small 1.08 0.27 0.415 162152448 162152448 350666752 NA #> 12 readr write small 1.78 1.85 0.781 314736640 314736640 359104512 31053850 #> 13 nanoparquet read medium 0.84 0.139 0.979 158351360 158351360 1640300544 NA #> 14 nanoparquet write medium 1.97 0.227 2.20 1079656448 1079656448 1762787328 111363363 #> 15 arrow read medium 1.40 0.265 0.982 168099840 168099840 2229256192 NA #> 16 arrow write medium 2.65 0.065 2.60 1090486272 1090486272 1380417536 112167843 #> 17 duckdb read medium 1.82 0.331 1.12 160743424 160743424 2224111616 NA #> 18 duckdb write medium 7.07 0.353 2.38 1099300864 1099300864 3086221312 213168966 #> 19 data.table read medium 2.36 0.135 0.891 159596544 159596544 1453031424 NA #> 20 data.table write medium 2.60 0.098 0.744 1086357504 1086357504 1088962560 619210198 #> 21 data.table.gz read medium 3.26 0.305 1.98 155844608 155844608 1516044288 NA #> 22 data.table.gz write medium 27.6 0.084 7.01 1092681728 1092681728 1095352320 165249944 #> 23 readr read medium 19.1 5.35 5.10 158367744 158367744 3874635776 NA #> 24 readr write medium 34.4 39.4 14.0 1090158592 1090158592 1932197888 621073998 #> 25 nanoparquet read large 7.25 2.44 10.8 73023488 73023488 8098021376 NA #> 26 nanoparquet write large 19.2 4.46 24.8 8158134272 8450293760 8450293760 1113819142 #> 27 arrow read large 12.0 7.32 10.8 72941568 72941568 9892495360 NA #> 28 arrow write large 27.9 2.31 29.9 8304607232 8573747200 8835842048 1121513329 #> 29 duckdb read large 16.2 5.18 14.6 75251712 75251712 8127512576 NA #> 30 duckdb write large 54.7 14.2 33.7 8305164288 8574451712 9348841472 2131769619 #> 31 data.table read large 21.6 3.87 12.8 78872576 78872576 8691007488 NA #> 32 data.table write large 26.3 1.69 8.09 8304033792 8573157376 8573157376 6192100558 #> 33 data.table.gz read large 30.6 7.16 26.7 72876032 72876032 8018870272 NA #> 34 data.table.gz write large 279. 1.93 71.6 8303362048 8572665856 8572665856 1652494401 #> 35 readr read large 144. 177. 231. 73564160 73564160 8500789248 NA #> 36 readr write large 333. 345. 143. 8304148480 8573452288 9224192000 6210738558"},{"path":"https://nanoparquet.r-lib.org/dev/articles/benchmarks.html","id":"about","dir":"Articles","previous_headings":"","what":"About","title":"Benchmarks","text":"See benchmark-funcs.R file nanoparquet repository code benchmarks. ran measurement subprocess, make easier measure memory usage. include package loading time benchmarks. nanoparquet dependencies loads quickly. arrow duckdb packages might take 200ms load test system, need load dependencies also bigger.","code":"sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.4.2 (2024-10-31) #> os Ubuntu 24.04.1 LTS #> system x86_64, linux-gnu #> ui X11 #> language en-US #> collate C.UTF-8 #> ctype C.UTF-8 #> tz UTC #> date 2025-01-29 #> pandoc 3.1.11 @ /opt/hostedtoolcache/pandoc/3.1.11/x64/ (via rmarkdown) #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date (UTC) lib source #> arrow 18.1.0.1 2025-01-08 [1] RSPM #> assertthat 0.2.1 2019-03-21 [1] RSPM #> base64enc 0.1-3 2015-07-28 [1] RSPM #> bit 4.5.0.1 2024-12-03 [1] RSPM #> bit64 4.6.0-1 2025-01-16 [1] RSPM #> cli 3.6.3 2024-06-21 [1] RSPM #> colorspace 2.1-1 2024-07-26 [1] RSPM #> commonmark 1.9.2 2024-10-04 [1] RSPM #> DBI 1.2.3 2024-06-02 [1] RSPM #> digest 0.6.37 2024-08-19 [1] RSPM #> dplyr * 1.1.4 2023-11-17 [1] RSPM #> duckdb 1.1.3-2 2025-01-24 [1] RSPM #> evaluate 1.0.3 2025-01-10 [1] RSPM #> farver 2.1.2 2024-05-13 [1] RSPM #> fastmap 1.2.0 2024-05-15 [1] RSPM #> fontawesome 0.5.3 2024-11-16 [1] RSPM #> generics 0.1.3 2022-07-05 [1] RSPM #> ggplot2 3.5.1 2024-04-23 [1] RSPM #> glue 1.8.0 2024-09-30 [1] RSPM #> gt * 0.11.1 2024-10-04 [1] RSPM #> gtable 0.3.6 2024-10-25 [1] RSPM #> gtExtras * 0.5.0 2023-09-15 [1] RSPM #> htmltools 0.5.8.1 2024-04-04 [1] RSPM #> jsonlite 1.8.9 2024-09-20 [1] RSPM #> knitr 1.49 2024-11-08 [1] RSPM #> labeling 0.4.3 2023-08-29 [1] RSPM #> lifecycle 1.0.4 2023-11-07 [1] RSPM #> magrittr 2.0.3 2022-03-30 [1] RSPM #> markdown 1.13 2024-06-04 [1] RSPM #> munsell 0.5.1 2024-04-01 [1] RSPM #> nanoparquet 0.3.1.9000 2025-01-29 [1] local #> nycflights13 1.0.2 2021-04-12 [1] RSPM #> paletteer 1.6.0 2024-01-21 [1] RSPM #> pillar 1.10.1 2025-01-07 [1] RSPM #> pkgconfig 2.0.3 2019-09-22 [1] RSPM #> prettyunits 1.2.0 2023-09-24 [1] RSPM #> purrr 1.0.2 2023-08-10 [1] RSPM #> R6 2.5.1 2021-08-19 [1] RSPM #> ragg 1.3.3 2024-09-11 [1] RSPM #> rematch2 2.1.2 2020-05-01 [1] RSPM #> rlang 1.1.5 2025-01-17 [1] RSPM #> rmarkdown 2.29 2024-11-04 [1] RSPM #> sass 0.4.9 2024-03-15 [1] RSPM #> scales 1.3.0 2023-11-28 [1] RSPM #> sessioninfo 1.2.2 2021-12-06 [1] any (@1.2.2) #> svglite 2.1.3 2023-12-08 [1] RSPM #> systemfonts 1.2.1 2025-01-20 [1] RSPM #> textshaping 1.0.0 2025-01-20 [1] RSPM #> tibble 3.2.1 2023-03-20 [1] RSPM #> tidyselect 1.2.1 2024-03-11 [1] RSPM #> utf8 1.2.4 2023-10-22 [1] RSPM #> vctrs 0.6.5 2023-12-01 [1] RSPM #> withr 3.0.2 2024-10-28 [1] RSPM #> xfun 0.50 2025-01-07 [1] RSPM #> xml2 1.3.6 2023-12-04 [1] RSPM #> yaml 2.3.10 2024-07-26 [1] RSPM #> #> [1] /home/runner/work/_temp/Library #> [2] /opt/R/4.4.2/lib/R/site-library #> [3] /opt/R/4.4.2/lib/R/library #> #> ──────────────────────────────────────────────────────────────────────────────"},{"path":"https://nanoparquet.r-lib.org/dev/authors.html","id":null,"dir":"","previous_headings":"","what":"Authors","title":"Authors and Citation","text":"Gábor Csárdi. Author, maintainer. Hannes Mühleisen. Author, copyright holder. Google Inc.. Copyright holder. Apache Software Foundation. Copyright holder. . Copyright holder. RAD Game Tools. Copyright holder. Valve Software. Copyright holder. Tenacious Software LLC. Copyright holder. Facebook, Inc.. Copyright holder.","code":""},{"path":"https://nanoparquet.r-lib.org/dev/authors.html","id":"citation","dir":"","previous_headings":"","what":"Citation","title":"Authors and Citation","text":"Csárdi G, Mühleisen H (2025). nanoparquet: Read Write 'Parquet' Files. R package version 0.3.1.9000, https://r-lib.github.io/nanoparquet/, https://github.com/r-lib/nanoparquet.","code":"@Manual{, title = {nanoparquet: Read and Write 'Parquet' Files}, author = {Gábor Csárdi and Hannes Mühleisen}, year = {2025}, note = {R package version 0.3.1.9000, https://r-lib.github.io/nanoparquet/}, url = {https://github.com/r-lib/nanoparquet}, }"},{"path":"https://nanoparquet.r-lib.org/dev/index.html","id":"nanoparquet","dir":"","previous_headings":"","what":"Read and Write Parquet Files","title":"Read and Write Parquet Files","text":"nanoparquet reader writer common subset Parquet files.","code":""},{"path":"https://nanoparquet.r-lib.org/dev/index.html","id":"features","dir":"","previous_headings":"","what":"Features:","title":"Read and Write Parquet Files","text":"Read write flat (.e. non-nested) Parquet files. Can read Parquet data types. Can read subset columns Parquet file. Can write many R data types, including factors temporal types Parquet. Can append data frame Parquet file without first reading rewriting whole file. Completely dependency free. Supports Snappy, Gzip Zstd compression. Competitive tools terms speed, memory use file size.","code":""},{"path":"https://nanoparquet.r-lib.org/dev/index.html","id":"limitations","dir":"","previous_headings":"","what":"Limitations:","title":"Read and Write Parquet Files","text":"Nested Parquet types supported. Parquet logical types supported: INTERVAL, UNKNOWN. Snappy, Gzip Zstd compression supported. Encryption supported. Reading files URLs supported. nanoparquet always reads data (selected subset ) memory. work --memory data Parquet files like Apache Arrow DuckDB .","code":""},{"path":"https://nanoparquet.r-lib.org/dev/index.html","id":"installation","dir":"","previous_headings":"","what":"Installation","title":"Read and Write Parquet Files","text":"Install R package CRAN:","code":"install.packages(\"nanoparquet\")"},{"path":[]},{"path":"https://nanoparquet.r-lib.org/dev/index.html","id":"read","dir":"","previous_headings":"Usage","what":"Read","title":"Read and Write Parquet Files","text":"Call read_parquet() read Parquet file: see columns Parquet file types mapped R types read_parquet(), call read_parquet_schema() first: Folders similar-structured Parquet files (e.g. produced Spark) can read like :","code":"df <- nanoparquet::read_parquet(\"example.parquet\") nanoparquet::read_parquet_schema(\"example.parquet\") df <- data.table::rbindlist(lapply( Sys.glob(\"some-folder/part-*.parquet\"), nanoparquet::read_parquet ))"},{"path":"https://nanoparquet.r-lib.org/dev/index.html","id":"write","dir":"","previous_headings":"Usage","what":"Write","title":"Read and Write Parquet Files","text":"Call write_parquet() write data frame Parquet file: see columns data frame mapped Parquet types write_parquet(), call infer_parquet_schema() first:","code":"nanoparquet::write_parquet(mtcars, \"mtcars.parquet\") nanoparquet::infer_parquet_schema(mtcars)"},{"path":"https://nanoparquet.r-lib.org/dev/index.html","id":"inspect","dir":"","previous_headings":"Usage","what":"Inspect","title":"Read and Write Parquet Files","text":"Call read_parquet_info(), read_parquet_schema(), read_parquet_metadata() see various kinds metadata Parquet file: read_parquet_info() shows basic summary file. read_parquet_schema() shows columns, including non-leaf columns, mapped R types read_parquet(). read_parquet_metadata() shows complete metadata information: file meta data, schema, row groups column chunks file. find file supported isn’t, please open issue link file.","code":"nanoparquet::read_parquet_info(\"mtcars.parquet\") nanoparquet::read_parquet_schema(\"mtcars.parquet\") nanoparquet::read_parquet_metadata(\"mtcars.parquet\")"},{"path":"https://nanoparquet.r-lib.org/dev/index.html","id":"options","dir":"","previous_headings":"","what":"Options","title":"Read and Write Parquet Files","text":"See also ?parquet_options(). nanoparquet.class: extra class add data frames returned read_parquet(). defined, default \"tbl\", changes data frame printed pillar package loaded. nanoparquet.use_arrow_metadata: unless set FALSE, read_parquet() make use Arrow metadata Parquet file. Currently used detect factor columns. nanoparquet.write_arrow_metadata: unless set FALSE, write_parquet() add Arrow metadata Parquet file. helps preserving classes columns, e.g. factors read back factors, nanoparquet Arrow.","code":""},{"path":"https://nanoparquet.r-lib.org/dev/index.html","id":"license","dir":"","previous_headings":"","what":"License","title":"Read and Write Parquet Files","text":"MIT","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/append_parquet.html","id":null,"dir":"Reference","previous_headings":"","what":"Append a data frame to an existing Parquet file — append_parquet","title":"Append a data frame to an existing Parquet file — append_parquet","text":"schema data frame must compatible schema file.","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/append_parquet.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Append a data frame to an existing Parquet file — append_parquet","text":"","code":"append_parquet( x, file, compression = c(\"snappy\", \"gzip\", \"zstd\", \"uncompressed\"), encoding = NULL, row_groups = NULL, options = parquet_options() )"},{"path":"https://nanoparquet.r-lib.org/dev/reference/append_parquet.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Append a data frame to an existing Parquet file — append_parquet","text":"x Data frame append. file Path output file. compression Compression algorithm use newly written data. See write_parquet(). encoding Encoding use newly written data. encoding data file. See write_parquet() possible values. row_groups Row groups new, extended Parquet file. append_parquet() can change last existing row group, row_groups specified, respect . .e. existing file n rows, last row group starts k (k <= n), first row group row_groups refers new data must start k n+1. (simpler specify num_rows_per_row_group options, see parquet_options() instead row_groups. use row_groups need complete control.) options Nanoparquet options, new data, see parquet_options(). keep_row_groups option also affects whether append_parquet() overwrites existing row groups file.","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/append_parquet.html","id":"warning","dir":"Reference","previous_headings":"","what":"Warning","title":"Append a data frame to an existing Parquet file — append_parquet","text":"function atomic! interrupted, may leave file corrupt state. work around create copy original file, append new data copy, rename new, extended file original one.","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/append_parquet.html","id":"about-row-groups","dir":"Reference","previous_headings":"","what":"About row groups","title":"Append a data frame to an existing Parquet file — append_parquet","text":"Parquet file may partitioned multiple row groups, indeed large Parquet files . append_parquet() able update existing file along row group boundaries. two possibilities: append_parquet() keeps existing row groups file, creates new row groups new data. mode can forced keep_row_groups option options, see parquet_options(). Alternatively, write_parquet overwrite last row group file, existing contents plus (beginning ) new data. mode makes sense last row group small, many small row groups inefficient. default append_parquet chooses two modes automatically, aiming create row groups least num_rows_per_row_group (see parquet_options()) rows. can customize behavior keep_row_groups options row_groups argument.","code":""},{"path":[]},{"path":"https://nanoparquet.r-lib.org/dev/reference/infer_parquet_schema.html","id":null,"dir":"Reference","previous_headings":"","what":"Infer Parquet schema of a data frame — infer_parquet_schema","title":"Infer Parquet schema of a data frame — infer_parquet_schema","text":"Infer Parquet schema data frame","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/infer_parquet_schema.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Infer Parquet schema of a data frame — infer_parquet_schema","text":"","code":"infer_parquet_schema(df, options = parquet_options())"},{"path":"https://nanoparquet.r-lib.org/dev/reference/infer_parquet_schema.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Infer Parquet schema of a data frame — infer_parquet_schema","text":"df Data frame. options Return value parquet_options(), may modify R Parquet type mappings.","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/infer_parquet_schema.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Infer Parquet schema of a data frame — infer_parquet_schema","text":"Data frame, inferred schema. columns return value read_parquet_schema(): file_name, name, r_type, type, type_length, repetition_type, converted_type, logical_type, num_children, scale, precision, field_id.","code":""},{"path":[]},{"path":"https://nanoparquet.r-lib.org/dev/reference/nanoparquet-package.html","id":null,"dir":"Reference","previous_headings":"","what":"nanoparquet: Read and Write 'Parquet' Files — nanoparquet-package","title":"nanoparquet: Read and Write 'Parquet' Files — nanoparquet-package","text":"Self-sufficient reader writer flat 'Parquet' files. Can read 'Parquet' data types. Can write many 'R' data types, including factors temporal types. See docs limitations.","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/nanoparquet-package.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"nanoparquet: Read and Write 'Parquet' Files — nanoparquet-package","text":"nanoparquet reader writer common subset Parquet files.","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/nanoparquet-package.html","id":"features-","dir":"Reference","previous_headings":"","what":"Features:","title":"nanoparquet: Read and Write 'Parquet' Files — nanoparquet-package","text":"Read write flat (.e. non-nested) Parquet files. Can read Parquet data types. Can read subset columns Parquet file. Can write many R data types, including factors temporal types Parquet. Can append data frame Parquet file without first reading rewriting whole file. Completely dependency free. Supports Snappy, Gzip Zstd compression. Competitive tools terms speed, memory use file size.","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/nanoparquet-package.html","id":"limitations-","dir":"Reference","previous_headings":"","what":"Limitations:","title":"nanoparquet: Read and Write 'Parquet' Files — nanoparquet-package","text":"Nested Parquet types supported. Parquet logical types supported: INTERVAL, UNKNOWN. Snappy, Gzip Zstd compression supported. Encryption supported. Reading files URLs supported. nanoparquet always reads data (selected subset ) memory. work --memory data Parquet files like Apache Arrow DuckDB .","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/nanoparquet-package.html","id":"installation","dir":"Reference","previous_headings":"","what":"Installation","title":"nanoparquet: Read and Write 'Parquet' Files — nanoparquet-package","text":"Install R package CRAN:","code":"install.packages(\"nanoparquet\")"},{"path":[]},{"path":"https://nanoparquet.r-lib.org/dev/reference/nanoparquet-package.html","id":"read","dir":"Reference","previous_headings":"","what":"Read","title":"nanoparquet: Read and Write 'Parquet' Files — nanoparquet-package","text":"Call read_parquet() read Parquet file: see columns Parquet file types mapped R types read_parquet(), call read_parquet_schema() first: Folders similar-structured Parquet files (e.g. produced Spark) can read like :","code":"df <- nanoparquet::read_parquet(\"example.parquet\") nanoparquet::read_parquet_schema(\"example.parquet\") df <- data.table::rbindlist(lapply( Sys.glob(\"some-folder/part-*.parquet\"), nanoparquet::read_parquet ))"},{"path":"https://nanoparquet.r-lib.org/dev/reference/nanoparquet-package.html","id":"write","dir":"Reference","previous_headings":"","what":"Write","title":"nanoparquet: Read and Write 'Parquet' Files — nanoparquet-package","text":"Call write_parquet() write data frame Parquet file: see columns data frame mapped Parquet types write_parquet(), call infer_parquet_schema() first:","code":"nanoparquet::write_parquet(mtcars, \"mtcars.parquet\") nanoparquet::infer_parquet_schema(mtcars)"},{"path":"https://nanoparquet.r-lib.org/dev/reference/nanoparquet-package.html","id":"inspect","dir":"Reference","previous_headings":"","what":"Inspect","title":"nanoparquet: Read and Write 'Parquet' Files — nanoparquet-package","text":"Call read_parquet_info(), read_parquet_schema(), read_parquet_metadata() see various kinds metadata Parquet file: read_parquet_info() shows basic summary file. read_parquet_schema() shows columns, including non-leaf columns, mapped R types read_parquet(). read_parquet_metadata() shows complete metadata information: file meta data, schema, row groups column chunks file. find file supported , please open issue link file.","code":"nanoparquet::read_parquet_info(\"mtcars.parquet\") nanoparquet::read_parquet_schema(\"mtcars.parquet\") nanoparquet::read_parquet_metadata(\"mtcars.parquet\")"},{"path":"https://nanoparquet.r-lib.org/dev/reference/nanoparquet-package.html","id":"options","dir":"Reference","previous_headings":"","what":"Options","title":"nanoparquet: Read and Write 'Parquet' Files — nanoparquet-package","text":"See also ?parquet_options(). nanoparquet.class: extra class add data frames returned read_parquet(). defined, default \"tbl\", changes data frame printed pillar package loaded. nanoparquet.use_arrow_metadata: unless set FALSE, read_parquet() make use Arrow metadata Parquet file. Currently used detect factor columns. nanoparquet.write_arrow_metadata: unless set FALSE, write_parquet() add Arrow metadata Parquet file. helps preserving classes columns, e.g. factors read back factors, nanoparquet Arrow.","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/nanoparquet-package.html","id":"license","dir":"Reference","previous_headings":"","what":"License","title":"nanoparquet: Read and Write 'Parquet' Files — nanoparquet-package","text":"MIT","code":""},{"path":[]},{"path":"https://nanoparquet.r-lib.org/dev/reference/nanoparquet-package.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"nanoparquet: Read and Write 'Parquet' Files — nanoparquet-package","text":"Maintainer: Gábor Csárdi csardi.gabor@gmail.com Authors: Hannes Mühleisen (ORCID) [copyright holder] contributors: Google Inc. [copyright holder] Apache Software Foundation [copyright holder] Posit Software, PBC [copyright holder] RAD Game Tools [copyright holder] Valve Software [copyright holder] Tenacious Software LLC [copyright holder] Facebook, Inc. [copyright holder]","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/nanoparquet-types.html","id":null,"dir":"Reference","previous_headings":"","what":"nanoparquet's type maps — nanoparquet-types","title":"nanoparquet's type maps — nanoparquet-types","text":"nanoparquet maps R types Parquet types.","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/nanoparquet-types.html","id":"r-s-data-types","dir":"Reference","previous_headings":"","what":"R's data types","title":"nanoparquet's type maps — nanoparquet-types","text":"writing data frame, nanoparquet maps R's data types Parquet logical types. following table summary mapping. details see . non-default mappings can selected via schema argument. E.g. write factor column called 'name' ENUM, use detailed mapping rules listed , order preference. rules likely change nanoparquet reaches version 1.0.0. Factors (.e. vectors inherit factor class) converted character vectors using .character(), written STRSXP (character vector) type. fact column factor stored Arrow metadata (see ), unless nanoparquet.write_arrow_metadata option set FALSE. Dates (.e. Date class) written DATE logical type, INT32 type internally. hms objects (hms package) written TIME(true, MILLIS). logical type, internally INT32 Parquet type. Sub-milliseconds precision lost. POSIXct objects written TIMESTAMP(true, MICROS) logical type, internally INT64 Parquet type. Sub-microsecond precision lost. difftime objects (hms objects, see ), written INT64 Parquet type, noting Arrow metadata (see ) column type Duration NANOSECONDS unit. Integer vectors (INTSXP) written INT(32, true) logical type, corresponds INT32 type. Real vectors (REALSXP) written DOUBLE type. Character vectors (STRSXP) written STRING logical type, BYTE_ARRAY type. always converted UTF-8 writing. Logical vectors (LGLSXP) written BOOLEAN type. vectors error currently. can use infer_parquet_schema() data frame map R data types Parquet data types. change default R Parquet mapping, use parquet_schema() schema argument write_parquet(). Currently supported non-default mappings : integer INT64, integer INT96, double INT96, double FLOAT, character BYTE_ARRAY, character FIXED_LEN_BYTE_ARRAY, character ENUM, factor ENUM, integer DECIAML & INT32, integer DECIAML & INT64, double DECIAML & INT32, double DECIAML & INT64, integer INT(8, *), INT(16, *), INT(32, signed), double INT(*, *), character UUID, double FLOAT16, list raw vectors BYTE_ARRAY, list raw vectors FIXED_LEN_BYTE_ARRAY.","code":"write_parquet(..., schema = parquet_schema(name = \"ENUM\"))"},{"path":"https://nanoparquet.r-lib.org/dev/reference/nanoparquet-types.html","id":"parquet-s-data-types","dir":"Reference","previous_headings":"","what":"Parquet's data types","title":"nanoparquet's type maps — nanoparquet-types","text":"reading Parquet file nanoparquet also relies logical types Arrow metadata (present, see ) addition low level data types. following table summarizes mappings. See details . exact rules . rules likely change nanoparquet reaches version 1.0.0. BOOLEAN type read logical vector (LGLSXP). STRING logical type UTF8 converted type read character vector UTF-8 encoding. DATE logical type DATE converted type read Date R object. TIME logical type TIME_MILLIS TIME_MICROS converted types read hms object, see hms package. TIMESTAMP logical type TIMESTAMP_MILLIS TIMESTAMP_MICROS converted types read POSIXct objects. logical type UTC flag set, time zone POSIXct object set UTC. INT32 read integer vector (INTSXP). INT64, DOUBLE FLOAT read real vectors (REALSXP). INT96 read POSIXct read vector tzone attribute set \"UTC\". old convention store time stamps INT96 objects. DECIMAL converted type (FIXED_LEN_BYTE_ARRAY BYTE_ARRAY type) read real vector (REALSXP), potentially losing precision. ENUM logical type read character vector. UUID logical type read character vector uses 00112233-4455-6677-8899-aabbccddeeff form. FLOAT16 logical type read real vector (REALSXP). BYTE_ARRAY read factor object file written Arrow original data type column factor. (See 'Arrow metadata .) Otherwise BYTE_ARRAY read list raw vectors, missing values denoted NULL. logical converted types read annotated low level types: INT(8, true), INT(16, true) INT(32, true) read integer vectors INT32 internally Parquet. INT(64, true) read real vector (REALSXP). Unsigned integer types INT(8, false), INT(16, false) INT(32, false) read integer vectors (INTSXP). Large positive values may overflow negative values, known issue fix. INT(64, false) read real vector (REALSXP). Large positive values may overflow negative values, known issue fix. INTERVAL fixed length byte array, nanoparquet reads list raw vectors. Missing values denoted NULL. JSON columns read character vectors (STRSXP). BSON columns read raw vectors (RAWSXP). types yet supported: Nested types (LIST, MAP) supported. UNKNOWN logical type supported. can use read_parquet_schema() function see R read columns Parquet file. Look r_type column.","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/nanoparquet-types.html","id":"the-arrow-metadata","dir":"Reference","previous_headings":"","what":"The Arrow metadata","title":"nanoparquet's type maps — nanoparquet-types","text":"Apache Arrow (.e. arrow R package) adds additional metadata Parquet files writing arrow::write_parquet(). , reading file arrow::read_parquet(), uses metadata recreate Arrow R data types writing. nanoparquet::write_parquet() also adds Arrow metadata Parquet files, unless nanoparquet.write_arrow_metadata option set FALSE. Similarly, nanoparquet::read_parquet() uses Arrow metadata Parquet file (present), unless nanoparquet.use_arrow_metadata option set FALSE. Arrow metadata stored file level key-value metadata, key ARROW:schema. Currently nanoparquet uses Arrow metadata two things: uses detect factors. Without Arrow metadata factors read string vectors. uses detect difftime objects. Without arrow metadata read INT64 columns, containing time difference nanoseconds.","code":""},{"path":[]},{"path":"https://nanoparquet.r-lib.org/dev/reference/parquet-encodings.html","id":null,"dir":"Reference","previous_headings":"","what":"Parquet encodings — parquet-encodings","title":"Parquet encodings — parquet-encodings","text":"Various Parquet encodings","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/parquet-encodings.html","id":"nanoparquet-defaults","dir":"Reference","previous_headings":"","what":"Nanoparquet defaults","title":"Parquet encodings — parquet-encodings","text":"Currently defaults decided based R types. might change future. general, defaults likely change nanoparquet reaches version 1.0.0. Current encoding defaults: Definition levels always use RLE. (Nanoparquet currently write repetition levels, also use RLE, implemented.) factor columns use RLE_DICTIONARY. logical columns use RLE average run length first 10,000 values least 15. Otherwise use PLAIN encoding. integer, double character columns use RLE_DICTIONARY least two third values repeated. Otherwise use PLAIN encoding. list columns raw vectors always use PLAIN encoding currently.","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/parquet-encodings.html","id":"parquet-encodings","dir":"Reference","previous_headings":"","what":"Parquet encodings","title":"Parquet encodings — parquet-encodings","text":"See https://github.com/apache/parquet-format/blob/master/Encodings.md details Parquet encodings.","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/parquet-encodings.html","id":"plain-encoding","dir":"Reference","previous_headings":"","what":"PLAIN encoding","title":"Parquet encodings — parquet-encodings","text":"Supported types: . general values written back back: Integer types little endian. Floating point types follow IEEE standard. BYTE_ARRAY: element, little endian 4-byte length bytes . FIXED_LEN_BYTE_ARRAY: bytes written back back. Nanoparquet can read write encoding primitive types.","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/parquet-encodings.html","id":"rle-dictionary-encoding","dir":"Reference","previous_headings":"","what":"RLE_DICTIONARY encoding","title":"Parquet encodings — parquet-encodings","text":"Supported types: dictionary indices data pages. encoding combines run-length encoding bit-packing. Repeated sequences value can run-length encoded, non-repeated parts bit packed. used data pages dictionaries. dictionary pages PLAIN encoded. deprecated PLAIN_DICTIONARY name treated RLE_DICTIONARY. Nanoparquet can read write encoding.","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/parquet-encodings.html","id":"rle-encoding","dir":"Reference","previous_headings":"","what":"RLE encoding","title":"Parquet encodings — parquet-encodings","text":"Supported types: BOOLEAN. Also definition repetition levels. encoding RLE_DICTIONARY, slightly different header. combines run-length encoding bit packing. used BOOLEAN columns, also definition repetition levels. Nanoparquet can read write encoding.","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/parquet-encodings.html","id":"bit-packed-encoding-deprecated-in-favor-of-rle-","dir":"Reference","previous_headings":"","what":"BIT_PACKED encoding (deprecated in favor of RLE)","title":"Parquet encodings — parquet-encodings","text":"Supported types: none. definition repetition levels, RLE used instead. simple bit packing encoding integers, previously used encoding definition repetition levels. used new Parquet files RLE encoding includes better. Nanoparquet currently read write BIT_PACKED encoding.","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/parquet-encodings.html","id":"delta-binary-packed-encoding","dir":"Reference","previous_headings":"","what":"DELTA_BINARY_PACKED encoding","title":"Parquet encodings — parquet-encodings","text":"Supported types: INT32, INT64. encoding efficiently encodes integer columns differences consecutive elements often , /differences consecutive elements small. extreme case arithmetic sequence can encoded O(1) space. Nanoparquet can read encoding, currently write .","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/parquet-encodings.html","id":"delta-length-byte-array-encoding","dir":"Reference","previous_headings":"","what":"DELTA_LENGTH_BYTE_ARRAY encoding","title":"Parquet encodings — parquet-encodings","text":"Supported types: BYTE_ARRAY. encoding uses DELTA_BINARY_PACKED encode length byte array elements. especially efficient short byte array elements, .e. column short strings. Nanoparquet can read encoding, currently write .","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/parquet-encodings.html","id":"delta-byte-array-encoding","dir":"Reference","previous_headings":"","what":"DELTA_BYTE_ARRAY encoding","title":"Parquet encodings — parquet-encodings","text":"Supported types: BYTE_ARRAY, FIXED_LEN_BYTE_ARRAY. encoding efficient consecutive byte array elements share prefix, element can reuse prefix previous element. Nanoparquet can read encoding, currently write .","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/parquet-encodings.html","id":"byte-stream-split-encoding","dir":"Reference","previous_headings":"","what":"BYTE_STREAM_SPLIT encoding","title":"Parquet encodings — parquet-encodings","text":"Supported types: FLOAT, DOUBLE, INT32, INT64, FIXED_LEN_BYTE_ARRAY. encoding stores first bytes elements first, second bytes, etc. reduce size , may allow efficient compression. Nanoparquet can read encoding, currently write .","code":""},{"path":[]},{"path":"https://nanoparquet.r-lib.org/dev/reference/parquet_column_types.html","id":null,"dir":"Reference","previous_headings":"","what":"Map between R and Parquet data types — parquet_column_types","title":"Map between R and Parquet data types — parquet_column_types","text":"Note function now deprecated. Please use read_parquet_schema() files, infer_parquet_schema() data frames.","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/parquet_column_types.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Map between R and Parquet data types — parquet_column_types","text":"","code":"parquet_column_types(x, options = parquet_options())"},{"path":"https://nanoparquet.r-lib.org/dev/reference/parquet_column_types.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Map between R and Parquet data types — parquet_column_types","text":"x Path Parquet file, data frame. options Nanoparquet options, see parquet_options().","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/parquet_column_types.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Map between R and Parquet data types — parquet_column_types","text":"Data frame columns: file_name: file name. name: column name. type: (low level) Parquet data type. r_type: R type corresponds Parquet type. Might NA read_parquet() read column. See nanoparquet-types type mapping rules. repetition_type: whether column REQUIRED (NA) OPTIONAL (may NA). REPEATED columns currently supported nanoparquet. logical_type: Parquet logical type list column. element least entry called type, potentially additional entries, e.g. bit_width, is_signed, etc.","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/parquet_column_types.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Map between R and Parquet data types — parquet_column_types","text":"function works two ways. can map R types data frame Parquet types, see write_parquet() write data frame. can also map types Parquet file R types, see read_parquet() read file R.","code":""},{"path":[]},{"path":"https://nanoparquet.r-lib.org/dev/reference/parquet_options.html","id":null,"dir":"Reference","previous_headings":"","what":"Nanoparquet options — parquet_options","title":"Nanoparquet options — parquet_options","text":"Create list nanoparquet options.","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/parquet_options.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Nanoparquet options — parquet_options","text":"","code":"parquet_options( class = getOption(\"nanoparquet.class\", \"tbl\"), compression_level = getOption(\"nanoparquet.compression_level\", NA_integer_), keep_row_groups = FALSE, num_rows_per_row_group = getOption(\"nanoparquet.num_rows_per_row_group\", 10000000L), use_arrow_metadata = getOption(\"nanoparquet.use_arrow_metadata\", TRUE), write_arrow_metadata = getOption(\"nanoparquet.write_arrow_metadata\", TRUE), write_data_page_version = getOption(\"nanoparquet.write_data_page_version\", 1L), write_minmax_values = getOption(\"nanoparquet.write_minmax_values\", TRUE) )"},{"path":"https://nanoparquet.r-lib.org/dev/reference/parquet_options.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Nanoparquet options — parquet_options","text":"class extra class classes add data frames created read_parquet(). default nanoparquet adds \"tbl\" class, data frames printed differently pillar package loaded. compression_level compression level write_parquet(). NA default, specifies default compression level method. Inf always selects highest possible compression level. details: Snappy support compression levels currently. GZIP supports levels 0 (uncompressed), 1 (fastest), 9 (best). default 6. ZSTD allows positive levels 22 currently. 20 require memory. Negative levels also allowed, lower level, faster speed, cost compression. Currently smallest level -131072. default level 3. keep_row_groups option used appending Parquet file append_parquet(). TRUE existing row groups file always kept nanoparquet creates new row groups new data. FALSE (default), last row group file overwritten smaller default row group size, .e. num_rows_per_row_group. num_rows_per_row_group number rows put row group, row groups specified explicitly. integer scalar. Defaults 10 million. use_arrow_metadata TRUE FALSE. TRUE, read_parquet() read_parquet_schema() make use Apache Arrow metadata assign R classes Parquet columns. currently used detect factor columns, detect \"difftime\" columns. option FALSE: \"factor\" columns read character vectors. \"difftime\" columns read real numbers, meaning one seconds, milliseconds, microseconds nanoseconds. Impossible tell without using Arrow metadata. write_arrow_metadata Whether add Apache Arrow types metadata file write_parquet(). write_data_page_version Data version write default. Possible values 1 2. Default 1. write_minmax_values Whether write minimum maximum values per row group, data types support write_parquet(). However, nanoparquet currently support minimum maximum values DECIMAL, UUID FLOAT16 logical types BOOLEAN, BYTE_ARRAY FIXED_LEN_BYTE_ARRAY primitive types writing without logical type. Currently default TRUE.","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/parquet_options.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Nanoparquet options — parquet_options","text":"List nanoparquet options.","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/parquet_options.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Nanoparquet options — parquet_options","text":"","code":"if (FALSE) { # the effect of using Arrow metadata tmp <- tempfile(fileext = \".parquet\") d <- data.frame( fct = as.factor(\"a\"), dft = as.difftime(10, units = \"secs\") ) write_parquet(d, tmp) read_parquet(tmp, options = parquet_options(use_arrow_metadata = TRUE)) read_parquet(tmp, options = parquet_options(use_arrow_metadata = FALSE)) }"},{"path":"https://nanoparquet.r-lib.org/dev/reference/parquet_schema.html","id":null,"dir":"Reference","previous_headings":"","what":"Create a Parquet schema — parquet_schema","title":"Create a Parquet schema — parquet_schema","text":"can use schema specify write data frame Parquet file write_parquet().","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/parquet_schema.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Create a Parquet schema — parquet_schema","text":"","code":"parquet_schema(...)"},{"path":"https://nanoparquet.r-lib.org/dev/reference/parquet_schema.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Create a Parquet schema — parquet_schema","text":"... Parquet type specifications, see . backwards compatibility, can supply file name , parquet_schema behaves read_parquet_schema().","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/parquet_schema.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Create a Parquet schema — parquet_schema","text":"Data frame columns read_parquet_schema(): file_name, name, r_type, type, type_length, repetition_type, converted_type, logical_type, num_children, scale, precision, field_id.","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/parquet_schema.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Create a Parquet schema — parquet_schema","text":"schema list potentially named type specifications. schema stored data frame. (potentially named) argument parquet_schema may character scalar, list. Parameterized types need specified list. Primitive Parquet types may specified string list.","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/parquet_schema.html","id":"possible-types-","dir":"Reference","previous_headings":"","what":"Possible types:","title":"Create a Parquet schema — parquet_schema","text":"Special type: \"AUTO\": Parquet type, tells write_parquet() map R type Parquet automatically, using default mapping rules. Primitive Parquet types: \"BOOLEAN\" \"INT32\" \"INT64\" \"INT96\" \"FLOAT\" \"DOUBLE\" \"BYTE_ARRAY\" \"FIXED_LEN_BYTE_ARRAY\": fixed-length byte array. needs type_length parameter, integer 0 2^31-1. Parquet logical types: \"STRING\" \"ENUM\" \"UUID\" \"INTEGER\": signed unsigned integer. needs bit_width is_signed parameter. bit_width must 8, 16, 32 64. is_signed must TRUE FALSE. \"INT\": \"INTEGER\". Parquet documentation uses \"INT\", actual specification uses \"INTEGER\". supported nanoparquet. \"DECIMAL\": decimal number specified scale precision. needs precision primitive_type parameters. Also supports scale parameter, defaults zero specified. \"FLOAT16\" \"DATE\" \"TIME\": needs is_adjusted_utc (TRUE FALSE) unit parameter. unit must \"MILLIS\", \"MICROS\" \"NANOS\". \"TIMESTAMP\": needs is_adjusted_utc (TRUE FALSE) unit parameter. unit must \"MILLIS\", \"MICROS\" \"NANOS\". \"JSON\" \"BSON\" Logical types MAP, LIST UNKNOWN supported currently. Converted types deprecated Parquet specification favor logical types, parquet_schema() accepts converted types syntactic shortcut corresponding logical types: INT_8 mean list(\"INT\", bit_width = 8, is_signed = TRUE). INT_16 mean list(\"INT\", bit_width = 16, is_signed = TRUE). INT_32 mean list(\"INT\", bit_width = 32, is_signed = TRUE). INT_64 mean list(\"INT\", bit_width = 64, is_signed = TRUE). TIME_MICROS means list(\"TIME\", is_adjusted_utc = TRUE, unit = \"MICROS\"). TIME_MILLIS means list(\"TIME\", is_adjusted_utc = TRUE, unit = \"MILLIS\"). TIMESTAMP_MICROS means list(\"TIMESTAMP\", is_adjusted_utc = TRUE, unit = \"MICROS\"). TIMESTAMP_MILLIS means list(\"TIMESTAMP\", is_adjusted_utc = TRUE, unit = \"MILLIS\"). UINT_8 means list(\"INT\", bit_width = 8, is_signed = FALSE). UINT_16 means list(\"INT\", bit_width = 16, is_signed = FALSE). UINT_32 means list(\"INT\", bit_width = 32, is_signed = FALSE). UINT_64 means list(\"INT\", bit_width = 64, is_signed = FALSE).","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/parquet_schema.html","id":"missing-values","dir":"Reference","previous_headings":"","what":"Missing values","title":"Create a Parquet schema — parquet_schema","text":"type might also repetition_type parameter, possible values \"REQUIRED\", \"OPTIONAL\" \"REPEATED\". \"REQUIRED\" columns allow missing values. Missing values allowed \"OPTIONAL\" columns. \"REPEATED\" columns currently supported write_parquet().","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/parquet_schema.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Create a Parquet schema — parquet_schema","text":"","code":"parquet_schema( c1 = \"INT32\", c2 = list(\"INT\", bit_width = 64, is_signed = TRUE), c3 = list(\"STRING\", repetition_type = \"OPTIONAL\") ) #> # A data frame: 3 × 12 #> file_name name r_type type type_length repetition_type converted_type #> * #> 1 NA c1 NA INT32 NA NA NA #> 2 NA c2 NA INT64 NA NA INT_64 #> 3 NA c3 NA BYTE_… NA OPTIONAL UTF8 #> # ℹ 5 more variables: logical_type >, num_children , #> # scale , precision , field_id "},{"path":"https://nanoparquet.r-lib.org/dev/reference/read_parquet.html","id":null,"dir":"Reference","previous_headings":"","what":"Read a Parquet file into a data frame — read_parquet","title":"Read a Parquet file into a data frame — read_parquet","text":"Converts contents named Parquet file R data frame.","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/read_parquet.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Read a Parquet file into a data frame — read_parquet","text":"","code":"read_parquet(file, col_select = NULL, options = parquet_options())"},{"path":"https://nanoparquet.r-lib.org/dev/reference/read_parquet.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Read a Parquet file into a data frame — read_parquet","text":"file Path Parquet file. may also R connection, case first reads data connection, writes temporary file, reads temporary file, deletes . connection might open, case must binary connection. open, read_parquet() open also close end. col_select Columns read. can numeric vector column indices, character vector column names. error select column multiple times. order columns result order col_select. options Nanoparquet options, see parquet_options().","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/read_parquet.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Read a Parquet file into a data frame — read_parquet","text":"data.frame file's contents.","code":""},{"path":[]},{"path":"https://nanoparquet.r-lib.org/dev/reference/read_parquet.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Read a Parquet file into a data frame — read_parquet","text":"","code":"file_name <- system.file(\"extdata/userdata1.parquet\", package = \"nanoparquet\") parquet_df <- nanoparquet::read_parquet(file_name) print(str(parquet_df)) #> Classes ‘tbl’ and 'data.frame':\t1000 obs. of 13 variables: #> $ registration: POSIXct, format: \"2016-02-03 07:55:29\" \"2016-02-03 17:04:03\" ... #> $ id : int 1 2 3 4 5 6 7 8 9 10 ... #> $ first_name : chr \"Amanda\" \"Albert\" \"Evelyn\" \"Denise\" ... #> $ last_name : chr \"Jordan\" \"Freeman\" \"Morgan\" \"Riley\" ... #> $ email : chr \"ajordan0@com.com\" \"afreeman1@is.gd\" \"emorgan2@altervista.org\" \"driley3@gmpg.org\" ... #> $ gender : Factor w/ 2 levels \"Female\",\"Male\": 1 2 1 1 NA 1 2 2 2 1 ... #> $ ip_address : chr \"1.197.201.2\" \"218.111.175.34\" \"7.161.136.94\" \"140.35.109.83\" ... #> $ cc : chr \"6759521864920116\" NA \"6767119071901597\" \"3576031598965625\" ... #> $ country : chr \"Indonesia\" \"Canada\" \"Russia\" \"China\" ... #> $ birthdate : Date, format: \"1971-03-08\" \"1968-01-16\" ... #> $ salary : num 49757 150280 144973 90263 NA ... #> $ title : chr \"Internal Auditor\" \"Accountant IV\" \"Structural Engineer\" \"Senior Cost Accountant\" ... #> $ comments : chr \"1E+02\" NA NA NA ... #> NULL"},{"path":"https://nanoparquet.r-lib.org/dev/reference/read_parquet_info.html","id":null,"dir":"Reference","previous_headings":"","what":"Short summary of a Parquet file — read_parquet_info","title":"Short summary of a Parquet file — read_parquet_info","text":"Short summary Parquet file","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/read_parquet_info.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Short summary of a Parquet file — read_parquet_info","text":"","code":"read_parquet_info(file) parquet_info(file)"},{"path":"https://nanoparquet.r-lib.org/dev/reference/read_parquet_info.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Short summary of a Parquet file — read_parquet_info","text":"file Path Parquet file.","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/read_parquet_info.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Short summary of a Parquet file — read_parquet_info","text":"Data frame columns: file_name: file name. num_cols: number (leaf) columns. num_rows: number rows. num_row_groups: number row groups. file_size: file size bytes. parquet_version: Parquet version. created_by: string scalar, usually name software created file. NA available.","code":""},{"path":[]},{"path":"https://nanoparquet.r-lib.org/dev/reference/read_parquet_metadata.html","id":null,"dir":"Reference","previous_headings":"","what":"Read the metadata of a Parquet file — read_parquet_metadata","title":"Read the metadata of a Parquet file — read_parquet_metadata","text":"function work files, even read_parquet() unable read , unsupported schema, encoding, compression reason.","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/read_parquet_metadata.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Read the metadata of a Parquet file — read_parquet_metadata","text":"","code":"read_parquet_metadata(file, options = parquet_options()) parquet_metadata(file)"},{"path":"https://nanoparquet.r-lib.org/dev/reference/read_parquet_metadata.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Read the metadata of a Parquet file — read_parquet_metadata","text":"file Path Parquet file. options Options potentially alter default Parquet R type mappings, see parquet_options().","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/read_parquet_metadata.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Read the metadata of a Parquet file — read_parquet_metadata","text":"named list entries: file_meta_data: data frame file meta data: file_name: file name. version: Parquet version, integer. num_rows: total number rows. key_value_metadata: list column data frames two character columns called key value. key-value metadata file. Arrow stores schema . created_by: string scalar, usually name software created file. schema: data frame, schema file. one row node (inner node leaf node). flat files means one root node (inner node), always first one, one row \"real\" column. nested schemas, rows depth-first search order. important columns : file_name: file name. name: column name. r_type: R type corresponds Parquet type. Might NA read_parquet() read column. See nanoparquet-types type mapping rules. r_type: type: data type. One low level data types. type_length: length fixed length byte arrays. repettion_type: character, one REQUIRED, OPTIONAL REPEATED. logical_type: list column, logical types columns. element least entry called type, potentially additional entries, e.g. bit_width, is_signed, etc. num_children: number child nodes. non-negative integer root node, NA leaf node. $row_groups: data frame, information row groups. important columns: file_name: file name. id: row group id, integer zero number row groups minus one. total_byte_size: total uncompressed size column data. num_rows: number rows. file_offset: row group starts file. optional, might NA. total_compressed_size: total byte size compressed (potentially encrypted) column data row group. optional, might NA. ordinal: ordinal position row group file, starting zero. optional, might NA. NA, order row groups appear metadata. $column_chunks: data frame, information column chunks, across row groups. important columns: file_name: file name. row_group: row group chunk belongs . column: leaf column chunks belongs . order $schema, leaf columns (.e. columns NA children) counted. file_path: file chunk stored . NA means file. file_offset: column chunk begins file. type: low level parquet data type. encodings: encodings used store chunk. list column character vectors encoding names. Current possible encodings: \"PLAIN\", \"GROUP_VAR_INT\", \"PLAIN_DICTIONARY\", \"RLE\", \"BIT_PACKED\", \"DELTA_BINARY_PACKED\", \"DELTA_LENGTH_BYTE_ARRAY\", \"DELTA_BYTE_ARRAY\", \"RLE_DICTIONARY\", \"BYTE_STREAM_SPLIT\". path_in_scema: list column character vectors. simply path root node. simply column name flat schemas. codec: compression codec used column chunk. Possible values : \"UNCOMPRESSED\", \"SNAPPY\", \"GZIP\", \"LZO\", \"BROTLI\", \"LZ4\", \"ZSTD\". num_values: number values column chunk. total_uncompressed_size: total uncompressed size bytes. total_compressed_size: total compressed size bytes. data_page_offset: absolute position first data page column chunk file. index_page_offset: absolute position first index page column chunk file, NA index pages. dictionary_page_offset: absolute position first dictionary page column chunk file, NA dictionary pages. null_count: number missing values column chunk. may NA. min_value: list column raw vectors, minimum value column, binary. NULL, specified. column experimental. max_value: list column raw vectors, maximum value column, binary. NULL, specified. column experimental. is_min_value_exact: whether minimum value actual value column, bound. may NA. is_max_value_exact: whether maximum value actual value column, bound. may NA.","code":""},{"path":[]},{"path":"https://nanoparquet.r-lib.org/dev/reference/read_parquet_metadata.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Read the metadata of a Parquet file — read_parquet_metadata","text":"","code":"file_name <- system.file(\"extdata/userdata1.parquet\", package = \"nanoparquet\") nanoparquet::read_parquet_metadata(file_name) #> $file_meta_data #> # A data frame: 1 × 5 #> file_name version num_rows key_value_metadata created_by #> > #> 1 /home/runner/work/_temp/… 1 1000 https://g… #> #> $schema #> # A data frame: 14 × 12 #> file_name name r_type type type_length repetition_type converted_type #> #> 1 /home/ru… sche… NA NA NA NA NA #> 2 /home/ru… regi… POSIX… INT64 NA REQUIRED TIMESTAMP_MIC… #> 3 /home/ru… id integ… INT32 NA REQUIRED INT_32 #> 4 /home/ru… firs… chara… BYTE… NA OPTIONAL UTF8 #> 5 /home/ru… last… chara… BYTE… NA REQUIRED UTF8 #> 6 /home/ru… email factor BYTE… NA OPTIONAL UTF8 #> 7 /home/ru… gend… chara… BYTE… NA OPTIONAL UTF8 #> 8 /home/ru… ip_a… chara… BYTE… NA REQUIRED UTF8 #> 9 /home/ru… cc chara… BYTE… NA OPTIONAL UTF8 #> 10 /home/ru… coun… chara… BYTE… NA REQUIRED UTF8 #> 11 /home/ru… birt… Date INT32 NA OPTIONAL DATE #> 12 /home/ru… sala… double DOUB… NA OPTIONAL NA #> 13 /home/ru… title chara… BYTE… NA OPTIONAL UTF8 #> 14 /home/ru… comm… chara… BYTE… NA OPTIONAL UTF8 #> # ℹ 5 more variables: logical_type >, num_children , #> # scale , precision , field_id #> #> $row_groups #> # A data frame: 1 × 7 #> file_name id total_byte_size num_rows file_offset #> #> 1 /home/runner/work/_temp/Libr… 0 71427 1000 NA #> # ℹ 2 more variables: total_compressed_size , ordinal #> #> $column_chunks #> # A data frame: 13 × 24 #> file_name row_group column file_path file_offset offset_index_offset #> #> 1 /home/runne… 0 0 NA 4 NA #> 2 /home/runne… 0 1 NA 6741 NA #> 3 /home/runne… 0 2 NA 12259 NA #> 4 /home/runne… 0 3 NA 15211 NA #> 5 /home/runne… 0 4 NA 16239 NA #> 6 /home/runne… 0 5 NA 31759 NA #> 7 /home/runne… 0 6 NA 32031 NA #> 8 /home/runne… 0 7 NA 42952 NA #> 9 /home/runne… 0 8 NA 55009 NA #> 10 /home/runne… 0 9 NA 55925 NA #> 11 /home/runne… 0 10 NA 59312 NA #> 12 /home/runne… 0 11 NA 67026 NA #> 13 /home/runne… 0 12 NA 71089 NA #> # ℹ 18 more variables: offset_index_length , #> # column_index_offset , column_index_length , type , #> # encodings >, path_in_schema >, codec , #> # num_values , total_uncompressed_size , #> # total_compressed_size , data_page_offset , #> # index_page_offset , dictionary_page_offset , #> # null_count , min_value >, max_value >, … #>"},{"path":"https://nanoparquet.r-lib.org/dev/reference/read_parquet_page.html","id":null,"dir":"Reference","previous_headings":"","what":"Read a page from a Parquet file — read_parquet_page","title":"Read a page from a Parquet file — read_parquet_page","text":"Read page Parquet file","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/read_parquet_page.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Read a page from a Parquet file — read_parquet_page","text":"","code":"read_parquet_page(file, offset)"},{"path":"https://nanoparquet.r-lib.org/dev/reference/read_parquet_page.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Read a page from a Parquet file — read_parquet_page","text":"file Path Parquet file. offset Integer offset start page file. See read_parquet_pages() list pages offsets.","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/read_parquet_page.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Read a page from a Parquet file — read_parquet_page","text":"Named list. Many entries correspond columns result read_parquet_pages(). Additional entries : codec: compression codec. Possible values: has_repetition_levels: whether page repetition levels. has_definition_levels: whether page definition levels. schema_column: schema column page corresponds . Note leaf columns pages. data_type: low level Parquet data type. Possible values: repetition_type: whether column page belongs REQUIRED, OPTIONAL REPEATED. page_header: bytes page header raw vector. num_null: number missing (NA) values. set V2 data pages. num_rows: num_values flat tables, .e. files without repetition levels. compressed_data: data page raw vector. includes repetition definition levels, . data: uncompressed data, nanoparquet supports compression codec file (GZIP SNAPPY time writing), file compressed. latter case compressed_data.","code":""},{"path":[]},{"path":"https://nanoparquet.r-lib.org/dev/reference/read_parquet_page.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Read a page from a Parquet file — read_parquet_page","text":"","code":"file_name <- system.file(\"extdata/userdata1.parquet\", package = \"nanoparquet\") nanoparquet:::read_parquet_pages(file_name) #> # A data frame: 19 × 14 #> file_name row_group column page_type page_header_offset #> #> 1 /home/runner/work/_temp/… 0 0 DATA_PAGE 4 #> 2 /home/runner/work/_temp/… 0 1 DATA_PAGE 6741 #> 3 /home/runner/work/_temp/… 0 2 DICTIONA… 10766 #> 4 /home/runner/work/_temp/… 0 2 DATA_PAGE 12259 #> 5 /home/runner/work/_temp/… 0 3 DICTIONA… 13334 #> 6 /home/runner/work/_temp/… 0 3 DATA_PAGE 15211 #> 7 /home/runner/work/_temp/… 0 4 DATA_PAGE 16239 #> 8 /home/runner/work/_temp/… 0 5 DICTIONA… 31726 #> 9 /home/runner/work/_temp/… 0 5 DATA_PAGE 31759 #> 10 /home/runner/work/_temp/… 0 6 DATA_PAGE 32031 #> 11 /home/runner/work/_temp/… 0 7 DATA_PAGE 42952 #> 12 /home/runner/work/_temp/… 0 8 DICTIONA… 53749 #> 13 /home/runner/work/_temp/… 0 8 DATA_PAGE 55009 #> 14 /home/runner/work/_temp/… 0 9 DATA_PAGE 55925 #> 15 /home/runner/work/_temp/… 0 10 DATA_PAGE 59312 #> 16 /home/runner/work/_temp/… 0 11 DICTIONA… 65063 #> 17 /home/runner/work/_temp/… 0 11 DATA_PAGE 67026 #> 18 /home/runner/work/_temp/… 0 12 DICTIONA… 68019 #> 19 /home/runner/work/_temp/… 0 12 DATA_PAGE 71089 #> # ℹ 9 more variables: uncompressed_page_size , #> # compressed_page_size , crc , num_values , #> # encoding , definition_level_encoding , #> # repetition_level_encoding , data_offset , #> # page_header_length options(max.print = 100) # otherwise long raw vector nanoparquet:::read_parquet_page(file_name, 4L) #> $page_type #> [1] \"DATA_PAGE\" #> #> $row_group #> [1] 0 #> #> $column #> [1] 0 #> #> $page_header_offset #> [1] 4 #> #> $data_page_offset #> [1] 24 #> #> $page_header_length #> [1] 20 #> #> $compressed_page_size #> [1] 6717 #> #> $uncompressed_page_size #> [1] 8000 #> #> $codec #> [1] \"SNAPPY\" #> #> $num_values #> [1] 1000 #> #> $encoding #> [1] \"PLAIN\" #> #> $definition_level_encoding #> [1] \"PLAIN\" #> #> $repetition_level_encoding #> [1] \"PLAIN\" #> #> $has_repetition_levels #> [1] FALSE #> #> $has_definition_levels #> [1] FALSE #> #> $schema_column #> [1] 1 #> #> $data_type #> [1] \"INT64\" #> #> $repetition_type #> [1] \"REQUIRED\" #> #> $page_header #> [1] 15 00 15 80 7d 15 fa 68 2c 15 d0 0f 15 00 15 00 15 00 00 00 #> #> $data #> [1] 40 be 0c f1 d8 2a 05 00 c0 86 e0 9a e0 2a 05 00 c0 28 33 45 d3 2a 05 #> [24] 00 40 2b 96 ce d2 2a 05 00 c0 9c 33 91 d6 2a 05 00 80 a2 54 7b d8 2a #> [47] 05 00 00 59 b2 77 d9 2a 05 00 80 ee 7d fc d7 2a 05 00 40 cf 71 8d d5 #> [70] 2a 05 00 c0 bc 7b cd e1 2a 05 00 80 e4 da 72 d2 2a 05 00 80 30 4d 73 #> [93] e1 2a 05 00 40 fe a4 0f #> [ reached getOption(\"max.print\") -- omitted 7900 entries ] #> #> $definition_levels_byte_length #> [1] NA #> #> $repetition_levels_byte_length #> [1] NA #> #> $num_nulls #> [1] NA #> #> $num_rows #> [1] NA #> #> $compressed_data #> [1] c0 3e 30 40 be 0c f1 d8 2a 05 00 c0 86 e0 9a e0 01 08 2c 28 33 45 d3 #> [24] 2a 05 00 40 2b 96 ce d2 01 10 28 9c 33 91 d6 2a 05 00 80 a2 54 7b 01 #> [47] 28 10 00 59 b2 77 d9 01 10 0c ee 7d fc d7 01 28 0c cf 71 8d d5 01 28 #> [70] 0c bc 7b cd e1 01 18 08 e4 da 72 01 38 0c 80 30 4d 73 01 10 30 40 fe #> [93] a4 0f e2 2a 05 00 00 eb #> [ reached getOption(\"max.print\") -- omitted 6617 entries ] #>"},{"path":"https://nanoparquet.r-lib.org/dev/reference/read_parquet_pages.html","id":null,"dir":"Reference","previous_headings":"","what":"Metadata of all pages of a Parquet file — read_parquet_pages","title":"Metadata of all pages of a Parquet file — read_parquet_pages","text":"Metadata pages Parquet file","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/read_parquet_pages.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Metadata of all pages of a Parquet file — read_parquet_pages","text":"","code":"read_parquet_pages(file)"},{"path":"https://nanoparquet.r-lib.org/dev/reference/read_parquet_pages.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Metadata of all pages of a Parquet file — read_parquet_pages","text":"file Path Parquet file.","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/read_parquet_pages.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Metadata of all pages of a Parquet file — read_parquet_pages","text":"Data frame columns: file_name: file name. row_group: id row group page belongs , integer 0 number row groups minus one. column: id column. integer number leaf columns minus one. Note leaf columns considered, non-leaf columns pages. page_type: DATA_PAGE, INDEX_PAGE, DICTIONARY_PAGE DATA_PAGE_V2. page_header_offset: offset data page (header) file. uncompressed_page_size: include page header, per Parquet spec. compressed_page_size: without page header. crc: integer, checksum, present file, can NA. num_values: number data values page, include NULL (NA R) values. encoding: encoding page, current possible encodings: \"PLAIN\", \"GROUP_VAR_INT\", \"PLAIN_DICTIONARY\", \"RLE\", \"BIT_PACKED\", \"DELTA_BINARY_PACKED\", \"DELTA_LENGTH_BYTE_ARRAY\", \"DELTA_BYTE_ARRAY\", \"RLE_DICTIONARY\", \"BYTE_STREAM_SPLIT\". definition_level_encoding: encoding definition levels, see encoding possible values. can missing V2 data pages, always RLE encoded. repetition_level_encoding: encoding repetition levels, see encoding possible values. can missing V2 data pages, always RLE encoded. data_offset: offset actual data file. page_header_length: size page header, bytes.","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/read_parquet_pages.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Metadata of all pages of a Parquet file — read_parquet_pages","text":"Reading page headers might slow large files, especially file many small pages.","code":""},{"path":[]},{"path":"https://nanoparquet.r-lib.org/dev/reference/read_parquet_pages.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Metadata of all pages of a Parquet file — read_parquet_pages","text":"","code":"file_name <- system.file(\"extdata/userdata1.parquet\", package = \"nanoparquet\") nanoparquet:::read_parquet_pages(file_name) #> # A data frame: 19 × 14 #> file_name row_group column page_type page_header_offset #> #> 1 /home/runner/work/_temp/… 0 0 DATA_PAGE 4 #> 2 /home/runner/work/_temp/… 0 1 DATA_PAGE 6741 #> 3 /home/runner/work/_temp/… 0 2 DICTIONA… 10766 #> 4 /home/runner/work/_temp/… 0 2 DATA_PAGE 12259 #> 5 /home/runner/work/_temp/… 0 3 DICTIONA… 13334 #> 6 /home/runner/work/_temp/… 0 3 DATA_PAGE 15211 #> 7 /home/runner/work/_temp/… 0 4 DATA_PAGE 16239 #> 8 /home/runner/work/_temp/… 0 5 DICTIONA… 31726 #> 9 /home/runner/work/_temp/… 0 5 DATA_PAGE 31759 #> 10 /home/runner/work/_temp/… 0 6 DATA_PAGE 32031 #> 11 /home/runner/work/_temp/… 0 7 DATA_PAGE 42952 #> 12 /home/runner/work/_temp/… 0 8 DICTIONA… 53749 #> 13 /home/runner/work/_temp/… 0 8 DATA_PAGE 55009 #> 14 /home/runner/work/_temp/… 0 9 DATA_PAGE 55925 #> 15 /home/runner/work/_temp/… 0 10 DATA_PAGE 59312 #> 16 /home/runner/work/_temp/… 0 11 DICTIONA… 65063 #> 17 /home/runner/work/_temp/… 0 11 DATA_PAGE 67026 #> 18 /home/runner/work/_temp/… 0 12 DICTIONA… 68019 #> 19 /home/runner/work/_temp/… 0 12 DATA_PAGE 71089 #> # ℹ 9 more variables: uncompressed_page_size , #> # compressed_page_size , crc , num_values , #> # encoding , definition_level_encoding , #> # repetition_level_encoding , data_offset , #> # page_header_length "},{"path":"https://nanoparquet.r-lib.org/dev/reference/read_parquet_schema.html","id":null,"dir":"Reference","previous_headings":"","what":"Read the schema of a Parquet file — read_parquet_schema","title":"Read the schema of a Parquet file — read_parquet_schema","text":"function work files, even read_parquet() unable read , unsupported schema, encoding, compression reason.","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/read_parquet_schema.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Read the schema of a Parquet file — read_parquet_schema","text":"","code":"read_parquet_schema(file, options = parquet_options())"},{"path":"https://nanoparquet.r-lib.org/dev/reference/read_parquet_schema.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Read the schema of a Parquet file — read_parquet_schema","text":"file Path Parquet file. options Return value parquet_options(), options potentially modify Parquet R type mappings.","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/read_parquet_schema.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Read the schema of a Parquet file — read_parquet_schema","text":"","code":"Data frame, the schema of the file. It has one row for each node (inner node or leaf node). For flat files this means one root node (inner node), always the first one, and then one row for each \"real\" column. For nested schemas, the rows are in depth-first search order. Most important columns are: - `file_name`: file name. - `name`: column name. - `r_type`: the R type that corresponds to the Parquet type. Might be `NA` if [read_parquet()] cannot read this column. See [nanoparquet-types] for the type mapping rules. - `type`: data type. One of the low level data types. - `type_length`: length for fixed length byte arrays. - `repettion_type`: character, one of `REQUIRED`, `OPTIONAL` or `REPEATED`. - `logical_type`: a list column, the logical types of the columns. An element has at least an entry called `type`, and potentially additional entries, e.g. `bit_width`, `is_signed`, etc. - `num_children`: number of child nodes. Should be a non-negative integer for the root node, and `NA` for a leaf node."},{"path":[]},{"path":"https://nanoparquet.r-lib.org/dev/reference/rle_decode_int.html","id":null,"dir":"Reference","previous_headings":"","what":"RLE decode integers — rle_decode_int","title":"RLE decode integers — rle_decode_int","text":"RLE decode integers","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/rle_decode_int.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"RLE decode integers — rle_decode_int","text":"","code":"rle_decode_int( x, bit_width = attr(x, \"bit_width\"), length = attr(x, \"length\") %||% NA )"},{"path":"https://nanoparquet.r-lib.org/dev/reference/rle_decode_int.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"RLE decode integers — rle_decode_int","text":"x Raw vector encoded integers. bit_width Bit width used encoding. length Length output. NA assume x starts length output, encoded 4 byte integer.","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/rle_decode_int.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"RLE decode integers — rle_decode_int","text":"decoded integer vector.","code":""},{"path":[]},{"path":"https://nanoparquet.r-lib.org/dev/reference/rle_encode_int.html","id":null,"dir":"Reference","previous_headings":"","what":"RLE encode integers — rle_encode_int","title":"RLE encode integers — rle_encode_int","text":"RLE encode integers","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/rle_encode_int.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"RLE encode integers — rle_encode_int","text":"","code":"rle_encode_int(x)"},{"path":"https://nanoparquet.r-lib.org/dev/reference/rle_encode_int.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"RLE encode integers — rle_encode_int","text":"x Integer vector.","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/rle_encode_int.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"RLE encode integers — rle_encode_int","text":"Raw vector, encoded integers. two attributes: bit_length: number bits needed encode input, length: length original integer input.","code":""},{"path":[]},{"path":"https://nanoparquet.r-lib.org/dev/reference/write_parquet.html","id":null,"dir":"Reference","previous_headings":"","what":"Write a data frame to a Parquet file — write_parquet","title":"Write a data frame to a Parquet file — write_parquet","text":"Writes contents R data frame Parquet file.","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/write_parquet.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Write a data frame to a Parquet file — write_parquet","text":"","code":"write_parquet( x, file, schema = NULL, compression = c(\"snappy\", \"gzip\", \"zstd\", \"uncompressed\"), encoding = NULL, metadata = NULL, row_groups = NULL, options = parquet_options() )"},{"path":"https://nanoparquet.r-lib.org/dev/reference/write_parquet.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Write a data frame to a Parquet file — write_parquet","text":"x Data frame write. file Path output file. string \":raw:\", data frame written memory buffer, memory buffer returned raw vector. schema Parquet schema. Specify schema tweak default nanoparquet R -> Parquet type mappings. Use parquet_schema() create schema can use , read_parquet_schema() use schema Parquet file. compression Compression algorithm use. Currently \"snappy\" (default), \"gzip\", \"zstd\", \"uncompressed\" supported. encoding Encoding use. Possible values: NULL, appropriate encoding selected automatically: RLE PLAIN BOOLEAN columns, RLE_DICTIONARY columns many repeated values, PLAIN otherwise. single (unnamed) character string, 'll used columns. unnamed character vector encoding names length number columns data frame, encodings used column. named character vector, named must unique name must match column name, specify encoding column. special empty name (\"\") applies rest columns. empty name, rest columns use default encoding. NA_character_ specified column, default encoding used column. specified encoding invalid certain column type, nanoparquet implement , write_parquet() throws error. version nanoparquet supports following encodings: PLAIN, GROUP_VAR_INT, PLAIN_DICTIONARY, RLE, BIT_PACKED, DELTA_BINARY_PACKED, DELTA_LENGTH_BYTE_ARRAY, DELTA_BYTE_ARRAY, RLE_DICTIONARY, BYTE_STREAM_SPLIT. See parquet-encodings encodings. metadata Additional key-value metadata add file. must named character vector, data frame columns character columns called key value. row_groups Row groups Parquet file. NULL, num_rows_per_row_group option used options argument, see parquet_options(). Otherwise must integer vector, specifying starts row groups. options Nanoparquet options, see parquet_options().","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/write_parquet.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Write a data frame to a Parquet file — write_parquet","text":"NULL, unless file \":raw:\", case Parquet file returned raw vector.","code":""},{"path":"https://nanoparquet.r-lib.org/dev/reference/write_parquet.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Write a data frame to a Parquet file — write_parquet","text":"write_parquet() converts string columns UTF-8 encoding calling base::enc2utf8(). factor levels.","code":""},{"path":[]},{"path":"https://nanoparquet.r-lib.org/dev/reference/write_parquet.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Write a data frame to a Parquet file — write_parquet","text":"","code":"if (FALSE) { # add row names as a column, because `write_parquet()` ignores them. mtcars2 <- cbind(name = rownames(mtcars), mtcars) write_parquet(mtcars2, \"mtcars.parquet\") }"},{"path":"https://nanoparquet.r-lib.org/dev/news/index.html","id":"nanoparquet-development-version","dir":"Changelog","previous_headings":"","what":"nanoparquet (development version)","title":"nanoparquet (development version)","text":"API changes: parquet_schema() now called read_parquet_schema(). new parquet_schema() function falls back read_parquet_schema() called single string argument, warning. parquet_info() now called read_parquet_info(). parquet_info( still works now, warning. parquet_metadata() now called read_parquet_metadata(). parquet_metadata() still works, warning. parquet_column_types() now deprecated, issues warning. Use read_parquet_schema() new infer_parquet_schema() function instead. improvements: new parquet_schema() function creates Parquet schema scratch. can use schema new schema argument write_parquet(), specify columns data frame mapped Parquet types. New append_parquet() function append data frame existing Parquet file. New col_select argument read_parquet() read subset columns Parquet file. write_parquet() can now write multiple row groups. default puts 10 million rows single row group. can choose row groups manually row_groups argument. write_parquet() now writes minimum maximum values per row group types. See ?parquet_options() turning . also writes number non-missing values. Newly supported type conversions write_parquet() via schema argument: integer INT64, integer INT96, double INT96, double FLOAT, character BYTE_ARRAY, character FIXED_LEN_BYTE_ARRAY, character ENUM, factor ENUM. integer DECIMAL, INT32, integer DECIMAL, INT64, double DECIMAL, INT32, double DECIMAL, INT64, integer INT(8, *), INT(16, *), INT(32, signed), double INT(*, *), character UUID, double FLOAT16, list raw vectors BYTE_ARRAY, list raw vectors FIXED_LEN_BYTE_ARRAY. write_parquet() can now write version 2 data pages. default still version 1, might change future. write_parquet(file = \":raw:\") now works correctly larger data frames (#77). New compression_level option select compression level manually. See ?parquet_options details. (#91). read_parquet() can now read R connection (#71). read_parquet() now reads DECIMAL values correctly INT32 INT64 columns scale zero. read_parquet() now reads JSON columns character vectors, documented. read_parquet() now reads FLOAT16 logical type real (double) vector. class argument parquet_options() nanoparquet.class option now work (#104).","code":""},{"path":"https://nanoparquet.r-lib.org/dev/news/index.html","id":"nanoparquet-031","dir":"Changelog","previous_headings":"","what":"nanoparquet 0.3.1","title":"nanoparquet 0.3.1","text":"CRAN release: 2024-07-01 version fixes write_parquet() crash (#73).","code":""},{"path":"https://nanoparquet.r-lib.org/dev/news/index.html","id":"nanoparquet-030","dir":"Changelog","previous_headings":"","what":"nanoparquet 0.3.0","title":"nanoparquet 0.3.0","text":"CRAN release: 2024-06-17 read_parquet() type mapping changes: STRING logical type UTF8 converted type still read character vector, BYTE_ARRAY types without converted logical types , read list raw vectors. Missing values indicated NULL values. DECIMAL converted type read REALSXP now, even type FIXED_LEN_BYTE_ARRAY. (just BYTE_ARRAY). UUID logical type now read character vector, formatted 00112233-4455-6677-8899-aabbccddeeff. BYTE_ARRAY FIXED_LEN_BYTE_ARRAY types without logical converted types; unsupported ones: FLOAT16, INTERVAL; now read list raw vectors. Missing values denoted NULL. write_parquet() now automatically uses dictionary encoding columns many repeated values. first 10k rows used decide dictionary used . Similarly, logical columns written RLE encoding contain runs repeated values. NA values ignored selecting encoding (#18). write_parquet() can now write data frame memory buffer, returned raw vector, special \":raw:\" filename used (#31). read_parquet() can now read Parquet files V2 data pages (#37). read_parquet() write_parquet() now support GZIP ZSTD compressed Parquet files. read_parquet() now supports RLE encoding BOOLEAN columns also supports DELTA_BINARY_PACKED, DELTA_LENGTH_BYTE_ARRAY, DELTA_BYTE_ARRAY BYTE_STREAM_SPLIT encodings. parquet_columns() function now called parquet_column_types() can now map column types data frame Parquet types. parquet_info(), parquet_metadata() parquet_column_types() now work created_by metadata field unset. New parquet_options() function can use set nanoparquet options single read_parquet() write_parquet() call.","code":""},{"path":"https://nanoparquet.r-lib.org/dev/news/index.html","id":"nanoparquet-020","dir":"Changelog","previous_headings":"","what":"nanoparquet 0.2.0","title":"nanoparquet 0.2.0","text":"CRAN release: 2024-05-30 First release CRAN. contains Parquet reader https://github.com/hannes/miniparquet, Parquet writer, functions read Parquet metadata, many improvements.","code":""}]