sf_ - to_list #70

dcooley · 2020-03-23T07:02:17Z

dcooley · 2020-03-27T23:57:59Z

@mdsumner I've added a new argument to the sf_ functions called list_columns, which you use to specify any columns of the df where you want to keep all the values

df <- data.frame(
  id = c(1,1,2,2)
  , x = 1:4
  , y = 1:4
  , val = letters[1:4]
  , stringsAsFactors = FALSE
)

sf_linestring(
  obj = df
  , x = "x"
  , y = "y"
  , linestring_id = "id"
  , keep = TRUE
  , list_columns = "val"
)

#   id  val   geometry
# 1  1 a, b 1, 2, 1, 2
# 2  2 c, d 3, 4, 3, 4

vs the current behaviour

sf_linestring(
  obj = df
  , x = "x"
  , y = "y"
  , linestring_id = "id"
  , keep = TRUE
)

## here in 'val', only the first row of the `df$val` is kept
#   id val   geometry
# 1  1   a 1, 2, 1, 2
# 2  2   c 3, 4, 3, 4

Do you foresee any issues if I add this arg to your sf-helper functions?

mdsumner · 2020-03-28T00:53:03Z

That's interesting. I don't think it can be a problem, but I'd need to try it with a few things.

It's strange one, effectively you are allow vertex attributes beyond x, y, z, m - by storing them one-to-one "elsewhere" (which is what every spatial system does when butting up against the limitations of SF).

This is very complementary to PATH in silicate, it tries to keep these vertex attributes and in theory allows for any set to be stored in the $vertex table - but it's only really functional for track data (with date-time, depth, temperature, whatever was measured) and not mature. The issue is $vertex is normalized (deduplicated) in some geometry xy, or xyz, or xyt etc and so other attributes (all in a higher dimensionality along with x,y that has no inherent duplication) end up on the $path_link_vertex table. It's powerful I think but not developed.

(Heh, sorry, but this might help bridge that model).

I'll try to have a look soonish. Thanks for the heads up!

mdsumner · 2020-03-28T01:03:58Z

Oh, do you have a real-world example data set that's relevant? Pretty much any tracking data is, and (not a great example but relevant) the walrus818 in trip:

d <- tibble::as_tibble(trip::walrus818)

# A tibble: 10,558 x 6
   Deployment DataDT                Wet Forage X_AED170_70 Y_AED170_70
        <int> <dttm>              <int>  <int>       <dbl>       <dbl>
 1        353 2009-09-15 04:00:00     1      0      281017       22532
 2        353 2009-09-15 05:00:00     0      0      281399       22392
...

sf_linestring(d, x = "X_AED170_70", y = "Y_AED170_70", z = "DataDT", linestring_id = "Deployment")

Here list_columns would be c("Wet", "Forage").

Q: does keep remain? So we don't have to specify the columns, "just do it"?

mdsumner · 2020-03-28T01:05:23Z

crs for Walrus818 fwiw:

crsmeta::crs_proj(trip::walrus818)
[1] "+proj=aeqd +ellps=WGS84 +lon_0=-170 +lat_0=70"

mdsumner · 2020-03-28T02:21:46Z

Q: does keep remain? So we don't have to specify the columns, "just do it"?

Oh I see, list-columns are separate, and compatible with keep - very good. ( at first I though you only added one column with 1 or more vectors in each list)

dcooley · 2020-03-28T07:50:58Z

Oh, do you have a real-world example data set that's relevant? Pretty much any tracking data

No, nothing specific, but my use-case is the multi-coloured path and polygon vertices, and multi-coloured trips (e.g. to represent loads on public transport)

effectively you are allow vertex attributes beyond x, y, z, m

yep, exactly.

mdsumner · 2020-03-28T09:55:07Z

All good! Had a bit of refresh on this concept and what I've tried before, this will be a sweet mechanism, with lots of implications across the tools I'm working on.

mdsumner · 2020-03-28T10:03:31Z

Oh, so one thing to consider - should the "extra" attributes be bundled into a single list column, where each element is a dataframe? Then it's only a do.call(rbind, df$listcol) away and no name handling to do: dplyr::bind_rows(df$listcol, .id = "feature_id") also does the same and makes id-ing the feature easier.

I'm not of a strong opinion, possibly I'd prefer data frames in each element (even if only one column) than a raw list.

dcooley · 2020-03-28T21:55:57Z

should the "extra" attributes be bundled into a single list column, where each element is a dataframe?

Do you have an example of this, I can't quite visualise what you mean?

mdsumner · 2020-03-28T22:49:59Z

Took a bit more than I thought, but this is what I mean:

 library(sfheaders)
 d0 <- data.frame(x = 1:10, y = 1:10, linestring_id = 1)
 
 d <- rbind(d0, 2 * d0)
 d$temp <- sort(c(runif(nrow(d)/2),
                  runif(nrow(d)/2, 2, 5)))
 d$fac <- sample(5:15, nrow(d), replace = TRUE)
 d$time <- Sys.time() + seq_len(nrow(d)) * 2
 
 datcols <- c("temp", "fac", "time")
 ## current scheme for list_columns = c("temp", "fac", "time")
 x <- sf_linestring(d, x = "x", y = "y", linestring_id = "linestring_id", 
                    keep = TRUE,
                    list_columns = datcols)
 ## each are separate lists
 x$temp
#> [[1]]
#>  [1] 0.1012976 0.1808249 0.3301957 0.3396338 0.3452928 0.4666439 0.5471586
#>  [8] 0.5992981 0.7777070 0.7850789
#> 
#> [[2]]
#>  [1] 2.435679 2.778072 2.801986 2.979986 3.047628 3.705985 3.831921 4.083824
#>  [9] 4.682265 4.910941
 x$fac 
#> [[1]]
#>  [1]  6  8  9 10 15 15 15  9  9  9
#> 
#> [[2]]
#>  [1]  7  7 15  6 10  8 13  9 15  9
 x$time
#> [[1]]
#>  [1] "2020-03-29 09:49:23 AEDT" "2020-03-29 09:49:25 AEDT"
#>  [3] "2020-03-29 09:49:27 AEDT" "2020-03-29 09:49:29 AEDT"
#>  [5] "2020-03-29 09:49:31 AEDT" "2020-03-29 09:49:33 AEDT"
#>  [7] "2020-03-29 09:49:35 AEDT" "2020-03-29 09:49:37 AEDT"
#>  [9] "2020-03-29 09:49:39 AEDT" "2020-03-29 09:49:41 AEDT"
#> 
#> [[2]]
#>  [1] "2020-03-29 09:49:43 AEDT" "2020-03-29 09:49:45 AEDT"
#>  [3] "2020-03-29 09:49:47 AEDT" "2020-03-29 09:49:49 AEDT"
#>  [5] "2020-03-29 09:49:51 AEDT" "2020-03-29 09:49:53 AEDT"
#>  [7] "2020-03-29 09:49:55 AEDT" "2020-03-29 09:49:57 AEDT"
#>  [9] "2020-03-29 09:49:59 AEDT" "2020-03-29 09:50:01 AEDT"



 ## but 1) could be a nested data frame
 x2 <- x
 x2$data <- split(d[datcols], d$linestring_id)[unique(d$linestring_id)]

 ## (or 2) a list, closer to current scheme)
 #x2$data <- lapply(split(d[datcols], d$linestring_id)[unique(d$linestring_id)], as.list)
 x2[datcols] <- NULL

## each separate column is bundled in the one list-column
 
x2$data
#> $`1`
#>         temp fac                time
#> 1  0.1012976   6 2020-03-29 09:49:23
#> 2  0.1808249   8 2020-03-29 09:49:25
#> 3  0.3301957   9 2020-03-29 09:49:27
#> 4  0.3396338  10 2020-03-29 09:49:29
#> 5  0.3452928  15 2020-03-29 09:49:31
#> 6  0.4666439  15 2020-03-29 09:49:33
#> 7  0.5471586  15 2020-03-29 09:49:35
#> 8  0.5992981   9 2020-03-29 09:49:37
#> 9  0.7777070   9 2020-03-29 09:49:39
#> 10 0.7850789   9 2020-03-29 09:49:41
#> 
#> $`2`
#>        temp fac                time
#> 11 2.435679   7 2020-03-29 09:49:43
#> 12 2.778072   7 2020-03-29 09:49:45
#> 13 2.801986  15 2020-03-29 09:49:47
#> 14 2.979986   6 2020-03-29 09:49:49
#> 15 3.047628  10 2020-03-29 09:49:51
#> 16 3.705985   8 2020-03-29 09:49:53
#> 17 3.831921  13 2020-03-29 09:49:55
#> 18 4.083824   9 2020-03-29 09:49:57
#> 19 4.682265  15 2020-03-29 09:49:59
#> 20 4.910941   9 2020-03-29 09:50:01

^{Created on 2020-03-29 by the reprex package (v0.3.0)}

mdsumner · 2020-03-28T22:53:16Z

fwiw, it's what nest() does

library(tidyverse)
 d %>% select(-x, -y) %>% group_by(linestring_id) %>% nest(data = datcols)
# A tibble: 2 x 2
# Groups:   linestring_id [2]
  linestring_id data             
          <dbl> <list>           
1             1 <tibble [10 × 3]>
2             2 <tibble [10 × 3]>

And it's what unjoin() does, but by separating and de-duplicating the coords if posssible (a core of silicate):

unjoin::unjoin(d, x, y, linestring_id)

SymbolixAU · 2020-03-29T20:45:01Z

Ok I see what you're suggesting.

However, my argument for keeping them separate is so you can run colourvalues over each individual list.

olourvalues::colour_values(x$temp)
[[1]]
 [1] "#440154FF" "#45065AFF" "#460A5DFF" "#471265FF" "#48176AFF" "#482273FF" "#472D7BFF" "#46337FFF" "#463480FF"
[10] "#433D85FF"

[[2]]
 [1] "#26818EFF" "#25858EFF" "#20928CFF" "#1F988BFF" "#1E9D89FF" "#57C766FF" "#B2DD2DFF" "#B3DD2DFF" "#C8E020FF"
[10] "#FDE725FF"

Which means it can go straight into mapdeck ;)

mdsumner · 2020-03-29T21:00:43Z

Ah ok, no worries. Appreciate the discussion has been clarifying :)

SymbolixAU · 2020-03-29T21:32:31Z

And thinking about it more, I see things like 'nesting' as very much in the realm of other packages (dplyr, data.table), and I think it would be a fair bit of effort to get it "right", when they already do it.

library(data.table)

dt <- as.data.table( d )
dt[
  , .(
    geometry = sfheaders::sfc_linestring(obj = dt, x = "x", y = "y")
    , data = list(.SD)
  )
  , .SDcols = setdiff(names(dt), c("x","y"))
  , by = .(linestring_id)
]
#    linestring_id                           geometry         data
# 1:             1 LINESTRING (1 1, 2 2, 3 3, ...,... <data.table>
# 2:             2 LINESTRING (1 1, 2 2, 3 3, ...,... <data.table>

Out of interest, what would be the dplyr equivalent of this data.table code?

I don't really speak dplyr, so I've got as far as

library(tidyverse)
d %>% 
  group_by(linestring_id) %>% 
  summarise(
    geometry = sfheaders::sfc_linestring(., "x", "y")
    )

SymbolixAU · 2020-03-29T21:36:42Z

But thinking some more... maybe it wouldn't be so hard...

We could have a list_data argument, as well as list_columns, which will combine the lists into your suggested data ...

SymbolixAU · 2020-03-29T21:44:30Z

yeah ok, let's do it

mdsumner · 2020-03-29T22:16:12Z

haha can't keep up, this does it (but is a bit complicated atm with upcoming dplyr/tibble changes)

  library(sfheaders)
d0 <- data.frame(x = 1:10, y = 1:10, linestring_id = 1)

d <- rbind(d0, 2 * d0)
d$temp <- sort(c(runif(nrow(d)/2),
                 runif(nrow(d)/2, 2, 5)))
d$fac <- sample(5:15, nrow(d), replace = TRUE)
d$time <- Sys.time() + seq_len(nrow(d)) * 2

datcols <- c("temp", "fac", "time")
## current scheme for list_columns = c("temp", "fac", "time")
x <- sf_linestring(d, x = "x", y = "y", linestring_id = "linestring_id", 
                   keep = TRUE,
                   list_columns = datcols)


library(tidyverse)
d %>% 
  group_by(linestring_id) %>% 
  summarise(
    geometry = sfheaders::sfc_linestring(., "x", "y"), 
    data  = list(select(ungroup(.), datcols))
  )
#> # A tibble: 2 x 3
#>   linestring_id geometry data             
#>           <dbl> <list>   <list>           
#> 1             1 <XY>     <tibble [20 × 3]>
#> 2             2 <XY>     <tibble [20 × 3]>

^{Created on 2020-03-30 by the reprex package (v0.3.0)}

SymbolixAU pushed a commit that referenced this issue Mar 24, 2020

fill_list code for #70

fd2215a

dcooley added a commit that referenced this issue Mar 24, 2020

some design for #70, could get messy though

4dc6028

SymbolixAU pushed a commit that referenced this issue Mar 27, 2020

implemented, but messy #70

fed2ed8

dcooley added a commit that referenced this issue Mar 27, 2020

working example for #70, now need tests and coverage

9f8256c

dcooley added a commit that referenced this issue Mar 27, 2020

inching closer #70

1db853e

dcooley added a commit that referenced this issue Mar 27, 2020

tests for #70

da7cc8d

SymbolixAU mentioned this issue Mar 29, 2020

nested data #72

Open

dcooley closed this as completed in fb4c40e Apr 1, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sf_ - to_list #70

sf_ - to_list #70

dcooley commented Mar 23, 2020 •

edited

Loading

dcooley commented Mar 27, 2020

mdsumner commented Mar 28, 2020 •

edited

Loading

mdsumner commented Mar 28, 2020

mdsumner commented Mar 28, 2020

mdsumner commented Mar 28, 2020

dcooley commented Mar 28, 2020

mdsumner commented Mar 28, 2020

mdsumner commented Mar 28, 2020

dcooley commented Mar 28, 2020

mdsumner commented Mar 28, 2020 •

edited

Loading

mdsumner commented Mar 28, 2020 •

edited

Loading

SymbolixAU commented Mar 29, 2020 •

edited

Loading

mdsumner commented Mar 29, 2020

SymbolixAU commented Mar 29, 2020

SymbolixAU commented Mar 29, 2020

SymbolixAU commented Mar 29, 2020

mdsumner commented Mar 29, 2020

sf_ - to_list #70

sf_ - to_list #70

Comments

dcooley commented Mar 23, 2020 • edited Loading

dcooley commented Mar 27, 2020

mdsumner commented Mar 28, 2020 • edited Loading

mdsumner commented Mar 28, 2020

mdsumner commented Mar 28, 2020

mdsumner commented Mar 28, 2020

dcooley commented Mar 28, 2020

mdsumner commented Mar 28, 2020

mdsumner commented Mar 28, 2020

dcooley commented Mar 28, 2020

mdsumner commented Mar 28, 2020 • edited Loading

mdsumner commented Mar 28, 2020 • edited Loading

SymbolixAU commented Mar 29, 2020 • edited Loading

mdsumner commented Mar 29, 2020

SymbolixAU commented Mar 29, 2020

SymbolixAU commented Mar 29, 2020

SymbolixAU commented Mar 29, 2020

mdsumner commented Mar 29, 2020

dcooley commented Mar 23, 2020 •

edited

Loading

mdsumner commented Mar 28, 2020 •

edited

Loading

mdsumner commented Mar 28, 2020 •

edited

Loading

mdsumner commented Mar 28, 2020 •

edited

Loading

SymbolixAU commented Mar 29, 2020 •

edited

Loading