Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

data.table + sf integration with ggplot2 regression #6707

Open
MichaelChirico opened this issue Jan 4, 2025 · 3 comments
Open

data.table + sf integration with ggplot2 regression #6707

MichaelChirico opened this issue Jan 4, 2025 · 3 comments

Comments

@MichaelChirico
Copy link
Member

          A related issue AFAICT is the `data.table` + `sf` integration with `ggplot2`. I've just discovered this as I was trying to replicate an old [blog post](https://grantmcdermott.com/fast-geospatial-datatable-geos/) of mine and was surprised to find that the following example no longer works:
library(sf)
#> Linking to GEOS 3.13.0, GDAL 3.10.0, PROJ 9.5.1; sf_use_s2() is TRUE
library(data.table)
library(ggplot2)
theme_set(theme_minimal())

nc = st_read(system.file("shape/nc.shp", package = "sf"))
#> Reading layer `nc' from data source `/usr/lib/R/library/sf/shape/nc.shp' using driver `ESRI Shapefile'
#> Simple feature collection with 100 features and 14 fields
#> Geometry type: MULTIPOLYGON
#> Dimension:     XY
#> Bounding box:  xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
#> Geodetic CRS:  NAD27
nc_dt = as.data.table(nc)

nc_dt[
  ,
  .(geometry = st_union(geometry)), 
  by = .(region = ifelse(CNTY_ID<=1980, 'high', 'low'))
]  |> 
  ggplot(aes(geometry=geometry, fill=region)) + 
  geom_sf()

As you can see, the plot extent has been cut off at the bounding box of the first row in the data.table (corresponding to the "high" region).

I can fix the problem by explicitly passing the original bounding box extent to a coord_sf() layer, but this is obviously more of a pain.

nc_dt[
  ,
  .(geometry = st_union(geometry)), 
  by = .(region = ifelse(CNTY_ID<=1980, 'high', 'low'))
]  |> 
  ggplot(aes(geometry=geometry, fill=region)) + 
  geom_sf() +
  coord_sf(xlim = st_bbox(nc)[c(1,3)], ylim = st_bbox(nc)[c(2,4)])

Created on 2025-01-03 with reprex v2.1.1

Originally posted by @grantmcdermott in #5352 (comment)

@xarrama
Copy link

xarrama commented Jan 7, 2025

+1
This issue is stoping me from using data.table + sf. I didn't know about the coord_sf trick. Thank you!

@ontam
Copy link

ontam commented Jan 9, 2025

I usually use it this way, and this step it’s also necessary if further geometry operations are performed after the union.

nc_dt[ , .(geometry = st_union(geometry)), by = .(region = ifelse(CNTY_ID<=1980, 'high', 'low')) ] |> sf::st_sf() |> ggplot(aes(geometry=geometry, fill=region)) + geom_sf()

If data.table worked with sf without issues, I wouldn’t occasionally find myself thinking about Python.

@kadyb
Copy link

kadyb commented Jan 22, 2025

I think one more workaround is to recalculate the bounding box of the geometry using:

nc_dt$geometry = sf::st_sfc(nc_dt$geometry, recompute_bbox = TRUE)

and then plot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants