-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Morpher to remove pseudo-nodes #70
Comments
HI @meltzerdl, I think it's a really interesting problem (somehow linked to ropensci/stplanr#416). I hope I will be able to work a little bit on this problem in the next days and update this comment with something more useful. |
Just to get it clear: In a situation EDIT That is, the problem explained here. The given answers there seem to imply that |
Yes, so the idea would be that the two edges (between A and B, and between B and C) would be dissolved into one edge, and node B would be removed. Interesting, I hadn't seen that explanation about an igraph answer. |
Sorry this took me some time. From what I found here, in a graph with nodes A, B and C and edges A-B and B-C, the process of removing node B and replacing edges A-B and B-C by a single edge A-C is called smoothing. Does that sound familiar to anyone, or is there a better name? For now, in the develop branch I implemented a first version of this, in the form of a spatial morpher called Note that you do need the development version to run this. Install with library(sfnetworks)
# Create a very simple network with some connected linestrings in Roxel.
net = as_sfnetwork(roxel[c(4, 5, 8), ], directed = FALSE)
net
#> # An sfnetwork with 4 nodes and 3 edges
#> #
#> # CRS: EPSG:4326
#> #
#> # An unrooted tree with spatially explicit edges
#> #
#> # Node Data: 4 x 1 (active)
#> # Geometry type: POINT
#> # Dimension: XY
#> # Bounding box: xmin: 7.537614 ymin: 51.94468 xmax: 7.540063 ymax: 51.9475
#> geometry
#> <POINT [°]>
#> 1 (7.540063 51.94468)
#> 2 (7.53822 51.94546)
#> 3 (7.537673 51.9475)
#> 4 (7.537614 51.94562)
#> #
#> # Edge Data: 3 x 5
#> # Geometry type: LINESTRING
#> # Dimension: XY
#> # Bounding box: xmin: 7.537614 ymin: 51.94468 xmax: 7.540063 ymax: 51.9475
#> from to name type geometry
#> <int> <int> <fct> <fct> <LINESTRING [°]>
#> 1 1 2 <NA> path (7.540063 51.94468, 7.539696 51.94479, 7.539466…
#> 2 3 4 Welsingh… reside… (7.537673 51.9475, 7.537614 51.94562)
#> 3 2 4 <NA> path (7.53822 51.94546, 7.538131 51.94549, 7.538027 …
# The network has four nodes, of which two are 'pseudo'.
plot(net) # The 'to_spatial_smoothed' morpher function removes these pseudo nodes.
# A new edge is be created between the remaining two nodes.
# The geometry of that edge is equal to the merged orignal edges.
net_smoothed = tidygraph::convert(net, to_spatial_smoothed)
net_smoothed
#> # An sfnetwork with 2 nodes and 1 edges
#> #
#> # CRS: EPSG:4326
#> #
#> # An unrooted tree with spatially explicit edges
#> #
#> # Node Data: 2 x 2 (active)
#> # Geometry type: POINT
#> # Dimension: XY
#> # Bounding box: xmin: 7.537672 ymin: 51.94468 xmax: 7.540063 ymax: 51.9475
#> geometry .tidygraph_node_index
#> <POINT [°]> <int>
#> 1 (7.540063 51.94468) 1
#> 2 (7.537673 51.9475) 3
#> #
#> # Edge Data: 1 x 5
#> # Geometry type: LINESTRING
#> # Dimension: XY
#> # Bounding box: xmin: 7.537614 ymin: 51.94468 xmax: 7.540063 ymax: 51.9475
#> from to geometry .tidygraph_edge_… .orig_data
#> <int> <int> <LINESTRING [°]> <list> <list>
#> 1 1 2 (7.537673 51.9475, 7.537614 51.9456… <int [3]> <tibble [3…
plot(net_smoothed) # In line with tidygraphs morphers:
# The data of the original edges are stored in a '.orig_data' column.
# This behaviour can be turned off by setting 'store_orignal_data = FALSE'.
net_smoothed %>%
activate("edges") %>%
tidygraph::pull(".orig_data")
#> [[1]]
#> Simple feature collection with 3 features and 5 fields
#> geometry type: LINESTRING
#> dimension: XY
#> bbox: xmin: 7.537614 ymin: 51.94468 xmax: 7.540063 ymax: 51.9475
#> CRS: EPSG:4326
#> # A tibble: 3 x 6
#> from to name type geometry .tidygraph_edge…
#> <int> <int> <fct> <fct> <LINESTRING [°]> <int>
#> 1 3 4 Welsin… reside… (7.537673 51.9475, 7.537614 51.9… 2
#> 2 1 2 <NA> path (7.540063 51.94468, 7.539696 51.… 1
#> 3 2 4 <NA> path (7.53822 51.94546, 7.538131 51.9… 3
# An additional parameter that can be set is 'require_equal_attrs'.
# This will only merge edges when their attributes are equal (default = FALSE).
# In our example network they were not.
# So smoothing does not remove any nodes.
tidygraph::convert(net, to_spatial_smoothed, require_equal_attrs = TRUE) %>%
plot() Created on 2020-09-27 by the reprex package (v0.3.0) |
Just tried this and I confirm, it works. It looks like a solid implementation to me. I tried the following: remotes::install_github("luukvdmeer/sfnetworks", ref = "develop")
library(sfnetworks)
# Create a very simple network with some connected linestrings in Roxel.
net = as_sfnetwork(roxel[c(4, 5, 8), ], directed = FALSE)
net
plot(net)
net_smoothed = tidygraph::convert(net, to_spatial_smoothed)
net_smoothed
plot(net_smoothed)
net_smoothed %>%
activate("edges") %>%
tidygraph::pull(".orig_data")
tidygraph::convert(net, to_spatial_smoothed) %>%
plot() Confirmed, without the Is that the same thing? I guess 'removal of pseudo-nodes' is a particular type of contraction. @fiftysevendegreesofrad, developer of the sDNA package for spatial network analysis, has written some excellent documentation on the wider topic of network preparation: https://sdna.cardiff.ac.uk/sdna/wp-content/downloads/documentation/manual/sDNA_manual_v3_4_5/network_preparation.html. Would be interested in your thoughts on the terminology Crispin, would you call this pseudo-node removal, smoothing, contraction or something else? Also heads-up as I suspect it may be the kind of thing that you or others have already implemented in a lower level and potentially faster-iterating language such as C++. |
Presume we're talking about removing nodes with only 2 links joining. The accepted terminology for this seems to be removing pseudo-nodes these days. Though in sDNA I call it fixing split links (and yes there is a c++ implementation in https://github.com/fiftysevendegreesofrad/sdna_open though it leans heavily on sDNA's internal network representation so may not be that useful to you). Incidentally one thing to consider in such operations is what happens to data attached to the links - you can choose to handle this externally of course using GIS join operations for example, or assume numerical data is summed or averaged (if the latter, weighted or not by unit length), and then there's the question of what happens to text... |
Updated version of this. Now named suppressPackageStartupMessages({
library(sfnetworks)
library(tidygraph)
library(igraph)
})
net = as_sfnetwork(roxel)
paste(vcount(net), "nodes and", ecount(net), "edges")
#> [1] "701 nodes and 851 edges"
# Without requiring equal attributes.
smoothed = convert(net, to_spatial_smooth)
paste(vcount(smoothed), "nodes and", ecount(smoothed), "edges")
#> [1] "652 nodes and 802 edges"
plot(net, cex = 0.8, col = "red")
plot(smoothed, add = TRUE) # With requiring equal attributes.
smoothed = convert(net, to_spatial_smooth, require_equal_attrs = TRUE)
paste(vcount(smoothed), "nodes and", ecount(smoothed), "edges")
#> [1] "675 nodes and 825 edges"
plot(net, cex = 0.8, col = "red")
plot(smoothed, add = TRUE) Created on 2020-11-04 by the reprex package (v0.3.0) |
Hi @luukvdmeer! Would you consider adding a progress bar to this new morpher? I think it could be created using txtProgressBar(), setting If you like the idea I can add a PR during the weekend. |
Yes, if that works would be great! I was already scared for performance on large networks.. Could you try it also with the same network but after removing edge geometries (i.e. spatially implicit edges)? Then we can see if the geometry splitting is the part that really slows things down (and might consider to use your implementation of st_split if that makes a significant difference..?). EDIT: Sorry, I was confused. There are no geometries splitted in this function. Geometries are concatenated. |
Hi @luukvdmeer, I was thinking about this problem and I have one (absolutely not urgent) question. Here you say that:
I don't understand why you need to iterate over all nodes instead of applying something like:
I did not try to implement the algorithm described above (so maybe there are some obvious drawbacks, I'm not sure), but I don't understand why you need to check the nodes one by one. I just want to discuss these ideas with you before spending several days on something that you have already explored. EDIT: Actually I just checked the code more carefully and that was a stupid question, sorry. If possible I will try to improve the step 2 of the algorithm (i.e. merging the edges). |
I think this works faster in osmnx. here is the function they are using (I didn't delve into it), maybe it can be used as inspiration: |
I reasoned this would not work good in situation where there are chains of pseudo nodes. In a situation where we have A -- b -- C -- d -- E, with upper-case letters being 'junction' nodes and lower-case letters being pseudo nodes, it is indeed easy to just find and remove the pseudo nodes b and d separately and then create the new edges A -> C after removing b and C -> E after removing d. But when C is also a pseudo node, i.e. we have A -- b -- c -- d -- E, we have to identify that b, c and d form a chain of pseudo nodes and only add one new edge A -> E after removing them. Currently this is implemented in a way that we first process b and replace A -> b and b -> c by a new edge A -> c, then process c and replace A -> c and c -> d by a new edge A -> d, and finally process d and replace A -> d and d -> E by a new edge A -> E. But this iterative processing is not so fast.. I totally agree there must be more efficient ways to do this than the current 'dirty' implementation. Maybe do the looping only to find the indices of pseudo nodes and group them in chains, and then do the concatenation of geometries in one go afterwards. Something like that. But I did not give it enough thought yet. Getting inspired by osmnx sounds like a good idea! |
This is now part of the new version as the It does exactly what is has to do, but I'll leave this issue open because we definitely need to improve its speed. |
btw, practically, I'm using python's osmnx to download the data I need and simplify pseudo nodes (it works really fast, didn't compare) and sfnetworks to analyze it, using reticulate in the middle. Maybe this should be in discussions, but a function to convert an osmnx network to sfnetworks may be required. Will try to provide script |
@idshklein have you tried downloading networks with |
Ping @agila5 I managed to get this +/- 20 times faster. Can be tested! |
Hi @luukvdmeer! Thank you very much for the update, I will test the new version sometimes during the next days. Meanwhile, happy new year 🎉 🥳 |
Some testing on the data I plan to use for my coming talk in Spanish on sfnetworks: library(sfnetworks)
library(tidygraph)
library(igraph)
load('E:/Personal_projects/Rspatialnetworks/sfnetworks-REspacialES/data/cuenca.rda')
(net = as_sfnetwork(cuenca))
#> # A sfnetwork with 12283 nodes and 8999 edges
#> #
#> # CRS: EPSG:4326
#> #
#> # A directed multigraph with 3646 components with spatially explicit edges
#> #
#> # Node Data: 12,283 x 1 (active)
#> # Geometry type: POINT
#> # Dimension: XY
#> # Bounding box: xmin: -79.0708 ymin: -2.935008 xmax: -78.91145 ymax: -2.833441
#> geometry
#> <POINT [°]>
#> 1 (-79.028 -2.906154)
#> 2 (-79.02803 -2.90633)
#> 3 (-78.95973 -2.893506)
#> 4 (-78.96173 -2.893512)
#> 5 (-79.00801 -2.891764)
#> 6 (-79.00924 -2.89848)
#> # ... with 12,277 more rows
#> #
#> # Edge Data: 8,999 x 9
#> # Geometry type: LINESTRING
#> # Dimension: XY
#> # Bounding box: xmin: -79.08642 ymin: -2.960194 xmax: -78.90349 ymax:
#> # -2.779669
#> from to name type lanes oneway sidewalk surface
#> <int> <int> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 1 2 <NA> resi~ <NA> no <NA> <NA>
#> 2 3 4 <NA> trun~ 1 yes right asphalt
#> 3 5 6 Juan~ resi~ 2 yes both sett
#> # ... with 8,996 more rows, and 1 more variable: geometry <LINESTRING [°]>
# Number of pseudo-nodes (note that roxel has 49)
pseudo = degree(net, mode = "in") == 1 & degree(net, mode = "out") == 1
length(pseudo[pseudo])
#> [1] 1541
microbenchmark::microbenchmark(convert(net, to_spatial_smooth), times = 10)
#> Unit: seconds
#> expr min lq mean median uq
#> convert(net, to_spatial_smooth) 3.132143 3.406496 3.499677 3.488304 3.659073
#> max neval
#> 3.711856 10 |
In the most recent release some improvements have been made again. There is still a bug left, but this is documented in a separate issue, see #112 Of course, new issues may still arise after the function gets used more often, but I think the current implementation is enough to close this one. |
Great functions in this package! I'm trying to find a way to remove pseudonodes. I want to take a road dataset and create two data outputs based on it: road segments and intersections. However, it's important that pseudonodes are removed in creating both outputs. In my case, I am defining pseudonodes as anywhere only two line segments meet at a node (regardless of whether attributes are the same or different for the two segments). So, I would need to take the roads network, merge segments with pseudonodes, and then produce a set of points at all road intersections (i.e. where 3 or more road segments meet).
I've used stplanr for this in the following way, with success:
Roads <- st_cast(roads_input_data, "MULTILINESTRING") %>%
st_combine() %>%
as_Spatial() %>%
gsection() #from stplanr
Roads_output <- st_as_sf(Roads)
Nodes_output <- SpatialLinesNetwork(Roads) %>% #from stplanr
sln2points() %>% #from stplanr
st_as_sf()
However, for large statewide roads datasets this process is so slow as to be impractical and I'm looking for a faster way. If I could run it in parallel, cutting up the dataset by county (or smaller) and then putting the final dataset together, that could speed it up. however, in this case I would still need a way to remove pseudonodes produced by process of cutting up the roads data.
So, ideally it would be nice to have a function to remove pseudonodes; i.e. any node where there are only two line segments meeting, regardless of attributes. Could have an option to select certain attributes must match, as some folks might find that important.
Thank you!
The text was updated successfully, but these errors were encountered: