Replies: 3 comments 1 reply
-
Interesting, yes. I'd like to know how 'generic' we can get here, aka: for an as large as possible number of (tabular) input data sets out in the wild, what sort of input would we typically need from a user so we can reliably convert (qualify) those tables into edge-, edge attribute-, and node attribute-) data? The example here is very specific for a particular data set, we would need the user to qualify what the unique node id is (solder_id), and what the attributes are that connect nodes and in what way (unit_id & date range). The latter would also translate into edge attributes I guess. Anyway, something to look for in our collected example datasets. It might turn out we will have to have one sql query per dataset, but hopefully it'll be more like us finding 4 or 5 general scenarios that could cover 80% of the data we can typically expect. |
Beta Was this translation helpful? Give feedback.
-
As an example: if a dataset has an obvious, unique node identifier, as well as a start and end date, we could always create network data where the edges are overlaps in time between two nodes. Or, if we had longtitude/latitude, we should be able to create network data where the edges would be created using how close two nodes where (with maybe a cut-off for when two nodes are not considered 'close' -- this could also go into the edge-weight for such a graph). How much sense that makes, in most cases, is another question of course. One we should also strive to get a handle on. |
Beta Was this translation helpful? Give feedback.
-
How do existing tools help with this, does anyone have any examples here? |
Beta Was this translation helpful? Give feedback.
-
Extracting edge data from one or several tables can be part of data wrangling or preprocessing before starting with a graph creation workflow. Not all data comes in perfectly parsed and structured edge-list format that is usually required to create a graph with a network analysis library (networkX, igraph, networkit ..).
One of the Jupyter notebooks for the network analysis workflow already details edge extraction from a denormalized database output: Network Analysis with NetworkX. There, the extraction is done by using groupby (to extract unique source-target pairs) and the aggregate function (to count the weights).
Sometimes an extraction operation is more complex and it might be more useful to use SQL in this case. (See also this discussion: #10)
The following case could serve as a template for a variety of extraction problems. In this example a new network between items in the first column (network of participants in this case) is created based on a common observation in another column (common event) and an existing overlapping time range in (start_date : end_date).
Orig. table
New table
Example cases that partly fit this schema can be found here:
https://stackoverflow.com/questions/64636430/projecting-dynamic-bi-partite-two-mode-network-where-only-edges-overlapping-in-t
https://stackoverflow.com/questions/53824502/how-can-i-select-the-employees-who-worked-the-longest-time-together-on-one-proje
https://stackoverflow.com/questions/7486144/efficient-projection-of-a-bipartite-graph-in-python-using-networkx
https://stackoverflow.com/questions/58396361/find-overlap-time-ranges
Python solution
SQL solution (Markus)
Beta Was this translation helpful? Give feedback.
All reactions