forked from rich-iannone/DiagrammeR-docs
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path02-graph_creation.Rmd
executable file
·441 lines (331 loc) · 21.5 KB
/
02-graph_creation.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
# Graph Creation {#creation}
Creating a graph object is undoubtedly important. I dare say it is one of the fundamental aspects of the **DiagrammeR** world. With the graph object produced, so many other things are possible. For instance, you can inspect certain aspects of the graph, modify the graph in many ways that suit your workflow, view the graph (or part of the graph!) in the **RStudio** Viewer, or perform graph traversals and thus create complex graph queries using **magrittr** (`%>%`) or **pipeR** (`%>>%`) pipelines. The possibilities are really very exciting and it all begins with creating those graph objects.
## Getting Started
Before we dive into making a graph with DiagrammeR we'd want to be sure some things are in order. First, we need to make sure we have the package installed in **R**. While the package is available on **CRAN**, it's recommended that the development version of **DiagrammeR** is used. It's available on GitHub and to install packages from there, we need to use the **devtools** package. (If not installed use `install.packages("devtools")`.) After that, get the development build of **DiagrammeR** using `devtools::install_github("rich-iannone/DiagrammeR")`. Finally, load the package with:
```{r load_packages, results=FALSE}
library(DiagrammeR)
```
A few notes are useful here, the package automatically enables the **magrittr** pipe operator (`%>%`) which allows for chaining between statements in a left-to-right, easy-to-read manner. The **pipeR** package's `%>>%` (which works very similarly) is not loaded by **DiagrammeR** but it can be used. In the examples throughout, we'll stick with **magrittr**'s pipe.
## Simple Creation and Visualization
Let's get to making a graph. The `create_graph()` function creates a graph object and, optionally, allows for intialization of nodes, edges, and a few global attributes for the graph. We can create an empty graph by simply using `create_graph()` as is.
```{r create_empty_graph}
graph_e <- create_graph()
```
This really is an empty graph (no nodes) and we can quickly verify this by using the `node_count()` function.
```{r node_count_empty_graph}
node_count(graph_e)
```
You can add individual nodes to a graph by using the `add_node()` function. Let's add two nodes in the most minimal fashion:
```{r add_2_nodes_to_graph}
graph_1n <- add_node(graph = graph_e)
graph_2n <- add_node(graph = graph_1n)
```
This creates 2 nodes with ID values `1` and `2` (ID values are set for you as auto-incrementing integers). There are a few functions that can be used to check that these additions did occur. We can get a count of nodes as before with the `node_count()` function:
```{r get_node_count_3_graphs}
# Count the number of nodes in each graph produced
c(node_count(graph = graph_e),
node_count(graph = graph_1n),
node_count(graph = graph_2n))
```
Alternatively, we can use the `get_node_ids()` function to return a vector of node ID values:
```{r get_node_ids_1_graph}
# Get the node ID values for `graph_2n`; note that
# the data argument is `x` and not `graph`
get_node_ids(x = graph_2n)
```
Finally, we can visualize the graph by using the `render_graph()` function. The nodes typically appear very large when there are few of them but scale down in size if there are many. The values shown inside the nodes are the automatically-assigned node ID values. If we were to assign `label` values to each of the nodes - which could be done when using `add_node(label = "label_name")`, for example - those values would be shown instead.
```{r render_graph_2n}
render_graph(graph = graph_2n)
```
Adding nodes does not automatically add edges. We can check that there are no edges in the graph by using the `edge_count()` function. A graph with no edges will return `0`.
```{r edge_count_empty_graph}
edge_count(graph = graph_2n)
```
The vast majority of the functions in **DiagrammeR** have the `graph` argument (or `x` as we've seen) as their first argument. This is useful in the context of using the **magrittr** pipe operator, `%>%` as we can start with a graph object, perform a transformation on that graph, expect a modified graph, and use that as input for the next transformation function, all while joining such operations with the pipe. With this piping approach, we can create the same graph as before (2 nodes, no edges) by chaining with `%>%`:
```{r create_graph_add_nodes_pipes}
# Create the equivalent `graph_2n` object
# but use `%>%` to avoid intermediate objects
graph_2n_piped <-
create_graph() %>%
add_node() %>%
add_node()
# Verify that the graph contains 2 nodes
node_count(graph_2n_piped)
```
The benefits are pretty obvious over the *nested* approach that avoids creating intermediate graph objects.
```{r create_graph_2n_nested_funs}
# Create the equivalent `graph_2n` object
# using nested function calls
graph_2n_nested_fcns <-
add_node(
add_node(
create_graph()))
# Verify that the graph contains 2 nodes
node_count(graph_2n_nested_fcns)
```
With the `%>%`, the sequence of operations is more easily readable (left to right), plus, only one name to create! The less names the better. Moreover, and not shown above, the situtation for the nested approach gets more confusing as we add arguments and their values (of which, there can be many).
So, we have made a graph with 2 nodes. The next reasonable thing to do would be adding an edge between the nodes. By default, new graphs produced with `create_graph()` are *directed graphs*. To review, a directed graph is one where any edge between a pair of nodes has a defined direction (e.g., by the definition `1->2`, we mean to say that edge is directed from node `1` to node `2`). This book mainly deals with directed graphs but we may occasionally delve into *undirected graphs* (where edges have no specified direction between nodes). At any rate, using `create_graph(directed = FALSE)` will create an empty graph designated as undirected (i.e., any edges added will be undirected).
Let's use a pipeline with `%>%` to create a graph with 2 nodes and the edge with definition `1->2`. This requires three different functions (`create_graph()`, `add_node()`, and `add_edge()`):
```{r create_graph_2n_1e}
# Create a graph with 2 nodes and 1 edge
graph_2n_1e <-
create_graph() %>%
add_node() %>%
add_node() %>%
add_edge(from = 1, to = 2)
# Describe the graph
paste("This graph has:",
node_count(graph_2n_1e), "nodes,",
edge_count(graph_2n_1e), "edges")
```
We can be sure we created the correct edge definition (`1->2`) by using the `get_edges()` function:
```{r get_edges_graph_2n_1e}
# The graph has 1 edge, what is its
# edge definition?
get_edges(graph_2n_1e)
```
We now have this simple and small graph. We can view it again by calling the `render_graph()` function. You probably do not want to assign the graph to an object when calling `render_graph()` (as you would likely mistakenly overwrite a graph you've previously made). We are purely using this function for its side effect, which is viewing. We can view the graph using the **Graphviz** renderer:
```{r render_graph_graphviz}
# Show the graph using the Graphviz engine
render_graph(graph = graph_2n_1e)
```
And also, if you like interactivity and fluid physical motions, the **visNetwork** renderer.
```{r render_graph_visnetwork}
# Show the graph using the visNetwork engine
render_graph(graph = graph_2n_1e, output = "visNetwork")
```
Just as we've built up a graph, we can do the opposite and remove edges and nodes. The key functions here are `delete_edge()` and `delete_node()`. Let's remove the edge we just recently added and then remove each of the nodes, leaving us again with an empty graph. Note again that we don't need to repeat the graph's object name throughout this pipeline (which is nice, saves typing). Second useful note: typing `delete_edge(` and then hitting the tab in **RStudio** brings up useful information on the function's argument names along with useful descriptions. I find this very helpful and use this feature quite often.
```{r delete_node_delete_edge}
# Remove the edge from the graph, then,
# remove each of the 2 nodes
graph_2n_1e_empty <-
graph_2n_1e %>%
delete_edge(from = 1, to = 2) %>%
delete_node(node = 2) %>%
delete_node(node = 1)
# Verify that there are no nodes left
# in this graph object
node_count(graph_2n_1e_empty)
```
There are some shortcuts/variations for doing the same thing (there are often numerous ways to transform graphs). You could simply call `delete_node()` twice, for instance, to get an empty graph. This is because removing a node with edges attached will automatically remove those edges. If we were now to display the graph with `render_graph()` you would get a field of nothingness. Since that's not very interesting, I'm not going to show it here.
## Creation Using Data Frames
For many graph diagrams you may need many nodes and edges. Let's use the `create_node_df()` function to specify a collection of nodes and contain them in a data frame (a *node data frame*, or *ndf*). Immediately after that, inspect the node data frame.
```{r create_node_df}
ndf <-
create_node_df(
n = 3,
label = TRUE)
ndf
```
The `n` argument is required here and it must indicate the number of nodes you intend this object to contain. The use of `label = TRUE` allows for copying of the node IDs as the node `label` (which is a node attribute). This is not always desirable, however. A better option is to specify a vector of label values (you can use all manner of characters, it will be coerced to a `character` vector). Make certain that this vector is the same length as specified by `n`. Also, if we ensure that the `label` node attribute always contains unique values, we can later select individual nodes by their `label` values and perform actions on these selections.
You may have noticed the node attribute `type` in the output. Values may optionally be provided for this attribute and, again, having this extra metadata is useful for categorizing collections of nodes. Let's refine the `ndf` object and include two different `type` values (`A` and `B`).
```{r create_ndf_type}
ndf <-
create_node_df(
n = 3,
type = c("A", "A", "B"),
label = TRUE)
ndf
```
Now onto the edges, those connections between the nodes. The edges are also collected in a data frame (this time, as an *edge data frame* or *edf*). The `create_edge_df()` function is used to generate this type of object.
```{r create_edge_df}
edf <-
create_edge_df(
from = c(1, 1),
to = c(2, 3))
edf
```
The `from` and `to` arguments specify which nodes for the edge are outgoing and incoming, respectively. Here, the edges are: `1->2` and `1->3`. As stated before, for directed graphs, the order is essential. The `rel` argument allows for the inclusion of text labels in the same manner as the node `type`. This is useful for targeting specific groups of edges during a selection or traversal. Let's refine the `edf` object and include two different `rel` values (`X` and `Y`).
```{r create_edf_rel}
edf <-
create_edge_df(
from = c(1, 1),
to = c(2, 3),
rel = c("X", "Y"))
edf
```
Now that we have an *ndf* and an *edf*, we can combine those into a new graph object by using these specialized data frames within the `create_graph()` function call.
```{r create_graph_ndf_edf}
# Create a graph object using node and
# edge data frames
graph_ndf_edf <-
create_graph(
nodes_df = ndf,
edges_df = edf)
```
What exactly happened? These data frames (`ndf` and `edf`) were placed within the graph object when it was created. They essentially became internal `ndf` and `edf` objects. We can inspect the graph's internal *ndf* and *edf* at any time using the `get_node_df()` and `get_edge_df()` functions:
```{r inspect_internal_ndf}
# Show the graph's internal node data frame
get_node_df(graph_ndf_edf)
```
```{r inspect_internal_edf}
# Show the graph's internal edge data frame
get_edge_df(graph_ndf_edf)
```
Let's view the graph using `render_graph()`. The output will clearly show us how the 3 nodes are connected to each other.
```{r render_graph_ndf_edf}
# Show the graph using `render_graph()`
render_graph(graph = graph_ndf_edf)
```
There is a bit more that you can do with node and edge data frames. Extra columns (or attributes) filled with values can be used for several purposes:
- to associate data values that relate to each node or edge
- to provide styling attributes such as color names or relative node sizes
We can add these extra columns/attrs when making the node or edge data frames. Here is an example where `color` attribute values for nodes and edges is provided along with some `fillcolor` values for nodes.
```{r generate_render_graph_w_colors}
# Create a node data frame
```
## Adding Basic Attributes
Note that whenever we use the default values for `type` or `label` in each `add_node()` call, we don't get values for the `type` attribute and the `label` attribute is assigned the node ID value. In the ideal case, values for `type` and `label` are supplied. Something to keep in mind is that including `label` values that are unique or distinct across all nodes in the graph will make it possible to specify node selections and perform useful actions on specific nodes. Let's create the `graph` object once more with `type` and `label` node attributes included.
```{r create_graph_add_nodes_labels_types}
graph_node_type_label <-
create_graph() %>%
add_node(type = "number", label = "one") %>%
add_node(type = "number", label = "two")
```
View the graph's internal node data frame with the `get_node_df()` function so we can see that these attributes have been included alongside the graph's nodes.
```{r get_node_df_graph_node_type_label}
get_node_df(graph_node_type_label)
```
Now let's add a single, directed edge between nodes `1` and `2` using `add_edge()`. This edge will also be given a value for its `rel` attribute (`to_number`). After adding the edge to the graph, use the `get_edges()` function to show that the edge has been produced.
```{r create_graph_w_ids}
# Add an edge between nodes `1` and `2` and
# set the `rel` attribute as `to_number`
graph_edge_w_ids <-
graph_node_type_label %>%
add_edge(
from = 1, to = 2,
rel = "to_number")
# Display the graph's edges (in the default
# string vector format with node IDs separated
# by arrows in this directed graph case)
graph_edge_w_ids %>% get_edges()
```
Perhaps you don't want to work directly with the node ID values and instead with unique node labels. This is a common practice as node ID values can be considered as less meaningful (they are not assigned by the user) but node labels and other attributes can give each node an identity and make nodes more distinguishable. In such a workflow, it's easier to create edges based on the node `label` values. Supply the node labels as values for the `from` and `to` arguments and set `use_labels` to `TRUE`.
To view the graph's edges after the transformation, use `get_edges()` as before but, this time, use `return_values = "label"` to display the graph's edges in terms of node `label` values.
```{r create_graph_w_labels}
# Add an edge between the nodes with labels
# `one` (node `1`) and `two` (node `2`) and
# set the `rel` attribute as `to_number`
graph_edge_w_ids <-
graph_node_type_label %>%
add_edge(
from = "one", to = "two",
rel = "to_number",
use_labels = TRUE)
# Display the graph's edges (as a string-based
# vector with pairs of node `label` values)
graph_edge_w_ids %>% get_edges(return_values = "label")
```
The `get_edges()` function can output the pairs of nodes in edges either as a character vector (as above, which is the default), as a data frame (with 2 columns: `from` and `to`), or as a list (first component is the `from` vector and the second represents the `to` nodes). Here are examples of the latter two output types:
```{r get_edges_return_type_df}
# Get the graph's edges as a data frame
get_edges(graph_edge_w_ids, return_type = "df")
```
```{r get_edges_return_type_list}
# Get the graph's edges as a list
get_edges(graph_edge_w_ids, return_type = "list")
```
The addition of a node and the creation of edges can also be performed in a single `add_node()` step. You can use either (or both) of the optional `from` and `to` arguments in the `add_node()` function. Let's make various graph objects and see how both nodes and edges can be created with a single call to `add_node()`.
```{r add_node_edge_from_to}
# Add initial node (ID `1`) and then
# add node `2` and edge `1->2`
graph_a <-
create_graph() %>%
add_node() %>%
add_node(from = 1)
# Add initial node (ID `1`) and then
# add node `2` and edge `2->1`
graph_b <-
create_graph() %>%
add_node() %>%
add_node(to = 1)
# Add 2 initial nodes (IDs `1` and
# `2`) and then add node `3` and edges
# `2->3` and `3->1`
graph_c <-
create_graph() %>%
add_node() %>%
add_node() %>%
add_node(from = 2, to = 1)
# Get all of the edges available in
# each of the graphs created
list(graph_a = get_edges(graph_a),
graph_b = get_edges(graph_b),
graph_c = get_edges(graph_c))
```
There are many other ways to generate a node and connect that new node to existing nodes. The `from` and `to` arguments of `add_node()` also accept vectors of length greater than 1. So, a new node can be connected to or from multiple nodes already in the graph. To make an example of this more succinct, we can use the node creation function `add_n_nodes()`. Supplying a number for the `n` argument creates *n* nodes in the graph. The `add_n_nodes()` function has no means to create edges like `add_node()` but it's a great way to simply add a lot of nodes to the graph with one function call. Below, an example of adding one node to many nodes:
```{r add_node_edge_from_to_multiple}
# Create a graph, add 5 nodes, and then
# add a node with edges to nodes `1` to `5`
graph_d <-
create_graph() %>%
add_n_nodes(n = 5) %>%
add_node(to = 1:5)
# View the graph in the RStudio Viewer
render_graph(graph = graph_d)
```
While this works and produces the result that was intended, it's slightly inconvenient. We have to mentally keep track of which node ID values were created and use those directly in the `from` or `to` arguments to create the edges. A better way is to capture a selection of nodes and perform graph transformations with an active selection:
```{r add_node_edge_from_to_multiple_w_selection}
# Create a graph, add 5 nodes, set those nodes
# as a node selection, and then add a new node
# with edges to all nodes in the selection
graph_e <-
create_graph() %>%
add_n_nodes(n = 5) %>%
select_last_nodes_created() %>%
add_node(to = get_selection(.))
# View the graph in the RStudio Viewer
render_graph(graph = graph_e)
```
This produces the same graph as before but, this time, we didn't have to manually supply node ID values. The `select_last_nodes_created()` function simply made a selection of node ID values and we retrieved those IDs using the `get_selection()` function. The dot (`.`) as the sole argument referred to the graph itself, which is needed for the `graph` argument to `get_selection`.
While the graph was constructed to our specification, the nodes and the edges within that graph do not have their basic attributes filled with values. We can check this using `get_node_df()` and `get_edges_df` and we see that values for the `type`, `label`, and `rel` attributes are all `NA` values.
```{r check_internal_node_df}
get_node_df(graph = graph_e)
```
```{r check_internal_edge_df}
get_edge_df(graph = graph_e)
```
These values can be added later. All `type` and `label` values for nodes can be specified using `set_node_attrs()`. Likewise, all `rel` values for edges can be set with the `set_edge_attrs()` function. To do this unconditionally to all nodes and edges in the graph:
```{r set_edge_attrs_f}
graph_f <-
graph_e %>%
set_node_attrs(
node_attr = "label",
values = c("one", "two", "three",
"four", "five", "six")) %>%
set_node_attrs(
node_attr = "type",
value = "a") %>%
set_edge_attrs(
edge_attr = "rel",
values = "to_number")
```
To verify that the changes were applied, use the `get_node_df()` and `get_edge_df()` function to output the graph's internal node and edge data frames.
```{r get_node_df_all_attrs_set}
get_node_df(graph_f)
```
```{r get_edge_df_all_attrs_set}
get_edge_df(graph_f)
```
Alternatively, we can use the `get_node_attrs()` and `get_edge_attrs()` functions to look at individual attribute values for graph nodes and edges. By supplying the graph object and the name of the attribute (e.g., `type` for nodes, `rel` for edges), we get a named vector of node or edge attribute values.
```{r get_node_edge_attr_values_as_vectors}
# Get a vector of values for the `type` node
# attribute; this returns a named vector (where
# the names are the node ID values)
type_node_attr_values <-
get_node_attrs(x = graph_f, node_attr = "type")
# Get a vector of values for the `rel` edge
# attribute; this also returns a named vector
# (where the names are the edge definitions)
rel_edge_attr_values <-
get_edge_attrs(x = graph_f, edge_attr = "rel")
# Place these node and edge attribute vectors in
# a list and display it
list(type_node_attr_values = type_node_attr_values,
rel_edge_attr_values = rel_edge_attr_values)
```
View the graph again to see that all edges are labeled with the `to_number` `rel` edge attribute. We will render the graph using `output = visNetwork` since that rendering method automatically includes node `label` and edge `rel` values (but not the edge `type`).
```{r view_graph_all_rel_set, out.width='80%', fig.asp=.75, fig.align='center'}
render_graph(graph = graph_f, output = "visNetwork")
```
Go ahead, play with the graph by dragging nodes this way and that way. It's fun! Graphs should always be this fun...