forked from hadley/adv-r
-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathSoftwareSystems.rmd
403 lines (340 loc) · 15.9 KB
/
SoftwareSystems.rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
---
title: Software systems
layout: default
---
# Writing software systems
At the most basic level, an R program, like any other program is a sequence
of instructions written to perform a task. Programs consist of data
structures, which hold data, and functions, which define things a program
can do. You are already familiar with the native R data structures: vectors,
lists, data frames, etc. And you have already seen the functions that
access and manipulate these functions. However, as you design your
own systems on top of R you will eventually want to create your own
data structures. After these new types are defined you may want
to create specialized functions that operate on your new data structures.
In other cases you may want to extend existing systems to take advantage
of your new functionality. This chapter shows you how to build new
software systems that can "plug into" R's existing functionality
and allows other users to extend your new capabilities.
Data structures are generally associated with a set of functions that
are created to work with them. The data structures and their functions
can be encapsulated to create classes. Classes help us to compartmentalize
conceptually coherent pieces of software. For example, an R vector is a
class holding a sequence of atomic types in R. We can create an instance
of a vector using one of R's vector creation routines.
x <- 1:10
length(x)
The variable x is an object of type vector. Where the class
describes what the data structure will look like an object is an actual
instance of that type.
Objects are associated with functions that let us do things like access and
manipulate the data held by an object. In the previous example the
length function is associated with vectors and allows us to find out how many
elements the vector holds.
R provides three different constructs for programming with classes, also
called object oriented (OO) programming, S3, S4, and R5. The first two S3 and S4
are written in a style called generic-function OO. Functions that may be
associated with a class are first defined as being generic. Then
methods, or functions associated with a specific class, are defined much
like any other function. However, when an instance of an object is passed
to the generic function as a parameter, it is dispatched to its associated
method. R5 is implemented in a style called message-passing OO. In this style
methods are directly associated with classes and it is the object that
determines which function to call.
For the rest of this chapter we are going to explore the use of S3, S4, and
R5 to generate sequences. Along with building a general system
for generating sequences we are going to create classes that generate
the Fibonacci numbers, one by one. As you probably already know, the
Fibonacci numbers follow the integer sequence
0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144...
and are defined by the recurrence
F(0) = 0
F(1) = 1
F(k) = F(k-1) + F(k-2).
These numbers can easily be generated in R using the familiar vectors and
functions that you already know. An example of how to do this is
provided below. It's important to realize that the techniques shown in this
chapter will not allow you to express algorithms you couldn't express
with R's native data structures and functions. The techniques do allow you
to organize data structures and functions to create a general system
or framework for generating sequences.
fibonacci <- function(lastTwo=c()) {
if (length(lastTwo) == 0) {
lastTwo <- 1
} else {
lastTwo <- c(lastTwo, sum(lastTwo))
if (length(lastTwo) > 2) {
lastTwo <- lastTwo[-1]
}
}
return(lastTwo)
}
# Get the first 10 fibonacci numbers
fibs <- fibonacci()
for (i in 1:10) {
print(tail(fibs, 1))
fibs <- fibonacci(fibs)
}
Creating a general framework for sequences has two advantages. First, it allows
for abstraction. In our example we've defined a vector to hold the last two
values in the Fibonacci sequence along with a function that gets the next
value in the sequence. By realizing that any integer sequence that we might like
to generate can be expressed computationally as data, the last two values
for the Fibonacci sequence, and a function to get the next value. We've
identified the essential pieces generating sequences. From here we can
start thinking about the types of things we might like to do with any sequence,
not just the Fibonaccis. Second, we can make our system extensible. That is,
we can write code for other types of sequences
that work within our framework. Extensibility allows you to
create new sequences, like the factorial numbers, based on the abstract
notion of a sequence. It will even allow others to define their own sequences
that will work within our sequence framework.
## S3
S3 was R's first class system. It was first described in the
1992 "White Book" (Chambers & Hastie, 1991) and it is the only object
system used by the base R installation. In this system, new data types or
classes are built from native types (vector, list, etc.) but they are given
a `class` attribute. This is a character vector of class names and you should
note that a single object can have multiple types.
Recalling that in the last section the data needed to create a
Fibonacci sequence was a vector of size two, we can create a new
data type, called FibonacciData to hold these values:
# Create a FibonacciData object using attributes
x <- vector(mode="integer")
attr(x, "class") <- "FibonacciData"
x
# using the structure function
x <- structure(vector(mode="integer"), class="FibonacciData")
x
# using the class function
x <- vector(mode="integer")
class(x) <- "FibonacciData"
class(x)
# [1] "FibonacciData"
While it is true that a class is simply an attribute it is recommended that
when you access and modify class information you use the `class` function. It
communicates your intent more clearly, making your code easier to read.
Furthermore, it is often better to create a function to create
instances of a class, rather than simply attaching attributes ad-hoc.
The functions below are called __constructors__ and they create an object
of type `SequenceData` and an object of type `FibonacciData`, which is
also of type `SequenceData`.
SequenceData <- function(x=NULL) {
r <- structure( vector(mode="integer"), class="SequenceData" )
if (!is.null(x)) {
r <- x
}
r
}
FibonacciData <- function(x=NULL) {
r <- SequenceData(x)
class(r) <- c("FibonacciData", class(r))
r
}
By defining data types we can create special functions,
called methods that behave differently depending on the type of the object
passed to the method. For example, let's say that we want to be able
to handle the generation of integer sequences with a method, called
`nextNum`. The `nextNum` function will return an object, which
could be a `FibonacciData` object, and from the returned object we
get the next value in the sequence. This is easily accomplished
by creating __generic functions__, which will allow us to define
a `nextNum` and `value` method for different types of sequences.
nextNum <- function(x) {
UseMethod("nextNum", x)
}
value <- function(x) {
UseMethod("value", x)
}
Both of these generic functions take a single parameter `x` and pass the
name of the function and the parameter to the `UseMethod` function. The
first argument of `UseMethod` registers the `nextNum` and `value` functions
as generic functions; essentially letting R know that they are generic
functions and calls to `nextNum` and `value` need to be handled as such.
The second argument to UseMethod says that specific methods will be called,
or __dispatched__, based on the type of the variable `x`.
Now that the generic function has been defined we can define methods,
called `nextNum` and `value` which each take an object of type
`SequenceData` or `FibonacciData` and perform the appropriate operation.
nextNum.SequenceData <- function(x) {
stop("You can't call nextNum on an abstract SequenceData type")
}
value.SequenceData <- function(x) {
stop("You can't call value on an abstract SequenceData type")
}
nextNum.FibonacciData <- function(x) {
# The class of the return vector needs to be "FibonacciData".
# We can do this by passing it to the constructor.
FibonacciData(c(tail(x, 1), ifelse(!length(x), 1, sum(x))))
}
value.FibonacciData <- function(x) {
ifelse(length(x) == 0, 0, tail(x, 1))
}
A method name starts with the corresponding generic function name, followed by
a ".", followed by the type of the parameter.
The `UseMethod` function uses the class of `x` to figure out which method to
call. If `nextNum` or `value` is called and `x` has more than one class,
as it does in this case `UseMethod` will look for methods in the
same order that the classes appear in the class attribute. It should
be noted in this example that the `SequenceData` type
categorizes a broad range of things, in this case sequences.
It also allows us to define but not implement operations which can be
performed on any sequence. The `FibonacciData` type is a specific type
of `SequenceData`, and needs to implement its own methods for `nextNum` and
`value`. When this is complete we can use `FibonacciData` objects
much like the familiar data structures and functions.
Technical note: After `UseMethod` has
found the correct method it uses the same evironment as the generic
function. So any assignment or evaluations that were made before the
call to `UseMethod` will be accessible to the method.
a <- FibonacciData()
fibs <- rep(NA, 10)
for (i in 1:10) {
fibs[i] <- value(a)
a <- nextNum(a)
}
print(fibs)
# [1] 0 1 1 2 3 5 8 13 21 34
As mentioned before, the base R installation makes heavy use of S3 methods, just
like the ones we've been creating. This means that we can create methods for
standard R functions, allowing our new data types to act the same as R's native
types. In the example below we'll create a new method for R's `print` function,
which takes as an argument a `SequenceData` object and prints its value.
print.SequenceData <- function(x, ...) {
print(value(x))
return(invisible(x))
}
fib <- FibonacciData()
print(fib)
In this case an object of type `FibonacciData` is created, which also
has type `SequenceData`. The `print(fib)` generic function call dispatches
to the `print.SequenceData` method. In this method, the `value()` method
is called, which is dispatched to `value.Fibonacci` since it appears first
in the parameters vector of classes. This functionality is called polymorphism
and it allows us to create the `print.SequenceData` method based on an
__abstract__ type `SequenceData`. However, the method works as expected
when it passed a __concrete__ type, in this case a `FibonacciData` object.
## S4
S4 was first described in the 1998 'Green Book' (Chambers 1998). It allows
for more sophisticated handling of method calls and, as a result,
it is better at managing more complex class hierarchies. Just as in S3,
an S4 class has an associated type along with data members. Returning
to our Fibonacci example, an S4 `Sequence` and `Fibonacci` class are
defined as follows.
setClass("Sequence")
setClass("Fibonacci", representation(lastTwo="numeric"),
contains="Sequence")
A new class is defined using the `setClass` function. The code above
defines two new classes. The first is called `Sequence`, the second
is `Fibonacci`, which holds a numeric vector named `lastTwo` and inherits
from the `Sequence` class. Now that we have two new S4 classes we can
define their associated methods.
setGeneric("value", function(x)
standardGeneric("value"))
setGeneric("nextNum", function(x, n)
standardGeneric("nextNum"))
setMethod("nextNum", signature(x="Sequence", n="missing"),
function(x) {
stop("You cannot call the nextNum method on an abstract class")
})
setMethod("nextNum", signature(x="Sequence"),
function(x, n) {
for (i in 1:n) {
x <- nextNum(x)
}
x
})
setMethod("value", signature(x="Sequence"),
function(x) {
stop("You cannot call the value method on an abtract class")
})
Fibonacci <- function() {
new("Fibonacci", lastTwo=vector(mode="numeric"))
}
## Closures as S3 objects
You may have noticed that, so far in this chapter whenever we want to
go to the next Fibonacci number we are actually calculating the next
number with the `nextNum` method and then overwriting the current one.
Put another way, the `nextNum` methods we have created do not change
their parameters beyond their function scope, and if we pass a parameter
to a function, we expect that it has the same value after the function is
called. As a result, in our Fibonacci examples we have been able to either
get the next number and overwrite or we have been able to retrieve the
value, but not both.
While separating access from assignment is conceptually appealing, it does
make our example a little bit cumbersome. Each call to `nextNum` was
immediately followed by a call to `value`. It would be much more convenient
`nextNum` would calculated the next Fibonacci number and update the
object holding the current one. This is easily done using closures
with the following code.
FibonacciGenerator <- function() {
lastTwo <- c()
function() {
lastTwo <<- c(tail(lastTwo, 1),
ifelse(!length(lastTwo), 1, sum(lastTwo)))
tail(lastTwo, 1)
}
}
While the `FibonacciGenerator` will create a closure that both updates and
returns the updated value, it suffers from two drawbacks. First, the
overarching goal was to create a software system for generating sequences,
not just Fibonacci numbers. We may want to create other types of sequences,
like random walks. This simple closure does not further out effort to create
a framework for sequence generation. Second, the closures we've seen so far
were essentially functions with associated data. They are capable of performing
a single thing, defined by a function. This means that if we want to be able
to do more than simply get the next number we to take another approach.
R does allow a closure to be defined with associated data, as before,
along with named methods. Furthermore, since can make these closures S3
objects simply by specifying a class attribute. The following code creates
an abstract `Sequence` class with two methods `nextNum` and `value`,
using a closure.
Sequence <- function() {
nextNum <- function() {
stop("You cannot call the nextNum method on an abstract class")
}
value <- function() {
stop("You cannot call the value method on an abstract class")
}
object <- list(nextNum=nextNum, value=value)
class(object) <- "Sequence"
object
}
Fibonacci <- function() {
lastTwo <- c()
nextNum <- function() {
lastTwo <<- c(tail(lastTwo, 1))
ifelse(!length(lastTwo), 1, sum(lastTwo))
tail(lastTwo, 1)
}
value <- function() {
ifelse(!length(lastTwo), 0, tail(lastTwo, 1))
}
object <- list(nextNum=nextNum, value=value)
class(object) <- c("Fibonacci", "Sequence")
object
}
## R5
Sequence <- setRefClass("Sequence",
methods=list(
nextNum=function(n) {
stop("You cannot call the nextNum method on an abstract class")
},
value=function() {
stop("You cannot call the value method on an abstract class")
}
)
)
Fibonacci <- setRefClass("Fibonacci", contains="Sequence",
fields=list(lastTwo="numeric"),
methods=list(
nextNum=function(n=1) {
lastTwo <<- c(tail(lastTwo, 1),
ifelse(!length(lastTwo), 1, sum(lastTwo)))
tail(lastTwo, 1)
},
value=function() {
ifelse(!length(lastTwo), 0, tail(lastTwo, 1))
}
)
)