8 S3

You may have noticed that your slot machine results do not look the way I promised they would. I suggested that the slot machine would display its results like this:

play()
## 0 0 DD
## $0

But the current machine displays its results in a less pretty format:

play()
## "0"  "0" "DD" 
## 0

Moreover, the slot machine uses a hack to display symbols (we call print() from within play()). As a result, the symbols do not follow your prize output if you save it:

one_play <- play()
## "B" "0" "B" 

one_play
## 0

You can fix both of these problems with R’s S3 system.

8.1 The S3 System

S3 refers to a class system built into R. The system governs how R handles objects of different classes. Certain R functions will look up an object’s S3 class, and then behave differently in response.

The print() function is like this. When you print a numeric vector, print() will display a number:

num <- 1000000000
print(num)
## 1000000000

But if you give that number the S3 class POSIXct followed by POSIXt, print() will display a time:

class(num) <- c("POSIXct", "POSIXt")
print(num)
## "2001-09-08 19:46:40 CST"

If you use objects with classes—and you do—you will run into R’s S3 system. S3 behavior can seem odd at first, but is easy to predict once you are familiar with it.

R’s S3 system is built around three components: attributes (especially the class attribute), generic functions, and methods.

8.2 Attributes

In Attributes, you learned that many R objects come with attributes, pieces of extra information that are given a name and appended to the object. Attributes do not affect the values of the object, but stick to the object as a type of metadata that R can use to handle the object. For example, a data frame stores its row and column names as attributes. Data frames also store their class, "data.frame", as an attribute.

You can see an object’s attributes with attributes(). If you run attributes() on the deck data frame that you created in Project 2: Playing Cards, you will see:

attributes(deck)
## $names
## [1] "face"  "suit"  "value"
## 
## $class
## [1] "data.frame"
## 
## $row.names
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 
## [20] 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
## [37] 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52

R comes with many helper functions that let you set and access the most common attributes used in R. You’ve already met the names(), dim(), and class() functions, which each work with an eponymously named attribute. However, R also has row.names(), levels(), and many other attribute-based helper functions. You can use any of these functions to retrieve an attribute’s value:

row.names(deck)
##  [1] "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10" "11" "12" "13"
## [14] "14" "15" "16" "17" "18" "19" "20" "21" "22" "23" "24" "25" "26"
## [27] "27" "28" "29" "30" "31" "32" "33" "34" "35" "36" "37" "38" "39"
## [40] "40" "41" "42" "43" "44" "45" "46" "47" "48" "49" "50" "51" "52"

or to change an attribute’s value:

row.names(deck) <- 101:152

or to give an object a new attribute altogether:

levels(deck) <- c("level 1", "level 2", "level 3")

attributes(deck)
## $names
## [1] "face"  "suit"  "value"
## 
## $class
## [1] "data.frame"
## 
## $row.names
##  [1] 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117
## [18] 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134
## [35] 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151
## [52] 152
## 
## $levels
## [1] "level 1" "level 2" "level 3"

R is very laissez faire when it comes to attributes. It will let you add any attributes that you like to an object (and then it will usually ignore them). The only time R will complain is when a function needs to find an attribute and it is not there.

You can add any general attribute to an object with attr(); you can also use attr() to look up the value of any attribute of an object. Let’s see how this works with one_play, the result of playing our slot machine one time:

one_play <- play()
one_play
## 0

attributes(one_play)
## NULL

attr() takes two arguments: an R object and the name of an attribute (as a character string). To give the R object an attribute of the specified name, save a value to the output of attr(). Let’s give one_play an attribute named symbols that contains a vector of character strings:

attr(one_play, "symbols") <- c("B", "0", "B")

attributes(one_play)
## $symbols
## [1] "B" "0" "B"

To look up the value of any attribute, give attr() an R object and the name of the attribute you would like to look up:

attr(one_play, "symbols")
## "B" "0" "B"

If you give an attribute to an atomic vector, like one_play, R will usually display the attribute beneath the vector’s values. However, if the attribute changes the vector’s class, R may display all of the information in the vector in a new way (as we saw with POSIXct objects):

one_play
## [1] 0
## attr(,"symbols")
## [1] "B" "0" "B"

R will generally ignore an object’s attributes unless you give them a name that an R function looks for, like names() or class(). For example, R will ignore the symbols attribute of one_play as you manipulate one_play:

one_play + 1
##  1
## attr(,"symbols")
##  "B" "0" "B"

Exercise: Add an Attribute

Modify play() to return a prize that contains the symbols associated with it as an attribute named symbols. Remove the redundant call to print(symbols):

play <- function() {
  symbols <- get_symbols()
  print(symbols)
  score(symbols)
}

You can create a new version of play() by capturing the output of score(symbols) and assigning an attribute to it. play() can then return the enhanced version of the output:

play <- function() {
  symbols <- get_symbols()
  prize <- score(symbols)
  attr(prize, "symbols") <- symbols
  prize
}

Now play() returns both the prize and the symbols associated with the prize. The results may not look pretty, but the symbols stick with the prize when we copy it to a new object. We can work on tidying up the display in a minute:

play()
## [1] 0
## attr(,"symbols")
## [1] "B"  "BB" "0" 
 
two_play <- play()
 
two_play
## [1] 0
## attr(,"symbols")
## [1] "0" "B" "0"

You can also generate a prize and set its attributes in one step with the structure() function. structure() creates an object with a set of attributes. The first argument of structure() should be an R object or set of values, and the remaining arguments should be named attributes for structure() to add to the object. You can give these arguments any argument names you like. structure() will add the attributes to the object under the names that you provide as argument names:

play <- function() {
  symbols <- get_symbols()
  structure(score(symbols), symbols = symbols)
}

three_play <- play()
three_play
##  0
##  attr(,"symbols")
##  "0"  "BB" "B"

Now that your play() output contains a symbols attribute, what can you do with it? You can write your own functions that lookup and use the attribute. For example, the following function will look up one_play’s symbols attribute and use it to display one_play in a pretty manner. We will use this function to display our slot results, so let’s take a moment to study what it does:

slot_display <- function(prize){

  # extract symbols
  symbols <- attr(prize, "symbols")

  # collapse symbols into single string
  symbols <- paste(symbols, collapse = " ")

  # combine symbol with prize as a character string
  # \n is special escape sequence for a new line (i.e. return or enter)
  string <- paste(symbols, prize, sep = "\n$")

  # display character string in console without quotes
  cat(string)
}

slot_display(one_play)
## B 0 B
## $0

The function expects an object like one_play that has both a numerical value and a symbols attribute. The first line of the function will look up the value of the symbols attribute and save it as an object named symbols. Let’s make an example symbols object so we can see what the rest of the function does. We can use one_play’s symbols attribute to do the job. symbols will be a vector of three-character strings:

symbols <- attr(one_play, "symbols")

symbols
## "B" "0" "B"

Next, slot_display() uses paste() to collapse the three strings in symbols into a single-character string. paste() collapses a vector of character strings into a single string when you give it the collapse argument. paste() will use the value of collapse to separate the formerly distinct strings. Hence, symbols becomes B 0 B the three strings separated by a space:

symbols <- paste(symbols, collapse = " ")

symbols
## "B 0 B"

Our function then uses paste() in a new way to combine symbols with the value of prize. paste() combines separate objects into a character string when you give it a sep argument. For example, here paste() will combine the string in symbols, B 0 B, with the number in prize, 0. paste() will use the value of sep argument to separate the inputs in the new string. Here, that value is \n$, so our result will look like "B 0 B\n$0":

prize <- one_play
string <- paste(symbols, prize, sep = "\n$")

string
## "B 0 B\n$0"

The last line of slot_display() calls cat() on the new string. cat() is like print(); it displays its input at the command line. However, cat() does not surround its output with quotation marks. cat() also replaces every \n with a new line or line break. The result is what we see. Notice that it looks just how I suggested that our play() output should look in Programs:

cat(string)
## B 0 B
## $0

You can use slot_display() to manually clean up the output of play():

slot_display(play())
## C B 0
## $2

slot_display(play())
## 7 0 BB
## $0

This method of cleaning the output requires you to manually intervene in your R session (to call slot_display()). There is a function that you can use to automatically clean up the output of play() each time it is displayed. This function is print(), and it is a generic function.

8.3 Generic Functions

R uses print() more often than you may think; R calls print() each time it displays a result in your console window. This call happens in the background, so you do not notice it; but the call explains how output makes it to the console window (recall that print() always prints its argument in the console window). This print() call also explains why the output of print() always matches what you see when you display an object at the command line:

print(pi)
## 3.141593

pi
## 3.141593


print(head(deck))
##  face   suit value
##  king spades    13
## queen spades    12
##  jack spades    11
##   ten spades    10
##  nine spades     9
## eight spades     8

head(deck)
##  face   suit value
##  king spades    13
## queen spades    12
##  jack spades    11
##   ten spades    10
##  nine spades     9
## eight spades     8


print(play())
##  5
## attr(,"symbols")
##  "B"  "BB" "B" 

play()
##  5
## attr(,"symbols")
##  "B"  "BB" "B"

You can change how R displays your slot output by rewriting print() to look like slot_display(). Then R would print the output in our tidy format. However, this method would have negative side effects. You do not want R to call slot_display() when it prints a data frame, a numerical vector, or any other object.

Fortunately, print() is not a normal function; it is a generic function. This means that print() is written in a way that lets it do different things in different cases. You’ve already seen this behavior in action (although you may not have realized it). print() did one thing when we looked at the unclassed version of num:

num <- 1000000000
print(num)
## 1000000000

and a different thing when we gave num a class:

class(num) <- c("POSIXct", "POSIXt")
print(num)
## "2001-09-08 19:46:40 CST"

Take a look at the code inside print() to see how it does this. You may imagine that print looks up the class attribute of its input and then uses an +if+ tree to pick which output to display. If this occurred to you, great job! print() does something very similar, but much more simple.

8.4 Methods

When you call print(), print() calls a special function, UseMethod():

print
## function (x, ...) 
## UseMethod("print")
## <bytecode: 0x7ffee4c62f80>
## <environment: namespace:base>

UseMethod() examines the class of the input that you provide for the first argument of print(), and then passes all of your arguments to a new function designed to handle that class of input. For example, when you give print() a POSIXct object, UseMethod() will pass all of print()’s arguments to print.POSIXct(). R will then run print.POSIXct() and return the results:

print.POSIXct
## function (x, ...) 
## {
##     max.print <- getOption("max.print", 9999L)
##     if (max.print < length(x)) {
##         print(format(x[seq_len(max.print)], usetz = TRUE), ...)
##         cat(" [ reached getOption(\"max.print\") -- omitted", 
##             length(x) - max.print, "entries ]\n")
##     }
##     else print(format(x, usetz = TRUE), ...)
##     invisible(x)
## }
## <bytecode: 0x7fa948f3d008>
## <environment: namespace:base>

If you give print() a factor object, UseMethod() will pass all of print()’s arguments to print.factor(). R will then run print.factor() and return the results:

print.factor
## function (x, quote = FALSE, max.levels = NULL, width = getOption("width"), 
##     ...) 
## {
##     ord <- is.ordered(x)
##     if (length(x) == 0L) 
##         cat(if (ord) 
##             "ordered"
## ...
##         drop <- n > maxl
##         cat(if (drop) 
##             paste(format(n), ""), T0, paste(if (drop) 
##             c(lev[1L:max(1, maxl - 1)], "...", if (maxl > 1) lev[n])
##         else lev, collapse = colsep), "\n", sep = "")
##     }
##     invisible(x)
## }
## <bytecode: 0x7fa94a64d470>
## <environment: namespace:base>

print.POSIXct() and print.factor() are called methods of print(). By themselves, print.POSIXct() and print.factor() work like regular R functions. However, each was written specifically so UseMethod() could call it to handle a specific class of print() input.

Notice that print.POSIXct() and print.factor() do two different things (also notice that I abridged the middle of print.factor()—it is a long function). This is how print() manages to do different things in different cases. print() calls UseMethod(), which calls a specialized method based on the class of print()’s first argument.

You can see which methods exist for a generic function by calling methods() on the function. For example, print() has almost 200 methods (which gives you an idea of how many classes exist in R):

methods(print)
##   [1] print.acf*                                   
##   [2] print.anova                                  
##   [3] print.aov*                                   
##  ...                      
## [176] print.xgettext*                              
## [177] print.xngettext*                             
## [178] print.xtabs*
##
##   Nonvisible functions are asterisked

This system of generic functions, methods, and class-based dispatch is known as S3 because it originated in the third version of S, the programming language that would evolve into S-PLUS and R. Many common R functions are S3 generics that work with a set of class methods. For example, summary() and head() also call UseMethod(). More basic functions, like c(), +, -, < and others also behave like generic functions, although they call .Primitive() instead of UseMethod().

The S3 system allows R functions to behave in different ways for different classes. You can use S3 to format your slot output. First, give your output its own class. Then write a print method for that class. To do this efficiently, you will need to know a little about how UseMethod() selects a method function to use.

8.4.1 Method Dispatch

UseMethod() uses a very simple system to match methods to functions.

Every S3 method has a two-part name. The first part of the name will refer to the function that the method works with. The second part will refer to the class. These two parts will be separated by a period. So for example, the print method that works with functions will be called print.function(). The summary method that works with matrices will be called summary.matrix(). And so on.

When UseMethod() needs to call a method, it searches for an R function with the correct S3-style name. The function does not have to be special in any way; it just needs to have the correct name.

You can participate in this system by writing your own function and giving it a valid S3-style name. For example, let’s give one_play a class of its own. It doesn’t matter what you call the class; R will store any character string in the class attribute:

class(one_play) <- "slots"

Now let’s write an S3 print method for the +slots+ class. The method doesn’t need to do anything special—it doesn’t even need to print one_play. But it does need to be named print.slots(); otherwise UseMethod() will not find it. The method should also take the same arguments as print(); otherwise, R will give an error when it tries to pass the arguments to print.slots():

args(print)
## function (x, ...) 
## NULL

print.slots <- function(x, ...) {
  cat("I'm using the print.slots method")
}

Does our method work? Yes, and not only that; R uses the print method to display the contents of one_play. This method isn’t very useful, so I’m going to remove it. You’ll have a chance to write a better one in a minute:

print(one_play)
## I'm using the print.slots method

one_play
## I'm using the print.slots method

rm(print.slots)

Some R objects have multiple classes. For example, the output of Sys.time() has two classes. Which class will UseMethod() use to find a print method?

now <- Sys.time()
attributes(now)
## $class
## [1] "POSIXct" "POSIXt"

UseMethod() will first look for a method that matches the first class listed in the object’s class vector. If UseMethod() cannot find one, it will then look for the method that matches the second class (and so on if there are more classes in an object’s class vector).

If you give print() an object whose class or classes do not have a print method, UseMethod() will call print.default(), a special method written to handle general cases.

Let’s use this system to write a better print method for the slot machine output.

Exercise: Make a Print Method

Write a new print method for the slots class. The method should call slot_display() to return well-formatted slot-machine output.

What name must you use for this method?

It is surprisingly easy to write a good print.slots() method because we’ve already done all of the hard work when we wrote slot_display(). For example, the following method will work. Just make sure the method is named print.slots() so UseMethod() can find it, and make sure that it takes the same arguments as print() so UseMethod() can pass those arguments to print.slots() without any trouble:

print.slots <- function(x, ...) {
  slot_display(x)
}

Now R will automatically use slot_display() to display objects of class +slots+ (and only objects of class “slots”):

one_play
## B 0 B
## $0

Let’s ensure that every piece of slot machine output has the slots class.

Exercise: Add a Class

Modify the play() function so it assigns slots to the class attribute of its output:

play <- function() {
  symbols <- get_symbols()
  structure(score(symbols), symbols = symbols)
}

You can set the class attribute of the output at the same time that you set the +symbols+ attribute. Just add class = "slots" to the structure() call:

play <- function() {
  symbols <- get_symbols()
  structure(score(symbols), symbols = symbols, class = "slots")
}

Now each of our slot machine plays will have the class slots:

class(play())
## "slots"

As a result, R will display them in the correct slot-machine format:

play()
## BB BB BBB
## $5

play()
## BB 0 0
## $0

8.5 Classes

You can use the S3 system to make a robust new class of objects in R. Then R will treat objects of your class in a consistent, sensible manner. To make a class:

Choose a name for your class.
Assign each instance of your class a +class+ attribute.
Write class methods for any generic function likely to use objects of your class.

Many R packages are based on classes that have been built in a similar manner. While this work is simple, it may not be easy. For example, consider how many methods exist for predefined classes.

You can call methods() on a class with the class argument, which takes a character string. methods() will return every method written for the class. Notice that methods() will not be able to show you methods that come in an unloaded R package:

methods(class = "factor")
##  [1] [.factor             [[.factor           
##  [3] [[<-.factor          [<-.factor          
##  [5] all.equal.factor     as.character.factor 
##  [7] as.data.frame.factor as.Date.factor      
##  [9] as.list.factor       as.logical.factor   
## [11] as.POSIXlt.factor    as.vector.factor    
## [13] droplevels.factor    format.factor       
## [15] is.na<-.factor       length<-.factor     
## [17] levels<-.factor      Math.factor         
## [19] Ops.factor           plot.factor*        
## [21] print.factor         relevel.factor*     
## [23] relist.factor*       rep.factor          
## [25] summary.factor       Summary.factor      
## [27] xtfrm.factor        
## 
##    Nonvisible functions are asterisked

This output indicates how much work is required to create a robust, well-behaved class. You will usually need to write a class method for every basic R operation.

Consider two challenges that you will face right away. First, R drops attributes (like class) when it combines objects into a vector:

play1 <- play()
play1
## B BBB BBB
## $5

play2 <- play()
play2
## 0 B 0
## $0

c(play1, play2)
## [1] 5 0

Here, R stops using print.slots() to display the vector because the vector c(play1, play2) no longer has a “slots” +class+ attribute.

Next, R will drop the attributes of an object (like class) when you subset the object:

play1[1]
## [1] 5

You can avoid this behavior by writing a c.slots() method and a [.slots method, but then difficulties will quickly accrue. How would you combine the symbols attributes of multiple plays into a vector of symbols attributes? How would you change print.slots() to handle vectors of outputs? These challenges are open for you to explore. However, you will usually not have to attempt this type of large-scale programming as a data scientist.

In our case, it is very handy to let slots objects revert to single prize values when we combine groups of them together into a vector.

8.6 S3 and Debugging

S3 can be annoying if you are trying to understand R functions. It is difficult to tell what a function does if its code body contains a call to UseMethod(). Now that you know that UseMethod() calls a class-specific method, you can search for and examine the method directly. It will be a function whose name follows the <function.class> syntax, or possibly <function.default>. You can also use the methods() function to see what methods are associated with a function or a class.

8.7 S4 and R5

R also contains two other systems that create class specific behavior. These are known as S4 and R5 (or reference classes). Each of these systems is much harder to use than S3, and perhaps as a consequence, more rare. However, they offer safeguards that S3 does not. If you’d like to learn more about these systems, including how to write and use your own generic functions, I recommend the book Advanced R Programming by Hadley Wickham.

8.8 Summary

Values are not the only place to store information in R, and functions are not the only way to create unique behavior. You can also do both of these things with R’s S3 system. The S3 system provides a simple way to create object-specific behavior in R. In other words, it is R’s version of object-oriented programming (OOP). The system is implemented by generic functions. These functions examine the class attribute of their input and call a class-specific method to generate output. Many S3 methods will look for and use additional information that is stored in an object’s attributes. Many common R functions are S3 generics.

R’s S3 system is more helpful for the tasks of computer science than the tasks of data science, but understanding S3 can help you troubleshoot your work in R as a data scientist.

You now know quite a bit about how to write R code that performs custom tasks, but how could you repeat these tasks? As a data scientist, you will often repeat tasks, sometimes thousands or even millions of times. Why? Because repetition lets you simulate results and estimate probabilities. Loops will show you how to automate repetition with R’s for and while functions. You’ll use for to simulate various slot machine plays and to calculate the payout rate of your slot machine.