Skip to content

Instantly share code, notes, and snippets.

@allenaven
Last active November 7, 2022 16:47
Show Gist options
  • Save allenaven/72e6a8d7ee6da1e886481266827fed65 to your computer and use it in GitHub Desktop.
Save allenaven/72e6a8d7ee6da1e886481266827fed65 to your computer and use it in GitHub Desktop.
trying to build an s4 class that I need

Guide to using S4 classes in R

What is an "S4 class"?

S4 is a certain specification of R object class (thus, it is useful in object oriented programming). You define an S4 class based on the data you want it to hold, and you can then create many unique copies of that S4 class that hold different data but all adhere to the same specification and can take advantage of the same functions.

The concepts of class and method are crucial to understanding object-oriented programming:

  • class defines the behavior of objects in two ways:
    • Describes their attributes
    • Describes their relationships to other classes
  • methods are functions that behave differently depending on the class of their input. For example, the plot method functions differently depending on whether you tell it to plot a raster, timeseries, etc.

S4 in particular is a system that features a few formalities compared to other classes in R:

  • Formal class definitions describing representation and inheritance for each class
  • Special helper functions that define generics and methods
  • Multiple dispatch [I don't understand what this means]

S4 classes are often overkill for what R programmers need to do, but they are good for "more complicated systems of interrelated objects".

Basics of S4

The core components of an S4 implementation are

  • classes
  • generic functions
  • methods

And those things are "glued" together by method dispatch (of the "simple" and the "multiple" varieties).

Classes and instances

  • Create and define an S4 class using setClass()
    • This ensures consistency by requiring metadata of all parts of the new class
    • Subsequent assignments to class slots are checked for validity of type agreement
  • Create objects of an S4 using the constructer function new or simply <- assignment
  • Inspect class structure with str()

A class has three key properties:

  1. name: a string that identifies the class
  2. contains is a character vector of classes that your S4 class inherits from. The concept of inheritance is a little unclear to me...
    • To inherit S4 classes from S3 classes you must use setOldClass()
    • I don't understand why you don't need to use contains on your basic types like numeric and character...
  3. representation: is depricated, use slots()
  4. slots a list of slots telling the slot names and classes. For example,
slots = c(name = 'character', age = 'numeric')

Hints for accessing and using slots within a class

  • Use getSlots() to return a description of all slots of a class
  • Use the slot(class, 'slotname') function to access a slot of an S4 object
  • Alternatively, use the @ in place of $ to access S4 object slots, like so:
# First create the class:
setClass('Person', representation(name = 'character', age = 'numeric'))
setClass('Employee', representation(boss = 'Person'), contains = 'Person')
# Then fill it:
hadley <- new('Person', name = 'Hadley')
hadley@age
  • But it's considered a poor idea to use @ in place of a properly written accessor
    • This speaks to the need to write proper accessors---it is on the programmer to write methods for every possible user need
    • The developer shouldn't expect the user to directly access slots
    • The user shouldn't expect to have to access slots directly
  • Thus when I write an S4 class I need to have read and write accessor methods for each slot, and the write accessor should also do validity checking

Hints for accessing and using methods within a class

  • Use showMethods('functionname') or showMethods(class = 'classname') to find out what methods are defined for a function or class, respectively
  • Define a new method for use with your new class:
    • First reserve the name for your method using setGeneric
    • Then use setMethod like so:
setMethod('age_squared', signature(the_person = 'Person'),
	function(the_person) {
		the_person@agesquared = the_person@age * the_Person@age
		return(the_person)
	}
)
  • Or if you want to use an existing method with your new class you can just use the setMethod step
  • "getter" and "setter" methods may resemble each other: width(ir) <- width(ir) - 5
  • "Constructor" function can be named the same name as the class and set all the slots with whatever function arguments. This makes it easy to document creation of new classes. Also explains why doing something like ?lme4::lmer returns several different suggestions corresponding to the function and the corresponding class(es). So really when you think about it, your call to lmer() that I think of as creating a model fit is really an S4 class constructor.

Defining and checking validity

Use the setValidity() function to define validity checks, and define an initialize() method to check for that validity when creating objects of the class using new()

References

What can an S4 class do for me?

First, why use a "class" in the first place? I feel like object oriented programming is a concept that is taught to beginners in other popular languages like Python and Java, but in R the OO paradigm falls by the wayside. This may be a result of R's user base being less CS-oriented than the average programmer, or maybe R users just don't write big enough scripts/programs to really see a benefit from a full embrace of the OO workflow. Whatever the case, using OO programming (including custom classes for your objects) will allow you to precisely define and encapsulate data representation and behavior of cohesive "things". The canonical example is to define an "employee" class that has attributes "salary", "name", "emailaddress" and possibly methods like "promotion" that changes "salary", that type of thing.

Is is possible to write scripted code to accomplish all of that without invoking a new class? Absolutely, but it's wasteful and error-prone to do that more than once. Using OO methods introduces consistency that is not possible otherwise. I've tried to do something similar to this before, in an ignorantly hacky sort of way, by using a list of lists to hold instances of objects that share common metadata categories, for example, I might need to do something like have a number of "sites", all of which have associated lon/lat, name, several types of equipment deployed there, dates of deployment, datalogger records, and so forth. This is a ham-handed approach to using a "class" (and actually should really be in a relational database). Contrast the list of lists that I just described with an S4 class: I can define my own way to print out info for each object, rather than have to rely on a complex query that I have to repeatedly type each time I want to see some info buried deep within the list.

There are a few different kinds of classes available to you to use in R. The most common of those include S3 classes and S4 classes. As to which of those to use, that decision should be based on your intended application. This SO thread speaks to that question. To my layman's understanding, the biggest user-facing differences between those two classes are:

  • Whereas S3 classes are easy to implement (you simply set the "class" attribute of any object to a string of your choosing), S4 classes are more flexible and enforce more rigor than S3 classes in terms of making sure that an object is what it is supposed to be (or says it is)
  • S3 methods are identified by function names, which can cause unexpected behavior, but S4 uses "multiple dispatch" to define function arguments based on class [I'm not completely clear on this...]

Those differences are also outlined by Doug Bates in his R News article "Converting Packages to S4" and in this excellent presentation that is a must-read. And Norman Matloff, in The Art of R Programming, pitches the difference between S3 and S4 classes as a convenience vs. safety tradeoff. There are differences of opinion on that.

Bear in mind that some of the specifications for S4 classes changed with R version 3.0 and so some of the older documentation or web references may be out of date.

# Working on an S4 class to hold and manipulate animal movement data
# with the "net squared displacement" method
library(lubridate)
library(dplyr)
library(aar)
data(drifter)
### Define the class
ind_yr_dataset <- setClass(
# Set the class name:
Class = 'ind_yr_dataset',
# Define the slots Don't use "representation" here, it's depricated!
slots = c(
animalid = 'character',
yr = 'numeric',
abs_start_date = 'POSIXct',
loc_data = 'data.frame'
),
# Set default slot values, if desired:
prototype = prototype(
# The loc data should be: date = POSIXct, rel_day = numeric, lon/lat = numeric
abs_start_date = ymd('1990-01-01', tz = 'UTC'),
loc_data = data.frame(date = NA, rel_day = NA, lon = NA, lat = NA)
)
)
# Look at the slots of the class I just created:
getSlots('ind_yr_dataset')
# Create a new object of the class I just created
tma001_1 <- new('ind_yr_dataset')
tma001_1@loc_data <- drifter
# This is equivalent to
tma001_2 <- new('ind_yr_dataset', animalid = 'weenis', abs_start_date = ymd('2002-05-09', tz = 'UTC'),
loc_data = drifter)
# Set a "plot" method that will work when I call `plot(tma001_1)`
# Note that I have to match the arguments in the definition to the
# arguments taken by the base "print" function! (took me awhile to get that)
setMethod('plot', signature = 'ind_yr_dataset',
definition = function(x) {
lon = x@loc_data$long
lat = x@loc_data$lat
plot(lon, lat, las = 1, pch = 16, asp = 1)
})
# I can't do this below because I haven't set up "netdisp" as a generic function yet...
setMethod('netdisp', signature(the_data = 'ind_yr_dataset'),
function(calc_nsd) {
the_data@loc_data$nsd = the_data@loc_data$lat + 3
return(the_data)
}
)
# Set method to visualize when I call print on an object
setMethod('show', 'ind_yr_dataset',
function(object) {
cat('ID:', object@animalid, '\n')
print(paste('Begin date:', object@abs_start_date), right = TRUE, quote = FALSE)
})
# Really though, how do I define a way to put appropriate data into the new class?
# And then to do work on it after all the data are in?
# And hold results of that work, like model comparisons?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment