Appendix B — R Packages

Many of R’s most useful functions do not come preloaded when you start R, but reside in packages that can be installed on top of R. R packages are similar to libraries in C, C++, and Javascript, packages in Python, and gems in Ruby. An R package bundles together useful functions, help files, and data sets. You can use these functions within your own R code once you load the package they live in. Usually the contents of an R package are all related to a single type of task, which the package helps solve. R packages will let you take advantage of R’s most useful features: its large community of package writers (many of whom are active data scientists) and its prewritten routines for handling many common (and exotic) data-science tasks.

Base R

You may hear R users (or me) refer to “base R.” What is base R? It is just the collection of R functions that gets loaded every time you start R. These functions provide the basics of the language, and you don’t have to load a package before you can use them.

B.1 Installing Packages

To use an R package, you must first install it on your computer and then load it in your current R session. The easiest way to install an R package is with the install.packages() R function. Open R and type the following into the command line:

install.packages("<package name>")

This will search for the specified package in the collection of packages hosted on the CRAN site. When R finds the package, it will download it into a libraries folder on your computer. R can access the package here in future R sessions without reinstalling it. Anyone can write an R package and disseminate it as they like; however, almost all R packages are published through the CRAN website. CRAN tests each R package before publishing it. This doesn’t eliminate every bug inside a package, but it does mean that you can trust a package on CRAN to run in the current version of R on your OS.

You can install multiple packages at once by linking their names with R’s concatenate function, c(). For example, to install the ggplot2, reshape2, and dplyr packages, run:

install.packages(c("ggplot2", "reshape2", "dplyr"))

If this is your first time installing a package, R will prompt you to choose an online mirror of to install from. Mirrors are listed by location. Your downloads should be quickest if you select a mirror that is close to you. If you want to download a new package, try the Austria mirror first. This is the main CRAN repository, and new packages can sometimes take a couple of days to make it around to all of the other mirrors.

B.2 Loading Packages

Installing a package doesn’t immediately place its functions at your fingertips. It just places them on your computer. To use an R package, you next have to load it in your R session with the command:

library(<package name>)

Notice that the quotation marks have disappeared. You can use them if you like, but quotation marks are optional for the library() command. (This is not true for the install.packages() command).

library() will make all of the package’s functions, data sets, and help files available to you until you close your current R session. The next time you begin an R session, you’ll have to reload the package with library() if you want to use it, but you won’t have to reinstall it. You only have to install each package once. After that, a copy of the package will live in your R library. To see which packages you currently have in your R library, run:

library()

library() also shows the path to your actual R library, which is the folder that contains your R packages. You may notice many packages that you don’t remember installing. This is because R automatically downloads a set of useful packages when you first install R.

Install packages from (almost) anywhere

The devtools R package makes it easy to install packages from locations other than the CRAN website. devtools provides functions like install_github(), install_gitorious(), install_bitbucket(), and install_url(). These work similar to install.packages(), but they search new locations for R packages. install_github() is especially useful because many R developers provide development versions of their packages on GitHub. The development version of a package will contain a sneak peek of new functions and patches but may not be as stable or as bug free as the CRAN version.

Why does R make you bother with installing and loading packages? You can imagine an R where every package came preloaded, but this would be a very large and slow program. As of May 6, 2014, the CRAN website hosts 5,511 packages. It is simpler to only install and load the packages that you want to use when you want to use them. This keeps your copy of R fast because it has fewer functions and help pages to search through at any one time. The arrangement has other benefits as well. For example, it is possible to update your copy of an R package without updating your entire copy of R.

What’s the best way to learn about R packages?

It is difficult to use an R package if you don’t know that it exists. You could go to the CRAN website and click the Packages link to see a list of available packages, but you’ll have to wade through thousands of them. Moreover, many R packages do the same things.

How do you know which package does them best? The R-packages mailing list is a place to start. It sends out announcements of new packages and maintains an archive of old announcements. Blogs that aggregate posts about R can also provide valuable leads. I recommend R-bloggers. RStudio maintains a list of some of the most useful R packages in the Getting Started section of http://support.rstudio.com. Finally, CRAN groups together some of the most useful—and most respected—packages by subject area. This is an excellent place to learn about the packages designed for your area of work.