WHAT IS R AND WHY SHOULD WE USE IT?

The fossil package is constructed for use with the R Statistical Language. R owes its origins to the S Language, a program initiated at Bell Labs in the 1970s as a way to implement a computational statistical language (Becker et al. 1988). The S Language has been the basis for another well known statistical program, S-PLUS. In 1991 Ross Ihaka and Robert Gentleman at the University of Auckland began developing a statistical language for their teaching laboratory since no adequate commercial solution existed at the time. Their work mimicked many of the styles and methods of S, and eventually this package evolved into the R Language for Statistical Computing (Ihaka and Gentleman 1996). Since its origins, R has been open-sourced under the GNU Public License, meaning that anyone who chooses to use, redistribute or improve the software is free to do so provided they allow others the same rights (Stallman 1999). The program was originally written for a Macintosh system, but it has since been ported to virtually every computing architecture, both legacy and modern. This makes it an ideal candidate for a statistical system in many modern laboratories, where researchers possess their own (if not multiple) computers, often with different operating systems.

Many other statistical programs encourage their users to manually select their data and choose the analyses to be run with a mouse cursor. At first glance, this is a much simpler way of interacting with the data, but it suffers from a major drawback; analyses of this type are not truly reproducible (Leisch and Rossini 2003; Green 2003). Although descriptions of statistical procedures used in refereed papers are a must, trying to record exact mouse clicks and button selections is virtually impossible. R on the other hand encourages users to record each and every step of the process used. Most users of R will write their methods of analysis out in a text editor of some kind and then proceed to run this code in the R environment, with every step, from analysis to figure creation, fully documented.

The deeper benefits of this method may not be obvious. I have personally experienced situations where mistakes were made early on in the process of data analysis and not found until much later. While in a graphical, mouse driven environment trying to repeat all the steps necessary is often time consuming, well written R code can be easily modified and re-run with minimal fuss. Further, as the program is consistent across platforms, collaborators can run the code on their platform of choice, without having to worry if their version of a program has the same available functions. This benefit also extends to other scientists, who by taking other researchers' code can re-run published findings exactly, without having to purchase software of any kind.

What follows is not an in-depth introduction to R; there have already been many books written on the subject. For a good start, the original text by Becker et al. (1988) and a more recent text by Braun and Murdoch (2008) are highly recommended. Rather the focus of this paper is the use of the functions found within the fossil package.

Next Section