Table of Contents

R Tools for Paleontology:

Plain-Language &
Multilingual  Abstracts



What is R and Why Should We Use It?

Setting Up the Environment

Loading Your Data in R

Distance/Stance/Similarity/Beta Diversity Indices

Non-Parametric Species Estimators and Rarefaction

Minimum Spanning Trees

Biogeography and GIS






Print article




Multivariate analyses in palaeontology have become an increasing focus of many palaeontological research programs, especially with the development over the past decade of large datasets (e.g., Paleobiology Database; Carrano 2000; Alroy et al. 2001; Carrasco et al. 2005) and readily available computing power. A variety of statistical programs and software has been used and developed by and for palaeontologists, ecologists and evolutionary biologists as these massive data sets have become more commonplace (e.g., Hammer et al. 2001; Colwell 2009; Harrison and Larsson 2008; Maddison and Maddison 2009).

Large databases necessarily involve large numbers of collaborators, which may lead to an issue of heterogeneity and incompatibility of computing platforms and file formats. Despite the large number of freely available programs, there are few truly cross platform solutions available. One statistical environment gaining recognition over the last decade with its ability to perform intensive statistical analyses has been the R Statistical Language (R Development Core Team 2010; Ezard and Purvis 2009). This software is cross platform, freely available (Open Source) and has an extensive installed user and contributor base. While the base software when installed can perform many common statistical procedures, the software is easily extensible through packages, such as phylogenetic analysis (Paradis et al. 2004), time series analysis (Hunt 2008) and palaeobiological phylogenies (Ezard and Purvis 2009). These packages are available through a central repository called the Comprehensive R Archive Network, or CRAN. Additionally, data from virtually any source can be used, from plain text and Microsoft Excel tables to images and GIS shapefiles, and graphs and figures can be output in virtually any format. This flexibility and availability is what has made it a growing success in the field of statistics and database analysis.

Here I present a new package that has been developed to enable a selection of ecological and geographic analysis tools to be added to the base R environment. The package was originally developed with palaeontologists in mind and is appropriately entitled fossil. As of this writing, it is in version 0.3.2, and although there are planned additions to the code, the functions already present allow for a large number of analyses to be performed.

Reasons for developing fossil are many fold. The underlying impetus was to create a single package to examine large datasets with up-to-date methods of biodiversity estimators and ecological pattern recognition that can be used in conjunction with geographic data over long time scales. Macroecological analysis is a growing area and palaeontologists now have a real opportunity to answer modern questions of biodiversity distributions, thanks in large part to the deep time of the fossil record. By providing powerful tools that integrate well, we can spend more time on the questions rather than the methods.

A number of the functions that have been implemented in fossil can also be found in the excellent package vegan (Oksanen et al. 2010). Many of the species diversity and species estimator functions are implemented in both packages. However, the fossil package was implemented to cover a number of use cases that vegan did not cover. Initially, the primary function that was needed was a way to estimate species diversity using a number of functions all at once. As well, the function to create distance matrices with user defined measures was at the time more difficult to use, and so I have tried to implement a more easily extensible method. The fossil package also implements a number of spatial analysis and export tools that are not found within vegan, such as methods to calculate geographic distances and areas from a set of points.

For example, the fossil record, while accurate, is by no means complete (Benton et al. 2000) yet can still provide important information on biogeographic patterns. Using fossil, we can compare sparse ecological data with a number of ecological similarity indices (e.g., Chao-Jaccard, Chao-Sorenson, Simpson) and then observe the patterns of connectivity using various types of neighbour joining techniques. These patterns can then be visualised in ecological space, using ordinations to group similar sites, and in geographic space, placing localities on a map and observing how this ecological connectivity relates to geography. Combining spatial, ecological and temporal data provides a more complete picture of the evolution of the biosphere than any one factor alone.


Next Section

R Tools for Paleontology
Plain-Language & Multilingual  Abstracts | Abstract | Introduction | What is R and Why Should We Use It?
 Setting Up the Environment | Loading Your Data in R | Distance/Stance/Similarity/Beta Diversity Indices
 Non-Parametric Species Estimators and RarefactionMinimum Spanning Trees
Biogeography and GIS | Conclusions | Acknowledgements | References | Appendix
Print article