INTRODUCTION

Phylogenetic analysis is fundamental to many modern evolutionary studies of fossil taxa. Not only does an understanding of the evolutionary relationships among taxa contribute to our understanding of biotic diversity in general, but phylogenetic trees themselves also have become a useful component to the analysis of many other aspects of biotic diversity and evolution (Harvey and Pagel 1991; Huelsenbeck and Rannala 1997) including morphologic evolution (e.g., Wagner 1996; Stockmeyer Lofgren et al. 2003), taxonomic diversification (e.g., Slowinski and Guyer 1993; Sanderson and Donoghue 1996), and biogeography (e.g., Lieberman 2005; Ree et al. 2005). Needless to say, the accuracy of phylogenetic inferences is critical to the success of these analyses (Wagner 1998, 2000).

Traditional maximum-parsimony cladistic analysis (henceforth "cladistics") seeks the set of relationships (summarized using a branching diagram or cladogram) that minimizes the number of ad hoc hypotheses of character evolution (e.g., Farris 1983). These ad hoc explanations are in the form of homoplasy (i.e., convergence, parallelism, or reversal), in which characters exhibit more than the minimum number of state changes required given the distribution of character states among taxa. The total number of these ad hoc hypotheses becomes the currency by which cladograms may be evaluated, and the number of hypotheses of homoplasy beyond the hypothetical minimum (i.e., zero, in the case of no homoplasy) has been referred to as the total parsimony debt (Fisher 1982, 1992). In practice, cladistics employs simple algorithms to calculate the minimum parsimony debt for a given data set of characters on a particular cladogram. This minimum can be compared among competing cladograms for the same set of taxa, and the cladogram (or cladograms) exhibiting the minimum debt is chosen as optimal.

Stratocladistics (Fisher 1992, 1994; Clyde and Fisher 1997; Fox et al. 1999; Bodenbender and Fisher 2001) was developed as an extension of cladistics to make use of the temporal order of taxa in the fossil record as data in phylogenetic analysis. It simultaneously considers both character data and stratigraphic data in the form of the intervals of first and last appearance of taxa to be analyzed. Stratocladistics subscribes to the identical philosophy as cladistics; both seek the phylogenetic hypothesis requiring the minimum number of ad hoc hypotheses. As in cladistics, ad hoc hypotheses include homoplasy, but stratocladistics also considers instances of nonpreservation of particular taxa during intervals in which other taxa are preserved (henceforth "gaps") to be ad hoc explanations. In this way, the total parsimony debt incurred by a tree is the sum of its character debt and stratigraphic debt.

Phylogenetic hypotheses (i.e., trees) make predictions about the distribution of taxa in the fossil record (e.g., Novacek and Norell 1982; Smith 1988; Norell 1992, 1993). For example, taxa that originate early in a clade's history should be found relatively early in the fossil record. If not, then a hypothesis of nonpreservation must be stated or implied to account for a taxon's absence from the early part of the record. Stratocladistics considers stratigraphic intervals in which a lineage is expected to be present based on the topology of the phylogenetic tree but is, in fact, not observed to be evidence against a phylogenetic hypothesis. In such a case, the implied gap is a result of the (perhaps provisional) acceptance of the tree over other trees that may not imply the same gap. The tree (or trees) that simultaneously minimizes ad hoc hypotheses of nonpreservation, as well as those of character homoplasy, is considered optimal (Fisher 1992). See Bodenbender and Fisher (2001) for a more extensive description of stratocladistic analysis.

Stratocladistics differs from traditional cladistics in another important way in that stratocladistics allows for taxa to be designated as ancestral to other taxa if that arrangement reduces the instances of nonpreservation without invoking more instances of homoplasy. In doing so, the resulting trees are phylogenetic trees that describe the ancestor-descendant relationships among taxa, whereas cladograms from cladistic analysis are only diagrams that depict recency of common ancestry and sister group relationships (see Hull 1979), or "hierarchies founded on homology hypotheses" (Brochu et al. 2001, p. 174). These evolutionary trees offer more specific hypotheses of the evolutionary relationships among taxa and are therefore more easily refuted with additional data (Fox et al. 1999). Foote (1996) suggests that the incidence of ancestors in the fossil record is not negligible; under reasonable models of evolution and preservation, at least 1-10% of known fossil taxa are likely to be direct ancestors. Therefore, this distinction between cladograms and evolutionary trees that explicitly include ancestral taxa has important implications for those studies that use either type of trees in the analysis of biologic diversity or character evolution. For example, Wagner (2000) demonstrated that failure to recognize ancestral taxa properly can mislead metrics intended to measure the quality of the fossil record from model phylogenies. Lane et al. (2005) have likewise demonstrated how the misidentification of ancestral taxa as sister taxa, as certainly happens with an unknown frequency in cladistic analyses, can lead to overestimates of past taxonomic richness. Stratocladistics is currently one of the few phylogenetic methods that can operationally identify ancestral taxa and thus holds considerable promise for our understanding of the evolutionary history of fossil organisms.

Despite the potential promise of stratocladistics, some issues have been raised that are yet unresolved (e.g., Nelson 1978; Smith 2000; Sumrall and Brochu 2003). Criticisms of the use of stratigraphic data in phylogenetic analysis are typically philosophical or theoretical in nature. Examples include whether stratigraphic data constitute phylogenetic or nonphylogenetic data (Sumrall and Brochu 2003) or positive or negative evidence (Heyning and Thacker 1999). However, others argue that any data about which a hypothesis makes predictions are appropriate for testing that hypothesis, and that we cannot limit analysis to only data that support hypotheses, or only those that contradict them (Wagner 2000).

Despite this debate, the general use and evaluation of stratocladistic methods have been hampered by the lack of an efficient, automated means of performing analyses (Fisher 1992; Fox et al. 1999; Bodenbender and Fisher 2001; Fisher and Bodenbender 2003; Sumrall and Brochu 2003). PAUP* (Swofford 2002) is the most widely used computer program for cladistic analysis, but it does not support stratigraphic data, nor does it search for optimal assignments of ancestors. MacClade (Maddison and Maddison 2005) supports stratigraphic data and can be used to perform searches for optimal assignments of ancestors on single trees, but its branch swapping capabilities are limited and do not include simultaneous searches for optimal assignments of ancestors. To date, published stratocladistic searches (Fox et al. 1999; Bloch et al. 2001; Bodenbender and Fisher 2001; Geisler and Uhen 2005) have been restricted to piecemeal analyses that involve iterations of traditional cladistic analysis using PAUP*, followed by additional manual branch swapping and ancestor assignment on the resulting trees (with the stratigraphic character included) for more optimal ones using MacClade (Fisher 1992). In doing this approach, the critical step of searching for optimal ancestor assignments is separated from that of branch swapping.

Although heuristic search strategies are never guaranteed to consider all possible solutions exhaustively, this manual search strategy is considerably limited because these components of the search are decoupled operationally, and it is quite likely that the optimal topology and assignment of ancestors might not be visited at all. Furthermore, the order in which taxa are assigned as ancestors can affect the debt incurred or saved by assignments of subsequent taxa as ancestors. Therefore, to find the optimal set of taxa assigned as ancestors, more than one sequence of taxon assignments should be performed. The number of possible sequences can be quite large when the ingroup includes enough taxa to be a computationally intensive phylogenetic problem, which is increasingly typical for analyses including fossil taxa. For such problems, if trees are evaluated one ancestral assignment at a time, the numbers of sequences that can reasonably be searched by hand is necessarily small. The vast number of possible combinations of trees and ancestral assignments render the manual search inexact and impractical for even small datasets, and an automated search is necessary.

Here, we present a new computer program entitled StrataPhy to perform full stratocladistic searches. StrataPhy is available from Appendix I or at the author's site. We begin by describing the algorithm used to perform these searches, then describe how StrataPhy can be used in conjunction with other phylogenetic software to produce files for analysis, and finally, discuss analysis parameters in StrataPhy that are modifiable by the user in the current version. Future releases of StrataPhy will include additional features, which will be discussed briefly in the conclusions of this paper.