DISCUSSION

The CLAMP method can be criticized at two levels: data collection and data analysis. This paper is concerned primarily with improving the methods of analysis, but there are a few problems with the analyses that are based in the data collection.

First is the fact that the raw (by species) scores have, except in doctoral dissertations (Stranks 1996, Kennedy 1998) seldom, or never, been published. This means that some of the most important and interesting questions about phylogenetic distribution of leaf morphological variables and the differences among plants of different growth form (habit) can not be asked. Some recent studies like that by Kennedy et al. (2002) have not even published the CLAMP scores averaged by flora, but have only printed biplots of eigenvector loadings. This form of presentation is so highly processed and incorporates so many assumptions that it makes interpretation of the results difficult and reanalysis of the data impossible.

Second, Wolfe's (1993) selection of characters was explicitly based on preliminary eigenvector analyses, for instance:

During one stage of the study, I expanded the character set to include about 20 character states additional to [the original 29]. Judging from eigenvalues and percent of total variance accounted for, these characters either added nothing or even lowered both eigenvalues and percent variance.
Wolfe 1993, p. 20.

To reject potentially interesting variables from a coding scheme on the basis of low eigenvalues and percent of variance accounted for is to allow the statistical horse to bolt: you proceed quickly, but have little control over the direction you are traveling. Among the character states rejected were those relating to compound leaves, spinose teeth, and inrolled or thickened margins, all of which have clear mechanical adaptive significance. If we hope to obtain ecological or environmental data from leaf physiognomy as well as information about climate, such character states should be retained.

The same criticism goes for lumping together characters like "teeth round" and "teeth appressed" merely because "combining the states produced both higher eigenvalues and percent variance" (Wolfe 1993, p. 24).

Thirdly, the description of some of the character states seems ambiguous. Though it is not possible without a comparative study to say for certain that interpretations of the character descriptions would vary, it is not clear whether "0.25 if the teeth are both regular and irregular and some leaves have teeth" (Wolfe 1993, p. 24) should be interpreted as "0.25 if the teeth are regular and/or irregular and some leaves have teeth" or as "0.25 if the teeth are all regular or all irregular and some leaves have teeth."

In the case of deeply lobed leaves, the leaf size is supposed to be scored from a single lobe, but the aspect ratio and shape still refer to the overall leaf, whereas in the case of a compound leaf the leaflet is what is scored for size, aspect ratio, and shape. This is particularly problematic in genera like Rosa in which a plant can have compound, deeply lobed, and simple leaves on the same branch.

Fourthly, the scored variables are divided into sections relating to common topics. Leaf size, for instance is coded as proportions of leaves falling into nine size classes. The scores in some of these sections, like leaf size, aspect ratio, or shape must sum to one while the scores in the section describing teeth and lobation do not have a constant sum. This means that the presence of teeth is implicitly weighted more heavily in the overall description of the flora than, for instance, the leaf size, and it is not clear that any normalization procedure can correct this bias. The restriction of groups of variables to a constant sum introduces dependence and implicit weighting that are hidden by eigenvector analyses but made apparent by a graphical approach (see Figure 5).

Despite these issues with the process of coding, no morphological coding scheme could be ideal, and these criticisms of CLAMP are offered in a spirit of improving what is the only such system currently available in the published literature. In particular, Wolfe's original article (1993) was much more broadly focused than some subsequent publications: a discussion of axes of variation other than those corresponding to temperature and precipitation made it not only a contribution to paleoclimatology but also ecologically and botanically interesting.

More important than these problems in the coding are the true uncertainties associated with the estimation of paleoclimatic variables. It is generally accepted that the leaf physiognomy of a flora indicates the general climate regime of the area in which it grew: "tropical," "sub-tropical," or "temperate," and "wet" or "dry." These are categories that not only any botanist, but many laymen would recognize from simple leaf silhouettes. Beyond this there remains doubt as to the degree of precision and reliability that leaf physiognomy can provide, but there has been relatively little general discussion of what causes the real uncertainty in the procedure of estimating ancient environmental parameters from leaf morphology.

It is noticeable that two doctoral dissertations that have examined CLAMP data in detail are less sanguine about the errors associated with the methodology than most published articles. Stranks (1996) cautions that "the method is still in a developmental stage with many questions remaining unanswered" (Stranks 1996, p. 122) and "that a relationship exists between physiognomy and climate is clear. Whether it can successfully be applied to fossil floras in order to extract climate and altitude, however remains to be resolved" (Stranks 1996, p. 124). Though she does not use the term "spatial autocorrelation," she correctly observes that "the response of southern hemisphere sites in general cannot be compared to those of northern hemisphere sites" (Stranks 1996, p. 124). Greenwood et al. (2004) support this contention. Kennedy (1998) lists several sources of potential error and admits that "qualitative sources of error, such as subjectivity in morphotyping and taphonomic bias, could potentially introduce large amounts of uncertainty into palaeoclimatic interpretations" (Kennedy 1998, p. 20). In contrast to this conservatism, many publications suggest that: "CLAMP...is a powerful paleoclimate proxy with the ability to yield quantitative data on past temperatures, precipitation, growing season length, and humidity, as well as enthalpy" (Spicer et al. 2005, p. 429).

Some of the sources of error that must be dealt with are, in rough, increasing order of relative importance or difficulty of quantification:

Binomial sampling error. This is the simple and well-understood error associated with the random selection with replacement of n leaves out of a population of which a proportion P have untoothed margins. If this selection is repeated many times, the standard error of P should approach . This imposes a minimum error on the order of a few degrees with floras of about 30 species. In floras that have many more species (e.g., >100), the binomial error becomes insignificant (Wilf 1997).
Repeatablity of coding. At this stage, it is not clear what errors may be produced by different people coding the same floras, so this potential source of error is not readily distinguishable from spatial autocorrelation or the study effect discussed above. Future work will invest this source of error using blind experiments.
Spatial autocorrelation and irregular sampling. The current sampling distribution is very poor, but can be improved by collecting more samples where they are lacking, by gridding the available locality data on a raster and applying statistical tools spatially, and by creating spatially distributed artificial floras from species range data as has been done by Traiser et al. (2005). Unfortunately, climate station data are seldom or never available from exactly the same places as floras are collected. Up to a point, this can be addressed by appropriate methods of interpolation, but errors introduced by microclimatic variation and patchy species distributions may continue to remain problematic.
Inherent time-averaging. This is not an issue if MAT is the only dependent variable, but MAT is a grossly time-averaged quantity that will be perceived differently if data on, e.g., mean monthly temperatures are compared across studies. It is easy to illustrate how dramatically plants have evolved to respond to the timing of temperature change: CAM plants open their stomata during the night when it is cool and transpiration is reduced. As soon as one calculates average daily temperatures—much less monthly or yearly means—from an hourly record, one loses the ability to explain an entire evolutionary strategy that allows thousands of species of plants to exist. This is an extreme example, but the more general point that different temporal scales will affect the significance of variables like temperature must be taken into consideration.
Other sources of noise (elevation, microclimate, disturbance, soil type, systematics, taphonomy, etc.). All of these variables are known to be of importance at particular spatial and temporal scales, and must be considered. Is the sample skewed by collection of more low-altitude floras than high-altitude? Do secondary-growth (recently cleared) forests respond differently than primary forests? In the absence of clear answers to these questions about systematic biases, calculation of a stochastic binomial sampling error becomes nearly irrelevant.
Uniformity through time. How far back in time can spatial patterns observed in the modern day be extrapolated? This is a broad question facing all methods of reconstructing ancient climates; a simple criterion that is often implicitly invoked is that a method must work increasingly well as it approaches the present; hence error must increase as we go back in time.

The error figures usually associated with estimates of mean annual temperature (MAT) from leaf morphology are usually one- or two-standard deviation analytic errors calculated by assuming only binomial sampling error or normally distributed stochastic variation in the explanatory variables and then propagating this error through a regression line. When the number of species increases much beyond a typical 30, these analytical errors are dramatically reduced, which has led to the appearance in the literature of, for instance, temperature estimates of plus or minus a few degrees (Burnham et al. 2001, Kowalski 2002, Kennedy et al. 2002). Even errors of under a degree have appeared, which as Miller et al. (in press) point out is incompatible with a rigorous error analysis of the relationship between P and MAT.

Errors 4–6 may be ultimately unquantifiable and uncorrectable, but there is abundant evidence that the issue of spatial autocorrelation can be handled. Work by Thompson et al. (1999) provides graphical tools for plotting floras in ecological space and Traiser (2004) and Traiser et al. (2005) give spatially distributed leaf physiognomic data from synthetic floras for the whole continent of Europe. In concert with the sort of exploratory data analysis that is presented here, these techniques may make it possible, not only to improve estimates of terrestrial paleoclimates, but also to extract additional types of data about how environments and plant ecosystems have changed through time.