The data available for this analysis come from four studies. The first and largest, wolfe173, is an updated version of the data published by Wolfe (1993), sometimes referred to as "CLAMP 3B" (e.g., Jacobs 2002). It is available on the web as two Excel spread sheets (.xls files) containing respectively the morphological and environmental data for 173 floras, points representing which are colored black in the figures in this paper. For the 103 floras that were published in 1993, the number of species in each flora and its latitude, longitude, and elevation were typed in from Wolfe (1993). The geographical distribution of this data set is mainly restricted to the continental United States and Japan, though there are a few floras from Alaska and continental east Asia. These data are available in the Appendix, and include i) the climatic variables and ii) the morphological leaf scores.
The second data set, jacobs, is from Jacobs (1999, 2002) who gives CLAMP scores and associated environmental data for 30 floras in tropical Africa. This study used the original 29-variable coding scheme, so there are two variables with all values missing. Points from jacobs are colored red in the figures in this paper. The third data set, gregory, is from Gregory-Wodzicki (2000), provides CLAMP scores and environmental data for 12 floras in Bolivia and is colored green here. Finally, Kowalski (2002) provides CLAMP scores and environmental data for 30 floras in tropical South America, which are represented by blue points here.
The data in jacobs, gregory, and kowalski were scanned in from tables in the cited publications, processed for automatic text-recognition, and then hand-edited and spot checked for accuracy. The data were read into the open-source program R (R Development Core Team 2004) from tab-delimited text files, which are available in the supplementary data archive and preprocessed so that all studies were in comparable form. The code used is given in the script file in the data archive. The data matrices are not printed because all the data have appeared in print before.
The completed data set consists of 245 floras and is stored as a series of data frames in R with the suffix -all for the raw frames containing both morphological and environmental data, and a suffix -clamp for the cleaned CLAMP scores. The complete data set is all, and the supplementary material typed in from Wolfe (1993) is a separate data frame called wolfe1993. Stranks (1996) provides additional data from New Zealand that have not yet been processed.
The 31 physiognomic variables described in Wolfe (1993, 1995) are listed in Table 1. Unfortunately, unless growing season precipitation is taken to be the same as annual precipitation, the only environmental variable that appears in all four data sets is mean annual temperature (MAT), so our comparison of different studies is restricted to a single response variable. This is unfortunate because the main point of applying a multivariate framework is to elicit information about multiple response variables. Because little of the true uncertainty in a temperature estimate comes from analytic error in the explanatory variables (this contention is defended below), it is highly unlikely that a multivariate framework will really improve temperature prediction much, however much of it can be made to reduce residual error of the regression.