Fitting data to functions


Linear

Typical applicationAssumptionsData needed
Fitting data to a straight line, or exponential or power function None Two columns of counted or measured data

Two columns must be selected (x and y values). A straight line y=ax+b is fitted to the data. There are two different algorithms available: Standard regression and Reduced Major Axis (the latter is selected by ticking the box). Standard regression keeps the x values fixed, and finds the line which minimizes the squared errors in the y values. Use this if your x values have very little error associated with them. Reduced Major Axis tries to minimize both the x and the y errors.

Also, both x and y values can be log-transformed, in effect fitting your data to the 'allometric' function y=(10^b)(x^a). An a value around 1 indicates that a straight-line ('isometric') fit may be more applicable.

The values for a and b, their errors, a Chi-square correlation value, Pearson's r correlation, and the probability that the columns are not correlated are given.

Exponential functions

Your data can be fitted to an exponential function y=(e^b)(e^(ax)) by first log-transforming just your y column (in the Massage menu) and then performing a straight-line fit.

Sinusoidal

Typical applicationAssumptionsData needed
Fitting data to a set of periodic, sinusoidal functions None Two columns of counted or measured data

Two columns must be selected (x and y values). A sum of up to six sinusoids with periods specified by the user, but with unknown amplitudes and phases, is fitted to the data. This can be useful for modeling periodicities in time series, such as annual growth cycles or climatic cycles, usually in combination with spectral analysis. The algorithm is based on a least-squares criterion and singular value decomposition ( Press et al. 1992). By default, the periods are set to the range of the x values, and harmonics (1/2, 1/3, 1/4, 1/5 and 1/6 of the fundamental period). These values can be changed, and need not be in harmonic proportion.

With a little effort, frequencies can also be estimated by trial and error, by adjusting the frequency so that amplitude is maximized (this procedure is difficult with more than a single sinusoidal).

It is not meaningful to specify periodicities that are smaller than two times the typical spacing of data points.

Logistic

Typical applicationAssumptionsData needed
Fitting data to a logistic or von Bertalanffy growth model None Two columns of counted or measured data

Attempts to fit the data to the logistic equation y=a/(1+b*exp(-cx)), where x starts at zero (that is, your data set is shifted so that your leftmost data point is moved to x=0). The algorithm is a little complicated. The value of a is first estimated to be the maximal value of y. The values of b and c are then estimated using a straight-line fit to a linearized model.

Though acceptable, this estimate can optionally be improved by using the estimated values as an initial guess for a Levenberg-Marquardt nonlinear optimization (Press et al. 1992). This procedure can sometimes improve the fit, but due to the numerical instability of the logistic model it can fail with an error message.

The logistic equation can model growth with saturation, and was used by Sepkoski (1984) to describe the proposed stabilization of marine diversity in the late Palaeozoic.

Von Bertalanffy

An option in the 'Logistic fit' window. Uses the same algorithm as above, but fits to the von Bertalanffy equation y=a*(1-b*exp(-cx)). This equation is used for modelling growth of multi-celled animals (in units of length or width, not volume).

More information about the logistic and the von Bertalanffy growth models can be found in Brown & Rothery 1993.

B-splines

Typical applicationAssumptionsData needed
Smoothing noisy data None Two columns of counted or measured data

Two columns must be selected (x and y values). The data are fitted with a least-squares criterion to a B-spline, which is a sequence of third-order polynomials, continuous up to the second derivative. A typical application of this is the construction of a smooth curve going through a noisy data set.

A decimation factor is set by the user, and controls how many data points contribute to each polynomial section. Larger decimation gives a smoother curve.

Note that sharp jumps in your data can give rise to oscillations in the curve, and that you can also get large excursions in regions with few data points.

Next: Time series analysis