Appendix

We show below a short program code, written in R (R Development Core Team 2008), that takes the presence-absence data as an input and outputs the statistically significant positive, negative, and extreme correlations. In this example, for simplicity, the base set is the same for all pairs of species.

# Input:
# n Number of locations.
# m Number of species.
# X nXm matrix such that X[i,j]=1 if species j occurs at locality i,
# X[i,j]=0 otherwise.
# alpha Significance threshold; we use alpha=0.05.
#
# Output:
# Cneg mXm Boolean matrix such that Cneg[i,j]=TRUE if there is a
# significant negative correlation between species i and j,
# Cneg[i,j]=FALSE otherwise.
# Cpos mXm Boolean matrix such that Cpos[i,j]=TRUE if there is a
# significant positive correlation between species i and j,
# Cpos[i,j]=FALSE otherwise.


# We use the R function for hypergeometric distribution to obtain
# the p-values of the Fisher's Exact Test, mid-P-variant.
fisher.test.midp <- function(x) {
# Input:
# x Two-dimensional contingency matrix.
#
# Output:
# One-tailed p-value of the Fisher's Exact Test mid-P.
( dhyper(x[1,1],x[1,1]+x[1,2],x[2,1]+x[2,2],x[1,1]+x[2,1])/2

+(if(x[1,1]>0) phyper(x[1,1]-1,x[1,1]+x[1,2],x[2,1]+x[2,2],x[1,1]+x[2,1]) else 0) )
 
}


# The p-values related to the correlations using the one-tailed Fisher's
# Exact Test mid-P
P <- matrix(0.5,nrow=m,ncol=m)
for(i in 1:(m-1)) {
for(j in (i+1):m) {
baseset <- 1:n # Here you can choose an appropriate base set for each
# pair of species i.e. only localities whose index is included
# in vector baseset are used to compute correlations
P[i,j] <- P[j,i] <- fisher.test.midp(table(factor(X[baseset,i], levels=c(0,1)), factor(X[baseset,j], levels=c(0,1))))
}
}

# One-tailed p-values for negative, positive and extreme correlations, respectively.
Pneg <- P
Ppos <- 1-P
Pext <- 2*pmin(P, 1-P)
# Adjust the p-values using the Benjamini-Hochberg method and pick the p-values
# that are at most alpha.
Cneg <- as.matrix(p.adjust(as.dist(Pneg),method="BH")) <= alpha
Cpos <- as.matrix(p.adjust(as.dist(Ppos),method="BH")) <= alpha
Cext <- as.matrix(p.adjust(as.dist(Pext),method="BH")) <= alpha

 

1

The hypergeometric distribution corresponds to a process where k balls are drawn in random from an urn with m white and n black balls. phyper(q,m,n,k) denotes the probability of drawing at most q white balls.