This folder contains the partial least-squares regression (PLSR) model coefficients to accompany the paper “Reflectance spectroscopy allows rapid, accurate, and non-destructive estimates of functional traits from pressed leaves” by Kothari et al. (2022) Methods in Ecology & Evolution. An open version is on bioRxiv, DOI: 10.1101/2021.04.21.440856.

Each set of models for a given trait comprises a .csv file containing 100 models (rows) \(\times\) coefficients (columns). The 100 models are derived from a jackknife analysis described in the paper. To generate trait estimates using a model set, you can apply them to the data (see below) and use the mean or the full distribution of estimates.

Spectral data preparation

Your spectral data should ideally be processed in the way described by the paper linked above. At a minimum, the data must be resampled to a 1 nm continuously in the 400-2400 nm range, or 1300-2400 nm for the restricted-range pressed-leaf models. The data should be a matrix in the format samples \(\times\) wavelengths.

Kinds of models

Here, I provide six different sets of models:

The file names should indicate which traits are correspond to which files. All elements are referred to by their standard symbols. Other abbreviations are as follows:

Code to build the models is available on GitHub or as an archived version on Zenodo (DOI: 10.5281/zenodo.6824407). You can find more details about model performance in the paper. Note that models vary tremendously in their performance depending on the trait and the type of tissue whose spectra are used to predict it. Please exercise caution in using models without some sort of validation using conventional trait measurements.

Units

I use the term ‘[dry] chemical traits’ to refer to all traits except LMA, EWT, and LDMC, which relate to leaf structure or water content. These non-chemical traits are in units of:

Chemical traits are on either a dry mass-basis or an area-basis. Mass-based chemical traits are:

Area-based chemical traits are always in g per cm-squared.

Application

If you have a spectral dataset, you can generate model-based trait estimates by modifying the code below. Call the model trait.model (100 models \(\times\) 2002 coefficients) and the spectral dataset test.spectra (n samples \(\times\) 2001 wavelength bands), both arrays. Our models have intercepts, hence the extra coefficient (2002 = 1 + 2001). (For the restricted-range models, there are 1102 coefficients, and the input data should have 1101 wavelength bands.) We can generate estimates with the following code:

## define a function using a tiny bit of matrix algebra
## to apply the coefficients
apply.coefs<-function(coef.matrix,val.spec,intercept=T){
    if(ncol(coef.matrix)!=ncol(val.spec)+intercept){
        stop("spectral matrix has incorrect dimensions")
    }
    
    if(intercept==T){
        pred.matrix<-t(t(as.matrix(val.spec) %*% t(coef.matrix[,-1]))+coef.matrix[,1])
    } else {
        pred.matrix<-as.matrix(val.spec) %*% t(coef.matrix)
    }
}

trait.model<-read.csv("PressedModels/LMA.csv") ## for example
trait.estimates<-apply.coefs(coef.matrix = trait.model,
                             val.spec = test.spectra)
mean.estimates<-rowMeans(trait.estimates)

The object trait.estimates is a samples \(\times\) (100) models array whose row means constitute the average trait estimate for each sample.

Make sure to use models appropriate to the kind of data: fresh-leaf models for fresh-leaf spectra, and so on. Otherwise, the models are sure to return very inaccurate measurements.

Associated data products

There are two associated kinds of data products (spectra and leaf traits):

  1. The main fresh-, pressed-, and ground-leaf spectra from the Canadian Airborne Biodiversity Observatory (CABO). These data were used to calibrate the models.
  2. The Cedar Creek pressed-leaf dataset used to externally validate the models.

Even less processed fresh-leaf data can be queried from the CABO Data Portal.

I usually use the CRAN-hosted package spectrolab v. 0.0.10 (working in R 3.6.3) to handle spectral data. In this package, the class spectra allows users to attach and retrieve metadata from spectral data using the function meta(). Below, you can find an example script that reads a .csv file, like our archived data, and turns it into an R spectra object. (Alternately, the columns in the .csv file corresponding to wavelength bands could be converted into a matrix without the intermediate step of creating a spectra-class object.)

library(spectrolab)

spec_df<-read.csv("mydata.csv")
name_var<-1 ## index for the column that contains sample names
meta_vars<-2:20 ## adjust as needed: indices for columns that contain metadata (including traits)
band_names<-400:2400 ## wavelengths of spectral bands corresponding to remaining columns

## you can also use the as_spectra command, but it's a bit more finicky 
## with data frames because the column names of bands must contain a letter
spec<-spectra(value = spec_df[,-c(name_vars,meta_vars)],
              band_names = 400:2400,
              names = spec_df[,name_var],
              meta = spec_df[,meta_vars])
test.spectra<-as.matrix(spec) ## this matrix can be used in apply.coefs() above

You should be able to download the Cedar Creek pressed-leaf data and the pressed-leaf models and replicate Fig. 5 (external validation) from the paper. Try it out!

Maintenance and questions

Please contact Shan Kothari at shan.kothari [at] umontreal [dot] ca or quercusacerifolia [at] gmail [dot] com with any questions.

Terms of Use

I release these models under a Creative Commons CC-BY 4.0 license. This means that you can use them for any purpose as long as you credit the authors of the paper (Shan Kothari, Rosalie Beauchamp-Rioux, Etienne Laliberté, and Jeannine Cavender-Bares), ideally by citing this data repository and the paper to which it’s linked.