--- title: "Timescale-specfic variance ratio (tsvr) package vignette" author: "Lei Zhao, Shaopeng Wang, Daniel Reuman" date: "" geometry: "left=1cm,right=1cm,top=2.5cm,bottom=2.8cm" output: pdf_document: number_sections: yes keep_tex: yes fig_caption: yes link-citations: True urlcolor: blue bibliography: tsvrvignette_refs.bib vignette: > %\VignetteIndexEntry{"tsvr vignette"} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- \newcommand{\mean}{\operatorname{mean}} \newcommand{\var}{\operatorname{var}} \newcommand{\cov}{\operatorname{cov}} \newcommand{\cor}{\operatorname{cor}} The `tsvr` package provides an implementation of a timescale-specific extension and generalization of the variance ratio of @Peterson_75. The variance ratio is used commonly in community ecology. The extension implemented in the `tsvr` package is described in detail by @Zhao_inprep. The `tsvr` package supports that paper and provides an implemetation of the tools developed there for anyone to use. The mathematical formulas for the variance ratio and extensions are detailed elsewhere [@Peterson_75; @Hallett_14; @Zhao_inprep]. The mathematics are also summarized here, but the main purpose of this vignette is to provide a decription of how to use the `tsvr` package. # Preparing data \noindent A typical dataset for analysis using `tsvr` is an $N \times T$ matrix of nonnegative numeric values where rows correspond to species in a community (so the number of species is $N$) and columns correspond to evenly spaced times during which sampling was conducted (so the number of times sampling was conducted is $T$). Matrix entries may be densities, or percent cover values for plant species within a quadrat, or biomasses, or other measures of abundance of the species. For instance: ```{r JRG_data, echo=TRUE} library(tsvr) class(JRGdat) names(JRGdat) d<-t(as.matrix(JRGdat[,2:dim(JRGdat)[2]])) dim(d) ``` Here `d` is a $26 \times 28$ matrix containing percent cover measurements for each year from $1983$ to $2010$ for 26 species which occurred in a 1 m$^2$ plot in the Jasper Ridge Biological Preserve serpentine grassland site. Species names are in the row names of `JRGdat`, which is an example dataset embedded in the `tsvr` package. Documentation for the dataset can be viewed via `?JRGdat`. See [@Hallett_14] for details of the Jasper Ridge ecosystem and of these and other related data. Standard implementations of Fourier transforms require time series consisting of measurements taken at evenly spaced times, with no missing data. The core functions provided in `tsvr` make these same assumptions and throw an error if data are missing. The user is left to decide on and implement a reasonable way of filling missing data, if data are missing. We have previously used the simple approach of replacing missing values in a time series by the median of the non-missing values in the time series [@Sheppard_16]. This approach, and other related simple procedures [@Sheppard_16], seem unlikely to artefactually produce significant synchrony, or coherence relationships with other variables, but rely on the percentage of missing data being fairly low and may obscure detection of synchrony or significant coherence relationships if too many data are missing. For applications which differ meaningfully from the prior work for which the tools of this package were developed [@Zhao_inprep], different ways of filling missing data may be more appropriate. The timescale-specific variance ratio techniques which are the focus of this package use Fourier methods to decompose by timescale the classic variance ratio and related quantities. Detrending and variance standardization across time series, techniques which are often applied before doing Fourier analysis, may not be approriate except in cases for which it makes sense to calculate the classic variance ratio and related quantities after performing those techniques. # The classic variance ratio and related quantities \noindent Let $x_i(t)$ be a measure of the population of species $i$ at time $t$, for $i=1,\ldots,N$ and $t=1,\ldots,T$. Define $x_{\text{tot}}(t)=\sum_{i=1}^N x_i(t)$ to be the total of all species populaton measures. Define $\text{CV}_{\text{com}}^2$ to be the square of the coefficient of variation of $x_{\text{tot}}(t)$ over time, i.e., $\var_t(x_{\text{tot}}(t))/(\mean_t(x_{\text{tot}}(t)))^2$, where $\var_t$ and $\mean_t$ represent variance and mean computed through time. This equals $\sum_{i,j} \cov_t(x_i(t),x_j(t))/(\mean_t(x_{\text{tot}}(t)))^2$, where $\cov_t$ is covariance through time. The abbreviation "com" in $\text{CV}_{\text{com}}^2$ stands for "community" since $\text{CV}_{\text{com}}^2$ is the squared coefficient of variation of the whole-community population. Define $\text{CV}_{\text{com\_ip}}^2$ to be the value of $\text{CV}_{\text{com}}^2$ that would pertain if the population dynamics of different species were independent, so that $\cov_t(x_i(t),x_j(t))=0$ for all $i \neq j$. The abbreviation "ip" stands for "independent populations". Thus $\text{CV}_{\text{com\_ip}}^2 = \sum_{i} \cov_t(x_i(t),x_i(t))/(\mean_t(x_{\text{tot}}(t)))^2 = \sum_{i} \var_t(x_i(t))/(\mean_t(x_{\text{tot}}(t)))^2$. The classic variance ratio is defined as $\varphi=\text{CV}_{\text{com}}^2/\text{CV}_{\text{com\_ip}}^2$, so that $\text{CV}_{\text{com}}^2=\varphi \text{CV}_{\text{com\_ip}}^2$. A variance ratio greater than $1$ suggests "synchronous" dynamics of the species comprising the community, so that community variability ($\text{CV}_{\text{com}}^2$) is greater than it would be if populations were independent ($\text{CV}_{\text{com\_ip}}^2$). A variance ratio less than $1$ suggests "compensatory" dynamics of the species comprising the community (i.e., increases/decreases in some species are partly compensated for by decreases/increases in others), so that community variability is less than it would be if populations were independent. The quantities $\text{CV}_{\text{com}}^2$, $\text{CV}_{\text{com\_ip}}^2$ and $\varphi$ can be computed using the `vreq_classic` function in the `tsvr` package, we here do so for the dataset `d` of the previous section, which we will continue to use throughout this vignette: ```{r vreq_classic_demo, echo=TRUE} res<-vreq_classic(d) class(res) names(res) summary(res) print(res) all.equal(res$com,res$comnull*res$vr) ``` The `vreq_classic` S3 class, of which `vreq_classic` is the generator function, inherits from the generic S3 class `vreq` (generator function `vreq`) and the `list` class, and has the three slots `com`, `comnull`, and `vr`. These slots correspond to $\text{CV}_{\text{com}}^2$, $\text{CV}_{\text{com\_ip}}^2$ and $\varphi$. Both the `vreq` and `vreq_classic` classes have `print` and `summary` methods and `set_*` and `get_*` methods where `*` represents any of the class slot names. See documentation for `vreq`, `vreq_classic`, `vreq_methods`, `vreq_classic_methods`, `setget_methods` for details. The "classic" in `vreq_classic` references the fact that this version of the variance ratio is the original version used in community ecology [@Peterson_75], and is probably still the most commonly used. Alternative versions have been proposed [@Loreau_08] - see the next section of this vignette. # The Loreau-de Mazancourt variance ratio and related quantities \noindent Define $\text{CV}_{\text{pop}}^2=\left(\sum_i \sqrt{\var_t (x_i(t))}\right)^2/(\mean_t(x_{\text{tot}}(t)))^2$. Another version of the variance ratio has been proposed by Loreau and de Mazancourt: $\varphi_{\text{LdM}} = \text{CV}_{\text{com}}^2/\text{CV}_{\text{pop}}^2$. Thus $\text{CV}_{\text{com}}^2 = \varphi_{\text{LdM}} \text{CV}_{\text{pop}}^2$. See [@Loreau_08] for details. The `tsvr` package implements the Loreau-de Mazancourt approach: ```{r vreq_LdM_demo, echo=TRUE} res<-vreq_LdM(d) class(res) names(res) summary(res) print(res) all.equal(res$com,res$comnull*res$vr) all.equal(res$com,vreq_classic(d)$com) ``` The `vreq_LdM` S3 class, of which `vreq_LdM` is the generator function, inherits from the generic S3 class `vreq` and the `list` class, and has slots `com`, `comnull`, and `vr`. These slots correspond to $\text{CV}_{\text{com}}^2$, $\text{CV}_{\text{pop}}^2$ and $\varphi_{\text{LdM}}$. The class `vreq_LdM` has `print` and `summary` methods and `set_*` and `get_*` methods where `*` represents any of the class slot names. See documentation for `vreq_LdM`, `vreq_LdM_methods`, and `setget_methods` for details. # The timescale-specific classic variance ratio and related quantities \noindent Next we describe a timescale-specific extension of the classic variance ratio, and its related quantities, that uses spectral methods, a set of standard statistical tools in ecology and other fields. The power spectrum of the time series $x_i(t)$, here denoted $s_{ii}(\sigma)$ and defined for timescales $\sigma=T/(T-1),T/(T-2),\ldots,T/2,T$, decomposes $\var_t(x_i(t))$ by timescale in that $s_{ii}(\sigma)$ is nonnegative, will tend to be larger for timescales on which $x_i(t)$ shows greater variation through time, and $\sum_\sigma s_{ii}(\sigma) = \var_t(x_i(t))$. Likewise, the cospectrum $s_{ij}(\sigma)$ decomposes $\cov_t(x_i(t),x_j(t))$ by timescale in a similar way, being larger for timescales on which the two time series predominantly covary. We define a timescale-specific generalization of $\text{CV}_{\text{com}}^2$ as $\text{CV}_{\text{com}}^2(\sigma) = \sum_{i,j} s_{ij}(\sigma)/(\mean_t(x_{\text{tot}}(t)))^2$. It is straightforward to see that $\sum_\sigma \text{CV}_{\text{com}}^2(\sigma) = \text{CV}_{\text{com}}^2$, so $\text{CV}_{\text{com}}^2(\sigma)$ decomposes $\text{CV}_{\text{com}}^2$ by timescale. $\text{CV}_{\text{com}}^2(\sigma)$ reveals to what extent variation on each timescale contributes to community variability. Likewise, $\text{CV}_{\text{com\_ip}}^2 = \sum_{i} \var_t(x_i(t))/(\mean_t(x_{\text{tot}}(t)))^2$ can be decomposed by timescale as $\text{CV}_{\text{com\_ip}}^2(\sigma) = \sum_{i} s_{ii}(\sigma)/(\mean_t(x_{\text{tot}}(t)))^2$. Finally, we define a timescale-specific version of the classic variance ratio as the quotient of these two quantities, i.e., $\varphi_{ts}(\sigma)=\left( \sum_{i,j} s_{ij}(\sigma) \right)/\left( \sum_i s_{ii}(\sigma) \right)$, so that $\text{CV}_{\text{com}}^2(\sigma) = \varphi_{ts}(\sigma) \text{CV}_{\text{com\_ip}}^2(\sigma)$. The timescale-specific variance ratio quantifies the extent to which species' oscillations are synchronous ($>1$) or compensatory ($<1$) on a timescale-by-timescale basis. For further details, see [@Zhao_inprep]. The quantities $\text{CV}_{\text{com}}^2(\sigma)$, $\varphi_{ts}(\sigma)$ and $\text{CV}_{\text{com\_ip}}^2(\sigma)$ can be computed using the `tsvr` package: ```{r tsvreq_classic_demo, echo=TRUE} res<-tsvreq_classic(d) class(res) names(res) summary(res) print(res) all.equal(res$com,res$tsvr*res$comnull) all.equal(sum(res$com),vreq_classic(d)$com) all.equal(sum(res$comnull),vreq_classic(d)$comnull) ``` Here, `ts` is a vector of timescales ($T/(T-1),T/(T-2),\ldots,T/2,T$), and the other elements are vectors of the same length containing timescale-specific information. The element `com` contains $\text{CV}_{\text{com}}^2(\sigma)$, `comnull` contains $\text{CV}_{\text{com\_ip}}^2(\sigma)$, and `tsvr` contains $\varphi_{ts}(\sigma)$. The `tsvreq_classic` S3 class inherits from the generic class `tsvreq` and from the `list` class. The timescale-specific variance ratio, $\varphi_{ts}(\sigma)$, is not a decomposition of $\varphi_{ts}$, i.e., summing $\varphi_{ts}(\sigma)$ across timescales does not yield $\varphi$. However, defining $w(\sigma)=\sum_i s_{ii}(\sigma)/(\sum_i \var_t(x_i(t)))$, one can show $\sum_\sigma w(\sigma)=1$, so the $w(\sigma)$ are weights, and $\sum_\sigma w(\sigma) \varphi_{ts}(\sigma) = \varphi$. So $\varphi$ is a weighted average of the values of $\varphi_{ts}(\sigma)$ across timescales. The `wts` field of a `tsvreq_classic` object contains $w(\sigma)$. ```{r tsvreq_classic_demo_2, echo=TRUE} sum(res$wts) res2<-vreq_classic(d) all.equal(sum(res$tsvr),res2$vr) all.equal(sum(res$wts*res$tsvr),res2$vr) ``` The plot method for the `tsvreq_classic` class displays the various components as functions of timescale: ```{r tsvreq_classic_demo_3, echo=TRUE, out.height="50%"} plot(res,filename="Tsvreq_classic_demo") knitr::include_graphics("Tsvreq_classic_demo.pdf") ``` The plots are symmetric about the middle, because Fourier transforms of real-valued time series have this property. The middle timescale is associated with the Nyquist frequency. The gray shading is a reminder of the symmetry - one should typically interpret the left, un-grayed side of plots. Both symmetric sides are plotted because sums must be computed over all displayed timescales to equal the corresponding frequency-nonspecific analogues. Additionally, non-smoothed Fourier transforms are used so that sums will exactly equal frequency-nonspecific analogues. Unsmoothed Fourier spectra and cospectra are jagged, so peaks of plots should not be given undue interpretive weight unless smoothing or averaging over timescales (see below) or significance testing is performed. # The lack of a timescale-specific Loreau-de Mazancourt variance ratio \noindent It is difficult to envision an analogous timescale-specific version of the Loreau-de Mazancourt approach. The quantity $\text{CV}_{\text{pop}}^2=\left(\sum_i \sqrt{\var_t (x_i(t))}\right)^2/(\mean_t(x_{\text{tot}}(t)))^2$ cannot be decomposed by replacing the variances in the numerator by power spectra because of the square root. # Aggregating the timescale-specific classic variance ratio and related quantities to timescale bands \noindent If $\Omega$ is a set of timescales, and defining $\text{CV}_{\text{com}}^2(\Omega)=\sum_{\sigma \in \Omega} \text{CV}_{\text{com}}^2(\sigma)$, $\text{CV}_{\text{com\_ip}}^2(\Omega)=\sum_{\sigma \in \Omega} \text{CV}_{\text{com\_ip}}^2(\sigma)$, and $\bar{\varphi}_{ts}(\Omega)=\frac{\sum_{\sigma \in \Omega} \varphi_{ts}(\sigma) w(\sigma)}{\sum_{\sigma \in \Omega} w(\sigma)}$, it has been shown [@Zhao_inprep] that $\text{CV}_{\text{com}}^2(\Omega) = \bar{\varphi}_{ts}(\Omega) \text{CV}_{\text{com\_ip}}^2(\Omega)$. Aggregating over timescales mitigates the jaggedness resulting from unsmoothed Fourier transforms (see above). The `tsvr` package provides tools for aggregating over any collection of timescales: ```{r agg_demo, echo=TRUE} res<-tsvreq_classic(d) aggresLong<-aggts(res,res$ts[res$ts>=4]) aggresShort<-aggts(res,res$ts[res$ts<4]) class(aggresLong) names(aggresLong) print(aggresLong) print(aggresShort) ``` The `aggts` function is the generator function for the `vreq_classic_ag` class, which inherits from the `vreq` class and from `list`. Note that the best way to specify the timescales over which to aggregate is by using conditions such as `aggts(res,res$ts[res$ts<4])`. If you type in timescales, e.g., `aggts(res,c(1.03,1.07,1.12))`, and the timescales do not match *exactly* with timescales in `res$ts`, they will be removed. No error or warning will be triggered unless there are no remaining timescales. The reason for this is, the code removes all timescales that are not among the canonical Fourier timescales less than $2$, the Nyquist timescale. Remaining timescales are then reflected about the Nyquist timescale to account for the symmetry of Fourier transforms. This setup makes it possible to specify, say, aggregation over timescales less than 4 by `aggts(res,res$ts[res$ts<4])` instead of `aggts(res,res$ts[res$ts<4 & res$ts>4/3])` (which gives the same results if run, but is an inconvenient format with which to specify timescales). In other words, only timescales greater than or equal to the Nyquist timescale (2) need be specified in the argument `ts`, and the symmetric timescales on the other side of the Nyquist timescale are included automatically. See also the documentation for `aggts`. # Acknowledgements \noindent This material is based upon work supported by the National Science Foundation under grant numbers 1714195 and 1442595, by the James S McDonnell Foundation, and by a working group grant from the Long Term Ecological Research Network Communications Office of the National Center for Ecological Analysis and Synthesis. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of these funders. We thank all users of the package who have reported or will later report ways in which the package could be improved. # References