1 Introduction

The dendroTools R package was developed in 2018 and is still being updated regularly and will continue to do so in the future. It provides dendroclimatological methods for analysing statistical relationships between tree rings and daily climate data. The most commonly used method is the Pearson correlation coefficient, but users can also use non-parametric correlations such as Kendall or Spearman correlations, and in the case of a multiproxy approach, linear regression can also be applied. Artificial neural networks (brnn) are also available for nonlinear analyses.

In this document I describe the basic principles behind the dendroTools R package and give some basic examples. All data included in the examples below is already included in the dendroTools R package. Please note that the examples presented here are less made computationally less intensive to accommodate the policy of CRAN. You are welcome to explore the full potential of my package by using the wide range of possible window widths.

2 Transformation and quick preview of daily data

One of the crucial step before using a dendroTools is to prepare climate data into a proper format. For daily_response() climate data must be in a format of 366 columns and n number of rows, which represent years, which are given as row names. A common format of daily data provided by many online sources is a table with two columns, where one column represents the date and the second is the value of the climate variable. To quickly transform such a format into a data frame with dimensions of 366 x n, dendroTools now offers the function data_transform(). The date can be in different formats, but it must be correctly specified with the argument date_format. For example, if the date is in the format “1988-01-30′′ (”year-month-day”), the argument date_format must be “ymd”.

When your data is in the proper format, the glimpse_daily_data() can be used for visual inspection of daily climate data. The main purpose here is to spot missing values and to evaluate the suitability of using specific daily data.

# Load the dendroTools and ggplot2 R packages
library(dendroTools)
library(ggplot2)

# 1 Load an example data (source: E-OBS)
data("swit272_daily_temperatures")
data("swit272_daily_precipitation")

# 2 Transform data into wide format
swit272_daily_temperatures <- data_transform(swit272_daily_temperatures, format = 'daily', date_format = 'ymd')
swit272_daily_precipitation <- data_transform(swit272_daily_precipitation, format = 'daily', date_format = 'ymd')

# 3 Glimpse daily data
glimpse_daily_data(env_data = swit272_daily_temperatures, na.color = "red") + 
  theme(legend.position = "bottom")
Figure 1: Glimpse of daily temperautres for swit272 site.

Figure 1: Glimpse of daily temperautres for swit272 site.

glimpse_daily_data(env_data = swit272_daily_precipitation, na.color = "red") + 
  theme(legend.position = "bottom")
Figure 2: Glimpse of daily precipitation for swit272 site.

Figure 2: Glimpse of daily precipitation for swit272 site.

3 The daily_response() in action

The daily_response() is the most commonly used function in the R package dendroTools, and works by sliding a moving window through the daily environmental (climate) data and computing statistical metrics using a tree-ring proxy. In dendroclimatology, such an analysis typically involves a site chronology that has been previously detrended, but a multiproxy approach involving multiple individual chronologies can also be used (see examples below). Possible metrics for the single-proxy approach include correlation coefficients, coefficient of determination (r-squared), and adjusted coefficient of determination (adjusted r-squared). In addition to linear regression, it is possible to use a nonlinear artificial neural network with a Bayesian regularisation training algorithm (brnn). In general, the user can use a fixed window or a progressive window to calculate the moving averages. To use a fixed window, choose its width by assigning an integer to the fixed_width argument. To use a so-called variable window, which includes many different windows, define the arguments lower_limit and upper_limit. In this case, all window widths between the lower and upper limits are taken into account. The window width is defined here as the number of days between the start and end days of a calculation. Thus, the window width represents a season of interest used in the calculations. All calculated statistical metrics (correlation coefficients, r-squared or adjusted r-squared) are stored in a matrix that is later used to interpret the results. Such interpretation usually involves identifying the optimal season in relation to tree-ring chronology or analysing temporal patterns of correlations between climate and growth. The so-called optimal season (also called optimal window or time window with the highest correlation value) is later extracted and used to evaluate the temporal stability correlations.

3.1 An example of analysing the relationship between Mean Vessel Area (MVA) tree-ring parameter and daily temperature data with fixed window approach

In this example, I analyse the relationship between the tree ring parameter Mean Early Wood Vessel Area (MVA) and daily temperature data (meteorological station Ljubljana, Slovenia) for a chosen window width of 60 days (argument fixed_width = 60). Note that the fixed_width argument overrides the upper_limit and lower_limit arguments when used.

Here I also demonstrate the usability of the row_names_subset argument, which I highly recommend using. In most real cases, tree-ring chronologies and climate data do not completely overlap - the chronologies are usually longer than the available climate data. So if you use the argument row_names_subset = TRUE , your tree ring chronology (response) and your climate data (env_data) will be automatically subdivided, keeping only the overlapping years. Therefore, it is very important that the years are correctly specified in the row names of both data inputs.

Another option I use here is to remove non-significant correlations. All non-significant correlations are removed by setting the argument remove_insignificant = TRUE while controlling the threshold for significance with the argument alpha.

The results are interpreted with generic summary() and visualized with the generic plot() functions. There are two types of plots available, 1) highlighted results for a window with the highest calculated value (type = 1), and 2) heatmap of all calculated values (type = 2).

# Load the dendroTools R package
library(dendroTools)

# Load data
data(data_MVA)
data(LJ_daily_temperatures)

# Example with fixed width
example_fixed_width <- daily_response(response = data_MVA, env_data = LJ_daily_temperatures,
                                   method = "cor", fixed_width = 60,
                                   row_names_subset = TRUE, remove_insignificant = TRUE,
                                   alpha = 0.05)
summary(example_fixed_width)
##                      Variable                                        Value
## 1                    approach                                        daily
## 2                      method            Correlation Coefficient (pearson)
## 3                      metric                                         <NA>
## 4              analysed_years                                  1940 - 2012
## 5   maximal_calculated_metric                                        0.767
## 6                    lower_ci                                         <NA>
## 7                    upper_ci                                         <NA>
## 8            reference_window Starting Day of Optimal Window Width: Day 73
## 9      analysed_previous_year                                        FALSE
## 10        optimal_time_window                              Mar 14 - May 12
## 11 optimal_time_window_length                                           60
plot(example_fixed_width, type = 1)
Figure 3: The MVA parameter contains the optimal temperature signal from March 14 (DOY 73) to May 12 (DOY 132).

Figure 3: The MVA parameter contains the optimal temperature signal from March 14 (DOY 73) to May 12 (DOY 132).

3.2 The variable window approach with the arguments subset_years and previous_year

In the exploration of climate-growth relationships, we are often interested in the effect of climate in previous year on tree-ring characteristics in current year. To calculate the correlations (or other statistical metrices) with previous growing seasons, use previous_year = TRUE. In addition to previous year effect, researchers are often interested in how these relationships differ in time, are they stable of vary? The key argument for such analysis is subset_years, which defines the subset of years to be analysed. In the following example, I will analyse the relationship between MVA and daily temperature data for two subperiods, 1940 – 1980 and 1981 – 2010. To make the example computationally less intensive, we will only use window with sizes of 55 and 65. As suggested before, the row_names_subset argument is set to TRUE.

# Load the dendroTools R package
library(dendroTools)

# Load the data
data(data_MVA)
data(LJ_daily_temperatures)

# Example for past and present
example_MVA_early <- daily_response(response = data_MVA, env_data = LJ_daily_temperatures,
                              method = "cor", lower_limit = 55, upper_limit = 65,
                              row_names_subset = TRUE, previous_year = TRUE,
                              remove_insignificant = TRUE, alpha = 0.05, 
                              subset_years = c(1941, 1980))

example_MVA_late <- daily_response(response = data_MVA, env_data = LJ_daily_temperatures,
                                   method = "cor", lower_limit = 55, upper_limit = 65,
                                   row_names_subset = TRUE, previous_year = TRUE,
                                   remove_insignificant = TRUE, alpha = 0.05, 
                                   subset_years = c(1981, 2010))
plot(example_MVA_early, type = 2)
Figure 4: The temporal correlations pattern for the period 1941-1980

Figure 4: The temporal correlations pattern for the period 1941-1980

plot(example_MVA_late, type = 2)
Figure 5: The temporal correlations pattern for the period 1981-2010

Figure 5: The temporal correlations pattern for the period 1981-2010

3.3 Analysis of climate-growth correlations only for a subset of selected time interval

If users wish to restrict the time interval of interest for calculating climate-growth correlations, the day_interval argument can be used for daily data and month_interval for monthly data in monthly_response(). For example, if you use daily_response() and want to calculate correlations only from the beginning of previous October to the current end of September, use the argument daily_interval =c(-274, 273). Note that a negative sign indicates previous_year doy, while a positive sign indicates the current year doy. In normal (non-leap ) years, October 1 represents doy 274, while September 30 represents doy 273. If you use monthly_response() and want to calculate correlations only from the beginning of the previous October to the current end of September, use the argument monthly_interval =c(-10, 9). Note that a negative sign indicates the month of the previous year, while a positive sign indicates the month of the current year.

# Load the dendroTools R package
library(dendroTools)

# Load data
data(data_TRW_1)
data(LJ_daily_temperatures)

# Example negative correlations
data(data_TRW_1)
example_restricted_interval <- daily_response(response = data_TRW_1, env_data = LJ_daily_temperatures,
                                   method = "cor", lower_limit = 55, upper_limit = 65,
                                   previous_year = FALSE, row_names_subset = TRUE, remove_insignificant = TRUE,
                                   alpha = 0.05, day_interval = c(150, 210))
## Warning in daily_response(response = data_TRW_1, env_data =
## LJ_daily_temperatures, : The upper_limit is outside your day_interval and
## therefore reduced to the maximum allowed: 61.
plot(example_restricted_interval, type = 2)
Figure 6: Climate-growth correlations calculated only for a subset of time window from DOY 150 to DOY 210

Figure 6: Climate-growth correlations calculated only for a subset of time window from DOY 150 to DOY 210

4 An example with multiple tree-ring proxies

As mentioned in the introduction, daily_response() allows multiproxy analysis where the response data frame consists of more than one tree-ring proxy. In such cases, linear or non-linear (brnn) models are fitted at each step and r-squared or adjusted r-squared is computed. However, users should choose multiple proxies judiciously and with caution, as there is nothing to prevent from including colinear variables. Using multiple proxies will result in higher explained variance, but at the expense of degrees of freedom. In these cases, you should also check the stability of the relationship over time and the result of cross-validation. If the metrics on the validation data are much lower than on the calibration data, there is a problem of overfitting and you should exclude some proxies and repeat the analysis. It is highly recommended that you use the adjusted r-squared metric when using the multiproxy approach.

# Load the dendroTools and brnn R package
library(dendroTools)

# Example of multiproxy analysis
data(example_proxies_1)
data(LJ_daily_temperatures)

# Summary of the example_proxies_1 data frame
summary(example_proxies_1)
##       MVA               O18                TRW        
##  Min.   :0.03812   Min.   :-2.12315   Min.   :0.7851  
##  1st Qu.:0.04204   1st Qu.:-0.54584   1st Qu.:0.8918  
##  Median :0.04482   Median :-0.04609   Median :1.0049  
##  Mean   :0.04532   Mean   : 0.04739   Mean   :1.0197  
##  3rd Qu.:0.04752   3rd Qu.: 0.48215   3rd Qu.:1.1159  
##  Max.   :0.05608   Max.   : 2.66724   Max.   :1.3782

There are three proxy variables: Tree-ring width (TRWi), stable oxygen isotopes (O18) and Mean Vessel Area (MVA).

cor(example_proxies_1)
##           MVA       O18       TRW
## MVA 1.0000000 0.2114934 0.0342514
## O18 0.2114934 1.0000000 0.2702639
## TRW 0.0342514 0.2702639 1.0000000

The correlation matrix shows only low correlations among the three proxies.

example_multiproxy <- daily_response(response = example_proxies_1, 
                                     env_data = LJ_daily_temperatures, 
                                     method = "lm", metric = "adj.r.squared", 
                                     lower_limit = 63, upper_limit = 67, 
                                     row_names_subset = TRUE, previous_year = FALSE, 
                                     remove_insignificant = TRUE, alpha = 0.05)
## Warning in daily_response(response = example_proxies_1, env_data =
## LJ_daily_temperatures, : Your response data frame has more than 1 column! Are
## you doing a multiproxy research? If so, OK. If not, check your response data
## frame!
plot(example_multiproxy, type = 2)
Figure 7: The temporal pattern of adjusted r-squared for the multiproxy example. The temporal pattern shows similar result than for the example with MVA only. Therefore, the highest percentage of (adjusted) explained variance is most likely related to MVA variable.

Figure 7: The temporal pattern of adjusted r-squared for the multiproxy example. The temporal pattern shows similar result than for the example with MVA only. Therefore, the highest percentage of (adjusted) explained variance is most likely related to MVA variable.

5 Principal component analysis in combination with the daily_response()

Principal Component Analysis (PCA) is typically used on tree ring data to reduce the full set of original tree ring chronologies to a more manageable set of transformed variables. These transformed variables, the set of principal component scores, are then used as predictors in climate reconstruction models. PCA is also used to amplify the common climate response at the regional scale within a group of tree-ring chronologies by concentrating the common signal in the components with the largest eigenvalues.

To use PCA regression within the daily_response(), set the PCA_transformation argument to TRUE. All variables in the response data frame are transformed using the PCA transformation. If the log_preprocess parameter is set to TRUE, the variables are transformed using a logarithmic transformation before being used in PCA. With the argument components_selection we specify how to select the PC scores, which will be used as predictors. There are three options: “automatic”, “manual” and “plot_selection”. If the argument is set to “automatic”, all PC scores with eigenvalues above 1 will be selected. This threshold can be changed by changing the eigenvalues_threshold argument. If the argument is set to “manual”, the user should specify the number of components with the N_components argument. If components_selection is set to “plot_selection”, a scree plot is displayed and the user has to manually enter the number of components to use as predictors. The latter seems to me to be the most reasonable choice.

In our example, we use PCA for 10 individual Mean Vessel Area (MVA) chronologies (example_proxies_individual). For the climate data, we use the data from LJ_daily_temperatures. Component selection is set to “manual”, N_components of the components to be used in the later analysis is set to 2. All window widths between 64 and 66 days are considered. It is important that the argument row_names_subset is set to TRUE. This argument automatically subsets both data frames (i.e. env_data and response) and only keeps matching years that are used for the calculations. To use this feature, the years must be specified as row names. All insignificant correlations are removed by setting the remove_insignificant argument to TRUE. The threshold for significance is controlled by the alpha argument.

# Load the dendroTools R package
library(dendroTools)

# Load data
data(example_proxies_individual)
data(LJ_daily_temperatures)

# Example PCA
example_PCA <- daily_response(response = example_proxies_individual, 
                              env_data = LJ_daily_temperatures, method = "lm", 
                              lower_limit = 64, upper_limit = 66, metric = "adj.r.squared",
                              row_names_subset = TRUE, remove_insignificant = TRUE,
                              alpha = 0.001, PCA_transformation = TRUE,
                              components_selection = "manual", N_components = 2)
## Warning in daily_response(response = example_proxies_individual, env_data =
## LJ_daily_temperatures, : Your response data frame has more than 1 column! Are
## you doing a multiproxy research? If so, OK. If not, check your response data
## frame!
# Get the summary statistics for the PCA
summary(example_PCA$PCA_output)
## Importance of components:
##                           Comp.1    Comp.2    Comp.3     Comp.4     Comp.5
## Standard deviation     1.8691406 1.5007125 1.1867201 0.87168877 0.79432916
## Proportion of Variance 0.3493686 0.2252138 0.1408305 0.07598413 0.06309588
## Cumulative Proportion  0.3493686 0.5745824 0.7154129 0.79139704 0.85449292
##                            Comp.6     Comp.7     Comp.8     Comp.9     Comp.10
## Standard deviation     0.68818504 0.61467322 0.57233447 0.49611738 0.173060035
## Proportion of Variance 0.04735987 0.03778232 0.03275667 0.02461325 0.002994978
## Cumulative Proportion  0.90185279 0.93963510 0.97239178 0.99700502 1.000000000
plot(example_PCA, type = 2)
Figure 8: The temporal pattern for the r-squared. The highest coefficients of determination were calculated for DOY around 90 with time span of two months.

Figure 8: The temporal pattern for the r-squared. The highest coefficients of determination were calculated for DOY around 90 with time span of two months.

6 An example of climate reconstruction

Reconstructions of past climatic conditions include reconstructions of past temperatures, precipitation, vegetation, water flows, sea surface temperatures, and other climatic or climate-related conditions. To reconstruct climate with the dendroTools, we first use daily_response() to find the optimal sequence of consecutive days that yields the highest metric, e.g. correlation coefficient. In this example, we use data_TRW and the daily temperatures of Kredarica (‘KRE_daily_temperatures’). The temporal stability of the statistical relationship is afterwards analysed with the “progressive” method using 3 splits (k = 3). Progressive testing involves splitting the data into k splits, calculating the metric for the first split, and then incrementally adding 1 split at a time and calculating the selected metric. The cross_validation_type argument is set to “randomised” so that the years are reshuffled before the cross-validation test.

# Load the dendroTools R package
library(dendroTools)

# Load data
data(data_TRW)
data(KRE_daily_temperatures)

example_reconstruction_lin <- daily_response(response = data_TRW, 
                                             env_data = KRE_daily_temperatures, 
                                             method = "lm", metric = "r.squared", 
                                             lower_limit = 38, upper_limit = 42,
                                             row_names_subset = TRUE, 
                                             temporal_stability_check = "progressive",
                                             cross_validation_type = "randomized", k = 3)
plot(example_reconstruction_lin)
Figure 9: The highest r squared was calculated for the period from May 15 to June 27. The aggregated (averaged) daily data for this period is saved in a data frame $optimized_return.

Figure 9: The highest r squared was calculated for the period from May 15 to June 27. The aggregated (averaged) daily data for this period is saved in a data frame $optimized_return.

Before reconstructing the May 15 to June 27 mean temperature, we should check temporal stability, cross validation and transfer function.

example_reconstruction_lin$temporal_stability
##        Period r.squared p value
## 1 1955 - 1963     0.108      NA
## 2 1955 - 1972     0.217      NA
## 3 1955 - 1981     0.360      NA
example_reconstruction_lin$cross_validation
##   CV      Period       cor      RMSE      RRSE         d         RE         CE
## 1  1 Calibration 0.4771488 0.7546864 0.8788225 0.5824031         NA         NA
## 2  1  Validation 0.7160956 0.9028707 0.9435670 0.6864304  0.4760769  0.1096813
## 3  2 Calibration 0.6196679 0.7261046 0.7848641 0.7351945         NA         NA
## 4  2  Validation 0.5995180 0.8720931 0.8395781 0.6176642  0.3203036  0.2951085
## 5  3 Calibration 0.6853016 0.7667255 0.7282593 0.7964675         NA         NA
## 6  3  Validation 0.6138851 0.9228104 1.5372829 0.6057758 -0.1817803 -1.3632386
##            DE
## 1          NA
## 2  0.08767291
## 3          NA
## 4  0.26883964
## 5          NA
## 6 -1.70839694
example_reconstruction_lin$transfer_function
Figure 10: Linear transfer function

Figure 10: Linear transfer function

A sample code for climate reconstruction with the lm method:

linear_model <- lm(Optimized_return ~ TRW, data = example_reconstruction_lin$optimized_return)
reconstruction <- data.frame(predictions = predict(linear_model, newdata = data_TRW))
plot(row.names(data_TRW), reconstruction$predictions, type = "l", xlab = "Year", ylab = "Mean temperature May 15 - Jun 27 [ºC]")
Figure 11: The reconstructed average temperature May 15 - June 27 with linear model

Figure 11: The reconstructed average temperature May 15 - June 27 with linear model

To reconstruct climate using the nonlinear brnn model, use the argument method = “brnn” (see the examples in published paper in Dendrochronologia journal).

7 The calculation of partial correlations with daily_response_seascorr()

A partial correlation coefficient describes the strength of the linear relationship between two variables, holding constant a number of other variables. It is often used in dendroclimatological investigations to analyse the effect of temperature on a tree-ring parameter while at the same time controlling for the precipitation effect, or vice versa. Partial correlations in dendroTools can be calculated using daily_response_seascorr() and monthly_response_seascorr(). To analyse partial correlations, three data frames are needed: 1) a tree-ring proxy, 2) primary climate data and 3) secondary climate data for control. The tree-ring proxy must be organized as a data frame with one column representing proxy values, while years are indicated as row names. Primary climate data is assigned to the env_data_primary argument, while secondary climate data is assigned to env_data_control.

# Example with precipitation and temperatures
partial_cor <- daily_response_seascorr(response = swit272,
                    env_data_primary = swit272_daily_temperatures,
                    env_data_control = swit272_daily_precipitation,
                    row_names_subset = TRUE, fixed_width = 45,
                    remove_insignificant = TRUE, alpha = 0.05, 
                    aggregate_function_env_data_primary = 'mean',
                    aggregate_function_env_data_control = 'sum',
                    pcor_method = "spearman",
                    boot = FALSE, reference_window = "end")
summary(partial_cor)
##                      Variable                                       Value
## 1                    approach                                       daily
## 2                      method  Partial Correlation Coefficient (spearman)
## 3                      metric                                        <NA>
## 4              analysed_years                                 1950 - 2011
## 5   maximal_calculated_metric                                       0.363
## 6                    lower_ci                                        <NA>
## 7                    upper_ci                                        <NA>
## 8            reference_window Ending Day of Optimal Window Width: Day 231
## 9      analysed_previous_year                                       FALSE
## 10        optimal_time_window                             Jul 06 - Aug 19
## 11 optimal_time_window_length                                          45
plot(partial_cor, type = 2)
Figure 12: Temporal partern of partial climate-growth correlations calculated with daily_response_seascorr()

Figure 12: Temporal partern of partial climate-growth correlations calculated with daily_response_seascorr()