We start with half-hourly \(u_*\)-filtered and gap-filled NEE_f values. For simplicity this example uses data provided with the package and omits \(u_*\) threshold detection but rather applies a user-specified threshold.

With option `FillAll = TRUE`

, an uncertainty, specifically the standard deviation, of the flux is estimated for each record during gapfilling and stored in variable `NEE_uStar_fsd`

.

```
library(REddyProc)
library(dplyr)
EddyDataWithPosix <- Example_DETha98 %>%
filterLongRuns("NEE") %>%
fConvertTimeToPosix('YDH',Year = 'Year',Day = 'DoY', Hour = 'Hour')
EProc <- sEddyProc$new(
'DE-Tha', EddyDataWithPosix, c('NEE','Rg','Tair','VPD', 'Ustar'))
EProc$sMDSGapFillAfterUstar('NEE', uStarTh = 0.3, FillAll = TRUE)
results <- EProc$sExportResults()
summary(results$NEE_uStar_fsd)
```

```
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.03535 1.69960 2.32796 2.74246 3.44417 24.55782
```

We can inspect, how the uncertainty scales with the flux magnitude.

`plot( NEE_uStar_fsd ~ NEE_uStar_fall, slice(results, sample.int(nrow(results),400)))`

With neglecting correlations among records, the uncertainty of the mean annual flux is computed by adding the variances. The mean is computed by \(m = \sum{x_i}/n\). And hence its standard deviation by \(sd(m) = \sqrt{Var(m)}= \sqrt{\sum{Var(x_i)}/n^2} = \sqrt{n \bar{\sigma^2}/n^2} = \bar{\sigma^2}/\sqrt{n}\). This results in an approximate reduction of the average standard deviation \(\bar{\sigma^2}\) by \(\sqrt{n}\).

```
results %>% filter(NEE_uStar_fqc == 0) %>% summarise(
nRec = sum(is.finite(NEE_uStar_f))
, varSum = sum(NEE_uStar_fsd^2, na.rm = TRUE)
, seMean = sqrt(varSum) / nRec
, seMeanApprox = mean(NEE_uStar_fsd, na.rma = TRUE) / sqrt(nRec)
) %>% select(nRec, seMean, seMeanApprox)
```

```
## nRec seMean seMeanApprox
## 1 10901 0.02988839 0.02650074
```

Due to the large number of records, the estimated uncertainty is very low.

When observations are not independent of each other, the formulas now become \(Var(m) = s^2/n_{eff}\) where \(s^2 = \frac{n_{eff}}{n(n_{eff}-1)} \sum_{i=1}^n \sigma_i^2\), and with the number of effective observations \(n_{eff}\) decreasing with the autocorrelation among records (Bayley 1946, Zieba 2011).

The average standard deviation \(\sqrt{\bar{\sigma^2_i}}\) now approximately decreases only by about \(\sqrt{n_{eff}}\):

\[ Var(m) = \frac{s^2}{n_{eff}} = \frac{\frac{n_{eff}}{n(n_{eff}-1)} \sum_{i=1}^n \sigma_i^2}{n_{eff}} = \frac{1}{n(n_{eff}-1)} \sum_{i=1}^n \sigma_i^2 \\ = \frac{1}{n(n_{eff}-1)} n \bar{\sigma^2_i} = \frac{\bar{\sigma^2_i}}{(n_{eff}-1)} \]

First we need to quantify the error terms, i.e. model-data residuals. For all the records of good quality, we have an original measured value `NEE_uStar_orig`

and modelled value from MDS gapfilling, `NEE_uStar_fall`

. The residual of bad-quality data is set to missing.

```
results <- EProc$sExportResults() %>%
mutate(
resid = ifelse(NEE_uStar_fqc == 0, NEE_uStar_orig - NEE_uStar_fall, NA )
)
```

Now we can inspect the the autocorrelation of the errors.

`acf(results$resid, na.action = na.pass, main = "")`

The empiricical autocorrelation function shows strong positive autocorrelation in residuals up to a lag of 10 records.

Computation of effective number of observations is provided by function `computeEffectiveNumObs`

from package `lognorm`

based on the empirical autocorrelation function for given model-data residuals.

`library(lognorm)`

`## Warning: package 'lognorm' was built under R version 3.5.3`

```
autoCorr <- computeEffectiveAutoCorr(results$resid)
nEff <- computeEffectiveNumObs(results$resid, na.rm = TRUE)
c( nEff = nEff, nObs = sum(is.finite(results$resid)))
```

```
## nEff nObs
## 3870.283 10901.000
```

We see that the effective number of observations is only about a third of the number of observations.

Now we can use the formulas for the sum and the mean of correlated normally distributed variables to compute the uncertainty of the mean.

```
results %>% filter(NEE_uStar_fqc == 0) %>% summarise(
nRec = sum(is.finite(NEE_uStar_fsd))
, varMean = sum(NEE_uStar_fsd^2, na.rm = TRUE) / nRec / (!!nEff - 1)
, seMean = sqrt(varMean)
#, seMean2 = sqrt(mean(NEE_uStar_fsd^2, na.rm = TRUE)) / sqrt(!!nEff - 1)
, seMeanApprox = mean(NEE_uStar_fsd, na.rm = TRUE) / sqrt(!!nEff - 1)
) %>% select(seMean, seMeanApprox)
```

```
## seMean seMeanApprox
## 1 0.05016727 0.04448115
```

When aggregating daily respiration, the same principles hold.

However, when computing the number of effective observations, we recommend using the empirical autocorrelation function estimated on longer time series of residuals (`autoCorr`

computed above) in `computeEffectiveNumObs`

instead of estimating them from the residuals of each day.

```
results <- results %>% mutate(
DateTime = EddyDataWithPosix$DateTime
, DoY = as.POSIXlt(DateTime - 15*60)$yday # midnight belongs to the previous
)
```

```
aggDay <- results %>% group_by(DoY) %>%
summarise(
DateTime = first(DateTime)
, nRec = sum( NEE_uStar_fqc == 0, na.rm = TRUE)
, nEff = computeEffectiveNumObs(
resid, effAcf = !!autoCorr, na.rm = TRUE)
, NEE = mean(NEE_uStar_f, na.rm = TRUE)
, sdNEE = if (nEff == 0) NA_real_ else sqrt(
mean(NEE_uStar_fsd^2, na.rm = TRUE) / (nEff - 1))
, sdNEEuncorr = if (nRec == 0) NA_real_ else sqrt(
mean(NEE_uStar_fsd^2, na.rm = TRUE) / (nRec - 1))
)
aggDay
```

```
## # A tibble: 365 x 7
## DoY DateTime nRec nEff NEE sdNEE sdNEEuncorr
## <int> <dttm> <int> <dbl> <dbl> <dbl> <dbl>
## 1 0 1998-01-01 00:30:00 21 7.87 0.124 0.988 0.579
## 2 1 1998-01-02 00:30:00 7 3.66 0.00610 1.57 1.05
## 3 2 1998-01-03 00:30:00 0 0 0.0484 NA NA
## 4 3 1998-01-04 00:30:00 0 0 0.303 NA NA
## 5 4 1998-01-05 00:30:00 28 10.9 0.195 0.851 0.515
## 6 5 1998-01-06 00:30:00 48 18.0 0.926 0.615 0.370
## 7 6 1998-01-07 00:30:00 48 18.0 -0.337 0.566 0.340
## 8 7 1998-01-08 00:30:00 46 17.2 -0.139 0.541 0.325
## 9 8 1998-01-09 00:30:00 45 16.8 0.614 0.482 0.289
## 10 9 1998-01-10 00:30:00 36 13.5 0.242 0.646 0.386
## # ... with 355 more rows
```

The confidence bounds (+-1.96 stdDev) computed with accounting for correlations in this case are about twice the ones computed with neglecting correlations.