The basic work-flow behind the PACE approach for sparse functional data is as follows (see eg. [@Yao05; @Liu09] for more information):

- Calculate the smoothed mean \(\hat{\mu}\) (using local linear smoothing) aggregating all the available readings together.
- Calculate for each curve seperately its own raw covariance and then aggregate all these raw covariances to generate the sample raw covariance.
- Use the off-diagonal elements of the sample raw covariance to estimate the smooth covariance.
- Perform eigenanalysis on the smoothed covariance to obtain the estimated eigenfunctions \(\hat{\phi}\) and eigenvalues \(\hat{\lambda}\), then project that smoothed covariance on a positive semi-definite surface [@Hall2008].
- Use Conditional Expectation (PACE step) to estimate the corresponding scores \(\hat{\xi}\). ie. \newline \(\hat{\xi}_{ik} = \hat{E}[\hat{\xi}_{ik}|Y_i] = \hat{\lambda}_k \hat{\phi}_{ik}^T \Sigma_{Y_i}^{-1}(Y_i-\hat{\mu}_i)\).

As a working assumption a dataset is treated as sparse if it has on average less than 20, potentially irregularly sampled, measurements per subject. A user can manually change the automatically determined ` dataType` if that is necessary.
For densely observed functional data simplified procedures are available to obtain the eigencomponents and associated functional principal components scores (see eg. [@Castro86] for more information). In particular in this case we:

- Calculate the cross-sectional mean \(\hat{\mu}\).
- Calculate the cross-sectional covariance surface (which is guaranteed to be positive semi-definite).
- Perform eigenanalysis on the covariance to estimate the eigenfunctions \(\hat{\phi}\) and eigenvalues \(\hat{\lambda}\).
- Use numerical integration to estimate the corresponding scores \(\hat{\xi}\). ie. \newline \(\hat{\xi}_{ik} = \int_0^T [ y(t) - \hat{\mu}(t)] \phi_i(t) dt\)

In the case of sparse FPCA the most computational intensive part is the smoothing of the sample's raw covariance function. For this, we employ a local weighted bilinear smoother.

A sibling MATLAB package for ` fdapace` can be found in here.

The simplest scenario is that one has two lists ` yList` and

`tList`

`yList`

`tList`

```
FPCAobj <- FPCA(Ly=yList, Lt=tList)
```

The generated ` FPCAobj` will contain all the basic information regarding the desired FPCA.

```
library(fdapace)
# Set the number of subjects (N) and the
# number of measurements per subjects (M)
N <- 200;
M <- 100;
set.seed(123)
# Define the continuum
s <- seq(0,10,length.out = M)
# Define the mean and 2 eigencomponents
meanFunct <- function(s) s + 10*exp(-(s-5)^2)
eigFunct1 <- function(s) +cos(2*s*pi/10) / sqrt(5)
eigFunct2 <- function(s) -sin(2*s*pi/10) / sqrt(5)
# Create FPC scores
Ksi <- matrix(rnorm(N*2), ncol=2);
Ksi <- apply(Ksi, 2, scale)
Ksi <- Ksi %*% diag(c(5,2))
# Create Y_true
yTrue <- Ksi %*% t(matrix(c(eigFunct1(s),eigFunct2(s)), ncol=2)) + t(matrix(rep(meanFunct(s),N), nrow=M))
```

```
L3 <- MakeFPCAInputs(IDs = rep(1:N, each=M), tVec=rep(s,N), t(yTrue))
FPCAdense <- FPCA(L3$Ly, L3$Lt)
# Plot the FPCA object
plot(FPCAdense)
```

```
# Find the standard deviation associated with each component
sqrt(FPCAdense$lambda)
```

```
## [1] 5.050606 1.999073
```

```
# Create sparse sample
# Each subject has one to five readings (median: 3)
set.seed(123)
ySparse <- Sparsify(yTrue, s, sparsity = c(1:5))
# Give your sample a bit of noise
ySparse$yNoisy <- lapply(ySparse$Ly, function(x) x + 0.5*rnorm(length(x)))
# Do FPCA on this sparse sample
# Notice that sparse FPCA will smooth the data internally (Yao et al., 2005)
# Smoothing is the main computational cost behind sparse FPCA
FPCAsparse <- FPCA(ySparse$yNoisy, ySparse$Lt, list(plot = TRUE))
```

` FPCA` calculates the bandwidth utilized by each smoother using generalised cross-validation or \(k\)-fold cross-validation automatically. Dense data are not smoothed by default. The argument

`methodMuCovEst`

`smooth`

`cross-sectional`

The bandwidth used for estimating the smoothed mean and the smoothed covariance are available under ` ...bwMu` and

`bwCov`

```
FPCAsparseMuBW5 <- FPCA(ySparse$yNoisy, ySparse$Lt, optns= list(userBwMu = 5))
```

Visualising the fitted trajectories is a good way to see if the new bandwidth made any sense:

```
par(mfrow=c(1,2))
CreatePathPlot( FPCAsparse, subset = 1:3, main = "GCV bandwidth", pch = 16)
CreatePathPlot( FPCAsparseMuBW5, subset = 1:3, main = "User-defined bandwidth", pch = 16)
```

` FPCA` uses a Gaussian kernel when smoothing sparse functional data; other kernel types (eg. Epanechnikov/

`epan`

`?FPCA`

`$optns\$kernel`

`gauss`

`rect`

```
FPCAsparseRect <- FPCA(ySparse$yNoisy, ySparse$Lt, optns = list(kernel = 'rect')) # Use rectangular kernel
```

` FPCA` returns automatically the smallest number of components required to explain 99.99% of a sample's variance. Using the function

`selectK`

```
SelectK( FPCAsparse, criterion = 'FVE', FVEthreshold = 0.95) # K = 2
```

```
## $K
## [1] 2
##
## $criterion
## [1] 95.68368
```

```
SelectK( FPCAsparse, criterion = 'AIC') # K = 2
```

```
## $K
## [1] 2
##
## $criterion
## [1] 1004.783
```

When working with functional data (usually not very sparse) the estimation of derivatives is often of interest. Using \texttt{fitted.FPCA} one can directly obtain numerical derivatives by defining the appropriate order ` p`;

`fdapace`

`p =1`

`2`

`?fitted.FPCA`

`FPCAder`

`FPCA`

```
fittedCurvesP0 <- fitted(FPCAsparse) # equivalent: fitted(FPCAsparse, derOptns=list(p = 0));
# Get first order derivatives of fitted curves, smooth using Epanechnikov kernel
fittedCurcesP1 <- fitted(FPCAsparse, derOptns=list(p = 1, kernelType = 'epan'))
```

```
## Warning in fitted.FPCA(FPCAsparse, derOptns = list(p = 1, kernelType = "epan")): Potentially you use too many components to estimate derivatives.
## Consider using SelectK() to find a more informed estimate for 'K'.
```

We use the ` medfly25` dataset that this available with

`fdapace`

`FPCA`

`medfly25`

```
# load data
data(medfly25)
# Turn the original data into a list of paired amplitude and timing lists
Flies <- MakeFPCAInputs(medfly25$ID, medfly25$Days, medfly25$nEggs)
fpcaObjFlies <- FPCA(Flies$Ly, Flies$Lt, list(plot = TRUE, methodMuCovEst = 'smooth', userBwCov = 2))
```

Based on the scree-plot we see that the first three components appear to encapsulate most of the relevant variation. The number of eigencomponents to reach a 99.99% FVE is \(11\) but just \(3\) eigencomponents are enough to reach a 95.0%. We can easily inspect the following visually, using the ` CreatePathPlot` command.

```
par(mfrow=c(1,2))
CreatePathPlot(fpcaObjFlies, subset = c(3,5,135), main = 'K = 11', pch = 4); grid()
CreatePathPlot(fpcaObjFlies, subset = c(3,5,135), K = 3, main = 'K = 3', pch = 4) ; grid()
```

One can perform outlier detection [@Febrero2007] as well as visualize data using a functional box-plot. To achieve these tasks one can use the functions ` CreateOutliersPlot` nd

`CreateFuncBoxPlot`

`KDE`

`bagplot`

`CreateOutliersPlot`

```
par(mfrow=c(1,2))
CreateOutliersPlot(fpcaObjFlies, optns = list(K = 3, variant = 'KDE'))
CreateFuncBoxPlot(fpcaObjFlies, xlab = 'Days', ylab = '# of eggs laid', optns = list(K =3, variant='bagplot'))
```

Functional data lend themselves naturally to questions about their rate of change; their derivatives. As mentioned previously using ` fdapace` one can generate estimates of the sample's derivatives (

`fitted.FPCA`

`FPCAder`

`derOptns`

```
par(mfrow=c(1,2))
CreatePathPlot(fpcaObjFlies, subset = c(3,5,135), K = 3, main = 'K = 3', showObs = FALSE) ; grid()
CreatePathPlot(fpcaObjFlies, subset = c(3,5,135), K = 3, main = 'K = 3', showObs = FALSE, derOptns = list(p = 1, bw = 1.01 , kernelType = 'epan') ) ; grid()
```

We note that if finite support kernel types are used (eg. ` rect` or

`epan`

`NaN`

`nRegGrid`

`kernelType`

`bw`

`CreateBWPlot`

`derOptns`

`bw`

```
fpcaObjFlies79 <- FPCA(Flies$Ly, Flies$Lt, list(nRegGrid = 79, methodMuCovEst = 'smooth', userBwCov = 2)) # Use 79 equidistant points for the support
CreateBWPlot(fpcaObjFlies79 , derOptns = list(p = 1, bw = 2.0 , kernelType = 'rect') )
```

As the ` medfly` sample is dense we can immediately use standard multivaritte clustering functionality to identify potential subgroups within it; the function

`FClust`

`fdapace`

`FClust`

`Rmixmod`

`medfly`

```
A <- FClust(Flies$Ly, Flies$Lt, optnsFPCA = list(methodMuCovEst = 'smooth', userBwCov = 2, FVEthreshold = 0.90), k = 2)
# The Neg-Entropy Criterion can be found as: A$clusterObj@bestResult@criterionValue
CreatePathPlot( fpcaObjFlies, K=2, showObs=FALSE, lty=1, col= A$cluster, xlab = 'Days', ylab = '# of eggs laid')
grid()
```