Here we focus on the specifications of the GSPCR model. Three
arguments of the `cv_gspcr()`

should be specified
carefully:

- Association measure
- Fit measure
- Number of components

In this vignette we consider a simple scenario with a continuous
dependent variable and a set of continuous predictors. First, we load
the required packages and store the **example dataset**
`GSPCRexdata`

(see the helpfile for details
`?GSPCRexdata`

) in two separate objects:

```
# Load R packages
library(gspcr) # this package!
library(superpc) # alternative comparison package
library(patchwork) # combining ggplots
# Comment goal of code
X <- GSPCRexdata$X$cont
y <- GSPCRexdata$y$cont
```

As described in the introduction, `gspcr`

allows for the
specification of **different bivariate association
measures**. We can run `gspcr`

using as a threshold
type:

- the log-likelihoods of simple GLMs;
- the generalized \(R^2\);
- the normalized association measure used in the
`superpc`

R package.

Another important aspect to consider is the **number of
threshold values** that should be considered. This can be
specified with the `nthrs`

argument. Using the following code
we can compare the solution paths obtained by the different association
measures and values for a given number of PCs.

```
# Define a vector of threshold types
threshold_types <- c("LLS", "normalized", "PR2")
# Train the GSPCR model with the different values
out_trhs <- lapply(
X = threshold_types,
FUN = function(i) {
cv_gspcr(
dv = y,
ivs = X,
thrs = i, # threshold type
nthrs = 20, # number of threshold values
npcs_range = 1,
K = 10
)
}
)
# Plot them
plots <- lapply(out_trhs, function(i) {
plot(
x = i,
y = "F",
labels = FALSE, # We are using a single nPC, do not need the label
discretize = FALSE, # Makes X-axis more readable
print = FALSE
)
})
# Patchwork ggplots
plots[[1]] + plots[[2]] + plots[[3]]
```

As you can see, the solution paths are similar, although LLS tended to favor lower threshold values.

We can use **different cross-validation fit measures**.
See the help file for the list options (`?cv_gspcr`

).

```
# Measures
fit_measure_vec <- c("LRT", "PR2", "MSE", "F", "AIC", "BIC")
# Train the GSPCR model with the different values
out_fit_meas <- lapply(fit_measure_vec, function(i) {
cv_gspcr(
dv = y,
ivs = X,
fit_measure = i,
thrs = "normalized",
nthrs = 20,
npcs_range = 1,
K = 10
)
})
# Plot them
plots <- lapply(seq_along(fit_measure_vec), function(i) {
# Reverse y?
rev <- grepl("MSE|AIC|BIC", fit_measure_vec[i])
# Make plots
plot(
x = out_fit_meas[[i]],
y = fit_measure_vec[[i]],
labels = FALSE,
y_reverse = rev,
errorBars = FALSE,
discretize = FALSE,
print = FALSE
)
})
# Patchwork ggplots
(plots[[1]] + plots[[2]] + plots[[3]]) / (plots[[4]] + plots[[5]] + plots[[6]])
```

As you can see, the different fit measures return equivalent solution
paths. This is true for **any number of PCs**:

```
# Train the GSPCR model with the different values
out_fit_meas <- lapply(fit_measure_vec, function(i) {
cv_gspcr(
dv = y,
ivs = X,
fit_measure = i,
thrs = "normalized",
nthrs = 20,
npcs_range = 5,
K = 10
)
})
# Plot them
plots <- lapply(seq_along(fit_measure_vec), function(i) {
# Reverse y?
rev <- grepl("MSE|AIC|BIC", fit_measure_vec[i])
# Make plots
plot(
x = out_fit_meas[[i]],
y = fit_measure_vec[[i]],
labels = FALSE,
y_reverse = rev,
errorBars = FALSE,
discretize = FALSE,
print = FALSE
)
})
# Patchwork ggplots
(plots[[1]] + plots[[2]] + plots[[3]]) / (plots[[4]] + plots[[5]] + plots[[6]])
```

We can use cross-validation to **select the number of
PCs** as well. We can use the `npcs_range`

argument to
specify the range of the number of PCs to consider.

```
# Train the model
out_npcs <- cv_gspcr(
dv = y,
ivs = X,
npcs_range = c(2, 5, 10)
)
# Plot solution paths
plot(out_npcs)
```

Given the choice of 2, 5, or 10 PCs, we would use 2 PCs with the second threshold value.