**Title**: The T-Rex selector for fast high-dimensional
variable selection with FDR control

**Description**: It performs fast variable selection in
large-scale high-dimensional settings while controlling the false
discovery rate (FDR) at a user-defined target level. The package is
based on the T-Rex selector paper (available at
https://arxiv.org/abs/2110.06048).

**Note**: The T-Rex selector performs terminated-random
experiments (T-Rex) using the T-LARS algorithm (R package) and fuses
the selected active sets of all random experiments to obtain a final set
of selected variables. The T-Rex selector provably controls the false
discovery rate (FDR), i.e., the expected fraction of selected false
positives among all selected variables, at the user-defined target level
while maximizing the number of selected variables and, thereby,
achieving a high true positive rate (TPR) (i.e., power). The T-Rex
selector can be applied in various fields, such as genomics, financial
engineering, or any other field that requires a fast and FDR-controlling
variable/feature selection method for large-scale high-dimensional
settings.

In the following sections, we show you how to install and use the package.

Before installing the ‘TRexSelector’ package, you need to install the required ‘tlars’ package. You can install the ‘tlars’ package from CRAN or GitHub with:

```
# Install stable version from CRAN
install.packages("tlars")
# Install development version from GitHub
install.packages("devtools")
::install_github("jasinmachkour/tlars") devtools
```

Then, you can install the ‘TRexSelector’ package with:

`::install_github("jasinmachkour/TRexSelector") devtools`

You can open the help pages with:

```
library(TRexSelector)
help(package = "TRexSelector")
?trex
?random_experiments
?lm_dummy
?add_dummies
?add_dummies_GVS
?FDP
?TPP# etc.
```

To cite the package ‘TRexSelector’ in publications use:

`citation("TRexSelector")`

This section illustrates the basic usage of the ‘TRexSelector’ package to perform FDR-controlled variable selection in large-scale high-dimensional settings based on the T-Rex selector.

**First**, we generate a high-dimensional Gaussian data set with sparse support:

```
library(TRexSelector)
# Setup
<- 75 # number of observations
n <- 150 # number of variables
p <- 3 # number of true active variables
num_act <- c(rep(1, times = num_act), rep(0, times = p - num_act)) # coefficient vector
beta <- which(beta > 0) # indices of true active variables
true_actives <- p # number of dummy predictors (also referred to as dummies)
num_dummies
# Generate Gaussian data
set.seed(123)
<- matrix(stats::rnorm(n * p), nrow = n, ncol = p)
X <- X %*% beta + stats::rnorm(n) y
```

**Second**, we perform FDR-controlled variable selection using the T-Rex selector for a target FDR of 5%:

```
# Seed
set.seed(1234)
# Numerical zero
<- .Machine$double.eps
eps
# Variable selection via T-Rex
<- trex(X = X, y = y, tFDR = 0.05, verbose = FALSE)
res <- which(res$selected_var > eps)
selected_var paste0("True active variables: ", paste(as.character(true_actives), collapse = ", "))
#> [1] "True active variables: 1, 2, 3"
paste0("Selected variables: ", paste(as.character(selected_var), collapse = ", "))
#> [1] "Selected variables: 1, 2, 3"
```

So, for a preset target FDR of 5%, the T-Rex selector has selected all true active variables and there are no false positives in this example.

Note that users have to choose the target FDR according to the requirements of their specific applications.

For more information and some examples, please check the GitHub-vignette.

T-Rex paper: https://arxiv.org/abs/2110.06048

‘TRexSelector’ package: GitHub-TRexSelector.

README file: GitHub-readme.

Vignette: GitHub-vignette.

‘tlars’ package: CRAN-tlars and GitHub-tlars.