This package provides functions for preparing data, training models, testing models, and evaluating the outcomes. Its goal is to be an extendable application programming interface (API) to creating and testing predictions made through machine learning.

The package aims create and run tests for two different machine learning problems: classification and regression. Classification predicts a categorical outcome, whereas regression predicts a continuous outcome. Both problems require different approaches to data preparation, modeling and evaluation.

The main package functions fall into one of six categories:

- A function for creating a test. This sets up the test data, splitting it into training and holdout sets.
- Functions for running a test. These call all the functions necessary to execute a test.
- Functions for preparing data for a test. This includes problem-specific & method-specific data preparation
- Functions for training a model. These are wrapper functions to underlying machine learning functions from other packages
- Functions for testing a model. These are wrapper functions to [code] predict, using each machine learning function’s own implementation.
- Functions for evaluating test results. These functions compare observations with the predictions produced by testing the model.
- Functions for running multiple instances of a test. These functions run a test multiple times on different samples of the data

The required parameters to creating a test using `createtest`

are a dataset, a dependent variable, a problem and a method, a name, and finally an index of the rows to use as training set.

`createtest`

checks that its inputs are not missing and are in the right shape. If possible, it will try to fix issues. For example, classification requires the dependent variable to be a factor, but `createtest`

will convert other vector types. Furthermore, `createtest`

tries to provide informative messages when it cannot fix an issue.

It returns a list of the class determined by the *problem* parameter. This class determines, amongst others, which preparation, training, testing and evaluation functions will be called by `runtest`

.

The structure of the return value (the test) is as follows:

- dependent
- The fully specified column name of the dependent variable.
- problem
- The type of problem for the test (classification or regression). This is therefore the same as the test class
- data
- The list of train and holdout data.frames.
- name
- The name of the test. This could be used when printing the test results
- description
- A description of the test.
- method
- The method that should be used for solving the problem: a single-item list of class method. This is used to determine which functions are to be called for creating a model and making predictions with that model
- extra.args
- Extra arguments that should be passed to the runtest method executing the test. These are not used in the package, but exist for extendability.
- call
- The call to the
`createtest`

function. Again, this could be used when printing the test results, so it is obvious at a glance what the test was about.

The package provides the `runtest`

generic function, which allows execution to be redirected based on the class of the test. Although there are perhaps cases where this distinction is meaningful, the package only provides `runtest.default`

.

The default implementation first calls `prepare`

. It calls `train_model`

with the data from `prepare`

. It calls `make_predictions`

with the model from `train_model`

. It calls `evaluate`

with the predictions from `make_predictions`

and the holdout set from `prepare`

and returns the result.

The main preparation function is `prepare`

. It is a generic function, which allows for different implementations for the different class of problems. However, the package itself only provides `prepare.default`

, as it provides generic data preparation that is suitable to both regression and classification.

`prepare.default`

calls a helper function `prepare_data`

, which removes all missing values and provides informative messages of what happened. Furthermore, this function deals with an issue some algorithms can’t deal with: different levels for factors in the train and holdout sets. To deal with this, it calls `apply_levels`

. `apply_levels`

goes through the data column by column, and replaces the levels of the factors in the holdout sets by those of the same column in the training set.

Finally, `prepare.default`

calls `method_prepare`

. This generic functions makes it possible to have machine learning method-specific data preparation. These functions are called with the *method* attribute of the test object. Most methods do not need specific preparation, so `method_prepare.default`

simply returns `identity(data)`

. Random Forests do need method specific preparation, as they cannot deal with categorical predictors with more than 32 categories. Therefore `method_prepare.randomForest`

uses `group_levels`

to group infrequent factor levels to make sure no factor has more than 32 levels.

The `train_model`

functions trains or fits a model to the data through some form of machine learning or statistical technique. There exist two implementations: `train_model.classification`

for creating a classification model and `train_model.regression`

for fitting a regression model.

`train_model.classification`

wraps `classification_model`

, of which `classification_model.default`

calls a machine learning method with parameters `x`

, `y`

and `data`

. `classification_model`

exists so method-specific implementations can be made that need other parameters. For example, `rpart`

requires a `formula`

, so `classification_model.rpart`

makes one from the *dependent* variable of the test object.

`train_model.regression`

is the analogue of `train_model.classification`

for regression. It calls `regression_model`

with parameters `formula`

and `data`

. Again, methods can have their own implementations.

The `make_predictions`

functions serve as a wrapper to `predict`

. `make_predictions.default`

calls `predict`

with the holdout set as `newdata`

and the *model* produced by `train_model`

. If a machine learning methods requires different parameters, the solution is an implementation of `make_predictions`

for that method. For example `predict.rpart`

needs a `type`

attribute, so `make_predictions.rpart`

was developed.

The `evaluate`

functions evaluate the performance of a model. `evaluate.default`

extracts the dependent variable from the holdout set (the `observations`

) and the `predictions`

from `make_predictions`

and calls its helper method `evaluate_problem`

.

`evaluate_problem.classification`

calls `caret::confusionMatrix`

. `evaluate_problem.regression`

returns a `summary`

of `observations - predictions`

.

As a convenience, the package provides a function to run multiple instances of a test: `multitest`

. The parameters to this method are mostly just passed along to `createtest`

via `create_and_run_test`

. The other parameters govern the sampling method used, as the goal of multiple runs of a test usually is to reduce sample bias.

There are two options: cross-fold, which makes sure every row of data is used a holdout once; and random, which randomly selects a percentage of the data for training. These are implemented by `multisample.cross_fold`

and `multisample.random`

respectively.

```
library(crtests)
library(randomForest)
```

`## randomForest 4.6-12`

`## Type rfNews() to see new features/changes/bug fixes.`

`library(caret)`

`## Loading required package: lattice`

`## Loading required package: ggplot2`

```
##
## Attaching package: 'ggplot2'
```

```
## The following object is masked from 'package:randomForest':
##
## margin
```

```
data(iris)
# A classification test
test <- createtest(data = iris,
dependent = "Species",
problem = "classification",
method = "randomForest",
name = "An example classification test",
train_index = sample(150, 100)
)
runtest(test)
```

```
## Classification Test Evaluation: An example classification test
##
## Test attributes:
##
## Method : randomForest
## Dependent variable : Species
## Percentage held out : 33.33333% (50 rows)
## Total rows in data : 150
## Data transformation : None
##
## Performance measures & statistics:
##
## Accuracy : 0.92
## 95% CI : 0.8076572, 0.9777720
## No information rate : 0.42
## P-value (accuracy > NIR) : 1.296044e-13
## McNemar's test P-value : NaN
```

```
# A regression test
test <- createtest(data = iris,
dependent = "Sepal.Width",
problem = "regression",
method = "randomForest",
name = "An example regression test",
train_index = sample(150,100)
)
runtest(test)
```

```
## Regression Test Evaluation: An example regression test
##
## Test attributes:
##
## Method : randomForest
## Dependent variable : Sepal.Width
## Percentage held out : 33.33333% (50 rows)
## Total rows in data : 150
## Data transformation : None
##
## Performance measures & statistics:
##
## Mean error : 0.05295918
## Mean absolute error : 0.2716197
## Mean square error : 0.1139226
## Mean absolute percentage error : 9.173828
## Root mean square error : 0.3375242
```

```
library(crtests)
library(randomForest)
library(rpart)
library(caret)
library(stringr)
# A classification multitest
summary(
multitest(data = iris,
dependent = "Species",
problem = "classification",
method = "randomForest",
name = "An example classification multitest",
iterations = 10,
cross_validation = TRUE,
preserve_distribution = TRUE
)
)
```

```
## Classification Multiple Test Evaluation: An example classification multitest
##
## Test attributes:
##
## General:
##
## Method: randomForest
## Dependent variable: Species
## Data transformation: identity
## Sampling method: 10-fold cross validation with
## preservation of class
## distribution
##
## Summary of attributes per test iteration:
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## Rows held out 30 30 30 30 30 30
## Total rows in data 150 150 150 150 150 150
##
## Performance measures:
##
## Min. 1st Qu. Median Mean 3rd Qu.
## Accuracy 9.00e-01 9.33e-01 9.67e-01 9.57e-01 9.92e-01
## Lower bound of 95% CI 7.35e-01 7.79e-01 8.28e-01 8.16e-01 8.70e-01
## Upper bound of 95% CI 9.79e-01 9.92e-01 9.99e-01 9.94e-01 1.00e+00
## No information rate 3.33e-01 3.33e-01 3.33e-01 3.33e-01 3.33e-01
## P-value (accuracy > NIR) 4.86e-15 7.77e-14 2.96e-13 3.51e-11 8.75e-12
## McNemar's test P-value NA NA NA NaN NA
## Max.
## Accuracy 1.00e+00
## Lower bound of 95% CI 8.84e-01
## Upper bound of 95% CI 1.00e+00
## No information rate 3.33e-01
## P-value (accuracy > NIR) 1.66e-10
## McNemar's test P-value NA
```

```
# A regression multitest
summary(
multitest(data = iris,
dependent = "Sepal.Width",
problem = "regression",
method = "rpart",
name = "An example regression multitest",
iterations = 15,
cross_validation = FALSE
)
)
```

```
## Regression Multiple Test Evaluation: An example regression multitest
##
## Test attributes:
##
## General:
##
## Method: rpart
## Dependent variable: Sepal.Width
## Data transformation: identity
## Sampling method: 15 random samples
##
## Summary of attributes per test iteration:
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## Rows held out 10 10 10 10 10 10
## Total rows in data 150 150 150 150 150 150
##
## Performance measures:
##
## Min. 1st Qu. Median Mean 3rd Qu.
## Mean error -0.1656 -0.0771 -0.0246 -0.00589 0.0765
## Mean absolute error 0.1281 0.2143 0.2398 0.25130 0.3067
## Mean square error 0.0284 0.0639 0.0882 0.10140 0.1173
## Mean absolute percentage error 4.3920 7.2340 8.5740 8.42900 10.4700
## Root mean square error 0.1685 0.2526 0.2971 0.30900 0.3426
## Max.
## Mean error 0.222
## Mean absolute error 0.370
## Mean square error 0.224
## Mean absolute percentage error 12.810
## Root mean square error 0.473
```