intrinsicFRP: An R Package for Factor Model Asset Pricing

CRAN status License: GPL v3 R-CMD-check codecov

Author: Alberto Quaini

Description

The intrinsicFRP library implements functions designed for comprehensive evaluation and testing of linear asset pricing models, focusing on the estimation and assessment of factor risk premia, factor SDF coefficients, selection of “useful” risk factors (those displaying non-zero population correlation with test asset returns), examination of model misspecification, validation of model identification, and heteroskedasticity and autocorrelation robust covariance matrix estimation.

Linear Asset Pricing Models

Given \(T\) observations on \(N\) test asset excess returns collected in matrix \(R\in\mathbb R^{TxN}\), and \(T\) observations on \(K\) risk factors in \(F\in\mathbb R^{TxK}\), consider the linear asset pricing model: \[R - \iota_T * E[R_t]' = (F - \iota_T * E[F_t]')\beta + \varepsilon,\] where \(\iota_T\in\mathbb R^T\) is a unit vector, \(E[R_t]\in\mathbb R^N\) and \(E[F_t]\in\mathbb R^K\) are the vector of expected returns and factors, respectively, \(\beta\in\mathbb R^{NxK}\) is the matrix of regression coefficients, and \(\varepsilon\in\mathbb R^{TxN}\) is a matrix of residuals.

Factor risk premia

The vector of compensations for risk factor exposures \(\lambda\in\mathbb R^K\), i.e., factor risk premia, are determined by the second-stage regression of expected asset returns on regression coefficients: \[E[R_t] = \beta\lambda + \nu,\] where \(\nu\in\mathbb R^N\) is the vector of model’s pricing errors.

For estimating and testing factor risk premia, our toolkit incorporates the classic two-pass approach outlined by [@fama1973risk], the misspecification-robust methodology proposed by [@kan2013pricing], and the tradable and “Oracle” tradable approaches introduced by [@quaini2023tradable].

The classic two-pass factor risk premia are given by: \[\lambda=(\beta'\beta)^{-1}\beta'E[R_t].\] The misspecification-robust factor risk premia are given by: \[\lambda=(\beta'W\beta)^{-1}\beta'WE[R_t],\] where \(W\) is a symmetric and positive definite weighting matrix. The tradable factor risk premia are given by: \[\lambda=Cov[R_t,F_t]Var[R_t]^{-1}E[R_t].\] Details on the motivations and properties of these notions of factor risk premia are found in the aforementioned references.

Factor SDF coefficients

If the law of one price holds, then there is a Stochastic Discount Factor (SDF) \(M\) pricing all excess returns, i.e., \(E[M_tR_t]=0\). A (linear) factor model for the SDF is of the form \(M_t=1-\gamma'(F_t-E[F_t])\), where the mean is normalized to one since we work with excess returns. This package implements the estimators in [@fama1973risk], which propose to find the candidate factor SDF such that \[\gamma=\arg\min_{g\in\mathbb R^K}E[R_tM_t]'E[R_tM_t],\] which is given by: \[\gamma=(Cov[R_t,F_t]'Cov[R_t,F_t])^{-1}Cov[R_t,F_t]'E[R_t].\] Furthermore, it implements the estimators in [@gospodinov2014misspecification], which propose to find the candidate factor SDF that minimizes the pricing errors, under a weighted \(L_2-\)distance: \[\gamma=\arg\min_{g\in\mathbb R^K}E[R_tM_t]'Var[R]^{-1}E[R_tM_t],\] which is given by: \[\gamma=(Cov[R_t,F_t]'Var[R_t]^{-1}Cov[R_t,F_t])^{-1}Cov[R_t,F_t]'Var[R_t]^{-1}E[R_t].\]

Factor selection

Many problems arise when some factors are useless or weak, that is, when their population correlations with test asset excess returns are zero or near-zero, or more generally when the matrix collecting such population correlations is reduced-rank. In that case, the matrix of regression coefficients \(\beta\) is also reduced rank, thereby the vector of factor risk premia \(\lambda\) is not identified. This lack of identification has serious repercussions in its estimation and inference. It is therefore important to filter out the problematic factors from the model.

For selecting the set of useful risk factors, our toolkit implements the iterative factor screening procedure of [@gospodinov2014misspecification], the one-step Oracle selection procedure of [@quaini2023tradable], and the three-step procedure of [@feng2020taming].

\[\gamma=\arg\min_{g\in\mathbb R^K}E[R_tM_t]'Var[R]^{-1}E[R_tM_t].\] The screening procedure of [@gospodinov2014misspecification] is based on the result that the asymptotic distribution of misspecification-robust t-statistics of the factor SDF coefficients are \(\chi^2(1)\), Chi-squared with one degree of freedom, for the coefficients of useless factors, and they are stochastically dominated by a \(\chi^2(1)\) for the useful factors. With this motivation, they suggest a sequential removal of factors associated with the smallest insignificant t-statistics of a nonzero misspecification-robust SDF coefficient. Note that this procedure gives rise to a conservative factor selection, it requires a correction for the multiple testing problem, and it is only robust to the presence of useless factors, and not to weak factors or more general linear dependence structures in the population correlation matrix between test asset returns and risk factors.

The screening methodology in [@quaini2023tradable] removes the factors associated to a zero Oracle tradable factor risk premia estimates, which arise from the one-step closed-form estimator: \[\check{\lambda}_k = sign(\hat\lambda_k)\max\{|\hat{\lambda}_k|-\tau/||\rho_k||_2^2,0\},\] where \(\hat{\lambda}=\widehat{Cov}[R_t,F_t]\widehat{Var}[R_t]^{-1}\widehat{E}[R_t]\) is the sample tradable factor risk premia estimator, which simply replaces population moments with empirical moments, \(\tau\) is a penalty parameter that can be tuned via, e.g., cross validation, and \(\rho_k=\widehat{Corr}[F_{tk}, R_t]\) is the estimated correlation between factor \(k\) and test asset excess returns. Borrowing the terminology of [@fan2001variable], this procedure achieves the so-called “Oracle” variable selection property, i.e., it consistently selects the useful factors. More precisely, the probability that the factors selected by the estimator are indeed useful factors tends to 1 as the sample size tends to infinity.

[@feng2020taming] propose a three-steps testing procedure that evaluates the contribution to cross-sectional pricing of any new factors on top of a set of control factors. The third step is a OLS regression of average returns on the covariances between asset returns and the new factors, as well as the control factors selected in either one of the first two steps. The first stwo steps consists in (i) a Lasso regression of average returns on the ovariances between asset returns and all control factors and (ii) a Lasso regression of the covariances between asset returns and the new factors on the ovariances between asset returns and all control factors. The second selection aims at correcting for potential omitted variables in the first selection.

Model misspecification distance

For evaluating model misspecification, the toolkit implements point and confidence interval estimation for the HJ model misspecification distance formulated by [@kan2008specification], which is a modification of the prominent Hansen-Jagannathan misspecification distance of [@hansen1997assessing]. Such (squared) distance \(\delta_m^2\) is given by the minimum quadratic form in the model pricing errors, weighted by the inverse variance of test asset excess returns: \[\delta_m^2 = min_{\gamma\in\mathbb R^k} (E[R_t] - Cov[R_t,F_t] * \gamma)' * V[R_t]^{-1} * (E[R_t] - Cov[R_t,F_t] * \gamma).\] Clearly, computation of the confidence interval is obtained by means of an asymptotic analysis under potentially misspecified models, i.e., without assuming correct model specification.

Model identification test

Lastly, the functions for testing model identification are specialized versions of the rank tests proposed by [@kleibergen2006generalized] and [@chen2019improved]. These tests are specifically tailored to assess the regression coefficient matrix of test asset returns on risk factors.

Factor risk premia \(\lambda\), equivalently SDF coefficients \(\gamma\), are identified if the population correlation matrix between test asset excess returns and risk factors, or equivalently the regression coefficient matrix \(\beta\), is full column rank. The beta rank test of [@kleibergen2006generalized] is a procedure that tests the Null hypothesis \[H_0:rank(\beta)=q\] iteratively for \(q=0,\ldots,K-1\). It therefore requires a correction for multiple testing. Instead, the beta rank test of [@chen2019improved] is a bootstrap-based procedure designed to test directly the Null hypothesis \[H_0:rank(\beta)<= K-1.\]

HAC covariance estimation

For heteroskedasticity and autocorrelation robust covariance matrix estimation of a centred time series \(q_t\in\mathbb R^M\), with \(M\ge 1\), the package implements the [@newey1994automatic] estimator. This estimator aims at computing: \[\lim_{T\to\infty}\frac{1}{T}\sum_{t=1}^T\sum_{s=1}^TE[q_tq_s'],\] by estimating the cross-moments \(E[q_tq_s']\) up to an appropriate number of lags \(l=|t-s|\). This estimator is frequently used, e.g., for a robust estimation of the covariance matrix of the model’s pricing errors \(\varepsilon_t\).

Citation

To cite intrinsicFRP in publications, please use:

Quaini, A. (2023). intrinsicFRP: An R Package for Factor Model Asset Pricing. R package version 2.0.0. URL: https://CRAN.R-project.org/package=intrinsicFRP.

Installation

Building from CRAN

Package intrinsicFRP is on CRAN (The Comprehensive R Archive Network), hence the latest release can be easily installed from the R command line via

install.packages("intrinsicFRP")

Building from source

To install the latest (possibly unstable) development version from GitHub, you can pull this repository and install it from the R command line via

# if you already have package `devtools` installed, you can skip the next line
install.packages("devtools")
devtools::install_github("a91quaini/intrinsicFRP")

Package intrinsicFRP contains C++ code that needs to be compiled, so you may need to download and install the necessary tools for MacOS or the necessary tools for Windows.

Features

Functions

R package intrinsicFRP implements the following functions:

For usage details, type ?FunctionName in the R console, e.g.:

?TFRP

Data

The intrinsicFRP R package includes a dataset comprising following test asset excess returns and risk factors frequently used in the asset pricing literature:

This dataset was sourced from the Kenneth French data library and processed so that the observations on the test assets and the factors are not expressed in percentage points, and that returns on the test assets are in excess of the risk-free rate.

Examples

Real data example 1: estimation and inference of factor risk premia

Let us ompute various factor risk premia estimates and corresponding 95% confidence intervals for the Fama-French 6 factors and a (simulated) “useless” factor.

# if you already have package `stats` installed you can skip the next line
install.packages("stats")

# import package data on 6 risk factors and 42 test asset excess returns
# remove the first column containing the date
factors = intrinsicFRP::factors[,-1]
returns = intrinsicFRP::returns[,-1]
RF = intrinsicFRP::risk_free[,-1]

# simulate a useless factor and add it to the matrix of factors
set.seed(23)
factors = cbind(
  factors,
  stats::rnorm(n = nrow(factors), sd = stats::sd(factors[,3]))
)
colnames(factors) = c(colnames(intrinsicFRP::factors[,2:7]), "Useless")

# index set of specific factor models
# Fama-French 3 factor model
ff3 = 1:3
# Fama-French 6 factor model
ff6 = 1:6         # "Mkt-RF" "SMB" "HML" "RMW" "CMA" "Mom"
# model comprising the Fama-French 6 factors and the simulated useless factor
ff6usl = 1:7      # "Mkt-RF" "SMB" "HML" "RMW" "CMA" "Mom" "Useless"

# compute the factor SDF coefficients and their standard errors
# for the Fama-MacBeth two-pass procedure
fm_sdf = intrinsicFRP::SDFCoefficients(
  returns,
  factors[,ff6usl],
  misspecification_robust = FALSE,
  include_standard_errors = TRUE
)
# and for the misspecification-robust procedure of Gospodinov Kan and Robottu 
gkr_sdf = intrinsicFRP::SDFCoefficients(
  returns, factors[,ff6usl],
  include_standard_errors = TRUE
)

Visualization of the Fama MacBeth (FM) and the misspecification-robust (GKR) factor SDF coefficients estimators:

# compute tradable factor risk premia and their standard errors
tfrp = intrinsicFRP::TFRP(returns, factors[,ff6usl], include_standard_errors = TRUE)

# compute the GLS factor risk premia of Kan Robotti and Shanken (2013) and their
# standard errors
krs_frp = intrinsicFRP::FRP(returns, factors[,ff6usl], include_standard_errors = TRUE)

# set penalty parameters
penalty_parameters = seq(1e-4, 4e-3, length.out = 1000)

# compute Oracle tradable factor risk premia and their standard errors
# for low factor models, no need for the "one standard deviation" tuning rule
oracle_tfrp = intrinsicFRP::OracleTFRP(
  returns,
  factors[,ff6usl],
  penalty_parameters,
  include_standard_errors = TRUE,
  one_stddev_rule = FALSE
)

Tuning model score of the Oracle TFRP estimator:

Visualization of the Fama MacBeth (FM), misspecification-robust (KRS), tradable (TFRP) and Oracle TFRP (O-TFRP) factor risk premia estimates:

In this plot, we notice a number of features:

Real data example 2: factor screening

Let us now focus on the factor screening procedures under the Oracle TFRP estimator and the approach of [@gospodinov2014misspecification].

# recover the indices of the factors selected by the Oracle TFRP estimator
which(oracle_tfrp$risk_premia != 0)

# compute the GKR factor screening procedure
intrinsicFRP::GKRFactorScreening(returns, factors[,ff6])

The results are:

# factor indices of the factors selected by the Oracle TFRP estimator
[1] 1 2 3 4 5 6

# results of the GKR factor screening procedure
$sdf_coefficients
          [,1]
[1,]  6.401245
[2,]  8.597950
[3,] 12.483170

$standard_errors
         [,1]
[1,] 1.600189
[2,] 2.657617
[3,] 4.173861

$t_statistics
         [,1]
[1,] 4.000306
[2,] 3.235211
[3,] 2.990797

$selected_factor_indices
     [,1]
[1,]    1
[2,]    3
[3,]    6

While the Oracle TFRP only removes the simulated useless factor, the procedure by [@gospodinov2014misspecification] only retains the market factor.

Real data example 3: asset pricing models’ misspecification distance

Let us compute the HJ misspecification distance of the Fama-French 3 and 6 factor models.

# compute the HJ misspecification distance of the Fama-French 3 and 6 factor models
intrinsicFRP::HJMisspecificationDistance(returns, factors[,ff3])
intrinsicFRP::HJMisspecificationDistance(returns, factors[,ff6])

The result of the HJ misspecification distance is:

# HJ misspecification test p-value for the Fama-French 3 factor model
$squared_distance
[1] 0.2488529

$lower_bound
[1] 0.1559376

$upper_bound
[1] 0.3417683

# HJ misspecification test p-value for the Fama-French 6 factor model
$squared_distance
[1] 0.1619658

$lower_bound
[1] 0.08727944

$upper_bound
[1] 0.2366521

Since the p-value of both HJ misspecification tests is below the standard thresholds of \(10\%\), \(5\%\) and \(1\%\), we reject the Null that the Fama-French 3 and 6 factor models are correctly specified.

Real data example 4: testing asset pricing models’ identification

Let us compute the iterative [@kleibergen2006generalized] and the [@chen2019improved] beta rank identification tests for the Fama-French 6 factor model.

# compute identification tests of the Fama-French 6 factor model
intrinsicFRP::IterativeKleibergenPaap2006BetaRankTest(returns, factors[,ff6])
intrinsicFRP::ChenFang2019BetaRankTest(returns, factors[,ff6])

The results of the identification tests for the Fama-French 6 factor model are:

# output of the Iteraive Kleibergen Paap (2006) Beta Rank Test
$rank
[1] 6

$q
     [,1] [,2] [,3] [,4] [,5] [,6]
[1,]    0    1    2    3    4    5

$statistics
[1] 108162.2473  10291.4280   4710.5822   1015.9157    376.7892    112.8450

$pvalues
[1]  0.000000e+00  0.000000e+00  0.000000e+00 3.510246e-143  2.062222e-41
[6]  1.340701e-09

# p-value of the Chen Fang (2019) Beta Rank Test
$statistic
[1] 143.1936

$`p-value`
[1] 0

Since the largest p-value of the Iteraive [@kleibergen2006generalized] Beta Rank Test and the p-value of the [@chen2019improved] Beta Rank Test are below the standard thresholds of \(10\%\), \(5\%\) and \(1\%\), we reject the Null that the Fama-French 6 factor model is not identified.

For sanity check, let us compute the same identification tests for the unidentified model comprising the Fama-French 6 factors and a (simulated) “useless” factor.

# compute identification tests of unidentified factor model comprising the
# Fama-French 6 factors and the simulated useless factor
intrinsicFRP::IterativeKleibergenPaap2006BetaRankTest(returns, factors[,ff6usl])
intrinsicFRP::ChenFang2019BetaRankTest(returns, factors[,ff6usl])

The results are:

# output of the Iteraive Kleibergen Paap (2006) Beta Rank Test
$rank
[1] 6

$q
     [,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,]    0    1    2    3    4    5    6

$statistics
[1] 128024.51364  11595.52592   5480.53412   1127.71644    475.16366
[6]    190.55705     57.75449

$pvalues
[1]  0.000000e+00  0.000000e+00  0.000000e+00 7.265767e-147  1.346504e-45
[6]  3.134998e-12  1.216120e-02

# p-value of the Chen Fang (2019) Beta Rank Test
$statistic
[1] 41.87344

$`p-value`
[1] 0.188

Since the largest p-value of the Iteraive [@kleibergen2006generalized] Beta Rank Test and the p-value of the [@chen2019improved] Beta Rank Test are above the standard thresholds of \(10\%\), \(5\%\) and \(1\%\), we do not reject the Null that the Fama-French 6 factor model augmented with a simulated useless factor is not identified.

Dependencies

To optimize computational performance, all methods implemented in package intrinsicFRP are written in C++ and make use of the Armadillo [@sanderson2016armadillo] library for efficient linear algebra calculations. However, for user convenience, the interface of package intrinsicFRP is entirely implemented in R, with minimal dependencies, including:

Issues, bug reports, contributions, further help

You can raise issues, report bugs, seek for further help, or submit your contribution to the R package intrinsicFRP at the github repository a91quaini/intrinsicFRP.

For bug reports, you are kindly asked to make a small and self-contained program which exposes the bug.

References