Zé Vinícius and Daniel P. Palomar

The Hong Kong University of Science and Technology (HKUST)

The Hong Kong University of Science and Technology (HKUST)

- Quick Start
- What is a Risk Parity Portfolio?
- Solving the Risk Parity Portfolio (RPP)
- Using the Package
**riskParityPortfolio** - Modern Risk Parity Portfolio
- A pratical example using FAANG price data
- Comparison with other Packages
- Appendix I - Risk concentration formulations
- Appendix II - Numerical algorithms for the risk parity portfolio
- Appendix III - Computational time
- References

This vignette illustrates the design of risk parity portfolios, which are widely used by practitioners in the financial industry, with the package

`riskParityPortfolio`

, giving a description of the algorithms and comparing the performance against existing packages.

Let’s start by loading the package:

The simplest use is for the vanilla risk parity portfolio:

```
rpp_vanilla <- riskParityPortfolio(Sigma)
names(rpp_vanilla)
#> [1] "w" "relative_risk_contribution" "is_feasible"
round(rpp_vanilla$w, digits = 3)
#> AAPL AMD ADI ABBV AEZS A APD AA CF
#> 0.157 0.068 0.125 0.133 0.045 0.129 0.158 0.085 0.101
```

To obtain the naive diagonal solution, also known as inverse volatility portfolio, make use of the `formulation`

argument:

For a more realistic formulation including the expected return in the objective and box constraints: \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \sum_{i,j=1}^{N}\left(\dfrac{w_{i}\left(\boldsymbol{\Sigma}\mathbf{w}\right)_{i}}{b_i}-\dfrac{w_{j}\left(\boldsymbol{\Sigma}\mathbf{w}\right)_{j}}{b_j}\right)^{2} \color{blue}{- \lambda \;\mathbf{w}^T\boldsymbol{\mu}}\\ \textsf{subject to} & \mathbf{w} \ge \mathbf{0}, \quad\mathbf{1}^T\mathbf{w}=1, \quad\color{blue}{\mathbf{l}\le\mathbf{w}\le\mathbf{u}}. \end{array}\]

```
rpp_mu <- riskParityPortfolio(Sigma, formulation = "rc-over-b-double-index",
mu = mu, lmd_mu = 1e-3, # for expected return term
w_ub = 0.16) # for box upper bound constraints
```

To plot and compare the results:

Let us denote by \(\mathbf{r}_{t}\) the vector of the **returns** of \(N\) assets at time \(t\) and suppose they follow an i.i.d. distribution (which is not a totally accurate assumption, but it is widely adopted, nonetheless) with mean \(\boldsymbol{\mu}\) and covariance matrix \(\boldsymbol{\Sigma}\).

The portfolio vector \(\mathbf{w}\) denotes the normalized dollar weights allocated to the \(N\) assets, such that \(\mathbf{1}^{T}\mathbf{w}=1\), and the **portfolio return** is then \(r_{t}^{\sf portf} = \mathbf{w}^{T}\mathbf{r}_{t}\), with expected return \(\mathbf{w}^{T}\boldsymbol{\mu}\) and variance \(\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}\).

In 1952, Markowitz proposed in his seminal paper [1] to find a trade-off between the portfolio expected return and its risk measured by the variance: \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{maximize}} & \mathbf{w}^{T}\boldsymbol{\mu}-\lambda\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}\\ \textsf{subject to} & \mathbf{w} \ge \mathbf{0}, \quad\mathbf{1}^T\mathbf{w}=1, \end{array}\] where \(\lambda\) is a parameter that controls how risk-averse the investor is.

Markowitz’s portfolio has been heavily critized for over half a century and has never been fully embraced by practitioners, among many reasons because:

- it only considers the risk of the portfolio as a whole and ignores the risk diversification (i.e., concentrates too much risk in few assets, which was observed in the 2008 financial crisis)
- it is highly sensitive to the estimation errors in the parameters (i.e., small estimation errors in the parameters may change the designed portfolio drastically).

Although portfolio management did not change much during the 40 years after the seminal works of Markowitz and Sharpe, the development of risk budgeting techniques marked an important milestone in deepening the relationship between risk and asset management.

Since the global financial crisis in 2008, **risk management** has particularly become more important than performance management in portfolio optimization. Indeed, risk parity became a popular financial model after the global financial crisis in 2008 [2], [3].

The alternative **risk parity portfolio design** has been receiving significant attention from both the theoretical and practical sides [4], [5] because it: (1) diversifies the risk, instead of the capital, among the assets and (2) is less sensitive to parameter estimation errors.

Nowadays, pension funds and institutional investors are using this approach in the development of smart indexing and the redefinition of long-term investment policies.

**Risk parity** is an approach to portfolio management that focuses on **allocation of risk** rather than allocation of capital. The risk parity approach asserts that when asset allocations are adjusted to the same risk level, the portfolio can achieve a higher Sharpe ratio and can be more resistant to market downturns.

While the minimum variance portfolio tries to minimize the variance (with the disadvantage that a few assets may be the ones contributing most to the risk), the risk parity portfolio tries to constrain **each asset** (or asset class, such as bonds, stocks, real estate, etc.) to **contribute equally to the portfolio overall volatility**.

The term “risk parity” was coined by Edward Qian from PanAgora Asset Management [2] and was then adopted by the asset management industry. Some of its theoretical components were developed in the 1950s and 1960s, but the first risk parity fund, called the “All Weather” fund, was pioneered by Bridgewater Associates LP in 1996. The interest in the risk parity approach has increased since the financial crisis in the late 2000s as the risk parity approach fared better than portfolios designed in traditional fashions.

Some portfolio managers have expressed skepticism with risk parity, while others point to its performance during the financial crisis of 2007-2008 as an indication of its potential success.

The idea of the risk parity portfolio (RPP), aka equal risk portfolio (ERP), is to “equalize” the risk so that the risk contribution of every asset is equal, rather than simply having an equal capital allocation like the equally weighted portfolio (EWP):

From Euler’s theorem, the volatility of the portfolio \(\sigma\left(\mathbf{w}\right)=\sqrt{\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}}\) can be decomposed as \[\sigma\left(\mathbf{w}\right)=\sum_{i=1}^{N}w_i\frac{\partial\sigma}{\partial w_i} = \sum_{i=1}^N\frac{w_i\left(\boldsymbol{\Sigma}\mathbf{w}\right)_{i}}{\sqrt{\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}}}.\]

The **risk contribution (RC)** from the \(i\)th asset to the total risk \(\sigma(\mathbf{w})\) is defined as \[{\sf RC}_i =\frac{w_i\left(\boldsymbol{\Sigma}\mathbf{w}\right)_i}{\sqrt{\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}}}\] which satisfies \(\sum_{i=1}^{N}{\sf RC}_i=\sigma\left(\mathbf{w}\right)\).

The **relative risk contribution (RRC)** is a normalized version: \[{\sf RRC}_i = \frac{w_i\left(\boldsymbol{\Sigma}\mathbf{w}\right)_i}{\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}}\] so that \(\sum_{i=1}^{N}{\sf RRC}_i=1\).

The **risk parity portfolio (RPP)** attemps to “equalize” the risk contributions: \[{\sf RC}_i = \frac{1}{N}\sigma(\mathbf{w})\quad\text{or}\quad{\sf RRC}_i = \frac{1}{N}.\]

More generally, the **risk budgeting portfolio (RBP)** attemps to allocate the risk according to the risk profile determined by the weights \(\mathbf{b}\) (with \(\mathbf{1}^T\mathbf{b}=1\) and \(\mathbf{b}\ge \mathbf{0}\)): \[{\sf RC}_i = b_i \sigma(\mathbf{w})\quad\text{or}\quad{\sf RRC}_i = b_i.\]

In practice, one can express the condition \({\sf RC}_i = \frac{1}{N}\sigma(\mathbf{w})\) in different equivalent ways such as \[w_i(\Sigma \mathbf{w})_{i} = w_j(\Sigma \mathbf{w})_{j}, \quad\forall i, j.\] The budget condition \({\sf RC}_i = b_i \sigma(\mathbf{w})\) can also be expressed as \[w_i (\Sigma \mathbf{w})_i = b_i \mathbf{w}^{T}\Sigma\mathbf{w}, \quad\forall i.\]

Assuming that the assets are uncorrelated, i.e., that \(\boldsymbol{\Sigma}\) is diagonal, and simply using the volatilities \(\boldsymbol{\sigma} = \sqrt{{\sf diag(\boldsymbol{\Sigma})}}\), one obtains \[\mathbf{w} = \frac{\boldsymbol{\sigma}^{-1}}{\mathbf{1}^T\boldsymbol{\sigma}^{-1}}\] or, more generally, \[\mathbf{w} = \frac{\sqrt{\mathbf{b}}\odot\boldsymbol{\sigma}^{-1}}{\mathbf{1}^T\left(\sqrt{\mathbf{b}}\odot\boldsymbol{\sigma}^{-1}\right)}.\] However, for non-diagonal \(\boldsymbol{\Sigma}\) or with other additional constraints or objective function terms, a closed-form solution does not exist and some optimization procedures have to be constructed. The previous diagonal solution can always be used and is called *naive risk budgeting portfolio*.

Suppose we only have the constraints \(\mathbf{1}^T\mathbf{w}=1\) and \(\mathbf{w} \ge \mathbf{0}\). Then, after the change of variable \(\mathbf{x}=\mathbf{w}/\sqrt{\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}}\), the equations \(w_i (\Sigma \mathbf{w})_i = b_i \mathbf{w}^{T}\Sigma\mathbf{w}\) become \(x_i\left(\boldsymbol{\Sigma}\mathbf{x}\right)_i = b_i\) or, more compactly in vector form, as \[\boldsymbol{\Sigma}\mathbf{x} = \mathbf{b}/\mathbf{x}\] with \(\mathbf{x} \ge \mathbf{0}\) and we can always recover the portfolio by normalizing: \(\mathbf{w} = \mathbf{x}/(\mathbf{1}^T\mathbf{x})\).

At this point, one could use a nonlinear multivariate root finder for \(\boldsymbol{\Sigma}\mathbf{x} = \mathbf{b}/\mathbf{x}\). For example, in R we can use the package rootSolve.

With the goal of designing risk budget portfolios, Spinu proposed in [6] to solve the following convex optimization problem: \[\underset{\mathbf{x}\ge\mathbf{0}}{\textsf{minimize}} \quad \frac{1}{2}\mathbf{x}^{T}\boldsymbol{\Sigma}\mathbf{x} - \sum_{i=1}^{N}b_i\log(x_i),\] where the portfolio can be recovered as \(\mathbf{w} = \mathbf{x}/(\mathbf{1}^T\mathbf{x})\).

Indeed, Spinu realized in [6] that precisely the risk budgeting equation \(\boldsymbol{\Sigma}\mathbf{x} = \mathbf{b}/\mathbf{x}\) corresponds to the gradient of the convex function \(f(\mathbf{x}) = \frac{1}{2}\mathbf{x}^{T}\boldsymbol{\Sigma}\mathbf{x} - \mathbf{b}^T\log(\mathbf{x})\) set to zero: \[\nabla f(\mathbf{x}) = \boldsymbol{\Sigma}\mathbf{x} - \mathbf{b}/\mathbf{x} = \mathbf{0}.\]

Thus, a convenient way to solve the problem is by solving the following convex optimization problem: \[\underset{\mathbf{x}\ge\mathbf{0}}{\textsf{minimize}} \quad \frac{1}{2}\mathbf{x}^{T}\boldsymbol{\Sigma}\mathbf{x} - \mathbf{b}^T\log(\mathbf{x})\] which has optimality condition \(\boldsymbol{\Sigma}\mathbf{x} = \mathbf{b}/\mathbf{x}\).

Such solution can be computed using a general-purpose convex optimization package, but faster algorithms such as the **Newton method** and the **cyclical coordinate descent method**, employed in [6] and [7], are implemented in this package. (Yet another convex formulation was proposed in [8].)

The previous methods are based on a convex reformulation of the problem so they are guaranteed to converge to the optimal risk budgeting solution. However, they can only be employed for the simplest risk budgeting formulation with a simplex constraint set (i.e., \(\mathbf{1}^T\mathbf{w}=1\) and \(\mathbf{w} \ge \mathbf{0}\)). They cannot be used if

- we have other constraints like allowing shortselling or box constraints: \(l_i \le w_i \le u_i\)
- on top of the risk budgeting constraints \(w_i\left(\boldsymbol{\Sigma}\mathbf{w}\right)_i = b_i \;\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}\) we have other objectives like maximizing the expected return \(\mathbf{w}^T\boldsymbol{\mu}\) or minimizing the overall variance \(\mathbf{w}^T\boldsymbol{\Sigma}\mathbf{w}\).

For those more general cases, we need more sophisticated formulations, which unfortunately are not convex. The idea is to try to achieve equal risk contributions \({\sf RC}_i = \frac{w_i\left(\boldsymbol{\Sigma}\mathbf{w}\right)_i}{\sqrt{\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}}}\) by penalizing the differences between the terms \(w_{i}\left(\boldsymbol{\Sigma}\mathbf{w}\right)_{i}\).

There are many reformulations possible. For illustrative purposes, one such formulation is \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \sum_{i,j=1}^{N}\left(w_{i}\left(\boldsymbol{\Sigma}\mathbf{w}\right)_{i}-w_{j}\left(\boldsymbol{\Sigma}\mathbf{w}\right)_{j}\right)^{2} \color{blue}{- \;F(\mathbf{w})}\\ \textsf{subject to} & \mathbf{w} \ge \mathbf{0}, \quad\mathbf{1}^T\mathbf{w}=1, \quad\color{blue}{\mathbf{w}\in\cal{W}} \end{array}\] where \(F(\mathbf{w})\) denotes some additional objective function and \(\cal{W}\) denotes an arbitrary convex set of constraints. More expressions for the risk concentration terms are listed in Appendix I.

The way to solve this general problem is derived in [9], [10] and is based on a powerful optimization framework named successive convex approximation (SCA) [11]. See Appendix II for a general idea of the method.

A simple code example on how to design a risk parity portfolio is as follows:

```
library(riskParityPortfolio)
# generate synthetic data
set.seed(42)
N <- 10
V <- matrix(rnorm(N^2), nrow = N)
Sigma <- cov(V)
# compute risk parity portfolio
rpp <- riskParityPortfolio(Sigma)
# plot
barplotPortfolioRisk(rpp$w, Sigma, col = "#A29BFE")
```

As presented earlier, risk parity portfolios are designed in such a way as to ensure equal risk contribution from the assests, which can be noted in the chart above.

The design of risk parity portfolios as solved by [6] and [7] is of paramount importance both for academia and industry. However, practitioners would like the ability to include additional constraints and objective terms desired in practice, such as the mean return, box constraints, etc. In such cases, the risk-contribution constraints cannot be met exactly due to the trade-off among different objectives or additional constraints.

Let us explore, for instance, the effect of including the expected return as an additional objective in the optimization problem. The problem can be formulated as \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & R(\mathbf{w}) - \lambda_{\mu} \mathbf{w}^{T}\boldsymbol{\mu}\\ \textsf{subject to} & \mathbf{1}^T\mathbf{w}=1, \mathbf{w} \ge \mathbf{0}, \end{array}\] where \(R(\mathbf{w}) = \sum_{i=1}^{N}\left(w_{i}\left(\boldsymbol{\Sigma}\mathbf{w}\right)_i - b_i\mathbf{w}^T\boldsymbol{\Sigma}\mathbf{w}\right)^{2}\) is the risk concentration function or risk parity function, \(\mathbf{w}^{T}\boldsymbol{\mu}\) is the expected return, and \(\lambda_{\mu}\) is a trade-off parameter.

```
library(ggplot2)
library(riskParityPortfolio)
N <- 10
V <- matrix(rnorm(N^2), nrow = N)
Sigma <- cov(V)
mu <- runif(N)
lmd_sweep <- 10^seq(-5, 2, .25)
mean_return <- c()
risk_concentration <- c()
for (lmd_mu in lmd_sweep) {
rpp <- riskParityPortfolio(Sigma, mu = mu, lmd_mu = lmd_mu,
formulation = "rc-over-sd vs b-times-sd")
mean_return <- c(mean_return, rpp$mean_return)
risk_concentration <- c(risk_concentration, rpp$risk_concentration)
}
ggplot(data.frame(risk_concentration, mean_return),
aes(x = risk_concentration, y = mean_return)) +
geom_line() + geom_point() +
labs(title = "Expected Return vs Risk Concentration",
x = "Risk Concentration", y = "Expected Return")
```

Similarly, the `riskParityPortfolio`

package allows us to include the variance \(\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}\) as an objective term: \[\begin{array}{ll}
\underset{\mathbf{w}}{\textsf{minimize}} &
R(\mathbf{w}) + \lambda_{\sf var} \mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}\\
\textsf{subject to} & \mathbf{1}^T\mathbf{w}=1, \mathbf{w} \ge \mathbf{0},
\end{array}\] where \(\lambda_{\sf var}\) is a trade-off parameter.

In the code, that can be done by passing a positive value to the parameter `lmd_var`

. Let’s check the following illustrative example that depicts the trade-off between volatility and risk parity:

```
library(ggplot2)
library(riskParityPortfolio)
load("Sigma_mu.RData")
lmd_sweep <- 10^seq(-5, 5, .25)
variance <- c()
risk_concentration <- c()
for (lmd_var in lmd_sweep) {
rpp <- riskParityPortfolio(Sigma, lmd_var = lmd_var,
formulation = "rc-over-sd vs b-times-sd")
variance <- c(variance, rpp$variance)
risk_concentration <- c(risk_concentration, rpp$risk_concentration)
}
volatility <- sqrt(variance)
ggplot(data.frame(risk_concentration, volatility),
aes(x = risk_concentration, y = volatility)) +
geom_line() + geom_point() +
labs(title = "Volatility vs Risk Concentration",
x = "Risk Concentration", y = "Volatility")
```

In version 2.0, we added support for general linear constraints, i.e., `riskParityPortfolio`

is now able to solve the following problem: \[\begin{array}{ll}
\underset{\mathbf{w}}{\textsf{minimize}} &
R(\mathbf{w}) + \lambda F(\mathbf{w})\\
\textsf{subject to} & \mathbf{C}\mathbf{w} = \mathbf{c},~~\mathbf{D}\mathbf{w} \leq \mathbf{d}.
\end{array}\]

Users interested in the details of the algorithm used to solve this problems are refered to Section V (*Advanced Solving Approaches*) of [9]. In summary, the algorithm fits well within the SCA framework, while preserving speed and scalability.

It was recently mentioned by [12] that the problem of designing risk parity portfolios with general constraints is harder than it seems. Indeed, [12] shows that, after imposing general linear constraints, the property of equal risk contributions (ERC) is unlikely to be preserved among the assets affected by the constraints.

Let’s check out a numerical example from [12]. Consider we have a universe of eight assets and we would like to design a risk parity portfolio \(\mathbf{w}\) satisfying the following constraints: \[ w_5 + w_6 + w_7 + w_8 \geq 30\%, \] \[ w_2 + w_6 \geq w_1 + w_5 + 5\%, \] and \[ \sum_i w_i = 100\% . \]

```
# compute the covariance matrix
vol <- c(0.05, 0.05, 0.07, 0.1, 0.15, 0.15, 0.15, 0.18)
Corr <- rbind(c(100, 80, 60, -20, -10, -20, -20, -20),
c( 80, 100, 40, -20, -20, -10, -20, -20),
c( 60, 40, 100, 50, 30, 20, 20, 30),
c(-20, -20, 50, 100, 60, 60, 50, 60),
c(-10, -20, 30, 60, 100, 90, 70, 70),
c(-20, -10, 20, 60, 90, 100, 60, 70),
c(-20, -20, 20, 50, 70, 60, 100, 70),
c(-20, -20, 30, 60, 70, 70, 70, 100)) / 100
Sigma <- Corr * (vol %o% vol)
# define linear constraints
Dmat <- matrix(0, 2, 8)
Dmat[1, ] <- c(rep(0, 4), rep(-1, 4))
Dmat[2, ] <- c(1, -1, 0, 0, 1, -1, 0, 0)
dvec <- c(-0.30, -0.05)
# design portfolio
rpp <- riskParityPortfolio(Sigma, Dmat = Dmat, dvec = dvec)
# plot portfolio weights
barplotPortfolioRisk(rpp$w, Sigma)
```

As we can observe, the risk contributions are somewhat clustered according to the relationship among assets defined by the linear constraints, as mentioned by [12].

The linear constraints are obviously satisfied:

```
# equality constraints
print(sum(rpp$w))
#> [1] 1
# inequality constraints
print(Dmat %*% rpp$w)
#> [,1]
#> [1,] -0.30
#> [2,] -0.05
```

The results obtained by our implementation agree with those reported by [12].

In [13], the author showed how to build a risk parity portfolio for FAANG companies (Facebook, Apple, Amazon, Netflix, and Google) using **riskParityPortfolio**. Here, we will attempt to replicate their backtest results, but using the package portfolioBacktest instead.

```
library(xts)
library(portfolioBacktest)
library(riskParityPortfolio)
# download price data
faang_data <- stockDataDownload(c("GOOG", "NFLX", "AAPL", "AMZN", "FB"),
from = "2014-01-01", to = "2019-06-25")
# define portfolios to be backtested
# risk parity portfolio
risk_parity <- function(dataset) {
prices <- dataset$adjusted
log_returns <- diff(log(prices))[-1]
return(riskParityPortfolio(cov(log_returns))$w)
}
# tangency portfolio (maximum sharpe ratio)
library(quadprog)
max_sharpe_ratio <- function(dataset) {
prices <- dataset$adjusted
log_returns <- diff(log(prices))[-1]
N <- ncol(prices)
Sigma <- cov(log_returns)
mu <- colMeans(log_returns)
if (all(mu <= 1e-8))
return(rep(0, N))
Dmat <- 2 * Sigma
Amat <- diag(N)
Amat <- cbind(mu, Amat)
bvec <- c(1, rep(0, N))
dvec <- rep(0, N)
res <- solve.QP(Dmat = Dmat, dvec = dvec, Amat = Amat, bvec = bvec, meq = 1)
w <- res$solution
return(w/sum(w))
}
# call portfolioBacktest and benchmark against the uniform (1/N) portfolio
bt <- portfolioBacktest(list("risk parity portfolio" = risk_parity,
"tangency portfolio" = max_sharpe_ratio),
list(faang_data),
T_rolling_window = 12*20,
optimize_every = 3*20, rebalance_every = 3*20)
# dates of the designed portfolios
index(bt$tangency$data1$w_designed)
#> [1] "2014-12-12" "2015-03-12" "2015-06-08" "2015-09-01" "2015-11-25" "2016-02-24" "2016-05-19" "2016-08-15"
#> [9] "2016-11-08" "2017-02-06" "2017-05-03" "2017-07-28" "2017-10-23" "2018-01-19" "2018-04-17" "2018-07-12"
#> [17] "2018-10-05" "2019-01-03" "2019-04-01"
# check performance summary
backtestSummary(bt)$performance
#> risk parity portfolio tangency portfolio
#> Sharpe ratio 1.3800144 0.8787596
#> max drawdown 0.3062046 0.3516856
#> annual return 0.3117200 0.2324203
#> annual volatility 0.2258817 0.2644868
#> Sterling ratio 1.0180122 0.6608751
#> Omega ratio 1.2710283 1.1793760
#> ROT (bps) 8310.1199557 793.0188434
# plot cumulative returns chart
backtestChartCumReturns(bt)
```

```
# plot assets exposures over time
backtestChartStackedBar(bt, portfolio = "risk parity portfolio", legend = TRUE)
```

Indeed, the charts are quite similar to those reported in [13]. Note that the weights evolution of the Markowitz’s tangency portfolio is quite sensitive to minute changes in the rebalance/optimization dates, whereas no significant difference is noticed for the risk parity portfolio weights.

Others R packages with the goal of designing risk parity portfolios do exist, such as `FinCovRegularization`

, `cccp`

, and `RiskPortfolios`

. Let’s check how do they perform against `riskParityPortfolio`

. (Note that other packages like `FRAPO`

use `cccp`

under the hood.)

```
library(FinCovRegularization)
library(cccp)
library(RiskPortfolios)
# generate synthetic data
set.seed(42)
N <- 10
V <- matrix(rnorm(N*(N+N/5)), N+N/5, N)
Sigma <- cov(V)
# uniform initial guess for the portfolio weights
w0 <- rep(1/N, N)
# compute risk parity portfolios using different methods
rpp <- riskParityPortfolio(Sigma, w0 = w0, formulation = "rc-double-index")
#> Warning in riskParityPortfolio(Sigma, w0 = w0, formulation = "rc-double-index"): The problem is a vanilla risk parity
#> portofolio, but a nonconvex formulation has been chosen. Consider not specifying the formulation argument in order to
#> use the convex formulation and get a guaranteed global solution.
fincov_w <- FinCovRegularization::RiskParity(Sigma)
riskport_w <- optimalPortfolio(Sigma = Sigma,
control = list(type = "erc",
constraint = "lo"))
cccp_w <- c(getx(cccp::rp(w0, Sigma, mrc = w0, optctrl = ctrl(trace = FALSE))))
# plot
w_all <- cbind("riskParityPortfolio" = rpp$w,
"FinCovRegularization" = fincov_w,
"cccp" = cccp_w,
"RiskPortfolios" = riskport_w)
barplotPortfolioRisk(w_all, Sigma)
```

Depending on the condition number of the covariance matrix, we found that the packages `FinCovRegularization`

and `RiskPortfolios`

may fail unexpectedly. Apart from that, all functions perform similarly.

Now, let’s see a comparison, in terms of computational time, of our cyclical coordinate descent implementation against the `rp()`

function from the `cccp`

package and the `optimalPortfolio()`

function from the `RiskPortfolios`

package. (For a fair comparison, instead of calling our function `riskParityPortfolio()`

, we call directly the core internal function `risk_parity_portfolio_ccd_spinu()`

, which only computes the risk parity weights, just like `rp()`

and `optimalPortfolio()`

.)

```
library(microbenchmark)
library(ggplot2)
library(cccp)
library(RiskPortfolios)
library(riskParityPortfolio)
N <- 100
V <- matrix(rnorm(N^2), ncol = N)
Sigma <- cov(V)
b <- rep(1/N, N)
op <- microbenchmark(
cccp = rp(b, Sigma, b, optctrl = ctrl(trace = FALSE)),
riskParityPortfolio =
riskParityPortfolio:::risk_parity_portfolio_ccd_spinu(Sigma, b, 1e-6, 50),
RiskPortfolios =
optimalPortfolio(Sigma = Sigma,
control = list(type = 'erc', constraint = 'lo')),
times = 10L)
print(op)
#> Unit: microseconds
#> expr min lq mean median uq max neval
#> cccp 15706.519 15907.386 17402.6694 16089.719 16160.337 29889.532 10
#> riskParityPortfolio 87.226 91.121 99.9547 96.808 108.079 119.291 10
#> RiskPortfolios 765344.494 769834.821 782709.4109 774997.356 790587.159 843189.970 10
autoplot(op)
#> Coordinate system already present. Adding new coordinate system, which will replace the existing one.
```

As it can be observed, our implementation is orders of maginitude faster than the interior-point method used by `cccp`

and the formulation used by `RiskPortfolios`

.

In general, with different constraints and objective functions, exact parity cannot be achieved and one needs to define a risk term to be minimized: \(R(\mathbf{w}) = \sum_{i=1}^{N}\left(g_{i}\left(\mathbf{w}\right)\right)^{2}\), where the \(g_{i}\)’s denote the different risk contribution errors, e.g., \(g_{i} = w_{i}\left(\boldsymbol{\Sigma}\mathbf{w}\right)_i - b_i\mathbf{w}^T\boldsymbol{\Sigma}\mathbf{w}\). A double-index summation can also be used: \(R(\mathbf{w}) = \sum_{i,j=1}^{N}\left(g_{ij}\left(\mathbf{w}\right)\right)^{2}\).

We consider the risk formulations as presented in [9], [10]. They can be passed through the keyword argument `formulation`

in the function `riskParityPortfolio()`

.

The name of the formulations and their mathematical expressions are presented as follows.

**Formulation “rc-double-index”**: \[R(\mathbf{w}) = \sum_{i,j=1}^{N}\left(w_{i}\left(\boldsymbol{\Sigma}\mathbf{w}\right)_{i}-w_{j}\left(\boldsymbol{\Sigma}\mathbf{w }\right)_{j}\right)^{2}\]

**Formulation “rc-vs-theta”**: \[
R(\mathbf{w},\theta) = \sum_{i=1}^{N}\left(w_{i}\left(\boldsymbol{\Sigma}\mathbf{w}\right)_i - \theta \right)^{2}
\]

**Formulation “rc-over-var-vs-b”**: \[
R(\mathbf{w}) = \sum_{i=1}^{N}\left(\frac{w_{i}\left(\boldsymbol{\Sigma}\mathbf{w}\right)_i}{\mathbf{w}^T\boldsymbol{\Sigma}\mathbf{w}}-b_i\right)^{2}
\]

**Formulation “rc-over-b double-index”**: \[
R(\mathbf{w}) = \sum_{i,j=1}^{N}\left(\frac{w_i\left(\boldsymbol{\Sigma}\mathbf{w}\right)_i}{b_i} - \frac{w_j\left(\boldsymbol{\Sigma}\mathbf{w}\right)_j}{b_j}\right)^{2}
\]

**Formulation “rc-vs-b-times-var”**: \[
R(\mathbf{w}) = \sum_{i=1}^{N}\left(w_{i}\left(\boldsymbol{\Sigma}\mathbf{w}\right)_i - b_i\mathbf{w}^T\boldsymbol{\Sigma}\mathbf{w}\right)^{2}
\]

**Formulation “rc-over-sd vs b-times-sd”**: \[
R(\mathbf{w}) = \sum_{i=1}^{N}\left(\frac{w_{i}\left(\boldsymbol{\Sigma}\mathbf{w}\right)_i}{\sqrt{\mathbf{w}^T\boldsymbol{\Sigma}\mathbf{w}}}-b_i\sqrt{\mathbf{w}^T\boldsymbol{\Sigma}\mathbf{w}}\right)^{2}
\]

**Formulation “rc-over-b vs theta”**: \[
R(\mathbf{w},\theta) = \sum_{i=1}^{N}\left(\frac{w_{i}\left(\boldsymbol{\Sigma}\mathbf{w}\right)_i}{b_i} - \theta \right)^{2}
\]

**Formulation “rc-over-var”**: \[
R(\mathbf{w}) = \sum_{i=1}^{N}\left(\frac{w_{i}\left(\boldsymbol{\Sigma}\mathbf{w}\right)_i}{\mathbf{w}^T\boldsymbol{\Sigma}\mathbf{w}}\right)^{2}
\]

In this appendix we describe the algorithms implemented for both the vanilla risk parity portfolio and the modern risk parity portfolio that may contain additional objective terms and constraints.

We now describe the implementation of the Newton method and the cyclical (coordinate) descent algorithm for the vanilla risk parity formulations presented in [6] and [7].

Consider the risk budgeting equations \[w_i\left(\boldsymbol{\Sigma}\mathbf{w}\right)_i = b_i \;\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}, \qquad i=1,\ldots,N\] with \(\mathbf{1}^T\mathbf{w}=1\) and \(\mathbf{w} \ge \mathbf{0}\).

If we define \(\mathbf{x}=\mathbf{w}/\sqrt{\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}}\), then we can rewrite the risk budgeting equations compactly as \[\boldsymbol{\Sigma}\mathbf{x} = \mathbf{b}/\mathbf{x}\] with \(\mathbf{x} \ge \mathbf{0}\) and we can always recover the portfolio by normalizing: \(\mathbf{w} = \mathbf{x}/(\mathbf{1}^T\mathbf{x})\).

Spinu [6] realized that precisely that equation corresponds to the gradient of the function \(f(\mathbf{x}) = \frac{1}{2}\mathbf{x}^{T}\boldsymbol{\Sigma}\mathbf{x} - \mathbf{b}^T\log(\mathbf{x})\) set to zero, which is the optimality condition for its minimization.

So we can finally formulate the risk budgeting problem as the following convex optimization problem: \[\underset{\mathbf{x}\ge\mathbf{0}}{\textsf{minimize}} \quad \frac{1}{2}\mathbf{x}^{T}\boldsymbol{\Sigma}\mathbf{x} - \mathbf{b}^T\log(\mathbf{x}).\]

Roncalli et al. [7] proposed a slightly different formulation (also convex): \[\underset{\mathbf{x}\ge\mathbf{0}}{\textsf{minimize}} \quad \sqrt{\mathbf{x}^{T}\boldsymbol{\Sigma}\mathbf{x}} - \mathbf{b}^T\log(\mathbf{x}).\]

Unfortunately, even though these two problems are convex, they do not conform with the typical classes that most solvers embrace (i.e., LP, QP, QCQP, SOCP, SDP, GP, etc.).

Nevertheless, there are several simple iterative algorithms that can be used, like the Newton method and the cyclical coordinate descent algorithm.

**Newton method**

The Newton method obtains the iterates based on the gradient \(\nabla f\) and the Hessian \({\sf H}\) of the objective function \(f(\mathbf{x})\) as follows: \[\mathbf{x}^{(k+1)} = \mathbf{x}^{(k)} - {\sf H}^{-1}(\mathbf{x}^{(k)})\nabla f(\mathbf{x}^{(k)})\]

In practice, one may need to use the backtracking method to properly adjust the step size of each iteration [14].

For the function \(f(\mathbf{x}) = \frac{1}{2}\mathbf{x}^{T}\boldsymbol{\Sigma}\mathbf{x} - \mathbf{b}^T\log(\mathbf{x})\), the gradient and Hessian are given by \[\begin{array}{ll} \nabla f(\mathbf{x}) &= \boldsymbol{\Sigma}\mathbf{x} - \mathbf{b}/\mathbf{x}\\ {\sf H}(\mathbf{x}) &= \boldsymbol{\Sigma} + {\sf Diag}(\mathbf{b}/\mathbf{x}^2). \end{array}\]

For the function \(f(\mathbf{x}) = \sqrt{\mathbf{x}^{T}\boldsymbol{\Sigma}\mathbf{x}} - \mathbf{b}^T\log(\mathbf{x})\), the gradient and Hessian are given by \[\begin{array}{ll} \nabla f(\mathbf{x}) &= \boldsymbol{\Sigma}\mathbf{x}/\sqrt{\mathbf{x}^{T}\boldsymbol{\Sigma}\mathbf{x}} - \mathbf{b}/\mathbf{x}\\ {\sf H}(\mathbf{x}) &= \left(\boldsymbol{\Sigma} - \boldsymbol{\Sigma}\mathbf{x}\mathbf{x}^T\boldsymbol{\Sigma}/\mathbf{x}^{T}\boldsymbol{\Sigma}\mathbf{x}\right) / \sqrt{\mathbf{x}^{T}\boldsymbol{\Sigma}\mathbf{x}} + {\sf Diag}(\mathbf{b}/\mathbf{x}^2). \end{array}\]

**Cyclical coordinate descent algorithm**

This method simply minimizes in a cyclical manner with respect to each element of the variable \(\mathbf{x}\) (denote \(\mathbf{x}_{-i}=[x_1,\ldots,x_{i-1},0,x_{i+1},\ldots,x_N]^T\)), while helding the other elements fixed.

For the function \(f(\mathbf{x}) = \frac{1}{2}\mathbf{x}^{T}\boldsymbol{\Sigma}\mathbf{x} - \mathbf{b}^T\log(\mathbf{x})\), the minimization w.r.t. \(x_i\) is \[\underset{x_i>0}{\textsf{minimize}} \quad \frac{1}{2}x_i^2\boldsymbol{\Sigma}_{ii} + x_i(\mathbf{x}_{-i}^T\boldsymbol{\Sigma}_{\cdot,i}) - b_i\log{x_i}\] with gradient \(\nabla_i f = x_i\boldsymbol{\Sigma}_{ii} + (\mathbf{x}_{-i}^T\boldsymbol{\Sigma}_{\cdot,i}) - b_i/x_i\). Setting the gradient to zero gives us the second order equation \[x_i^2\boldsymbol{\Sigma}_{ii} + x_i(\mathbf{x}_{-i}^T\boldsymbol{\Sigma}_{\cdot,i}) - b_i = 0\] with positive solution given by \[x_i^\star = \frac{-(\mathbf{x}_{-i}^T\boldsymbol{\Sigma}_{\cdot,i})+\sqrt{(\mathbf{x}_{-i}^T\boldsymbol{\Sigma}_{\cdot,i})^2+ 4\boldsymbol{\Sigma}_{ii} b_i}}{2\boldsymbol{\Sigma}_{ii}}.\]

The derivation for the function \(f(\mathbf{x}) = \sqrt{\mathbf{x}^{T}\boldsymbol{\Sigma}\mathbf{x}} - \mathbf{b}^T\log(\mathbf{x})\) follows similarly. The update for \(x_i\) is given by \[x_i^\star = \frac{-(\mathbf{x}_{-i}^T\boldsymbol{\Sigma}_{\cdot,i})+\sqrt{(\mathbf{x}_{-i}^T\boldsymbol{\Sigma}_{\cdot,i})^2+ 4\boldsymbol{\Sigma}_{ii} b_i \sqrt{\mathbf{x}^{T}\boldsymbol{\Sigma}\mathbf{x}}}}{2\boldsymbol{\Sigma}_{ii}}.\]

Many practical formulations deployed to design risk parity portfolios lead to nonconvex problems, specially when additional objective terms such as mean return or variance, or additional constraints, namely, shortselling, are taken into account. To circumvent the complications that arise in such formulations, Feng & Palomar [9] proposed a method called sucessive convex approximation (SCA). The SCA method works by convexifying the risk concentration term at some pre-defined point, casting the nonconvex problem into a much simpler strongly convex optimization problem. This procedure is then iterated until convergence is achieved. It is important to highlight that the SCA method always converges to a stationary point.

At the \(k\)-th iteration, the SCA method aims to solve \[\begin{align}\begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \sum_{i=1}^{n}\left(g_i(\mathbf{w}^k) + (\nabla g_i(\mathbf{w}^{k}))^{T}(\mathbf{w} - \mathbf{w}^{k})\right)^2 + \tau ||\mathbf{w} - \mathbf{w}^{k}||^{2}_{2} + \lambda F(\mathbf{w})\\ \textsf{subject to} & \mathbf{w}^{T}\mathbf{1} = 1, \mathbf{w} \in \mathcal{W}, \end{array}\end{align}\] where the first order Taylor expasion of \(g_i(\mathbf{w})\) has been used.

After some mathematical manipulations described in detail in [9], the optimization problem above can be rewritten as \[\begin{align}\begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \dfrac{1}{2}\mathbf{w}^{T}\mathbf{Q}^{k}\mathbf{w} + \mathbf{w}^{T}\mathbf{q}^{k} + \lambda F(\mathbf{w})\\ \textsf{subject to} & \mathbf{w}^{T}\mathbf{1} = 1, \mathbf{w} \in \mathcal{W}, \end{array}\end{align}\] where \[\begin{align} \mathbf{Q}^{k} & \triangleq 2(\mathbf{A}^{k})^{T}\mathbf{A}^{k} + \tau \mathbf{I},\\ \mathbf{q}^{k} & \triangleq 2(\mathbf{A}^{k})^{T}\mathbf{g}(\mathbf{w}^{k}) - \mathbf{Q}^{k}\mathbf{w}^{k}, \end{align}\] and \[\begin{align} \mathbf{A}^{k} & \triangleq \left[\nabla_{\mathbf{w}} g_{1}\left(\mathbf{w}^{k}\right), ..., \nabla_{\mathbf{w}} g_{n}\left(\mathbf{w}^{k}\right)\right]^{T} \\ \mathbf{g}\left(\mathbf{w}^{k}\right) & \triangleq \left[g_{1}\left(\mathbf{w}^{k}\right), ..., g_{n}\left(\mathbf{w}^{k}\right)\right]^{T}. \end{align}\]

The above problem is a quadratic program (QP) which can be efficiently solved by standard R libraries. Furthermore, it is straightforward that adding the mean return or variance terms still keeps the structure of the problem intact.

In order to efficiently design high dimensional portfolios that follows the risk parity criterion, we implement the cyclical coordinate descent algorithm proposed by [7]. In brief, this algorithm optimizes for one portfolio weight at a time while leaving the rest fixed.

The plot below illustrates the computational scaling of both Newton and cyclical algorithms. Note that the cyclical algorithm is implemented for two different formulations used by [6] and [7], respectively. Nonetheless, they output the same solution, as they should.

```
library(microbenchmark)
library(riskParityPortfolio)
library(ggplot2)
sizes <- c(10, 50, 100, 200, 300, 400, 500, 600, 700)
size_seq <- c(1:length(sizes))
times <- matrix(0, 3, length(sizes))
for (i in size_seq) {
V <- matrix(rnorm(1000 * sizes[i]), nrow = sizes[i])
Sigma <- V %*% t(V)
bench <- microbenchmark(
newton = riskParityPortfolio(Sigma, method_init="newton"),
cyclical_spinu = riskParityPortfolio(Sigma, method_init="cyclical-spinu"),
cyclical_roncalli = riskParityPortfolio(Sigma, method_init="cyclical-roncalli"),
times = 10L, unit = "ms", control = list(order = "inorder", warmup = 4))
times[1, i] <- median(bench$time[c(TRUE, FALSE, FALSE)] / 10 ^ 6)
times[2, i] <- median(bench$time[c(FALSE, TRUE, FALSE)] / 10 ^ 6)
times[3, i] <- median(bench$time[c(FALSE, FALSE, TRUE)] / 10 ^ 6)
}
df <- data.frame(sizes, newton = times[1, ], cyclical_spinu = times[2, ], cyclical_roncalli = times[3, ])
molten_df <- reshape2::melt(df, id.vars = "sizes")
ggplot(molten_df, aes(x = sizes, y = value, col = variable)) +
geom_line() + geom_point() +
scale_color_manual(values = c("#2d3436", "#6c5ce7", "#74b9ff")) +
theme(legend.title = ggplot2::element_blank()) +
labs(title = "Speed comparison of methods for the convex formulation",
x = "Portfolio size N", y = "CPU time [ms]")
```

[1] H. Markowitz, “Portfolio selection,” *J. Financ.*, vol. 7, no. 1, pp. 77–91, 1952.

[2] E. Qian, “Risk parity portfolios: Efficient portfolios through true diversification,” *PanAgora Asset Management*, 2005.

[3] C. S. Asness, A. Frazzini, and L. H. Pedersen, “Leverage aversion and risk parity,” *Financial Analysts Journal*, vol. 68, no. 1, pp. 47–59, 2012.

[4] T. Roncalli, *Introduction to risk parity and budgeting*. CRC Press, 2013.

[5] E. E. Qian, *Risk parity fundamentals*. CRC Press, 2016.

[6] F. Spinu, “An algorithm for computing risk parity weights,” *SSRN*, 2013.

[7] T. Griveau-Billion, J.-C. Richard, and T. Roncalli, “A fast algorithm for computing high-dimensional risk parity portfolios,” *SSRN*, 2013.

[8] H. Kaya and W. Lee, “Demystifying risk parity,” *Neuberger Berman*, 2012.

[9] Y. Feng and D. P. Palomar, “SCRIP: Successive convex optimization methods for risk parity portfolios design,” *IEEE Trans. Signal Processing*, vol. 63, no. 19, pp. 5285–5300, 2015.

[10] Y. Feng and D. P. Palomar, *A Signal Processing Perspective on Financial Engineering*. Foundations; Trends in Signal Processing, Now Publishers, 2016.

[11] G. Scutari, F. Facchinei, P. Song, D. P. Palomar, and J.-S. Pang, “Decomposition by partial linearization: Parallel optimization of multi-agent systems,” *IEEE Trans. Signal Processing*, vol. 62, no. 3, pp. 641–656, 2014.

[12] J.-C. Richard and T. Roncalli, “Constrained risk budgeting portfolios: Theory, algorithms, applications, & puzzles,” *SSRN*, 2019.

[13] T. Souza, “DIY ray dalio etf: How to build your own hedge fund strategy with risk parity portfolios,” *Medium: https://towardsdatascience.com/ray-dalio-etf-900edfe64b05*. 2019.

[14] S. P. Boyd and L. Vandenberghe, *Convex optimization*. Cambridge University Press, 2004.