# 1. Introduction

In modern financial markets, volatility measures the degree of dispersion for assets and plays a crucial role in portfolio allocation, performance evaluation, and risk management. Low-frequency and high-frequency stock data are widely adopted to model the dynamic evolution of daily volatilities, while efforts made for volatility estimation and prediction in the past often employ these two types of data independently. Recent attempts to bridge the gap between these two include the realized GARCH model, the heterogeneous autoregressive (HAR) model, as well as the high-frequency based volatility (HEAVY) model. See Shephard and Sheppard (2010), Hansen, Huang, and Shek (2012), Corsi (2009) for more details.

In addition, Kim and Wang (2016) introduced the unified GARCH-Ito model by embedding the standard GARCH volatility structure in the instantaneous volatilities of an Ito diffusion process. The unified GARCH-Ito model is a continuous-time process at the high-frequency timescale and when restricted to the low-frequency timescale, retains the standard GARCH structure. Moreover, Song et al. (2020) introduced the realized GARCH-Ito model by embedding the realized GARCH model structure in the instantaneous volatilities of a jump-diffusion process. Comparing to the unified GARCH-Ito model, its conditional volatility has integrated volatility and jump variation as innovations, which are high-frequency data-based innovations that are more informative.

The RealizedGARCHIto package aims to provide methods for modeling the high-frequency data with unified GARCH-Ito model and realized GARCH-Ito model. It provides methods to estimate model parameters and allows one to estimate and predict conditional volatilities with the two proposed models. It also includes one sample data set that has low-frequency log returns (return) and realized measures such as realized volatility (RV), bi-power realized volatility (BPV) and jump variation (JV) computed and estimated using the CSI 300 index minute data from 2018-01-01 to 2020-06-30.

# 2. Model Specification

## 2.1 Unified GARCH-Ito Model

Definition The log price $$X_t$$, $$t \in \mathbb{R}_+$$ obeys the unified GARCH-Ito model if it satisfies $\begin{equation*} \begin{split} dX_t=& \mu_t dt + \sigma_t dB_t \\ \sigma^2_t = & \sigma^2_{[t]} + (t-[t])\lbrace \omega + (\gamma-1) \sigma^2_{[t]} \rbrace + \beta \left(\int_{[t]}^t \sigma_s dB_s \right)^2 \end{split} \end{equation*}$ where $$\mu_t$$ is a drift, $$[t]$$ denotes the integer part of $$t$$ and when $$t$$ itself is an integer, $$[t]=t-1$$, $$B_t$$ is a Brownian motion with respect to a filtration $$\mathcal{F}_t$$, $$\sigma^2_t$$ is a volatility process adapted to $$\mathcal{F}_t$$, $$\theta=(\omega,\beta,\gamma)$$ are the model parameters.

Proposition The conditional integrated volatility over the low-frequency period $$n$$ retains the following iterative structure $\begin{equation*} E \left[ \int_{n-1}^n \sigma^2_t dt \middle| \mathcal{F}_{n-1} \right] = h_n(\theta) = \omega^g + \gamma h_{n-1}(\theta) + \beta^g Z^2_{n-1} \end{equation*}$ where $$\tau(\theta)=(\omega^g, \beta^g, \gamma)$$ are model parameters in the above low-frequency structure and are functions of the original $$\theta$$, $$Z_{n-1}=X_{n-1}-X_{n-2}$$ is the low-frequency log return. Moreover, $\begin{equation*} E[h_n(\theta)]=\frac{\omega^g}{1-\beta^g-\gamma}. \end{equation*}$ The model parameters can be estimated by maximizing the following quasi-maximum likelihood function: $\begin{equation*} \hat{L}_U(\theta)=-\sum_{i=1}^n \left[ \log (h_i(\theta)) + \frac{RV_i}{h_i(\theta)} \right] \end{equation*}$ where $$RV_i$$’s are the daily realized volatility estimates. Well-performing realized volatility estimators that can handle the market microstructure noise when frequency of the intraday data is ultra-high include the two-time scale realized volatility estimator, the multi-scale realized volatility estimator, the pre-averaging realized volatility estimator, and the kernel realized volatility estimator. See Ait-Sahalia and Yu (2009), Zhang (2006), Barndorff-Nielsen et al. (2008), Jacod et al. (2009), Christensen, Kinnebrock, and Podolskij (2010) for more details.

To maximize $$\hat{L}_U(\theta)$$, we need to specify $$h_1(\theta)$$ and we adopt its unconditional expectation such that $\begin{equation*} h_1(\theta)=\frac{\omega^g}{1-\beta^g-\gamma}. \end{equation*}$ We estimate the model parameter $$\theta$$ by maximizing $$\hat{L}_U(\theta)$$ such that $\begin{equation*} \hat{\theta}=\underset{\theta \in \Theta}{\text{argmax}} \hat{L}_U(\theta) \end{equation*}$ and estimate the model parameter $$\tau(\theta)$$ by $$\tau(\hat{\theta})$$.

require(GARCHIto) 
## Loading required package: GARCHIto
data("sample_data")
model_unified=UnifiedEst(sample_data$BPV, sample_data$return)
##
## Iter: 1 fn: -5311.7549    Pars:  0.000002193 0.055580454 0.856415391
## Iter: 2 fn: -5311.7549    Pars:  0.000002193 0.055580454 0.856415391
## solnp--> Completed in 2 iterations
model_unified$coefficients # estimated model parameters  ## omega_g beta_g gamma ## 2.192970e-06 5.558045e-02 8.564154e-01 ## 2.2 Realized GARCH-Ito Model Definition The log price $$X_t$$, $$t \in \mathbb{R}_+$$ obeys the realized GARCH-Ito model if it satisfies $\begin{equation*} \begin{split} dX_t=& \mu dt + \sigma_t dB_t + L_t d \Lambda_t \\ \sigma^2_t = & \sigma^2_{[t]} + \gamma (t-[t])^2 \lbrace \omega_1 +\sigma^2_{[t]} \rbrace - (t-[t]) \lbrace \omega_2 +\sigma^2_{[t]} \rbrace \\\ & \, + \alpha \int_{[t]}^t \sigma^2_s ds + \beta \int_{[t]}^t L^2_s d \Lambda_s + \nu ([t]+1-t) Z^2_t \end{split} \end{equation*}$ where $$\mu_t$$ is a drift, $$[t]$$ denotes the integer part of $$t$$ and when $$t$$ itself is an integer, $$[t]=t-1$$, $$Z_t=\int_{[t]}^t dW_t$$, $$B_t$$ and $$W_t$$ are standard Brownian motions with respect to filtration $$\mathcal{F}_t$$ with $$dW_t dB_t= \rho dt$$, and $$\sigma^2_t$$ is a volatility process adapted to $$\mathcal{F}_t$$. For the jump part, $$\Lambda_t$$ is the standard Poisson process with constant intensity $$\lambda$$ and $$L_t$$ denotes the i.i.d. jump sizes which are independent of the Poisson and continuous diffusion processes. The i.i.d. assumption on jump sizes can be further rewritten as $$L^2_t =\omega_L + M_t$$ where $$M_t$$’s are i.i.d. random variables with mean zero and constant variance. Proposition The conditional integrated volatility over the low-frequency period $$n$$ retains the following iterative structure $\begin{equation*} E \left[ \int_{n-1}^n \sigma^2_t dt \middle| \mathcal{F}_{n-1} \right] = h_n(\theta) = \omega^g + \gamma h_{n-1}(\theta) + \alpha^g \int_{n-2}^{n-1} \sigma^2_s ds + \beta^g \int_{n-2}^{n-1} L^2_t d \Lambda_t \end{equation*}$ where $$\tau(\theta)=(\omega^g, \alpha^g, \beta^g, \gamma)$$ are model parameters in the above low-frequency structure and are functions of the original $$\theta$$. Moreover, $\begin{equation*} E[h_n(\theta)]=\frac{\omega^g+\beta^g \lambda \omega_L}{1-\alpha^g-\gamma}. \end{equation*}$ The model parameters can be estimated by maximizing the following quasi-maximum likelihood function: $\begin{equation*} \hat{L}_R(\theta)= - \sum_{i=1}^n \left[ \log (\hat{h}_i(\theta)) + \frac{RV_i}{\hat{h}_i(\theta)} \right] \end{equation*}$ where $\begin{equation*} \hat{h}_1(\theta)=\frac{\omega^g+\beta^g \lambda \omega_L}{1-\alpha^g-\gamma} \end{equation*}$ and $\begin{equation*} \hat{h}_i(\theta) = \omega^g + \gamma h_{i-1}(\theta) + \alpha^g RV_{i-1} + \beta^g JV_{i-1}, \quad i=2,\ldots,n, \end{equation*}$ here $$RV_i$$’s are the daily realized volatility estimates and $$JV_i$$’s are the jump variation estimates. Note: in this pacakge, we approximate $$\lambda \omega_L$$ in $$\hat{h}_1(\theta)$$ by the median of all $$JV_i$$’s. We estimate the model parameter $$\theta$$ by maximizing $$\hat{L}_R(\theta)$$ such that $\begin{equation*} \hat{\theta}=\underset{\theta \in \Theta}{\text{argmax}} \hat{L}_R(\theta) \end{equation*}$ and estimate the model parameter $$\tau(\theta)$$ by $$\tau(\hat{\theta})$$. # without the consideration of price jumps model_realized_NJ=RealizedEst(sample_data$BPV)
##
## Iter: 1 fn: -5340.1082    Pars:  0.000004049 0.554107013 0.376603733
## Iter: 2 fn: -5340.1082    Pars:  0.000004049 0.554107013 0.376603733
## solnp--> Completed in 2 iterations
model_realized_NJ$coefficients  ## omega_g alpha_g gamma ## 4.048833e-06 5.541070e-01 3.766037e-01 # with the consideration of price jumps model_realized=RealizedEst(sample_data$BPV, sample_data$JV) ## ## Iter: 1 fn: -5340.1053 Pars: 0.000004144 0.557140538 0.000919110 0.371861977 ## Iter: 2 fn: -5340.1053 Pars: 0.000004144 0.557140538 0.000919110 0.371861977 ## solnp--> Completed in 2 iterations model_realized$coefficients 
##      omega_g      alpha_g       beta_g        gamma
## 4.143592e-06 5.571405e-01 9.191100e-04 3.718620e-01
plot(model_unified$sigma, cex=0.5, type="o", ylim=c(0,0.00035), main="estimated conditional volatilities", ylab="", xlab="") lines(model_realized_NJ$sigma,cex=0.5,type="o",col="blue",lty=2)
lines(model_realized$sigma,cex=0.5,type="o",col="red",lty=3, lwd=0.5) legend("topleft", cex=0.8, legend=c("Unified GARCH-Ito", "Realized GARCH-Ito No Jump","Realized GARCH-Ito with Jump"), col = c("black", "blue", "red"), lty=c(1,2,3)) # 3. Volatility Forecasting The dynamic structure imposed in the unified GARCH-Ito and realized GARCH-Ito model allow us to predict future volatility by estimating the expected conditional integrated volatility, i.e. $$E[h_{n+1}(\theta) | \mathcal{F}_n ]$$, with $$\hat{h}_{n+1}(\hat{\theta})$$. To obtain one-step-ahead estimated value of the conditional volatility, use$pred.

c(model_unified$pred, model_realized_NJ$pred, model_realized$pred)  ## [1] 3.537236e-05 2.841266e-05 2.843882e-05 To carry out rolling forecast with expanding window, we update the sample size for model construction at each rolling. To evaluate the model performance in volatility forecasting task, we compare $$\hat{h}_{n+1}(\hat{\theta})$$ with $$RV_{n+1}$$ and compute the squared prediction error $$\left( \hat{h}_{n+1}(\hat{\theta})-RV_{n+1} \right)^2$$. # conduct out of sample volatility forecasting and compute the mean squared prediction error error_unified=NULL error_realized_NJ=NULL error_realized=NULL for (i in 560:603){ sink("file") model1=UnifiedEst(sample_data$BPV[1:i], sample_data$return[1:i]) error_unified=c(error_unified, (model1$pred-sample_data$BPV[i+1])^2) model2=RealizedEst(sample_data$BPV[1:i])
error_realized_NJ=c(error_realized_NJ, (model2$pred-sample_data$BPV[i+1])^2)
model3=RealizedEst(sample_data$BPV[1:i], sample_data$JV[1:i])
error_realized=c(error_realized, (model3$pred-sample_data$BPV[i+1])^2)
sink()
}

error=c(mean(error_unified), mean(error_realized_NJ), mean(error_realized))
names(error)=c("Unified GARCH-Ito", "Realized GARCH-Ito No Jump", "Realized GARCH-Ito with Jump")
error
##            Unified GARCH-Ito   Realized GARCH-Ito No Jump
##                 4.439021e-10                 2.760417e-10
## Realized GARCH-Ito with Jump
##                 2.752327e-10

# 4. Further Extensions

Besides high- and low-frequency stock data, option data provide one more natural source for the more precise forecast of volatilities and have been investigated thoroughly since the seminal work of Black and Scholes (1973). Thus, Song et al. (2020) also discussed how to incorporate additional option data information in parameter estimation. Let $$NV_i$$’s be the estimated volatility values using option data, assume that $$NV_i$$ and the conditional integrated volatility $$h_i(\theta)$$ have the following linear relationship: $$$\label{eq:option} NV_{i}=b+a \, h_i(\theta)+e_i, \quad i=1,\ldots, n$$$ where $$b$$ and $$a$$ are the intercept and slope coefficients, respectively. Moreover, $$e_i$$’s are martingale differences with mean zero and variance $$\sigma^2_e$$, and they are independent of the price process and the microstructure component.

Let $$\phi=(\omega^g, \alpha^g, \beta^g, \gamma, a,b,\sigma^2_e)$$, the model parameters can be estimated by maximizing the following quasi-likelihood function, $\begin{equation*} \hat{L}_O(\phi)=-\sum_{i=1}^n \left[ \log (\hat{h}_i(\theta) + \frac{RV_i}{\hat{h}_i(\theta)}) \right] - \sum_{i=1}^n \left[\log(\sigma^2_e) + \frac{(NV_{i}-b-a \hat{h}_i(\theta))^2}{\sigma^2_e} \right]. \end{equation*}$ $\begin{equation*} \hat{\phi}=\underset{\phi \in \Phi}{\text{argmax}} \hat{L}_O(\phi). \end{equation*}$

The homogeneous variance in the linear model can be generalized to heterogeneous variance such as replacing $$\sigma^2_e$$ by $$\sigma^2_e h_i^\zeta (\theta)$$, where $$\zeta >0$$ is to adjust the level of heteroscedasticity with $$\zeta=0$$ corresponding to the homogeneous case. One may replace $$\sigma^2_e$$ by $$\sigma^2_e \hat{h}^\zeta_i(\theta)$$ in the quasi-likelihood function $$\hat{L}_O(\phi)$$ and then estimate $$\zeta$$ jointly with the other parameters.

# without the consideration of price jumps
RealizedEst_Option(RV, NV) # homogeneous error
RealizedEst_Option(RV, NV, homogeneous=FALSE ) # heterogeneous error
# with the consideration of price jumps
RealizedEst_Option(RV, JV, NV) # homogeneous error
RealizedEst_Option(RV, JV, NV, homogeneous=FALSE) # heterogeneous error

# References

Ait-Sahalia, Yacine, and Jialin Yu. 2009. “High Frequency Market Microstructure Noise Estimates and Liquidity Measures.” Annals of Applied Statistics 3 (1): 422–57.

Barndorff-Nielsen, Ole E, Peter Reinhard Hansen, Asger Lunde, and Neil Shephard. 2008. “Designing Realized Kernels to Measure the Ex Post Variation of Equity Prices in the Presence of Noise.” Econometrica 76 (6): 1481–1536.

Black, Fischer, and Myron Scholes. 1973. “The Pricing of Options and Corporate Liabilities.” Journal of Political Economy 81 (3): 637–54.

Christensen, Kim, Silja Kinnebrock, and Mark Podolskij. 2010. “Pre-Averaging Estimators of the Ex-Post Covariance Matrix in Noisy Diffusion Models with Non-Synchronous Data.” Journal of Econometrics 159 (1): 116–33.

Corsi, Fulvio. 2009. “A Simple Approximate Long-Memory Model of Realized Volatility.” Journal of Financial Econometrics 7 (2): 174–96.

Hansen, Peter Reinhard, Zhuo Huang, and Howard Howan Shek. 2012. “Realized GARCH: A Joint Model for Returns and Realized Measures of Volatility.” Journal of Applied Econometrics 27 (6): 877–906.

Jacod, Jean, Yingying Li, Per A Mykland, Mark Podolskij, and Mathias Vetter. 2009. “Microstructure Noise in the Continuous Case: The Pre-Averaging Approach.” Stochastic Processes and Their Applications 119 (7): 2249–76.

Kim, Donggyu, and Yazhen Wang. 2016. “Unified Discrete-Time and Continuous-Time Models and Statistical Inferences for Merged Low-Frequency and High-Frequency Financial Data.” Journal of Econometrics 194: 220–30.

Shephard, Neil, and Kevin Sheppard. 2010. “Realising the Future: Forecasting with High-Frequency-Based Volatility (Heavy) Models.” Journal of Applied Econometrics 25 (2): 197–231.

Song, Xinyu, Donggyu Kim, Huiling Yuan, Xiangyu Cui, Zhiping Lu, Yong Zhou, and Yazhen Wang. 2020. “Volatility Analysis with Realized GARCH-Itô Models.” Journal of Econometrics in press.

Zhang, Lan. 2006. “Efficient Estimation of Stochastic Volatility Using Noisy Observations: A Multi-Scale Approach.” Bernoulli 12 (6): 1019–43.