*dfms* provides a user friendly and computationally efficient
approach to estimate linear Gaussian Dynamic Factor Models in R. The
package is not geared at any specific application, and can be used for
dimensionality reduction, forecasting and nowcasting systems of time
series. The use of the package is facilitated by a comprehensive set of
methods to explore/plot models and extract results.

```
library(dfms)
library(xts)
```

This vignette walks through the main features of the package. The
data provided in the package in *xts* format is taken from
Banbura and Modugno (2014)^{1}, henceforth BM14, and covers the Euro Area
from January 1980 through September 2009.

```
# Using the monthly series from BM14
dim(BM14_M)
#> [1] 357 92
range(index(BM14_M))
#> [1] "1980-01-31" "2009-09-30"
head(colnames(BM14_M))
#> [1] "ip_total" "ip_tot_cstr" "ip_tot_cstr_en" "ip_constr"
#> [5] "ip_im_goods" "ip_capital"
plot(scale(BM14_M), lwd = 1)
```

The data frame `BM14_Models`

provides information about
the series^{2}, and the various models estimated by
BM14.

```
head(BM14_Models, 3)
#> series
#> 1 ip_total
#> 2 ip_tot_cstr
#> 3 ip_tot_cstr_en
#> label
#> 1 IP-Total Industry - Working Day and Seasonally Adjusted
#> 2 IP-Total Industry (Excluding Construction) - Working Day and Seasonally Adjusted
#> 3 IP-Total Industry Excluding Construction and MIG Energy - Working Day and Seasonally Adjusted
#> code freq log_trans small medium large
#> 1 sts.m.i5.Y.PROD.NS0010.4.000 M TRUE FALSE FALSE TRUE
#> 2 sts.m.i5.Y.PROD.NS0020.4.000 M TRUE TRUE TRUE TRUE
#> 3 sts.m.i5.Y.PROD.NS0021.4.000 M TRUE FALSE FALSE TRUE
# Using only monthly data
<- subset(BM14_Models, freq == "M") BM14_Models_M
```

Prior to estimation, all data is differenced by BM14, and some series
are log, differenced, as indicated by the `log_trans`

column
in `BM14_Models`

. In general, *dfms* uses a stationary
Kalman Filter with time-invariant system matrices, and therefore expects
data to be stationary. Data is also scaled and centered^{3} in the main
`DFM()`

function, thus this does not need to be done by the
user.

```
library(magrittr)
# log-transforming and first-differencing the data
$log_trans] %<>% log()
BM14_M[, BM14_Models_M= diff(BM14_M)
BM14_M_diff plot(scale(BM14_M_diff), lwd = 1)
```

Before estimating a model, the `ICr()`

function can be
applied to determine the number of factors. It computes 3 information
criteria proposed in Bai and NG (2002)^{4}, whereby the second
criteria generally suggests the most parsimonious model.

```
= ICr(BM14_M_diff)
ic #> Missing values detected: imputing data with tsnarmimp() with default settings
print(ic)
#> Optimal Number of Factors (r) from Bai and Ng (2002) Criteria
#>
#> IC1 IC2 IC3
#> 7 7 13
plot(ic)
```

Another option is to use a Screeplot to gauge the number of factors
by looking for a kink in the plot. A mathematical procedure for finding
the kink was suggested by Onatski (2010)^{5}, but this is currently
not implemented in `ICr()`

.

`screeplot(ic)`

Based on both the information criteria and the Screeplot, I gauge
that a model with 4 factors should be estimated, as factors, 5, 6 and 7
do not add much to the explanatory power of the model. Next to the
number of factors, the lag order of the factor-VAR of the transition
equation should be estimated (the default is 1 lag). This can be done
using the `VARselect()`

function from the *vars*
package, with PCA factor estimates reported by `ICr()`

.

```
# Using vars::VARselect() with 4 principal components to estimate the VAR lag order
::VARselect(ic$F_pca[, 1:4])
vars#> $selection
#> AIC(n) HQ(n) SC(n) FPE(n)
#> 6 3 3 6
#>
#> $criteria
#> 1 2 3 4 5 6
#> AIC(n) 5.810223 5.617282 5.427760 5.389413 5.407765 5.381829
#> HQ(n) 5.898758 5.776646 5.657953 5.690434 5.779614 5.824507
#> SC(n) 6.032560 6.017490 6.005838 6.145361 6.341582 6.493517
#> FPE(n) 333.696100 275.153456 227.671078 219.144228 223.265640 217.639133
#> 7 8 9 10
#> AIC(n) 5.409877 5.394900 5.421375 5.460761
#> HQ(n) 5.923383 5.979235 6.076538 6.186753
#> SC(n) 6.699434 6.862328 7.066673 7.283929
#> FPE(n) 223.956824 220.793226 226.933863 236.331677
```

The selection thus suggests we should estimate a factor model with
`r = 4`

factors and `p = 3`

lags^{6}. Before estimating the
model I note that *dfms* does not deal with seasonality in
series, thus it is recommended to also seasonally adjust data,
e.g.Â using the *seasonal* package before estimation. BM14 only
use seasonally adjusted series, thus this is not necessary with the
example data provided.

Estimation can then simply be done using the `DFM()`

function with parameters `r`

and `p`

^{7}.

```
# Estimating the model with 4 factors and 3 lags using BM14's EM algorithm
= DFM(BM14_M_diff, r = 4, p = 3)
model1 #> Converged after 26 iterations.
print(model1)
#> Dynamic Factor Model: n = 92, T = 356, r = 4, p = 3, %NA = 25.8366
#>
#> Factor Transition Matrix [A]
#> L1.f1 L1.f2 L1.f3 L1.f4 L2.f1 L2.f2 L2.f3 L2.f4 L3.f1
#> f1 0.4720 -0.1297 0.8460 0.2098 -0.0733 -0.1436 -0.0595 0.1565 0.2356
#> f2 -0.1612 0.1699 0.2389 0.1598 0.0641 -0.1341 -0.0542 0.1287 0.1336
#> f3 0.3965 0.3264 0.0213 -0.3033 -0.1542 -0.0467 -0.1484 -0.0150 -0.1172
#> f4 0.1096 0.1601 -0.1578 0.2485 -0.0365 -0.0563 -0.0230 -0.1117 -0.0719
#> L3.f2 L3.f3 L3.f4
#> f1 -0.0803 -0.0386 0.0408
#> f2 0.1347 -0.0024 -0.0342
#> f3 -0.0087 0.1767 0.0249
#> f4 0.0307 0.0662 -0.0035
plot(model1)
```

The model can be investigated using `summary()`

, which
returns an object of class â€˜dfm_summaryâ€™ containing the system matrices
and summary statistics of the factors and the residuals in the
measurement equation, as well as the R-Squared of the factor model for
individual series. The print method automatically adjusts the amount of
information printed to the data size. For large databases with more than
40 series, no series-level statistics are printed.

```
<- summary(model1)
dfm_summary print(dfm_summary) # Large model with > 40 series: defaults to compact = 2
#> Dynamic Factor Model: n = 92, T = 356, r = 4, p = 3, %NA = 25.8366
#>
#> Call: DFM(X = BM14_M_diff, r = 4, p = 3)
#>
#> Summary Statistics of Factors [F]
#> N Mean Median SD Min Max
#> f1 356 -0.0448 0.3455 4.4505 -21.9265 11.0306
#> f2 356 -0.0319 -0.0892 2.68 -9.9549 7.4988
#> f3 356 -0.1032 -0.0593 3.2891 -12.0969 16.2455
#> f4 356 -0.0118 0.089 2.161 -8.2883 10.7219
#>
#> Factor Transition Matrix [A]
#> L1.f1 L1.f2 L1.f3 L1.f4 L2.f1 L2.f2 L2.f3 L2.f4 L3.f1
#> f1 0.4720 -0.1297 0.84605 0.2098 -0.07334 -0.14356 -0.05950 0.15645 0.2356
#> f2 -0.1612 0.1699 0.23889 0.1598 0.06406 -0.13413 -0.05415 0.12869 0.1336
#> f3 0.3965 0.3264 0.02128 -0.3033 -0.15424 -0.04669 -0.14839 -0.01495 -0.1172
#> f4 0.1096 0.1601 -0.15776 0.2485 -0.03655 -0.05626 -0.02304 -0.11169 -0.0719
#> L3.f2 L3.f3 L3.f4
#> f1 -0.080320 -0.038592 0.040812
#> f2 0.134692 -0.002391 -0.034215
#> f3 -0.008694 0.176663 0.024876
#> f4 0.030716 0.066201 -0.003465
#>
#> Factor Covariance Matrix [cov(F)]
#> f1 f2 f3 f4
#> f1 19.8067 2.0846* -3.4700* -2.1094*
#> f2 2.0846* 7.1822 -2.8725* -1.0631*
#> f3 -3.4700* -2.8725* 10.8182 1.9286*
#> f4 -2.1094* -1.0631* 1.9286* 4.6701
#>
#> Factor Transition Error Covariance Matrix [Q]
#> u1 u2 u3 u4
#> u1 9.0178 0.3303 -3.0764 -1.0182
#> u2 0.3303 5.4425 -1.3095 -0.5051
#> u3 -3.0764 -1.3095 7.0230 0.8639
#> u4 -1.0182 -0.5051 0.8639 3.8005
#>
#> Summary of Residual AR(1) Serial Correlations
#> N Mean Median SD Min Max
#> 92 -0.0409 -0.0782 0.2959 -0.5073 0.6858
#>
#> Summary of Individual R-Squared's
#> N Mean Median SD Min Max
#> 92 0.3712 0.299 0.2888 0.0067 0.9978
# Can request more detailed printouts
# print(dfm_summary, compact = 1)
# print(dfm_summary, compact = 0)
```

Apart from the model summary, the *dfm* methods
`residuals()`

and `fitted()`

return observation
residuals and fitted values from the model. The default format is a
plain matrix, but the functions also have an argument to return data in
the original (input) format.

`plot(resid(model1, orig.format = TRUE))`