Getting Started with NNS: Forecasting

Fred Viole

library(NNS)
library(data.table)
require(knitr)
require(rgl)
require(meboot)

Forecasting

The underlying assumptions of traditional autoregressive models are well known. The resulting complexity with these models leads to observations such as,

``We have found that choosing the wrong model or parameters can often yield poor results, and it is unlikely that even experienced analysts can choose the correct model and parameters efficiently given this array of choices.’’

NNS simplifies the forecasting process. Below are some examples demonstrating NNS.ARMA and its assumption free, minimal parameter forecasting method.

Linear Regression

NNS.ARMA has the ability to fit a linear regression to the relevant component series, yielding very fast results. For our running example we will use the AirPassengers dataset loaded in base R.

We will forecast 44 periods h = 44 of AirPassengers using the first 100 observations training.set = 100, returning estimates of the final 44 observations. We will then test this against our validation set of tail(AirPassengers,44).

Since this is monthly data, we will try a seasonal.factor = 12.

Below is the linear fit and associated root mean squared error (RMSE) using method = "lin".

nns_lin = NNS.ARMA(AirPassengers, 
               h = 44, 
               training.set = 100, 
               method = "lin", 
               plot = TRUE, 
               seasonal.factor = 12, 
               seasonal.plot = FALSE)

sqrt(mean((nns_lin - tail(AirPassengers, 44)) ^ 2))
## [1] 35.39965

Nonlinear Regression

Now we can try using a nonlinear regression on the relevant component series using method = "nonlin".

nns_nonlin = NNS.ARMA(AirPassengers, 
               h = 44, 
               training.set = 100, 
               method = "nonlin", 
               plot = FALSE, 
               seasonal.factor = 12, 
               seasonal.plot = FALSE)

sqrt(mean((nns_nonlin - tail(AirPassengers, 44)) ^ 2))
[1] 18.77889

Cross-Validation

We can test a series of seasonal.factors and select the best one to fit. The largest period to consider would be 0.5 * length(variable), since we need more than 2 points for a regression! Remember, we are testing the first 100 observations of AirPassengers, not the full 144 observations.

seas = t(sapply(1 : 25, function(i) c(i, sqrt( mean( (NNS.ARMA(AirPassengers, h = 44, training.set = 100, method = "lin", seasonal.factor = i, plot=FALSE) - tail(AirPassengers, 44)) ^ 2) ) ) ) )

colnames(seas) = c("Period", "RMSE")
seas
##       Period      RMSE
##  [1,]      1  75.67783
##  [2,]      2  75.71250
##  [3,]      3  75.87604
##  [4,]      4  75.16563
##  [5,]      5  76.07418
##  [6,]      6  70.43185
##  [7,]      7  77.98493
##  [8,]      8  75.48997
##  [9,]      9  79.16378
## [10,]     10  81.47260
## [11,]     11 106.56886
## [12,]     12  35.39965
## [13,]     13  90.98265
## [14,]     14  95.64979
## [15,]     15  82.05345
## [16,]     16  74.63052
## [17,]     17  87.54036
## [18,]     18  74.90881
## [19,]     19  96.96011
## [20,]     20  88.75015
## [21,]     21 100.21346
## [22,]     22 108.68674
## [23,]     23  85.06430
## [24,]     24  35.49018
## [25,]     25  75.16192

Now we know seasonal.factor = 12 is our best fit, we can see if there’s any benefit from using a nonlinear regression. Alternatively, we can define our best fit as the corresponding seas$Period entry of the minimum value in our seas$RMSE column.

a = seas[which.min(seas[ , 2]), 1]

Below you will notice the use of seasonal.factor = a generates the same output.

nns = NNS.ARMA(AirPassengers, 
               h = 44, 
               training.set = 100, 
               method = "nonlin", 
               seasonal.factor = a, 
               plot = TRUE, seasonal.plot = FALSE)

sqrt(mean((nns - tail(AirPassengers, 44)) ^ 2))
## [1] 18.77889

Note: You may experience instances with monthly data that report seasonal.factor close to multiples of 3, 4, 6 or 12. For instance, if the reported seasonal.factor = {37, 47, 71, 73} use (seasonal.factor = c(36, 48, 72)) by setting the modulo parameter in NNS.seas(..., modulo = 12). The same suggestion holds for daily data and multiples of 7, or any other time series with logically inferred cyclical patterns. The nearest periods to that modulo will be in the expanded output.

NNS.seas(AirPassengers, modulo = 12, plot = FALSE)
## $all.periods
##    Period Coefficient.of.Variation Variable.Coefficient.of.Variation
## 1:     48                0.4002249                         0.4279947
## 2:     12                0.4059923                         0.4279947
## 3:     60                0.4279947                         0.4279947
## 4:     36                0.4279947                         0.4279947
## 5:     24                0.4279947                         0.4279947
## 
## $best.period
## Period 
##     48 
## 
## $periods
## [1] 48 12 60 36 24

Cross-Validating All Combinations of seasonal.factor

NNS also offers a wrapper function NNS.ARMA.optim() to test a given vector of seasonal.factor and returns the optimized objective function (in this case RMSE written as obj.fn = expression( sqrt(mean((predicted - actual)^2)) )) and the corresponding periods, as well as the NNS.ARMA regression method used. Alternatively, using external package objective functions work as well such as obj.fn = expression(Metrics::rmse(actual, predicted)).

NNS.ARMA.optim() will also test whether to regress the underlying data first, shrink the estimates to their subset mean values, include a bias.shift based on its internal validation errors, and compare different weights of both linear and nonlinear estimates.

Given our monthly dataset, we will try multiple years by setting seasonal.factor = seq(12, 60, 6) every 6 months based on our NNS.seas() insights above.

nns.optimal = NNS.ARMA.optim(AirPassengers, 
                             training.set = 100, 
                             seasonal.factor = seq(12, 60, 6),
                             obj.fn = expression( sqrt(mean((predicted - actual)^2)) ),
                             objective = "min",
                             pred.int = .95, plot = TRUE)

nns.optimal
[1] "CURRNET METHOD: lin"
[1] "COPY LATEST PARAMETERS DIRECTLY FOR NNS.ARMA() IF ERROR:"
[1] "NNS.ARMA(... method =  'lin' , seasonal.factor =  c( 12 ) ...)"
[1] "CURRENT lin OBJECTIVE FUNCTION = 35.3996540135277"
[1] "BEST method = 'lin', seasonal.factor = c( 12 )"
[1] "BEST lin OBJECTIVE FUNCTION = 35.3996540135277"
[1] "CURRNET METHOD: nonlin"
[1] "COPY LATEST PARAMETERS DIRECTLY FOR NNS.ARMA() IF ERROR:"
[1] "NNS.ARMA(... method =  'nonlin' , seasonal.factor =  c( 12 ) ...)"
[1] "CURRENT nonlin OBJECTIVE FUNCTION = 18.7788946994645"
[1] "BEST method = 'nonlin' PATH MEMBER = c( 12 )"
[1] "BEST nonlin OBJECTIVE FUNCTION = 18.7788946994645"
[1] "CURRNET METHOD: both"
[1] "COPY LATEST PARAMETERS DIRECTLY FOR NNS.ARMA() IF ERROR:"
[1] "NNS.ARMA(... method =  'both' , seasonal.factor =  c( 12 ) ...)"
[1] "CURRENT both OBJECTIVE FUNCTION = 26.9404229568573"
[1] "BEST method = 'both' PATH MEMBER = c( 12 )"
[1] "BEST both OBJECTIVE FUNCTION = 26.9404229568573"


$periods
[1] 12

$weights
NULL

$obj.fn
[1] 18.77889

$method
[1] "nonlin"

$shrink
[1] FALSE

$nns.regress
[1] FALSE

$bias.shift
[1] 0

$errors
 [1] -14.47605733 -21.86813452 -19.68680481 -32.74950538 -23.55099666 -17.40139233 -13.70469091  -6.24837561  -2.48949314   2.13237206  16.92258707  23.90974592   4.53920624
[14]  -2.80578226  -8.66505386 -35.95512814   6.15047733  -3.05392579   4.47214630  18.85551669   2.32637841   0.08983381  -1.04028830   2.89582233 -24.22859693  -6.13584252
[27] -27.38320978 -53.70280267 -21.86253259 -23.90679863 -23.77369245 -22.36041401 -29.22056474 -26.31507238  12.93035387 -34.59586321 -47.79031322 -34.85559482 -62.85716413
[40] -64.07451099 -35.67072562 -50.91910417 -27.91240080 -22.72623962

$results
 [1] 338.5400 405.6487 448.7971 437.6360 381.8544 325.4751 288.1899 326.6740 337.1004 319.5942 381.6468 371.7323 373.1716 452.1286 498.6929 485.0008 422.3997 357.9051
[19] 318.5071 360.0295 371.1142 350.8923 419.5388 408.2967 408.8936 499.6842 549.5283 533.5434 464.0692 391.2918 349.2869 394.1562 404.3549 381.6612 456.5031 443.9546
[37] 443.9982 546.2672 599.0085 581.0612 504.6510 423.7409 379.3403 427.2016

$lower.pred.int
 [1] 287.2344 354.3431 397.4915 386.3304 330.5488 274.1695 236.8843 275.3684 285.7949 268.2886 330.3413 320.4267 321.8661 400.8231 447.3873 433.6952 371.0941 306.5995
[19] 267.2016 308.7239 319.8086 299.5867 368.2332 356.9912 357.5880 448.3786 498.2227 482.2379 412.7636 339.9862 297.9813 342.8507 353.0493 330.3556 405.1975 392.6490
[37] 392.6926 494.9616 547.7030 529.7556 453.3454 372.4353 328.0347 375.8961

$upper.pred.int
 [1] 368.1156 435.2243 478.3727 467.2115 411.4300 355.0506 317.7655 356.2495 366.6760 349.1697 411.2224 401.3078 402.7472 481.7042 528.2685 514.5763 451.9753 387.4807
[19] 348.0827 389.6050 400.6897 380.4679 449.1144 437.8723 438.4692 529.2597 579.1039 563.1190 493.6448 420.8674 378.8625 423.7318 433.9305 411.2367 486.0786 473.5301
[37] 473.5738 575.8427 628.5841 610.6368 534.2265 453.3165 408.9159 456.7772

Extension of Estimates

We can forecast another 50 periods out-of-sample (h = 50), by dropping the training.set parameter while generating the 95% prediction intervals.

NNS.ARMA.optim(AirPassengers, 
                seasonal.factor = seq(12, 60, 6),
                obj.fn = expression( sqrt(mean((predicted - actual)^2)) ),
                objective = "min",
                pred.int = .95, h = 50, plot = TRUE)

Brief Notes on Other Parameters

We included the ability to use any number of specified seasonal periods simultaneously, weighted by their strength of seasonality. Computationally expensive when used with nonlinear regressions and large numbers of relevant periods.

Instead of weighting by the seasonal.factor strength of seasonality, we offer the ability to weight each per any defined compatible vector summing to 1.
Equal weighting would be weights = "equal".

Provides the values for the specified prediction intervals within [0,1] for each forecasted point and plots the bootstrapped replicates for the forecasted points.

We also included the ability to use all detected seasonal periods simultaneously, weighted by their strength of seasonality. Computationally expensive when used with nonlinear regressions and large numbers of relevant periods.

This parameter restricts the number of detected seasonal periods to use, again, weighted by their strength. To be used in conjunction with seasonal.factor = FALSE.

To be used in conjunction with seasonal.factor = FALSE. This parameter will ensure logical seasonal patterns (i.e., modulo = 7 for daily data) are included along with the results.

To be used in conjunction with seasonal.factor = FALSE & modulo != NULL. This parameter will ensure empirical patterns are kept along with the logical seasonal patterns.

This setting generates a new seasonal period(s) using the estimated values as continuations of the variable, either with or without a training.set. Also computationally expensive due to the recalculation of seasonal periods for each estimated value.

These are the plotting arguments, easily enabled or disabled with TRUE or FALSE. seasonal.plot = TRUE will not plot without plot = TRUE. If a seasonal analysis is all that is desired, NNS.seas is the function specifically suited for that task.

Multivariate Time Series Forecasting

The extension to a generalized multivariate instance is provided in the following documentation of the NNS.VAR() function:

References

If the user is so motivated, detailed arguments and proofs are provided within the following: