Modeling, Forecasting and Simulating Commodity Prices through Term Structure Estimation using Kalman Filtering: The R Package ‘NFCP’


# Set seed for reproducibility:

1. Introduction

‘NFCP’ (N-Factor Commodity Pricing) provides a framework for the modeling, parameter estimation, probabilistic forecasting, European and American option valuation and simulation of commodity prices through state space and Monte Carlo methods, risk-neutral valuation and Kalman filtering. Commodity pricing models are (systems of) stochastic differential equations that are utilized for the valuation and hedging of commodity contingent claims (i.e. derivative products on the commodity) and other commodity related investments. Parameters of commodity pricing models are estimated through MLE, using available term structure futures data of a commodity. Commodity pricing models that capture market dynamics are of great importance to commodity market participants in order to exercise sound investment and risk-management strategies. The N-factor commodity pricing model framework was first presented in the work of Cortazar and Naranjo (2006) titled “An N-Factor Gaussian Model of Oil Futures Prices”.

The purpose of this vignette is to provide worked examples of the estimation and analysis of commodity pricing models using the N-factor framework of the ‘NFCP’ package. Examples presented within this vignette, as well as throughout the documentation of the ‘NFCP’ package, replicate the prolific work of Schwartz and Smith (2000) titled: “Short-Term Variations and Long-Term Dynamics in Commodity Prices”. The two-factor model presented within this study is replicated, with the primary figures and results replicated as well as discussion on how to interpret and analyze N-factor models. The remainder of this vignette is structured as follows. Section 2 provides a background of modeling commodity prices through term structure estimation. Section 3 introduces the N-factor stochastic modeling framework. Section 4 provides worked examples on the estimation and analysis of commodity pricing models, using the crude oil data first presented within the work of Schwartz and Smith (2000), replicating many of the results of this study. Section 5 provides a discussion of the N-factor commodity pricing model, the ‘NFCP’ package and concludes.

2. Term Structure Estimation

Modeling commodity prices is typically performed through term structure estimation, as futures markets provide valuable information regarding expectations of future price movements and other dynamics. Parameter estimation of commodity pricing models involves expressing the futures curve in terms of unobserved factors and deriving analytic expressions for futures prices under no-arbitrage conditions. Term structure estimation of commodity pricing models was popularized within the works of Gibson and Schwartz (1990), Schwartz (1997, 1998) and Schwartz and Smith (2000). These commodity pricing models make differing assumptions regarding the innovation, mean-reverting behavior, consideration of the cost-of-carry, and the number of correlated factors to explain the term structure of commodities. These commodity pricing models (or their affine transformation equivalents, see Cortazar and Naranjo (2006)) were subsequently unified under the N-factor framework of Cortazar and Naranjo (2006), expressing the log of the commodity spot price processes as the sum of N correlated factors. ‘NFCP’ considers commodity pricing models under this N-factor framework.

Kalman filtering is the process of obtaining unreliable, but measurable, observations and estimating a time series of unobservable measurements in return, with an allowable degree of measurement error. Kalman filtering is used to estimate parameters of commodity pricing models through MLE (Gibson and Schwartz (1990), Schwartz (1997, 1998), Schwartz and Smith (2000), Cortazar and Naranjo (2006), Aspinall et al. (2020). Parameter estimation of N-factor models features many unknown parameters, a large number of observations and an objective function (the log-likelihood) that is discontinuous and nonlinear, with multiple local optima. The complexity of parameter estimation increases substantially according to the number of factors in the model. The number of unknown parameters of an N-factor model (without considering the measurement error of futures contracts) is equal to: \(0.5 (N^2 + 5N)\). MLE is performed using genetic algorithms through the ‘genoud’ function of the ‘rgenoud’ package to maximize the likelihood of obtaining MLE values.

3. The N-Factor Model

The N-factor commodity pricing model was first presented in the work of Cortazar and Naranjo (2006, equations 1-3). Following the work of Cortazar and Naranjo (2006), the N-factor framework describes the spot price process of a commodity as the correlated sum of \(N\) state variables \(x_{t}\).

\[log(S_{t}) = \begin{cases} \sum_{i=1}^N x_{i,t} &\text{GBM = T} \\ E+\sum_{i=1}^N x_{i,t} &\text{GBM = F} \\ \end{cases} \]

where state variables (i.e. factors of the spot price) follow correlated mean-reverting (i.e. Ornstein-Uhlenbeck) processes, with the first state variable further able to be considered to follow a random walk (i.e. Brownian motion process) to induce a unit root in the spot price process when logical argument \(GBM\) is true. The valuation of derivative products and commodity-related investments requires the “risk-neutral” version of the model. Here (and below) asterisks are used to denote the risk-neutral versions of variables and processes rather than the “true” process. Under the assumption of risk-neutrality the state variables follow the process:

\[dx_{1,t} = \begin{cases} \mu^*dt + \sigma_{1} dw_{1}t &\text{GBM = T} \\ - (\lambda_{1} + \kappa_{1}x_{1,t})dt + \sigma_{1} dw_{1}t &\text{GBM = F} \\ \end{cases} \]

\[dx_{i,t} =_{i\neq 1} - (\lambda_{i} + \kappa_{i}x_{i,t})dt + \sigma_{i} dw_{i}t\]

Where: \[E(w_{i})E(w_{j}) = \rho_{i,j}\]

The following constants are defined as:

\(\mu\): long-term growth rate

\(E\): Constant equilibrium level

\(\kappa_{i}\): Reversion rate of state variable \(i\)

\(\sigma_{i}\): the instantaneous volatility of state variable \(i\)

\(\rho_{i,j} \in [-1,1]\): the instantaneous correlation between state variables \(i\) and \(j\)

The risk-neutral variables are defined as:

\(\mu^*=\mu-\lambda_1\): Long-term risk-neutral growth rate

\(\lambda_{i}\): Risk premium of state variable \(i\)

Under the assumption of risk-neutrality, state variables with mean-reverting behaviour revert towards the equilibrium value \(- \frac{\lambda_{i}}{\kappa_{i}}\) at a rate of \(\kappa_{i}\) with a respective half-life of \(\frac{log(2)}{\kappa_{i}}\).

Under the N-factor model, the state variables under the risk-neutral process are jointly distributed with mean vector and covariance matrix:

\[E^*[x_i(t)] = \begin{cases} \ x_i(0) + \mu^* t &i=1 \text{ and GBM=T} \\ \ x_i(0)e^{-(\kappa_i t)} - \frac{(1-e^{-\kappa_it})\lambda_i}{\kappa_i} &\text{otherwise} \\ \end{cases}\] and:

\[Cov^*_{i,j}(x_{i,t},x_{j,t}) = Cov_{i,j}(x_{i,t},x_{j,t}) = \begin{cases} \sigma_1^2(t) &i=1,j=1 \text{ and GBM = T} \\ \sigma_i\sigma_j\rho_{i,j}\frac{1-e^{-(\kappa_i+\kappa_j)t}}{\kappa_i+\kappa_j} &\text{otherwise} \\ \end{cases} \]

filtering the N-factor model through the Kalman filter is described in ‘help(NFCP_Kalman_filter)’.

4. Using the ‘NFCP’ package:

The following section details how to replicate major findings, plots and tables from the prolific work of Schwartz and Smith (2000). The Schwartz and Smith (2000) two-factor model is one of the most popular commodity pricing models. Schwartz and Smith (2000) presented a two-factor model, where the first model followed a Brownian motion, and applied it to WTI Crude Oil data from 1990 - 1995. The approximate crude oil data, as well as estimated parameters and modeling assumptions, used within the original work of Schwartz and Smith (2000) is included within ‘NFCP’ within the ‘SS_oil’ list object. See help(SS_oil) for more details.

The parameters of the Schwartz and Smith two-factor model are:

model_parameters_2F <- NFCP_parameters(N_factors = 2,
                                      GBM = TRUE,
                                      initial_states = FALSE,
                                      N_ME = 5)
#> ----------------------------------------------------------------
#> 2 Factors model: first factor is a GBM
#> Risk Neutral SDE: 
#> log(S_t) = sum(x_t)
#> Where: 
#> d x1_t    = mu_rn * dt  + sigma_1 * dW_1
#> d x2_t    = - (lambda_2 + kappa_2 * x2_t) * dt  + sigma_2 * dW_2
#>  And: 
#> E(dW_1 * dW_2) = rho_1_2 * dt
## Print the vector of parameters of the specified commodity pricing model:
#>  [1] "mu"       "mu_rn"    "lambda_2" "kappa_2"  "sigma_1"  "sigma_2" 
#>  [7] "rho_1_2"  "ME_1"     "ME_2"     "ME_3"     "ME_4"     "ME_5"

The Schwartz and Smith two-factor model assumed that five futures contracts were observed at each time period, and that the measurement error of these observations were independent and unique. There were therefore five unknown measurement error parameters. Specying models with differing numbers of measurement error parameters is discussed in greater detail in section 4.3.

4.1 Stitching Futures Data

Traditional implementations of the Kalman filter to term structure data stitches contracts together according to their ordering by maturity. This results in a dataset where “for all given dates in the estimation sample, prices for the same set of contracts (with the same maturities) (are observed)” (Cortazar and Naranjo, 2006). Contracts have been stitched within existing literature in order to attain a dataset that has no missing observations and a time homogeneous time-to-maturity of observations - resulting in the Kalman filter quickly approaching a steady state distribution. Contracts are rolled over as the closest contract expires. Aggregating futures data creates information loss due to discrepancies between the actual time-to-maturity of observations and the assumed, time homogeneous time-to-maturity. Information loss also occurs as not all available contracts at a given point in time are observed within the model. Studies that have developed commodity pricing models on stitched data include: Longstaff and Schwartz (1990), Schwartz (1997), Schwartz and Smith (2000), Aspinall et al., (2020).

The approximate stitched data used to estimate commodity pricing models in the work of Schwartz and Smith (2000) is provided within the SS_oil list object. Stitching futures contracts to obtain a complete aggregate panel dataset can be developed through the ‘stitch_contracts’ function.

###Method 1 - Stitch Crude Oil Contracts according to maturity matching:
SS_oil_stitched <- stitch_contracts(futures = SS_oil$contracts,
futures_TTM = c(1, 5, 9, 13, 17)/12, maturity_matrix = SS_oil$contract_maturities,
rollover_frequency = 1/12, verbose = TRUE)

##Plot the Stitched Maturities:
matplot(as.Date(rownames(SS_oil_stitched$maturities)), SS_oil_stitched$maturities, 
        type = 'l', main = "Stitched Contract Maturities", 
        ylab = "Time To Maturity (Years)", xlab = "Date", col = 1)

The plot above is highly similar to Figure 1. of Schwartz (1997), although for a different observation period and different fixed maturities of stitched contracts.

Rather than stitching futures contracts to obtain a complete panel dataset, several other studies have instead used incomplete panel data of all available futures contracts to ensure there is no information loss during MLE (Sørensen, 2002; Cortazar and Schwartz, 2003; Cortazar and Naranjo, 2006). For the Schwartz and Smith (2000) oil contract data, there are an average of 21 available contracts at each observation. Both complete and incomplete panel data is compatible throughout the ‘NFCP’ package.

Whilst stitching data does result in information loss, the increase in relative algorithm efficiency compared to contract data can make this data structure appealing, particularly for estimating commodity pricing models with several state variables. Stitched futures data can also consider the white noise of each stitched contract during parameter estimation, which can provide a greater fit to the commodity pricing model. A commodity pricing model that considers all available contracts at a given point in time instead takes the assumption that the white noise of each futures contract is independent and identical (as was assumed in the work of Cortazar and Naranjo (2006)).

4.2 Filtering commodity pricing models:

Original model parameters estimated by Schwartz and Smith (2000) are available within the ‘SS_oil’ list object as object ‘two_factor’. Named vectors of parameters such as the one shown below are used throughout the ‘NFCP’ package for filtering, forecasting and simulating particular estimated models.

#>       mu    mu_rn lambda_2  kappa_2  sigma_1  sigma_2  rho_1_2     ME_1 
#>  -0.0125   0.0115   0.1570   1.4900   0.1450   0.2860   0.3000   0.0420 
#>     ME_2     ME_3     ME_4     ME_5 
#>   0.0060   0.0030   0.0000   0.0040

4.2.1 Complete, Stitched Data:

Stitched data is classifed as ‘complete’ due to the absence of missing values (i.e., NA’s) in the observed futures prices. Estimated parameters of N-factor models can be filtered using the Kalman filter through the ‘NFCP_Kalman_filter’ function to obtain filtered values of the factors, as well as other measures of model fit and robustness.

##Example 1 - Replicating the Schwartz and Smith (2000)
##Two-Factor Crude Oil commodity pricing model:

SS_oil_2F <- NFCP_Kalman_filter(
 parameter_values = SS_oil$two_factor,
 parameter_names = names(SS_oil$two_factor),
 log_futures = log(SS_oil$stitched_futures),
 futures_TTM = SS_oil$stitched_TTM,
 dt = SS_oil$dt,
 verbose = TRUE,
 debugging = TRUE)

It’s important to note that when ‘verbose = TRUE’, the Kalman filter iteration used by the ‘NFCP_Kalman_filter’ function does not use sequential processing and is a slower Kalman filter process not suited for parameter estimation.

4.2.2 All Available Contracts, Incomplete Data:

The Kalman filter can be applied to all available futures contracts by passing the contract data and time to maturity of contracts at each observation into the ‘NFCP_Kalman_filter’ function.

### Assume a constant measurement error in parameters of 1%:
SS_oil_2F_parameters <- SS_oil$two_factor
SS_oil_2F_parameters <- SS_oil_2F_parameters[!grepl("ME", names(SS_oil_2F_parameters))]
SS_oil_2F_parameters["ME_1"] <- 0.01

SS_oil_2F_contracts <- NFCP_Kalman_filter(
 parameter_values = SS_oil_2F_parameters,
 parameter_names = names(SS_oil_2F_parameters),
 log_futures = log(SS_oil$contracts),
 futures_TTM = SS_oil$contract_maturities,
 dt = SS_oil$dt,
 verbose = TRUE,
 debugging = TRUE)

4.3 Measurement Error of N-Factor Commodity Pricing Models:

The measurement error of a commodity pricing model is a highly important characteristic to consider, as it can provide insights into how well a given commodity pricing model can explain observed futures prices. In the work of Schwartz and Smith (2000), the assumption was made that the measurement error of observations was independent and unique (see below):

#>  ME_1  ME_2  ME_3  ME_4  ME_5 
#> 0.042 0.006 0.003 0.000 0.004

This provided insights into the model, such as it has the highest fit to medium-term futures contracts, and that there is a large difference in the ability for the model to explain different sections of the term structure of oil. This assumption is no longer possible when considering the incomplete contract data, as the number of observed futures contracts in the incomplete dataset is 82, which would require 82 separate unknown parameters. Traditionally the assumption for incomplete contract data is that the measurement error of futures contracts are independent and identical (i.e., there is only one measurement error).

The ‘NFCP’ package allows for multiple measurement errors to be considered in the Kalman filter by grouping observations by their respective time-to-maturity at a given observation. This can improve the fit of commodity pricing models, as measurement errors can clearly differ by a wide margin. Maturity grouping is considered in the ‘ME_TTM’ argument, which refers to the maximum maturity to consider for a given number of measurement errors. The ‘ME_TTM’ is only specified when the number of ME parameters is greater than one, and less than the total number of observed futures contracts.

For a given \(n\) number of measurement error parameters,

\(ME_1\) considers observations with a time-to-maturity that is \(< ME_{TTM}[1]\), \(ME_2\) considers observations with time-to-maturity \(< ME_{TTM}[2]\), and so on.

The assumption that measurement errors are independent and identical is clearly the simplest to estimate, but can be a restrictive assumption and not allow the commodity pricing model to fit to certain aspects of the term structure. Allowing multiple measurement errors through the ‘ME_TTM’ argument thus serves to ease this restriction, and allow the user to make the modeling of measurement error as simple or complex as desired for a given set of maturities.

4.4 Estimating N-Factor Commodity Pricing Models:

Parameter estimation of commodity pricing models is performed through the ‘NFCP_MLE’ function. The ‘NFCP_MLE’ function is a wrapper of the ‘NFCP_Kalman_filter’ function to the ‘genoud’ function of the ‘rgenoud’ package. The ‘genoud’ function uses genetic algorithms to solve optimization problems such as MLE (Mebane and Sekhon, 2011). All arguments of the ‘genoud’ function can be passed through the ‘NFCP_MLE’ function and can have a large impact on the estimated parameters of a commodity pricing model. Increasing the population size in particular can have a large impact on maximum likelihood estimates at the cost of increasing the computation time of the estimation process, with population sizes of up to 5-10,000 potentially increasing the performance of MLE for estimating complex commodity pricing models.

The following example presents an estimation of a one-factor geometric Brownian motion model to the Schwartz and Smith (2000) crude oil dataset. There are five futures prices observed at each discrete time point, allowing us to consider a model with between 1-5 measurement errors. When only one measurement error (i.e., ‘N_ME’ = 1) is considered, the measurement error is assumed independent and identical. When five measurement errors are considered (i.e., ‘N_ME’ = 5), the measurement error is assumed independent and unique. In this example, three measurement errors are considered. The first measurement error considers contracts with maturities of less than six months (i.e., ME_TTM[1] = 0.5). This corresponds to the first two observations. The second measurement error considers observations with maturities between 6-12 months (the third observation), and the final measurement error considers observations with maturities between 12-18 months (observations four and five). This function call can take up to one-minute to execute:

# Estimate a GBM model:
SS_oil_1F <- NFCP_MLE(
      ## Arguments
      log_futures = log(SS_oil$stitched_futures),
      dt = SS_oil$dt,
      futures_TTM= SS_oil$stitched_TTM,
      N_ME = 3,
      ME_TTM = c(0.5, 1, 1.5),
      N_factors = 1,
      GBM = TRUE,
      cluster = NULL,
      Richardsons_extrapolation = TRUE,
      ## Genoud arguments:
      pop.size = 1000, optim.method = "L-BFGS-B", print.level = 0, hessian = TRUE,
      max.generations = 100, solution.tolerance = 0.1)
#> ----------------------------------------------------------------
#> 1 Factor model: first factor is a GBM
#> Risk Neutral SDE: 
#> log(S_t) = sum(x_t)
#> Where: 
#> d x1_t    = mu_rn * dt  + sigma_1 * dW_1
#> ----------------------------------------------------------------
#> Term Structure Estimation: 
#> 6 unknown parameters 
#> 268 observations 
#> 5 futures contracts 
#> 3 measurement error maturity groupings

##Print results:
print(round(cbind(`Estimated Parameter` = SS_oil_1F$estimated_parameters, 
                  `Standard Error` = SS_oil_1F$standard_errors),4))
#>         Estimated Parameter Standard Error
#> mu                  -0.0200         0.0799
#> mu_rn               -0.0181         0.0023
#> sigma_1              0.1794         0.0088
#> ME_1                 0.0846         0.0026
#> ME_2                 0.0231         0.0011
#> ME_3                 0.0088         0.0004

4.5 N-Factor Model Comparison:

Log-likelihood scores of different N-factor models can be directly compared (given that they’ve been estimated using the same dataset) in order to test whether additional factors have had a significant improvement to the ability of the model to describe the term structure of the commodity. Consider the crude oil data of Schwartz and Smith (2000):

The log-likelihood score of a GBM model estimated on this term structure data is:

print(c(`Log-Likelihood: One-Factor` = SS_oil_1F$MLE))
#> Log-Likelihood: One-Factor 
#>                    2570.75

The log-likelihood score of a two-factor model on this term structure data is:

print(c(`Log-Likelihood: Two-Factor` = SS_oil_2F$`Log-Likelihood`))
#> Log-Likelihood: Two-Factor 
#>                   4018.632

The GBM model can be obtained from the two-factor model by considering uncertainty in factor 1 only (i.e. \(x_{(2,0)} ,\lambda_2, \sigma_2\) are zero). Futhermore, the one-factor model considered only three measurement errors, whilst the two-factor model utilized five measurement errors. The relevant test statistic for this comparison is a chi-squared test with 5 degrees of freedom (3 factor 2 parameters and 2 ME parameters) with a 99th percentile of this distribution being 15.086. If the two models had been estimated using the same number of ME parameters (i.e., ‘N_ME’ = 5), the test statistic would instead be a chi-squared test with 3 degrees of freedom with a 99th percentile of 11.345 (Schwartz and Smith, 2000). In this particular case, the log-likelihood score of the two-factor model is much greater than that of the one-factor GBM, indicating that an additional factor has greatly improved the fit of the commodity pricing model. Similar statistical testing can be done for different N-factor models.

4.6 N-Factor Model Term Structure Fit:

The errors in the model Fit to the logarithm of futures prices were presented in table 3 of Schwartz and Smith (2000). These errors, along with the root mean squared (RMS) error is returned by the ‘NFCP_Kalman_filter’ function:

##Replicate Table 3 of Schwartz and Smith (2000):
print(round(SS_oil_2F[["Term Structure Fit"]],4))
#>                         F1      F5     F9 F13    F17
#> Mean Error          0.0068 -0.0004 0.0002   0 0.0001
#> Mean Absolute Error 0.0318  0.0034 0.0021   0 0.0029
#> SD Error            0.0424  0.0043 0.0027   0 0.0037
#> RMSE                0.0429  0.0043 0.0027   0 0.0037

The estimated error for contract “F13” in the two-factor model presented by Schwartz and Smith (2000) was estimated to be zero, meaning the commodity pricing model perfectly matched futures prices of the contract at each observation point. The two-factor model is therefore most accurate for forecasting medium-term futures prices. The N-factor model is able to perfectly match up to N futures contracts during parameter estimation if the white noise of observations is not assumed identical (Schwartz and Smith, 2000).

Another measure of robustness is to evaluate the mean squared error (Bias) and root mean squared error (RMSE) of a commodity pricing model to the entire term structure dataset, rather than to individual contracts. The following is similar to table 3 presented by Cortazar and Naranjo (2006):

CN_table3 <- matrix(nrow = 2, ncol = 2, dimnames = list(c("One-Factor","Two-Factor"), c("RMSE", "Bias")))
CN_table3[,"Bias"] <- c(SS_oil_1F$`Filtered Error`["Bias"], SS_oil_2F$`Filtered Error`["Bias"])
CN_table3[,"RMSE"] <- c(SS_oil_1F$`Filtered Error`["RMSE"], SS_oil_2F$`Filtered Error`["RMSE"])

print(round(CN_table3, 4))
#>              RMSE    Bias
#> One-Factor 0.0543 -0.0046
#> Two-Factor 0.0194  0.0013

The introduction of a second factor for this particular commodity pricing model has reduced total RMSE and bias to the observed term structure dataset.

4.7 Plot Contract Observation Error:

The error in contracts at each observation point can be plot to provide an insight into which contracts are estimated with the highest level of error, and how the model fit changed over the observation period. These plots can be directly compared to visualise model fit of different N-factor models.

We begin by considering the contract observation error of the one-factor estimated GBM model:

##One Factor
matplot(as.Date(rownames(SS_oil_1F$V)), SS_oil_1F$V, type = 'l',
xlab = "", ylab = "Observation Error",
main = "Contract Observation Error: One-Factor Model"); legend("bottomright", 

matplot(as.Date(rownames(SS_oil_2F$V)), SS_oil_2F$V, type = 'l',
xlab = "", ylab = "Observation Error", ylim = c(-0.3, 0.2),
main = "Contract Observation Error: Two-Factor Model"); legend("bottomright", 

Observation errors of the two-factor model were lower than the one-factor model in this case. For the two-factor model, observation error for the one month maturity contract is far higher than the other contracts, meaning the two-factor models ability to predict very short-term contracts is lower than middle and long term contracts.

Comparing this with the contract observation error of the estimated one-factor GBM model:

This is also shown by the root mean squared error (RMSE) of each contract:

matplot(cbind(SS_oil_1F$`Term Structure Fit`["RMSE",], SS_oil_2F$`Term Structure Fit`["RMSE",]), 
     type = 'l', main = "Root Mean Squared Error of Futures Contracts", 
     xlab = "Contract", ylab = "RMSE"); legend("right",c("One-Factor", "Two-Factor"),

The one-factor GBM model had the highest fit for the medium maturity contract, with higher errors for the very short- and long-term contracts, whilst the two-factor model had lower RMSE values overall, and less discrepancy between the RMSE values of the different contracts. These findings are consistent with those found by Aspinall et al. (2020) for the prices of emissions allowances in the European Union.

4.8 Plot filtered values:

The two-factor model can be interpreted as the decomposition of spot prices into short-term deviations (factor 2) and long-run equilibrium prices (factor 1). The estimated equilibrium and spot prices were presented in figure 4 of Schwartz and Smith (2000) and are replicated below:

##Replicate Figure 4 of Schwartz and Smith (2000):
SS_figure_4 <- cbind(`Equilibrium Price` =
                    `Spot Price` =
                     SS_oil_2F$Y[,"filtered Spot"])

matplot(as.Date(rownames(SS_figure_4)), SS_figure_4, type = 'l',
xlab = "", ylab = "Oil Price ($/bbl, WTI)", col = 1,
 main = "Estimated Spot and Equilibrium Prices for the Futures Data")

4.9 Plot Standard Deviation of State Variables:

Cortazar and Naranjo (2006) plot the time series of the standard deviation for state variables (Figures 2 & 3) for analysis of the behaviour of the state variables over the observation period. When using stitched, aggregate data (i.e., the time-to-maturity of observations are time-homogeneous) the Kalman filter and covariance of state variables quickly approaches a steady state distribution. Plotting the covariance of state variables of contract data, however, can provide insights and complement analysis into exogenous shocks, structural breaks and the volatility of the factors that influence a commodity over time.


plot(as.Date(rownames(SS_oil$contracts)), sqrt(SS_oil_2F_contracts$P_t[1,1,]), 
     type = 'l', xlab = "Date", ylab = "Std. Dev.", 
     main = "Time Series of the Std. Dev for State Variable 1")

A possible structural break at approximately the beginning of 1991 can be observed within the plot above. In the work of Cortazar and Naranjo (2006), structural breaks in time series data were tested by comparing the time-series covariance matrix of state variables for difference samples, and then computing a Wald test statistic.

4.10 Forecast Spot and Futures Prices:

Expected Spot and futures prices can be forecast under an N-factor model analytically through the ‘spot_price_forecast’ and ‘futures_price_forecast’ functions. ‘help(spot_price_forecast)’ and ‘help(futures_price_forecast)’ presents expressions of expected spot and futures prices respectively.

Forecasted spot and futures prices were presented in the work of Schwartz and Smith (2000) in figures 1 and 2 using parameters developed using Enron data.

##Figure 1 and 2 of Schwartz and Smith (2000) was developed using Enron data
##and an assumption that mu was approximately 3% p.a.:
Enron_values <- c(0.0300875, 0.0161, 0.014, 1.19, 0.115, 0.158, 0.189)
names(Enron_values) <- NFCP_parameters(2, TRUE, FALSE, 0, FALSE)

4.10.1 Spot Prices:

The work of Schwartz and Smith (2000) presented forecasts of spot prices developed using Enron data (figure 1):

## Replicate figure 1 of Schwartz and Smith (2000):

SS_expected_spot <- spot_price_forecast(x_0 = c(2.857, 0.119),
                                           t = seq(0,9,1/12),
                                           percentiles = c(0.1, 0.9))
##Factor one only:
equilibrium_theta <- Enron_values[!names(Enron_values) %in%
                                 c("kappa_2", "lambda_2", "sigma_2", "rho_1_2")]
SS_expected_equilibrium <- spot_price_forecast(x_0 = c(2.857, 0),
                                                  t = seq(0,9,1/12),
                                                  percentiles = c(0.1, 0.9))
SS_figure_1 <- cbind(SS_expected_spot, SS_expected_equilibrium)
matplot(seq(0,9,1/12), SS_figure_1, type = 'l', col = 1, lty = c(rep(1,3), rep(2,3)),
xlab = "Time (Years)", ylab = "Spot Price ($/bbl, WTI)",
main = "Probabilistic Forecasts for Spot and Equilibrium Prices")

An important consideration when forecasting spot prices using parameters estimated through MLE is that the parameter estimation process takes the assumption of risk-neutrality and thus the true process growth rate \(\mu\) is not estimated with a high level of precision. For example, the standard error of parameter \(\mu\) for the one-factor GBM estimated in section 4.3 was approximately 10%. This is further shown such that Schwartz and Smith (2000) did not use the estimated value of \(\mu\) to forecast spot prices, opting to set it at a value “that is more representative of investor expectations” (Schwartz and Smith (2000)).

4.10.2 Futures Prices

Risk-neutral parameters are directly observed in the parameter estimation process and are therefore estimated with a higher level of precision.

Futures prices of Enron data was presented in the work of Schwartz and Smith (2000) in figure 2:

## Replicate Figure 2 of Schwartz and Smith (2000):

#Forecast expected spot prices under the "True" stochastic process:
SS_expected_spot <- spot_price_forecast(x_0 = c(2.857, 0.119),
                                        parameters = Enron_values,
                                        t = seq(0,9,1/12),
                                        percentiles = c(0.1, 0.9))
#Forecast expected futures prices under the Risk-Neutral stochastic process:
SS_futures_curve <- futures_price_forecast(x_0 = c(2.857, 0.119),
                                         parameters = Enron_values,
                                         futures_TTM = seq(0,9,1/12))
SS_figure_2 <- cbind(SS_expected_spot[,2], SS_futures_curve)
matplot(seq(0,9,1/12), log(SS_figure_2), type = 'l', col = 1, 
        xlab = "Time (Years)", ylab = "Log(Price)", 
        main = "Futures Prices and Expected Spot Prices")

4.10.3 Plot the Futures Curve:

The model fit to the observed term structure on given dates can be used as a measure of robustness of N-factor commodity pricing models. This also allows for analysis to how the model fits under circumstances of strong backwardation or contango, exogenous shocks, etc.

The futures curve can be forecast on a particular date using the estimate state vector and compared to the observed futures prices at that time. For example, the fit of the estimated one-factor GBM model and the two-factor model of Schwartz and Smith (2000) can be plotted against observed oil futures prices on the final observation date:

## Maximum Observed Maturity:
max_maturity <- max(tail(SS_oil$contract_maturities,1), na.rm = TRUE)

##Estimated Futures Prices:

### One Factor (GBM):
oil_TS_1F <- futures_price_forecast(x_0 = SS_oil_1F$x_t,
                                         parameters = SS_oil_1F$estimated_parameters,
                                         futures_TTM = seq(0,max_maturity,1/12))

### Two Factor:
oil_TS_2F <- futures_price_forecast(x_0 = SS_oil_2F$x_t,
                                         parameters = SS_oil$two_factor,
                                         futures_TTM = seq(0,max_maturity,1/12))

matplot(seq(0,max_maturity,1/12), cbind(oil_TS_1F, oil_TS_2F), type = 'l', 
        xlab = "Maturity (Years)", ylab = "Futures Price ($)", 
        main = "Estimated and observed oil futures prices on 1995-02-14"); 
points(tail(SS_oil$contract_maturities,1), tail(SS_oil$contracts,1))
legend("bottomleft", c("One-factor", "Two-Factor", "Observed"),

The additional flexibility of the two-factor model better fits the observed term structure at this particular observation date, particularly because the one-factor GBM model is linear in forecasts, which does not describe the observed behavior of futures prices. Comparing the models ability across different observation dates and when the market is in different levels of backwardation, contango, etc. can be used as a measure of robustness and fit of commodity pricing models.

4.11 Plot Volatility Term Structure:

In the work of Cortazar and Naranjo (2006), the ability of an N-factor model to match the volatility term structure of futures returns is presented as a measure of robustness (figure 7). This robustness measure can be attained by calling the ‘TSfit_volatility’ function, but it is also returned by the ‘NFCP_Kalman_filter’ function when ‘verbose = TRUE’.

###Test the Volatility Term Structure Fit of the Schwartz-Smith Two-Factor Oil Model:
V_TSFit <- TSfit_volatility(
 parameters = SS_oil$two_factor,
 futures = SS_oil$stitched_futures,
 futures_TTM = SS_oil$stitched_TTM,
 dt = SS_oil$dt)

##Plot Results:
matplot(V_TSFit["Maturity",], cbind(SS_oil_1F$`Term Structure Volatility Fit`["Theoretical Volatility",], 
                                    V_TSFit["Theoretical Volatility",]), type = 'l',
xlab = "Maturity (Years)", ylab = "Volatility (%)", 
ylim = c(0, 0.5), main = "Volatility Term Structure of Futures Returns"); points(
V_TSFit["Maturity",], V_TSFit["Empirical Volatility",]); legend("bottomleft", 
                  c("Empirical", "One-Factor", "Two-Factor"),col=0:2,cex=0.8,fill=0:2)

This finding shows that the two-factor model tends to slightly underestimate the volatility term structure of futures returns of this dataset. The two-factor model has a better fit to the volatility term structure over the one-factor GBM model, however, as it considers volatility to be a depreciating function with respect to the time to maturity of a futures contract (i.e., the Samuelson’s effect). Additional factors could possibly have a better fit to this robustness measure. A larger number of factors provides the model with more flexibility in adjusting first and second moments simultaneously, possibly explaining the ability for models with more factors to better describe the volatility term structure of futures returns.

4.12 Simulate Spot and Futures Prices:

Monte Carlo simulation can be used to numerically simulate future price paths of spot and futures prices under an N-factor model through the ‘spot_price_simulate’ and ‘futures_price_simulate’ functions. Valuation of American options under an N-factor model generally requires Monte Carlo methods due to the multidimensional structure of N-factor models. The simulation of future price paths is therefore of great interest to those evaluating American options, investment under commodity price uncertainty, and Real Options Analysis. The solutions of simulating spot and futures prices are presented withing ‘help(spot_price_simulate)’ and ‘help(futures_price_simulate)’.

4.12.1 Spot Prices:

Spot prices are simulated using the ‘mvrnorm’ function of the ‘MASS’ package to quickly and efficiently estimate futures shocks of price paths. 100 simulations with a forecasting horizon of 1 year and a monthly (ie. 1/12) discrete time step can be computed through the following:

##100 antithetic simulations of one year of monthly observations
simulated_spot_prices <- spot_price_simulate(
 x_0 = SS_oil_2F$x_t,
 parameters = SS_oil$two_factor,
 t = 1,
 dt = 1/12,
 N_simulations = 1e3,
 antithetic = TRUE,
 verbose = TRUE)

##Plot Price Paths:
matplot(seq(0,1,1/12), simulated_spot_prices$spot_prices, type = 'l', 
        xlab = "Forecasting Horizon (Years)", 
        ylab = "Spot Price ($/bbl, WTI)", main = "Simulated Crude Oil prices")

Spot prices are simulated using antithetic price paths by default, a simple and common variance reduction technique:

##Not Run - Plot Antithetic Price Pairs:
matplot(seq(0,1,1/12), simulated_spot_prices$spot_prices[,1:2], type = 'l', 
        xlab = "Forecasting Horizon (Years)", 
        ylab = "Spot Price ($/bbl, WTI)", 
        main = "Simulated Crude Oil Antithetic Price Pair")

Confidence intervals of simulations can also be visualized:

##Plot 95% Prediction interval:
prediction_interval <-$spot_prices, 1,
                       FUN = function(x) stats::quantile(x, probs = c(0.025, 0.975))),
                       Mean = rowMeans(simulated_spot_prices$spot_prices))
matplot(seq(0,1,1/12), t(prediction_interval), type = 'l', col = c(2,2,1),
lwd = 2, lty = c(2,2,1), xlab = "Forecasting Horizon (Years)", 
ylab = "Spot Price ($/bbl, WTI)", main = "Simulated Crude Oil 95% Confidence Interval")

4.12.2 Futures Prices:

Aggregated, stitched futures data as well as individual contracts are able to be simulated through the ‘futures_price_simulate’ function:

## Simulate Crude Oil Contract Prices under a Two-Factor model

simulated_contracts <- futures_price_simulate(x_0 = c(log(SS_oil$spot[1,1]), 0),
                                            parameters = SS_oil_2F_parameters,
                                            dt = SS_oil$dt,
                                            N_obs = nrow(SS_oil$contracts),
                                            futures_TTM = SS_oil$contract_maturities)

##Not Run - plot Simulated prices:
matplot(as.Date(rownames(SS_oil$contracts)), simulated_contracts$futures_prices, 
        type = 'l', ylab = "Futures Price ($/bbl, WTI)", xlab = "Observations", 
        main = "Simulated Futures Contracts")

Simulating futures prices can be useful for various analytic purposes, including the ability for the parameter estimation procedure to find the ‘true’ parameters of the spot price process that futures prices were simulated from.

4.13 Option Valuation:

The primary purpose of commodity pricing models is for the valuation and hedging of commodity contingent claims (i.e. derivative products on the commodity). Under the N-factor framework, European call and put options can be valued analytically through the Black and Scholes stock-option model applied to the N-factor framework. Schwartz and Smith (2000) first presented analytic European option pricing under a two-factor model. The value of American put options can also be approximated numerically under an N-factor framework through the least-squares Monte-Carlo (LSM) simulation approach. The LSM simulation method is an option valuation method first presented by Longstaff and Schwartz (2001) that approximates the value of American options by using “least squares to (directly) estimate the conditional expected payoff to the option holder from continuation” (Longstaff and Schwartz, 2001).

The following section presents the valuation of European and American put options under the N-factor framework, considering and comparing the value of options under both a one-factor and two-factor commodity pricing model.

4.13.1 European Call and Put Options:

European call and put options can be valued analytically under an N-factor model through the ‘European_option_value’ function. Expressions of the value of European call and puts under an N-factor model are presented in ‘help(European_option_value)’.

The final filtered values of the state variables is generally used to forecast future prices and value European options. These can be obtained by the ‘x_t’ object returned within the list object of ‘NFCP_Kalman_filter’ function when argument ‘verbose = TRUE’.

Consider a European put option on the spot price of crude oil, with option expiry in 1 year and a strike price of $20/bbl. Assume the risk-free interest rate is 5%. The calculated option value under the one-factor model estimated in section 4.4 and the two-factor model of Schwartz and Smith (2000) are:

## One-Factor European put option value: 
European_oil_1F <- European_option_value(x_0 = SS_oil_1F$x_t,
                                         parameters = SS_oil_1F$estimated_parameters,
                                         futures_maturity = 1,
                                         option_maturity  = 1,
                                         K = 20,
                                         r = 0.05,
                                         call = FALSE)

## Two-Factor European put option value: 
European_oil_2F <- European_option_value(x_0 = SS_oil_2F$x_t,
                                         parameters = SS_oil$two_factor,
                                         futures_maturity = 1,
                                         option_maturity = 1,
                                         K = 20,
                                         r = 0.05,
                                         call = FALSE)
## Print results:
print(round(c("One Factor" = European_oil_1F, "Two Factor" = European_oil_2F),3))
#> One Factor Two Factor 
#>      2.604      3.015

Option values under different stochastic modeling assumptions and number of factors can vary greatly. When argument ‘verbose = TRUE’, the ‘European_option_value’ function returns a list object providing details of the sensitivity in option price to the different stochastic parameters of the N-factor model as well as numeric approximations of the most common ‘Greeks’ considered in European option pricing.

4.13.2 American Put Options:

The valuation of American put options under the N-factor framework is available through the ‘American_option_value’ function. ‘American_option_value’ is a wrapper to the LSM simulation implementation ‘LSM.AmericanOption’ of the ‘LSMRealOptions’ package (Aspinall et al., 2021). LSM simulation is one of the most popular methods of approximating the value of American-style options due to its ability to efficiently and easily handle multi-dimensional valuating settings. This property of LSM simulation makes it the ideal solution method to value American-style options under the N-factor framework.

Longstaff and Schwartz (2001) state that as the conditional expectation of the continuation value belongs to a Hilbert space, it can be represented by a combination of orthogonal basis functions, where increasing the number of stochastic state variables (i.e. factors) increases the number of required basis functions exponentially. The convergence properties of the LSM simulation function state that the American option value calculated by the LSM simulation method converges to the true value as the number of simulations and basis functions increase (Longstaff and Schwartz, 2001; Clément, Lamberton and Protter, 2002). LSM simulation can be a computationally expensive process, with processing times increasing as a function of the number of simulations used and the degree of the orthogonal polynomial that is applied.

Similarly to the European option valuation presented in 4.13.1, Consider an American put option on the spot price of crude oil, with option expiry in 1 year and a strike price of $20/bbl. Assume the risk-free interest rate is 5%. Furthermore, consider 100,000 simulations (of which 50% are antithetic) and an exercise opportunity of 50 times per year. The calculated option value under the one-factor model estimated in section 4.4 and the two-factor model of Schwartz and Smith (2000) are:

## One-Factor American put option value: 
American_oil_1F <- American_option_value(x_0 = SS_oil_1F$x_t,
                      parameters = SS_oil_1F$estimated_parameters,
                      N_simulations = 1e5,
                      option_maturity = 1,
                      dt = 1/50,
                      K = 20,
                      r = 0.05)

## Two-Factor American put option value: 
American_oil_2F <- American_option_value(x_0 = SS_oil_2F$x_t,
                      parameters = SS_oil$two_factor,
                      N_simulations = 1e5,
                      option_maturity = 1,
                      dt = 1/50,
                      K = 20,
                      r = 0.05,
                      verbose = FALSE,
                      orthogonal = "Power",
                      degree = 2)

print(round(c("One Factor" = American_oil_1F, "Two Factor" = American_oil_2F),3))
#> One Factor Two Factor 
#>      2.634      3.703

The ability to exercise options at any point before maturity is a premium that American-style options hold over European-style options and therefore the value of American-style options are always greater than European-style counterparts.

4.13.3 Replicate table 1 of Longstaff and Schwartz (2001):

Longstaff and Schwartz (2001) present a table (table 1) of calculated vanilla American put option values under differing levels of strike prices, annual volatility of the underlying asset and option maturity. Corresponding analytic European and simulated American option values can be replicated through the ‘European_option_value’ and ‘American_option_value’ functions respectively.

Following the work of Longstaff and Schwartz (2001), consider an American-style put option on a share of stock that can be exercised at 50 discrete time points per year. The strike price of the put is $40, the short-term interest rate is .06. American options are valued using the LSM simulation solution method, using 100,000 simulations (of which 50% are antithetic) of price paths of the stock-price process, which is assumed to be a GBM stochastic process.

## Exercise opportunities per year:
dt <- 1/50
## strike price :
K <- 40
## short-term interest rate:
rf <- 0.06
## 100,000 simulations (50% antithetic):
N_simulations <- 1e5
## Stock price volatility (variable):
sigma <- rep(c(rep(0.2,2),rep(0.4,2)),5)
## Initial Stock price (variable):
S0 <- sort(rep(seq(36,44,2),4))
## Option maturity (variable):
TTM <- rep(1:2, 10)

LSM_output <- matrix(0, 20, 4, 
                     dimnames = list(NULL, c("LSM American", "(s.e)", 
                                             "Closed form European", 
                                             "Early exercise value")))

## Cycle through the rows of the table:
for(i in 1:20){
## Stochastic model assumption:
stock_parameters <- c(mu_rn = (rf - 0.5 * sigma[i]^2), sigma_1 = sigma[i])
## American option pricing through LSM Simulation
output <- American_option_value(x_0 = log(S0[i]),
                      parameters = stock_parameters,
                      N_simulations = N_simulations,
                      option_maturity = TTM[i],
                      dt = dt,
                      K = K,
                      r = rf,
                      orthogonal = "Laguerre",
                      degree = 3,
                      verbose = TRUE)

LSM_output[i,1] <- output$Value
LSM_output[i,2]  <- output$`Standard Error`

## European option pricing through the BSM PDE
LSM_output[i,3] <- European_option_value(x_0 = log(S0[i]),
                                         parameters = stock_parameters,
                                         futures_maturity = TTM[i],
                                         option_maturity = TTM[i],
                                         K = K,
                                         r = rf,
                                         call = FALSE)

## Early exercise value - the difference between American and European put values:
LSM_output[i,4] <- LSM_output[i,1] - LSM_output[i,3]


The first three degree’s of the “Laguerre” orthogonal polynomial are used in this example, as “three basis functions are sufficient to obtain effective convergence of the algorithm in this example” (Longstaff and Schwartz, 2001).

## Compile and print results:
LnS_table_1 <- = S0, sigma = sigma, T = TTM, LSM_output)
#>     S sigma T LSM American (s.e) Closed form European Early exercise value
#> 1  36   0.2 1        4.467 0.009                3.844                0.623
#> 2  36   0.2 2        4.827 0.011                3.763                1.064
#> 3  36   0.4 1        7.097 0.019                6.711                0.386
#> 4  36   0.4 2        8.503 0.022                7.700                0.803
#> 5  38   0.2 1        3.240 0.009                2.852                0.388
#> 6  38   0.2 2        3.740 0.011                2.991                0.749
#> 7  38   0.4 1        6.120 0.018                5.834                0.286
#> 8  38   0.4 2        7.660 0.022                6.979                0.681
#> 9  40   0.2 1        2.311 0.009                2.066                0.245
#> 10 40   0.2 2        2.878 0.011                2.356                0.522
#> 11 40   0.4 1        5.302 0.018                5.060                0.242
#> 12 40   0.4 2        6.912 0.022                6.326                0.586
#> 13 42   0.2 1        1.615 0.008                1.465                0.151
#> 14 42   0.2 2        2.196 0.010                1.841                0.355
#> 15 42   0.4 1        4.582 0.017                4.379                0.203
#> 16 42   0.4 2        6.233 0.021                5.736                0.497
#> 17 44   0.2 1        1.111 0.007                1.017                0.094
#> 18 44   0.2 2        1.683 0.009                1.429                0.254
#> 19 44   0.4 1        3.929 0.016                3.783                0.146
#> 20 44   0.4 2        5.611 0.020                5.202                0.409

The ‘LSMRealOptions’ provides a framework for the calculation of American-style option values, including the valuation of capital investment projects, through the LSM simulation method. ‘LSMRealOptions’ natively supports the simulated factors output by the ‘spot_price_simulate’ and is appropriate for other derivative pricing of commodity contingent claims.

5. Discussion:

This work has presented the ‘NFCP’ (N-Factor Commodity Pricing) package, providing worked examples of the functionality of the package, replicating the work of Schwartz and Smith (2000) throughout the study. The ‘NFCP’ package is a very versatile package that provides the ability to effectively build, analyse, forecast and simulate commodity pricing models. The ‘NFCP’ package allows for a wide versatility in the structure of commodity pricing models, allowing the spot price process of the commodity to be made up of the exponential sum of N arbitrary factors, with both random walk and mean-reverting characteristics available. The ‘NFCP’ package combines the fastest Kalman filter algorithm available with an effective optimization algorithm to ensure MLE returns optimal parameters for a given commodity pricing model and term structure data. The numeric difficulty of parameter estimation increases sharply with the number of underlying factors, N. The number of unknown parameters of an N-factor model (without considering the white noise of futures contracts) is equal to: \(0.5 (N^2 + 5N)\). Realistically, commodity pricing models with more than four factors are impractical due to the difficulty of parameter estimation through MLE.

The parameter estimation process is a bounded search under which the parameters can only take realistic values (i.e. volatility terms cannot be negative, etc.). Parameters estimated through MLE, however, have no in-built guarantee that they will make explicit economic sense. Output analysis of MLE is a crucial step in ensuring commodity pricing models are appropriate for analysis purposes. Factors with extremely large estimated correlation coefficients (say, greater than 90% or less than -90%) are generally poorly specified, with the commodity pricing model failing to differentiate between the dynamics that drive these two factors. Mean-reverting factors with extremely large risk-premium parameters ,\(\lambda_i\), are considered poorly specified models and generally include factors with very high correlation coefficients. Finally, reversion rates, \(\kappa_i\), that are extremely high or low are a final indicator of poorly specified models. A reversion rate that is too low is representative of a factor that does not revert over time, thus inducing a unit root in the spot price process. If this is encountered when GBM = FALSE, this may be the model naturally inducing a unit root, indicating that a degree of the spot price process is driven by a random walk. If this is encountered when GBM = TRUE, it’s not appropriate to utilize the commodity pricing model under the assumption that spot prices are driven by more than one unit-root process. When mean-reversion rates are extremely high, the state variable approaches a deterministic value given by the equilibrium of the process. Although log-likelihood scores for some commodity pricing models may indicate their increased ability to explain the term structure of a commodity, output analysis of parameters estimated through MLE is a crucial step to evaluating the applicability of estimated models for forecasting and analysis purposes.

The ‘NFCP’ package is a comprehensive and powerful framework for the development and analysis of commodity pricing models. It provides the ability to forecast and value derivative products using existing term structure data, and further provides the tools required to value American option products through simulation methods. The development and analysis of commodity pricing models is highly recommended to those who interact and trade in commodity markets, or those who have investment projects with the returns of these project contingent on the prices of commodities.


Gibson, R., and E. S. Schwartz, (1990). Stochastic Convenience Yield and the Pricing of Oil Contingent Claims. The Journal of Finance, 45(3), 959-976.

Schwartz, E. S., (1997). The Stochastic Behavior of Commodity Prices: Implications for Valuation and Hedging. The Journal of Finance, 52(3), 923-973.

Schwartz, E. S., (1998). Valuing long-term commodity assets. Financial Management, 3(2), 85-99.

Schwartz, E. S., and J. E. Smith, (2000). Short-Term Variations and Long-Term Dynamics in Commodity Prices. Manage. Sci., 46, 893-911.

Longstaff, F. A., and E. S. Schwartz, (2001). Valuing American options by simulation: a simple least-squares approach. The review of financial studies, 14(1), 113-147.

Sørensen, C., (2002). Modeling seasonality in agricultural commodity futures. Journal of Futures Markets: Futures, Options, and Other Derivative Products. 22(5), 393-426.

Clément, E., D. Lamberton, and P. Protter, (2002). An analysis of a least squares regression method for American option pricing. Finance and stochastics. 6.4, 449-471.

Cortazar, G., & E. S. Schwartz, (2003). Implementing a stochastic model for oil futures prices. Energy Economics, 25(3), 215-238.

Cortazar, G., and L. Naranjo, (2006). An N-factor Gaussian model of oil futures prices. Journal of Futures Markets: Futures, Options, and Other Derivative Products, 26(3), 243-268.

Mebane, W. R., and J. S. Sekhon, (2011). Genetic Optimization Using Derivatives: The rgenoud Package for R. Journal of Statistical Software, 42(11), 1-26. URL

Aspinall et al., (2020). Estimation of a term structure model of carbon prices through state space methods: The European Union emissions trading scheme. Accounting and Finance.