The ‘gfiExtremes’ package offers two main functions:
gfigpd2. Each of them generates simulations from the fiducial distribution of a model involving the generalized Pareto distribution. The difference is that the threshold \(\mu\) of the generalized Pareto distribution is assumed to be known by
gfigpd2 estimates this threshold.
The algorithms are implemented in C++. They are translated from the original R code written by the authors of the reference paper.
The package allows to run some MCMC chains in parallel. In the examples below, as well as in the examples of the package documentation, I set
nthreads = 2L because of CRAN restrictions: CRAN does not allow more than two R processes in parallel and then it would not accept the package if a higher number of threads were set.
When the threshold \(\mu\) is known, it is assumed that the data are independent realizations of a random variable \(X\) which follows a generalized Pareto distribution \(GP(\mu,\gamma,\sigma)\) conditionally to \(X \geqslant \mu\). On the event \(X < \mu\), no distributional assumption is made.
Then the algorithm performed by the
gfigpd1 function produces some simulations of the fiducial distributions of \(\gamma\), \(\sigma\) and of the \(100\beta\%\)-quantiles of \(X\) at the requested values of \(\beta\). These are MCMC chains.
For example, assume that \(X\) follows the \(GP(\mu,\gamma,\sigma)\) distribution (so \(X < \mu\) never happens).
The output of
gfigpd1 is a R object ready for analysis with the ‘coda’ package. In particular, it has a
summary(gf1) #> #> Iterations = 2001:61995 #> Thinning interval = 6 #> Number of chains = 4 #> Sample size per chain = 10000 #> #> 1. Empirical mean and standard deviation for each variable, #> plus standard error of the mean: #> #> Mean SD Naive SE Time-series SE #> gamma 0.3132 0.1017 0.0005086 0.0009532 #> sigma 1.9692 0.2351 0.0011757 0.0022424 #> beta1 30.9276 5.4264 0.0271322 0.0467451 #> beta2 38.0272 9.0772 0.0453861 0.0799586 #> beta3 63.5831 27.2337 0.1361684 0.2454226 #> #> 2. Quantiles for each variable: #> #> 2.5% 25% 50% 75% 97.5% #> gamma 0.1347 0.2415 0.3055 0.3769 0.5319 #> sigma 1.5371 1.8051 1.9585 2.1210 2.4602 #> beta1 23.9020 27.2199 29.7925 33.3429 44.5348 #> beta2 27.0268 31.9434 35.9689 41.6726 61.1854 #> beta3 35.3511 46.5221 56.5144 72.0191 134.1137 # compare with the true quantiles: qgpareto(c(0.99, 0.995, 0.999), mu = 10, gamma = 0.3, sigma = 2) #>  29.87381 36.00849 56.28855
Every parameter is very well estimated by the median of its fiducial distribution.
We can get the shortest fiducial confidence intervals with the ‘coda’ function
When the threshold \(\mu\) is unknown, it is also assumed that the data are independent realizations of a random variable \(X\) which follows a generalized Pareto distribution \(GP(\mu,\gamma,\sigma)\) conditionally to \(X \geqslant \mu\), but there are additional assumptions.
These additional assumptions have no meaningful interpretation but this is not important in order to estimate the quantiles of \(X\): it is possible that the parameters \(\gamma\) and \(\sigma\) cannot be estimated (it is always possible if \(X\) strictly follows the unrealistic assumptions of the model) but \(\mu\) can always be estimated and the fiducial distributions of the quantiles are available.
Let’s assume for example that \(X\) follows a log-normal distribution:
set.seed(666L) X <- rlnorm(400L, meanlog = 1) gf2 <- gfigpd2( X, beta = c(0.99, 0.995, 0.999), iter = 10000L, burnin = 10000L, nchains = 4L, nthreads = 2L ) summary(gf2) #> #> Iterations = 10001:69995 #> Thinning interval = 6 #> Number of chains = 4 #> Sample size per chain = 10000 #> #> 1. Empirical mean and standard deviation for each variable, #> plus standard error of the mean: #> #> Mean SD Naive SE Time-series SE #> beta1 25.64 4.025 0.02013 0.1160 #> beta2 32.77 6.471 0.03235 0.1970 #> beta3 55.60 17.167 0.08584 0.5701 #> #> 2. Quantiles for each variable: #> #> 2.5% 25% 50% 75% 97.5% #> beta1 19.73 22.75 24.97 27.77 35.44 #> beta2 23.69 28.19 31.59 35.96 48.99 #> beta3 34.07 43.71 51.84 62.69 100.44 # compare with the true quantiles: qlnorm(c(0.99, 0.995, 0.999), meanlog = 1) #>  27.83649 35.72423 59.75377
As you can see, the inference on the quantiles is good.