Introduction

An R package implementing a Projection Pursuit algorithm based on finite Gaussian Mixtures Models for density estimation using Genetic Algorithms (PPGMMGA) to maximise a Negentropy index. The PPGMMGA algorithm provides a method to visualise high-dimensional data in a lower-dimensional space, with special reference to reveal clustering structures.

library(ppgmmga)
Banknote data

library(mclust)
data("banknote")
X <- banknote[,-1]
Class <- banknote$Status table(Class) ## Class ## counterfeit genuine ## 100 100 clPairs(X, classification = Class, symbols = ppgmmga.options("classPlotSymbols"), colors = ppgmmga.options("classPlotColors")) 1-dimensional PPGMMGA PP1D <- ppgmmga(data = X, d = 1, seed = 1) PP1D ## Call: ## ppgmmga(data = X, d = 1, seed = 1) ## ## 'ppgmmga' object containing: ## [1] "data" "d" "approx" "GMM" "GA" ## [6] "Negentropy" "basis" "Z" summary(PP1D) ## ── ppgmmga ───────────────────────────── ## ## Data dimensions = 200 x 6 ## Data transformation = center & scale ## Projection subspace dimension = 1 ## GMM density estimate = (VEE,4) ## Negentropy approximation = UT ## GA optimal negentropy = 0.6345935 ## GA encoded basis solution: ## x1 x2 x3 x4 x5 ## [1,] 3.268902 2.373044 1.051365 0.313128 0.531718 ## ## Estimated projection basis: ## PP1 ## Length -0.0119653 ## Left -0.0934775 ## Right 0.1602105 ## Bottom 0.5740698 ## Top 0.3450346 ## Diagonal -0.7189203 ## ## Monte Carlo Negentropy approximation check: ## UT ## Approx Negentropy 0.634593544 ## MC Negentropy 0.633614256 ## MC se 0.002249545 ## Relative accuracy 1.001545559 plot(PP1D) plot(PP1D, class = Class) 2-dimensional PPGMMGA PP2D <- ppgmmga(data = X, d = 2, seed = 1) summary(PP2D) ## ── ppgmmga ───────────────────────────── ## ## Data dimensions = 200 x 6 ## Data transformation = center & scale ## Projection subspace dimension = 2 ## GMM density estimate = (VEE,4) ## Negentropy approximation = UT ## GA optimal negentropy = 1.13624 ## GA encoded basis solution: ## x1 x2 x3 x4 x5 x6 x7 x8 ## [1,] 2.268667 2.929821 1.061407 1.084929 0.30443 3.85462 0.98329 1.11377 ## x9 x10 ## [1,] 0.167174 1.668403 ## ## Estimated projection basis: ## PP1 PP2 ## Length -0.0372687 -0.0718319 ## Left 0.0312555 -0.1198116 ## Right -0.1548079 0.0630092 ## Bottom -0.0856931 0.8639049 ## Top -0.1024990 0.4603727 ## Diagonal 0.9776601 0.1350576 ## ## Monte Carlo Negentropy approximation check: ## UT ## Approx Negentropy 1.136240194 ## MC Negentropy 1.137260367 ## MC se 0.003527379 ## Relative accuracy 0.999102956 summary(PP2D$GMM)
## -------------------------------------------------------
## Density estimation via Gaussian finite mixture modeling
## -------------------------------------------------------
##
## Mclust VEE (ellipsoidal, equal shape and orientation) model with 4 components:
##
##  log-likelihood   n df       BIC       ICL
##       -1191.595 200 51 -2653.405 -2666.898
plot(PP2D$GA) plot(PP2D) plot(PP2D, class = Class, drawAxis = FALSE) 3-dimensional PPGMMGA PP3D <- ppgmmga(data = X, d = 3, center = TRUE, scale = FALSE, gatype = "gaisl", options = ppgmmga.options(numIslands = 2), seed = 1) summary(PP3D) ## ── ppgmmga ───────────────────────────── ## ## Data dimensions = 200 x 6 ## Data transformation = center ## Projection subspace dimension = 3 ## GMM density estimate = (VVE,3) ## Negentropy approximation = UT ## GA optimal negentropy = 1.16915 ## GA encoded basis solution: ## x1 x2 x3 x4 x5 x6 x7 x8 ## [1,] 4.274545 2.47064 1.055677 1.022896 0.851247 4.924235 1.982288 2.039161 ## x9 x10 ... x14 x15 ## [1,] 1.939208 2.210582 1.548995 2.489197 ## ## Estimated projection basis: ## PP1 PP2 PP3 ## Length -0.3145939 0.5612330 -0.5201907 ## Left -0.1472768 -0.1498109 -0.3297848 ## Right 0.3043823 0.5008715 -0.3739875 ## Bottom 0.2818318 0.3353769 0.4238383 ## Top 0.3062895 0.4589957 0.3562206 ## Diagonal -0.7832300 0.2975690 0.4174266 ## ## Monte Carlo Negentropy approximation check: ## UT ## Approx Negentropy 1.16914962 ## MC Negentropy 1.17493505 ## MC se 0.00430878 ## Relative accuracy 0.99507596 plot(PP3D$GA)

plot(PP3D)

plot(PP3D, class = Class)

plot(PP3D, dim = c(1,2))

plot(PP3D, dim = c(1,3), class = Class)

# A rotating 3D plot can be obtained using
if(!require("msir")) install.packages("msir")
msir::spinplot(PP3D\$Z, markby = Class,
pch.points = c(20,17),
col.points = ppgmmga.options("classPlotColors")[1:2])

References

Scrucca L, Serafini A (2019). “Projection pursuit based on Gaussian mixtures and evolutionary algorithms.” Journal of Computational and Graphical Statistics, 28(4), 847–860. https://doi.org/10.1080/10618600.2019.1598871.

