1. Independent Component Analysis (ICA)

Koki Tsuyuzaki

Department of Artificial Intelligence Medicine, Graduate School of Medicine, Chiba University
k.t.the-answer@hotmail.co.jp

2023-04-28

Introduction

In this vignette, we consider approximating a data matrix as a product of multiple low-rank matrices (a.k.a., factor matrices).

Test data is available from toyModel.

library("iTensor")
data1 <- iTensor::toyModel("ICA_Type1")
data2 <- iTensor::toyModel("ICA_Type2")
data3 <- iTensor::toyModel("ICA_Type3")
data4 <- iTensor::toyModel("ICA_Type4")
data5 <- iTensor::toyModel("ICA_Type5")
dim(data1$X_observed)
## [1] 500   3
dim(data2$X_observed)
## [1] 500   3
dim(data3$X_observed)
## [1] 500   3
dim(data4$X_observed)
## [1] 480   3
dim(data5$gene) # N < P systems
## [1]   64 3116

Summary of these data is as follows:

  1. ICA with time-independent sub-Gaussian data
  2. ICA with time-independent super-Gaussian data
  3. ICA with data mixed with signals having no time dependence and different kurtosis
  4. ICA with time-dependent data
  5. IPCA in N < P systems

You will see that the stuctures of these data as follows.

pairs(data1$X_observed, main="data1")

pairs(data2$X_observed, main="data2")

pairs(data3$X_observed, main="data3")

pairs(data4$X_observed, main="data4")

dim(data5$gene)
## [1]   64 3116

To perform and visualize all the ICA at once, here we write some functions as follows.

allICA <- function(X, J){
    # Classic ICA Methods
    out.FastICA <- ICA(X=X, J=J, algorithm="FastICA")
    out.InfoMax <- ICA(X=X, J=J, algorithm="InfoMax")
    out.ExtInfoMax <- ICA(X=X, J=J, algorithm="ExtInfoMax")
    # Modern ICA Method
    out.JADE <- ICA2(X=X, J=J, algorithm="JADE")
    # Output
    list(out.FastICA=out.FastICA, out.InfoMax=out.InfoMax,
      out.ExtInfoMax=out.ExtInfoMax, out.JADE=out.JADE)
}

CorrIndexAllICA <- function(S, out){
    tmp <- lapply(out, function(x, S){CorrIndex(cor(S, Re(x$S)))}, S=S)
    Name <- gsub("out.", "", names(tmp))
    Value <- round(unlist(tmp), 3)
    names(Value) <- NULL
    cbind(Name, Value)
    tmp <- as.data.frame(cbind(Name, Value))
    colnames(tmp) <- c("Method", "CorrIndex")
    knitr::kable(tmp)
}

Note that we select only four ICA algorithms for the demonstration but in iTensor 12 ICA algorithms are available. For the details, check ?ICA and ?ICA2.

1. ICA with time-independent sub-Gaussian data

In this data, according to the independence between estimated source signal, an upright square means that the calculation is successful and other cases such as a rhombus rotated 45 degrees from a square means that the calculation is fail.

set.seed(123456)
out1 <- allICA(X=data1$X_observed, J=3)

Here we use CorrIndex, which is a performance index to evaluate ICA results (Sobhani 2022). CorrIndex imply that FastICA, ExtInfoMax, and JADE performed well (the closer to 0, the better the performance).

CorrIndexAllICA(data1$S_true, out1)
Method CorrIndex
FastICA 0.263
InfoMax 7.439
ExtInfoMax 0.263
JADE 0.221

We can see the details of source signals as follows.

pairs(out1$out.FastICA$S, main="FastICA")

pairs(out1$out.InfoMax$S, main="InfoMax")

pairs(out1$out.ExtInfoMax$S, main="ExtInfoMax")

pairs(Re(out1$out.JADE$S), main="JADE")

2. ICA with time-independent super-Gaussian data

In this data, if the outliers are distributed along the lines of x = 0 and y = 0, the calculation is successful and if they are of the cross-shape, it is a failure.

set.seed(123456)
out2 <- allICA(X=data2$X_observed, J=3)

CorrIndex imply that all the algorithms performed well.

CorrIndexAllICA(data2$S_true, out2)
Method CorrIndex
FastICA 0.268
InfoMax 0.322
ExtInfoMax 0.268
JADE 0.374

We can see the details of source signals as follows.

pairs(out2$out.FastICA$S, main="FastICA")

pairs(out2$out.InfoMax$S, main="InfoMax")

pairs(out2$out.ExtInfoMax$S, main="ExtInfoMax")

pairs(Re(out2$out.JADE$S), main="JADE")

3. ICA with data mixed with signals having no time dependence and different kurtosis

In this data, the uniform distribution should be extracted in such a way that it is not rhombic, and furthermore the normal distribution and the t-distribution with 1 degree of freedom should be extracted independently. Sometimes, it is difficult ot determine visually though.

set.seed(123456)
out3 <- allICA(X=data3$X_observed, J=3)

CorrIndex imply that FastICA, ExtInfoMax, and JADE performed well.

CorrIndexAllICA(data3$S_true, out3)
Method CorrIndex
FastICA 0.23
InfoMax 3.799
ExtInfoMax 0.213
JADE 0.408

We can see the details of source signals as follows.

pairs(out3$out.FastICA$S, main="FastICA")

pairs(out3$out.InfoMax$S, main="InfoMax")

pairs(out3$out.ExtInfoMax$S, main="ExtInfoMax")

pairs(Re(out3$out.JADE$S), main="JADE")

4. ICA with time-dependent data

In this data, if the mesh pattern is reproduced, the calculation is successful.

set.seed(123456)
out4 <- allICA(X=data4$X_observed, J=3)

CorrIndex imply that FastICA, ExtInfoMax, and JADE performed well.

CorrIndexAllICA(data4$S_true, out4)
Method CorrIndex
FastICA 0.081
InfoMax 8.026
ExtInfoMax 0.081
JADE 0.08

We can see the details of source signals as follows.

pairs(out4$out.FastICA$S, main="FastICA")

pairs(out4$out.InfoMax$S, main="InfoMax")

pairs(out4$out.ExtInfoMax$S, main="ExtInfoMax")

pairs(Re(out4$out.JADE$S), main="JADE")

5. IPCA in N < P systems

This kind of data is an thin and long matrix, which is a very common data structure in omics data. In iTensor, we re-implemented the IPCA function of mixOmics package.

library("mixOmics")
## Loading required package: MASS
## Loading required package: lattice
## Loading required package: ggplot2
## 
## Loaded mixOmics 6.24.0
## Thank you for using mixOmics!
## Tutorials: http://mixomics.org
## Bookdown vignette: https://mixomicsteam.github.io/Bookdown
## Questions, issues: Follow the prompts at http://mixomics.org/contact-us
## Cite us:  citation('mixOmics')
set.seed(123456)

# mixOmics's IPCA
res.ipca <- ipca(data5$gene, ncomp=3, mode="deflation")

# iTensor's IPCA
out5 <- ICA2(X=as.matrix(data5$gene), J=3, algorithm="IPCA")

We can see the details of source signals as follows. In this data, we can confirm that the IPCA of mixOmics and iTensor have the same results, except for the order and constant-fold relationships.

pairs(res.ipca$x, main="IPCA (mixOmics)", col=data5$treatment[,"Treatment.Group"], pch=16)

pairs(out5$S, main="IPCA (iTensor)", col=data5$treatment[,"Treatment.Group"], pch=16)

Session Information

## R version 4.3.0 (2023-04-21)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Debian GNU/Linux bookworm/sid
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.21.so;  LAPACK version 3.11.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Etc/UTC
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] mixOmics_6.24.0 ggplot2_3.4.2   lattice_0.21-8  MASS_7.3-59    
## [5] iTensor_1.0.2  
## 
## loaded via a namespace (and not attached):
##  [1] tidyr_1.3.0         sass_0.4.5          utf8_1.2.3         
##  [4] generics_0.1.3      stringi_1.7.12      digest_0.6.31      
##  [7] magrittr_2.0.3      evaluate_0.20       grid_4.3.0         
## [10] RColorBrewer_1.1-3  fastmap_1.1.1       plyr_1.8.8         
## [13] jsonlite_1.8.4      Matrix_1.5-4        ggrepel_0.9.3      
## [16] RSpectra_0.16-1     gridExtra_2.3       mgcv_1.8-42        
## [19] purrr_1.0.1         fansi_1.0.4         scales_1.2.1       
## [22] codetools_0.2-19    jquerylib_0.1.4     cli_3.6.1          
## [25] rlang_1.1.0         splines_4.3.0       munsell_0.5.0      
## [28] withr_2.5.0         cachem_1.0.7        yaml_2.3.7         
## [31] ellipse_0.4.5       tools_4.3.0         parallel_4.3.0     
## [34] reshape2_1.4.4      BiocParallel_1.34.0 einsum_0.1.0       
## [37] dplyr_1.1.2         colorspace_2.1-0    corpcor_1.6.10     
## [40] groupICA_0.1.1      vctrs_0.6.2         R6_2.5.1           
## [43] matrixStats_0.63.0  lifecycle_1.0.3     stringr_1.5.0      
## [46] pkgconfig_2.0.3     rTensor_1.4.8       geigen_2.3         
## [49] pillar_1.9.0        bslib_0.4.2         gtable_0.3.3       
## [52] jointDiag_0.4       glue_1.6.2          rARPACK_0.11-0     
## [55] Rcpp_1.0.10         highr_0.10          xfun_0.39          
## [58] tibble_3.2.1        tidyselect_1.2.0    knitr_1.42         
## [61] nlme_3.1-162        htmltools_0.5.5     igraph_1.4.2       
## [64] rmarkdown_2.21      compiler_4.3.0

References

Sobhani, E. et al. 2022. “CorrIndex: A Permutation Invariant Performance Index.” Signal Processing 195: 108457.