3. Non-negative Tensor Factorization (NTF and NTD)

Koki Tsuyuzaki

Laboratory for Bioinformatics Research, RIKEN Center for Biosystems Dynamics Research
k.t.the-answer@hotmail.co.jp

Itoshi Nikaido

Laboratory for Bioinformatics Research, RIKEN Center for Biosystems Dynamics Research

2023-06-22

Introduction

In this vignette we consider approximating a non-negative tensor as a product of multiple non-negative low-rank matrices (a.k.a., factor matrices) and a core tensor.

Test data available from toyModel. Here, we set two datasets to simulate different situations.

library("nnTensor")
X_CP <- toyModel("CP")
X_Tucker <- toyModel("Tucker")

In the first case, you will see that there are four small blocks in the diagonal direction of the data tensor.

plotTensor3D(X_CP)

In the second case, you will see that there are six long blocks in the data tensor.

plotTensor3D(X_Tucker)

NTF

To decompose a non-negative tensor (\(\mathcal{X}\)), non-negative CP decomposition (a.k.a. non-negative tensor factorization; NTF (Cichocki 2007, 2009; Phan 2008b)) can be applied. NTF appoximates \(\mathcal{X}\) (\(N \times M \times L\)) as the mode-product of a core tensor \(S\) (\(J \times J \times J\)) and factor matrices \(A_1\) (\(J \times N\)), \(A_2\) (\(J \times M\)), and \(A_3\) (\(J \times L\)).

\[ \mathcal{X} \approx \mathcal{S} \times_{1} A_1 \times_{2} A_2 \times_{3} A_3\ \mathrm{s.t.}\ \mathcal{S} \geq 0, A_{k} \geq 0\ (k=1 \ldots 3) \]

Note that _{k} is the mode-\(k\) product (CICHOCK 2009) and the core tensor \(S\) has non-negative values only in the diagonal element.

NTF can be performed as follows:

set.seed(123456)
out_NTF_CP <- NTF(X_CP, algorithm="KL", rank=4)
str(out_NTF_CP, 2)
## List of 6
##  $ S            : num [1:4] 5563 8822 5364 5382
##  $ A            :List of 3
##   ..$ : num [1:4, 1:30] 2.22e-16 1.04e-01 2.22e-16 4.41e-01 2.22e-16 ...
##   ..$ : num [1:4, 1:30] 2.22e-16 1.05e-01 2.22e-16 4.51e-01 2.22e-16 ...
##   ..$ : num [1:4, 1:30] 6.17e-19 1.08e-01 1.36e-13 4.50e-01 3.99e-20 ...
##  $ RecError     : Named num [1:58] 1.00e-09 1.54e+06 1.33e+06 1.18e+05 2.50e+04 ...
##   ..- attr(*, "names")= chr [1:58] "offset" "1" "2" "3" ...
##  $ TrainRecError: Named num [1:58] 1.00e-09 1.54e+06 1.33e+06 1.18e+05 2.50e+04 ...
##   ..- attr(*, "names")= chr [1:58] "offset" "1" "2" "3" ...
##  $ TestRecError : Named num [1:58] 1e-09 0e+00 0e+00 0e+00 0e+00 0e+00 0e+00 0e+00 0e+00 0e+00 ...
##   ..- attr(*, "names")= chr [1:58] "offset" "1" "2" "3" ...
##  $ RelChange    : Named num [1:58] 1.00e-09 9.99e-01 1.63e-01 1.03e+01 3.71 ...
##   ..- attr(*, "names")= chr [1:58] "offset" "1" "2" "3" ...

As same as other matrix factorization methods, the values of reconstruction error and relative error indicate that the optimization is converged.

layout(t(1:2))
plot(log10(out_NTF_CP$RecError[-1]), type="b", main="Reconstruction Error")
plot(log10(out_NTF_CP$RelChange[-1]), type="b", main="Relative Change")

By using recTensor, user can easily reconstruct the data from core tensor and factor matrices as follows.

recX_CP <- recTensor(out_NTF_CP$S, out_NTF_CP$A, idx=1:3)
layout(t(1:2))
plotTensor3D(X_CP)
plotTensor3D(recX_CP)

When applying NTF against the first data, we can see that the four blocks are properly reconstrcted. Next, we apply NTF aginst the second data.

set.seed(123456)
out_NTF_Tucker <- NTF(X_Tucker, algorithm="KL", rank=4)
str(out_NTF_Tucker, 2)
## List of 6
##  $ S            : num [1:4] 2.09e+05 1.57e-15 1.57e-15 1.57e-15
##  $ A            :List of 3
##   ..$ : num [1:4, 1:50] 1.27e-01 2.22e-16 6.70e-01 3.65e-08 1.27e-01 ...
##   ..$ : num [1:4, 1:50] 1.91e-01 2.22e-16 2.22e-16 2.22e-16 1.93e-01 ...
##   ..$ : num [1:4, 1:50] 0.104 0.141 0.141 0.141 0.104 ...
##  $ RecError     : Named num [1:82] 1.00e-09 1.50e+06 2.19e+05 3.33e+05 4.63e+05 ...
##   ..- attr(*, "names")= chr [1:82] "offset" "1" "2" "3" ...
##  $ TrainRecError: Named num [1:82] 1.00e-09 1.50e+06 2.19e+05 3.33e+05 4.63e+05 ...
##   ..- attr(*, "names")= chr [1:82] "offset" "1" "2" "3" ...
##  $ TestRecError : Named num [1:82] 1e-09 0e+00 0e+00 0e+00 0e+00 0e+00 0e+00 0e+00 0e+00 0e+00 ...
##   ..- attr(*, "names")= chr [1:82] "offset" "1" "2" "3" ...
##  $ RelChange    : Named num [1:82] 1.00e-09 9.94e-01 5.83 3.41e-01 2.80e-01 ...
##   ..- attr(*, "names")= chr [1:82] "offset" "1" "2" "3" ...
layout(t(1:2))
plot(log10(out_NTF_Tucker$RecError[-1]), type="b", main="Reconstruction Error")
plot(log10(out_NTF_Tucker$RelChange[-1]), type="b", main="Relative Change")

Unlike with the first data, the second data shows that NTF does not work.

recX_Tucker <- recTensor(out_NTF_Tucker$S, out_NTF_Tucker$A, idx=1:3)
layout(t(1:2))
plotTensor3D(X_Tucker)
plotTensor3D(recX_Tucker)

NTD

Next, we introduce another type of non-negative tensor decomposition method, non-negative Tucker decomposition (NTD (Kim 2017, 2008; Phan 2008a, 2011)). The difference with the NTF is that different ranks can be specified for factor matrices such as \(A_1\) (\(J1 \times N\)), \(A_2\) (\(J2 \times M\)), and \(A_3\) (\(J3 \times L\)) and that the core tensor can have non-negative values in the non-diagonal elements. This degree of freedom will show that NTD fits the second set of data well.

set.seed(123456)
out_NTD <- NTD(X_Tucker, rank=c(3,4,5))
str(out_NTD, 2)
## List of 6
##  $ S            :Formal class 'Tensor' [package "rTensor"] with 3 slots
##  $ A            :List of 3
##   ..$ A1: num [1:3, 1:50] 1.10e-08 5.89e-08 3.98e-01 1.47e-08 5.94e-08 ...
##   ..$ A2: num [1:4, 1:50] 9.42e-05 4.74e-05 1.60e-03 4.46e-01 8.38e-06 ...
##   ..$ A3: num [1:5, 1:50] 0.000215 0.023118 0.000276 0.443 0.000275 ...
##  $ RecError     : Named num [1:101] 1.00e-09 5.43e+03 5.24e+03 5.17e+03 5.13e+03 ...
##   ..- attr(*, "names")= chr [1:101] "offset" "1" "2" "3" ...
##  $ TrainRecError: Named num [1:101] 1.00e-09 5.43e+03 5.24e+03 5.17e+03 5.13e+03 ...
##   ..- attr(*, "names")= chr [1:101] "offset" "1" "2" "3" ...
##  $ TestRecError : Named num [1:101] 1e-09 0e+00 0e+00 0e+00 0e+00 0e+00 0e+00 0e+00 0e+00 0e+00 ...
##   ..- attr(*, "names")= chr [1:101] "offset" "1" "2" "3" ...
##  $ RelChange    : Named num [1:101] 1.00e-09 7.47e-02 3.61e-02 1.43e-02 7.75e-03 ...
##   ..- attr(*, "names")= chr [1:101] "offset" "1" "2" "3" ...
layout(t(1:2))
plot(out_NTD$RecError[-1], type="b", main="Reconstruction Error")
plot(out_NTD$RelChange[-1], type="b", main="Relative Change")

recX_Tucker2 <- recTensor(out_NTD$S, out_NTD$A, idx=1:3)
layout(t(1:2))
plotTensor3D(X_Tucker)
plotTensor3D(recX_Tucker2)

NTD-2

NTD-2 extracts two factor matrices from two modes of a non-negative third-order tensor. By specifying two values against rank and modes, NTD-2 can be easily performed as follows. For mode not specified, a unit matrix is assigned instead.

set.seed(123456)
out_NTD2_1 <- NTD(X_Tucker, rank=c(3,4), modes=1:2)
out_NTD2_2 <- NTD(X_Tucker, rank=c(3,5), modes=c(1,3))
out_NTD2_3 <- NTD(X_Tucker, rank=c(4,5), modes=2:3)
layout(rbind(1:2, 3:4, 5:6))
plot(out_NTD2_1$RecError[-1], type="b", main="Reconstruction Error\nNTD-2 (mode=1:2)")
plot(out_NTD2_1$RelChange[-1], type="b", main="Relative Change\nNTD-2 (mode=1:2)")
plot(out_NTD2_2$RecError[-1], type="b", main="Reconstruction Error\nNTD-2 (mode=c(1,3))")
plot(out_NTD2_2$RelChange[-1], type="b", main="Relative Change\nNTD-2 (mode=c(1,3))")
plot(out_NTD2_3$RecError[-1], type="b", main="Reconstruction Error\nNTD-2 (mode=2:3)")
plot(out_NTD2_3$RelChange[-1], type="b", main="Relative Change\nNTD-2 (mode=2:3)")

recX_Tucker2_1 <- recTensor(out_NTD2_1$S, out_NTD2_1$A, idx=1:3)
recX_Tucker2_2 <- recTensor(out_NTD2_2$S, out_NTD2_2$A, idx=1:3)
recX_Tucker2_3 <- recTensor(out_NTD2_3$S, out_NTD2_3$A, idx=1:3)
layout(rbind(1:2, 3:4))
plotTensor3D(X_Tucker)
plotTensor3D(recX_Tucker2_1)
plotTensor3D(recX_Tucker2_2)
plotTensor3D(recX_Tucker2_3)

NTD-1

NTD-1 extracts a factor matrix from only one mode of a non-negative third-order tensor. By specifying only one value against rank and modes, NTD-1 can be easily performed as follows. For modes not specified, two unit matrices is assigned instead.

set.seed(123456)
out_NTD1_1 <- NTD(X_Tucker, rank=3, modes=1)
out_NTD1_2 <- NTD(X_Tucker, rank=4, modes=2)
out_NTD1_3 <- NTD(X_Tucker, rank=5, modes=3)
layout(rbind(1:2, 3:4, 5:6))
plot(out_NTD1_1$RecError[-1], type="b", main="Reconstruction Error\nNTD-1 (mode=1:2)")
plot(out_NTD1_1$RelChange[-1], type="b", main="Relative Change\nNTD-1 (mode=1:2)")
plot(out_NTD1_2$RecError[-1], type="b", main="Reconstruction Error\nNTD-1 (mode=c(1,3))")
plot(out_NTD1_2$RelChange[-1], type="b", main="Relative Change\nNTD-1 (mode=c(1,3))")
plot(out_NTD1_3$RecError[-1], type="b", main="Reconstruction Error\nNTD-1 (mode=2:3)")
plot(out_NTD1_3$RelChange[-1], type="b", main="Relative Change\nNTD-1 (mode=2:3)")

recX_Tucker2_1 <- recTensor(out_NTD1_1$S, out_NTD1_1$A, idx=1:3)
recX_Tucker2_2 <- recTensor(out_NTD1_2$S, out_NTD1_2$A, idx=1:3)
recX_Tucker2_3 <- recTensor(out_NTD1_3$S, out_NTD1_3$A, idx=1:3)
layout(rbind(1:2, 3:4))
plotTensor3D(X_Tucker)
plotTensor3D(recX_Tucker2_1)
plotTensor3D(recX_Tucker2_2)
plotTensor3D(recX_Tucker2_3)

Higher-order tensor decomposition

NTF and NTD can also be applied to non-negative higher-order tensors like follows.

library("nnTensor")
library("rTensor")
X_Higher <- as.tensor(array(runif(3*4*5*6*7), dim=c(3,4,5,6,7)))
out_NTF_Higher <- NTF(X_Higher, rank=4)
out_NTD_Higher <- NTD(X_Higher, rank=c(1,2,1,2,3))

Session Information

## R version 4.3.0 (2023-04-21)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.2 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Etc/UTC
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] rTensor_1.4.8  nnTensor_1.2.0
## 
## loaded via a namespace (and not attached):
##  [1] viridis_0.6.3      sass_0.4.6         utf8_1.2.3         generics_0.1.3    
##  [5] tcltk_4.3.0        digest_0.6.31      magrittr_2.0.3     evaluate_0.21     
##  [9] grid_4.3.0         RColorBrewer_1.1-3 fastmap_1.1.1      maps_3.4.1        
## [13] jsonlite_1.8.5     misc3d_0.9-1       gridExtra_2.3      fansi_1.0.4       
## [17] spam_2.9-1         viridisLite_0.4.2  scales_1.2.1       jquerylib_0.1.4   
## [21] cli_3.6.1          rlang_1.1.1        munsell_0.5.0      withr_2.5.0       
## [25] cachem_1.0.8       yaml_2.3.7         tools_4.3.0        dplyr_1.1.2       
## [29] colorspace_2.1-0   ggplot2_3.4.2      vctrs_0.6.2        R6_2.5.1          
## [33] lifecycle_1.0.3    plot3D_1.4         MASS_7.3-60        pkgconfig_2.0.3   
## [37] pillar_1.9.0       bslib_0.5.0        gtable_0.3.3       glue_1.6.2        
## [41] Rcpp_1.0.10        fields_14.1        xfun_0.39          tibble_3.2.1      
## [45] tidyselect_1.2.0   highr_0.10         knitr_1.43         farver_2.1.1      
## [49] htmltools_0.5.5    rmarkdown_2.22     labeling_0.4.2     tagcloud_0.6      
## [53] dotCall64_1.0-2    compiler_4.3.0

References

CICHOCK, A. et al. 2009. Nonnegative Matrix and Tensor Factorizations. Wiley.
Cichocki, A. et al. 2007. “Non-Negative Tensor Factorization Using Alpha and Beta Divergence.” ICASSP’07, III-1393-III-1396.
———. 2009. “Fast Local Algorithms for Large Scale Nonnegative Matrix and Tensor Factorizations.” IEICE Transactions 92-A: 708–21.
Kim, Y.-D. et al. 2008. “Nonneegative Tucker Decomposition with Alpha-Divergence.”
———. 2017. “Nonnegative Tucker Decomposition.” IEEE CVPR, 1–8.
Phan, A. H. et al. 2008a. “Fast and Efficient Algorithms for Nonnegative Tucker Decomposition.” ISNN 2008, 772–82.
———. 2008b. “Multi-Way Nonnegative Tensor Factorization Using Fast Hierarchical Alternating Least Squares Algorithm (HALS).” NOLTA 2008.
———. 2011. “Extended HALS Algorithm for Nonnegative Tucker Decomposition and Its Applications for Multiway Analysis and Classification.” Neurocomputing 74(11): 1956–69.