A probability mass function can be represented by a multi-dimensional array. However, for high-dimensional distributions where each variable may have a large state space, lack of computer memory can become a problem. For example, an \(80\)-dimensional random vector in which each variable has \(10\) levels will lead to a state space with \(10^{80}\) cells. Such a distribution can not be stored in a computer; in fact, \(10^{80}\) is one of the estimates of the number of atoms in the universe. However, if the array consists of only a few non-zero values, we need only store these values along with information about their location. That is, a sparse representation of a table. Sparta was created for efficient multiplication and marginalization of sparse tables.

Consider two arrays `f`

and `g`

:

```
dn <- function(x) setNames(lapply(x, paste0, 1:2), toupper(x))
d <- c(2, 2, 2)
f <- array(c(5, 4, 0, 7, 0, 9, 0, 0), d, dn(c("x", "y", "z")))
g <- array(c(7, 6, 0, 6, 0, 0, 9, 0), d, dn(c("y", "z", "w")))
```

with flat layouts

```
ftable(f, row.vars = "X")
#> Y y1 y2
#> Z z1 z2 z1 z2
#> X
#> x1 5 0 0 0
#> x2 4 9 7 0
ftable(g, row.vars = "W")
#> Y y1 y2
#> Z z1 z2 z1 z2
#> W
#> w1 7 0 6 6
#> w2 0 9 0 0
```

We can convert these to their equivalent **sparta** versions as

Printing the object by the default printing method yields

```
print.default(sf)
#> [,1] [,2] [,3] [,4]
#> X 1 2 2 2
#> Y 1 1 2 1
#> Z 1 1 1 2
#> attr(,"vals")
#> [1] 5 4 7 9
#> attr(,"dim_names")
#> attr(,"dim_names")$X
#> [1] "x1" "x2"
#>
#> attr(,"dim_names")$Y
#> [1] "y1" "y2"
#>
#> attr(,"dim_names")$Z
#> [1] "z1" "z2"
#>
#> attr(,"class")
#> [1] "sparta" "matrix"
```

The columns are the cells in the sparse matrix and the `vals`

attribute are the corresponding values which can be extracted with the `vals`

function. Furthermore, the domain resides in the `dim_names`

attribute, which can also be extracted using the `dim_names`

function. From the output, we see that (`x2`

, `y2`

, `z1`

) has a value of \(2\). Using the **sparta** print method prettifies things:

where row \(i\) corresponds to column \(i\) in the sparse matrix. The product of `sf`

and `sg`

```
mfg <- mult(sf, sg); mfg
#> X Y Z W val
#> 1 2 1 2 2 81
#> 2 2 2 1 1 42
#> 3 1 1 1 1 35
#> 4 2 1 1 1 28
```

Converting `sf`

into a conditional probability table (CPT) with conditioning variable `Z`

:

```
sf_cpt <- as_cpt(sf, y = "Z"); sf_cpt
#> X Y Z val
#> 1 1 1 1 0.312
#> 2 2 1 1 0.250
#> 3 2 2 1 0.438
#> 4 2 1 2 1.000
```

Slicing `sf`

on `X1 = x1`

and dropping the `X`

dimension

reduces `sf`

to a single non-zero element, whereas the equivalent dense case would result in a `(Y,Z)`

table with one non-zero element and three zero-elements.

Marginalizing (or summing) out `Y`

in `sg`

yields

Finally, we mention that a sparse table can be created using the constructor `sparta_struct`

, which can be necessary to use if the corresponding dense table is too large to have in memory.

Function name | Description |
---|---|

`as_<sparta>` |
Convert -like object to a `sparta` |

`as_<array/df/cpt>` |
Convert `sparta` object to an `array/data.frame/CPT` |

`sparta_struct` |
Constructor for `sparta` objects |

`mult, div, marg, slice` |
Multiply/divide/marginalize/slice |

`normalize` |
Normalize (the values of the result sum to one) |

`get_val` |
Extract the value for a specific named cell |

`get_cell_name` |
Extract the named cell |

`get_values` |
Extract the values |

`dim_names` |
Extract the domain |

`names` |
Extract the variable names |

`max/min` |
The maximum/minimum value |

`which_<max/min>_cell` |
The column index referring to the max/min value |

`which_<max/min>_idx` |
The configuration corresponding to the max/min value |

`sum` |
Sum the values |

`equiv` |
Test if two tables are identical up to permutations of the columns |