Control Table Keys

John Mount

2023-08-19

In our cdata R package and training materials we emphasize the record-oriented thinking and how to design a transform control table. We now have an additional exciting new feature: control table keys.

The user can now control which columns of a cdata control table are the keys, including now using composite keys (that is keys that are spread across more than one column). This is easiest to demonstrate with an example.

Consider the simple data frame (the first couple of rows of the famous iris data set).

library("cdata")

d <- wrapr::build_frame(
   "Sepal.Length"  , "Sepal.Width", "Petal.Length", "Petal.Width", "Species" |
     5.1           , 3.5          , 1.4           , 0.2          , "setosa"  |
     4.9           , 3            , 1.4           , 0.2          , "setosa"  )
d$id <- seq_len(nrow(d))

knitr::kable(d)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species id
5.1 3.5 1.4 0.2 setosa 1
4.9 3.0 1.4 0.2 setosa 2

Suppose we wish to land the dimensions of different irises in rows keyed by two columns: Part and Measure. That is we want the data to look like the following.

expect <- wrapr::build_frame(
   "id"  , "Species", "Part" , "Measure", "Value" |
     1L  , "setosa" , "Sepal", "Length" , 5.1     |
     1L  , "setosa" , "Sepal", "Width"  , 3.5     |
     1L  , "setosa" , "Petal", "Length" , 1.4     |
     1L  , "setosa" , "Petal", "Width"  , 0.2     |
     2L  , "setosa" , "Sepal", "Length" , 4.9     |
     2L  , "setosa" , "Sepal", "Width"  , 3       |
     2L  , "setosa" , "Petal", "Length" , 1.4     |
     2L  , "setosa" , "Petal", "Width"  , 0.2     )

knitr::kable(expect)
id Species Part Measure Value
1 setosa Sepal Length 5.1
1 setosa Sepal Width 3.5
1 setosa Petal Length 1.4
1 setosa Petal Width 0.2
2 setosa Sepal Length 4.9
2 setosa Sepal Width 3.0
2 setosa Petal Length 1.4
2 setosa Petal Width 0.2

This may seem like an exotic transform, but it seems to be common when you want to use multiple display aesthetics in a plot (such as x, y, color, and facet).

With multiple control table keys specifying these sorts of transforms can be made easy.

First define the control table specifying what a single row-record looks like after transformation to a multi-row block-record.

control_table <- wrapr::qchar_frame(
   "Part"   , "Measure", "Value"      |
     "Sepal", "Length" , Sepal.Length |
     "Sepal", "Width"  , Sepal.Width  |
     "Petal", "Length" , Petal.Length |
     "Petal", "Width"  , Petal.Width  )

layout <- rowrecs_to_blocks_spec(
  control_table,
  controlTableKeys = c("Part", "Measure"),
  recordKeys = c("id", "Species"))

print(layout)
#> {
#>  row_record <- wrapr::qchar_frame(
#>    "id"  , "Species", "Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width" |
#>      .   , .        , Sepal.Length  , Sepal.Width  , Petal.Length  , Petal.Width   )
#>  row_keys <- c('id', 'Species')
#> 
#>  # becomes
#> 
#>  block_record <- wrapr::qchar_frame(
#>    "id"  , "Species", "Part" , "Measure", "Value"      |
#>      .   , .        , "Sepal", "Length" , Sepal.Length |
#>      .   , .        , "Sepal", "Width"  , Sepal.Width  |
#>      .   , .        , "Petal", "Length" , Petal.Length |
#>      .   , .        , "Petal", "Width"  , Petal.Width  )
#>  block_keys <- c('id', 'Species', 'Part', 'Measure')
#> 
#>  # args: c(checkNames = TRUE, checkKeys = FALSE, strict = FALSE, allow_rqdatatable = FALSE)
#> }

Notice we are specifying that the columns Part and Measure are the control table row-keys (identifying rows within the block record) and “id” and “Species” are record keys (specifying which rows are in the same record).

We have designed our desired transform above (as taught here). We are introducing a new convention of putting the key values in quotes (even though wrapr::qcar_frame() does not need this). Notice the quoted fields are column-names of the original row-oriented data, and the quote is indicating that these are names standing in for values.

We can now perform the transform.

res <- d %.>% layout

knitr::kable(res)
id Species Part Measure Value
1 setosa Sepal Length 5.1
1 setosa Sepal Width 3.5
1 setosa Petal Length 1.4
1 setosa Petal Width 0.2
2 setosa Sepal Length 4.9
2 setosa Sepal Width 3.0
2 setosa Petal Length 1.4
2 setosa Petal Width 0.2

And, as one comes to expect with coordinatized/fluid-data transforms, the process is easy to reverse (modulo column and row order).

inv_layout <- t(layout)

print(inv_layout)
#> {
#>  block_record <- wrapr::qchar_frame(
#>    "id"  , "Species", "Part" , "Measure", "Value"      |
#>      .   , .        , "Sepal", "Length" , Sepal.Length |
#>      .   , .        , "Sepal", "Width"  , Sepal.Width  |
#>      .   , .        , "Petal", "Length" , Petal.Length |
#>      .   , .        , "Petal", "Width"  , Petal.Width  )
#>  block_keys <- c('id', 'Species', 'Part', 'Measure')
#> 
#>  # becomes
#> 
#>  row_record <- wrapr::qchar_frame(
#>    "id"  , "Species", "Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width" |
#>      .   , .        , Sepal.Length  , Sepal.Width  , Petal.Length  , Petal.Width   )
#>  row_keys <- c('id', 'Species')
#> 
#>  # args: c(checkNames = TRUE, checkKeys = FALSE, strict = FALSE, allow_rqdatatable = FALSE)
#> }

back <- res %.>% inv_layout

knitr::kable(back)
id Species Sepal.Length Sepal.Width Petal.Length Petal.Width
1 setosa 5.1 3.5 1.4 0.2
2 setosa 4.9 3.0 1.4 0.2

Recle Etino Vibal (who asked for this feature in an issue) has an interesting article https://amateurdatasci.rbind.io/post/table-another-back-again-cdata/ trying some variations on the data shaping concepts.

The cdata unit test include the following variations of the above example:

We think cdata (and the accompanying fluid data methodology, plus extensions) is a very deep and powerful way of wrangling data. Once you take the time to learn the methodology (which is “draw what you want to happen to one record, type that in as your control table, and you are done!”) it is very easy to use.