Building a descriptive analysis

Once a dataset is cleaned and ready for statistical analysis, the first step is typically to summarize it. The univariate_table() function makes it easy to create a custom descriptive analysis while consistently producing clean, presentation-ready output. It is built to integrate directly into your analysis work flow (e.g. R markdown) but can also be called from the console and be rendered in a number of formats.

require(cheese)

heart_disease %>%
  univariate_table()
Variable Level Summary
Age 56 (48, 61)
Sex Female 97 (32.01%)
Sex Male 206 (67.99%)
ChestPain Typical angina 23 (7.59%)
ChestPain Atypical angina 50 (16.5%)
ChestPain Non-anginal pain 86 (28.38%)
ChestPain Asymptomatic 144 (47.52%)
BP 130 (120, 140)
Cholesterol 241 (211, 275)
MaximumHR 153 (133.5, 166)
ExerciseInducedAngina No 204 (67.33%)
ExerciseInducedAngina Yes 99 (32.67%)
HeartDisease No 164 (54.13%)
HeartDisease Yes 139 (45.87%)

By default, an HTML table is produced containing descriptive statistics for columns in the dataset.

Custom string templates

In the table above, the summary statistics are presented within the cells in a particular format for different types of data. You can use the _summary arguments to customize not only the appearance that the results are presented with, but the values that go into the results themselves.

Suppose instead of the "median (q1, q3)" being displayed for numeric data, you want the "mean [sd] / median", in that exact format:

heart_disease %>%
  univariate_table(
    numeric_summary = 
      c(
        Summary = "mean [sd] / median"
      )
  )
Variable Level Summary
Age 54.44 [9.04] / 56
Sex Female 97 (32.01%)
Sex Male 206 (67.99%)
ChestPain Typical angina 23 (7.59%)
ChestPain Atypical angina 50 (16.5%)
ChestPain Non-anginal pain 86 (28.38%)
ChestPain Asymptomatic 144 (47.52%)
BP 131.69 [17.6] / 130
Cholesterol 246.69 [51.78] / 241
MaximumHR 149.61 [22.88] / 153
ExerciseInducedAngina No 204 (67.33%)
ExerciseInducedAngina Yes 99 (32.67%)
HeartDisease No 164 (54.13%)
HeartDisease Yes 139 (45.87%)

The name Summary was used to ensure that the result for the numeric data binded in the same column as the result for the other data types. If you chose to name it something else, you’d get a new column with those summaries:

heart_disease %>%
  univariate_table(
    numeric_summary = 
      c(
        NewSummary = "mean [sd] / median"
      )
  )
Variable Level NewSummary Summary
Age 54.44 [9.04] / 56
Sex Female 97 (32.01%)
Sex Male 206 (67.99%)
ChestPain Typical angina 23 (7.59%)
ChestPain Atypical angina 50 (16.5%)
ChestPain Non-anginal pain 86 (28.38%)
ChestPain Asymptomatic 144 (47.52%)
BP 131.69 [17.6] / 130
Cholesterol 246.69 [51.78] / 241
MaximumHR 149.61 [22.88] / 153
ExerciseInducedAngina No 204 (67.33%)
ExerciseInducedAngina Yes 99 (32.67%)
HeartDisease No 164 (54.13%)
HeartDisease Yes 139 (45.87%)

You can add as many summary columns as you want separately for each type of data:

heart_disease %>%
  univariate_table(
    numeric_summary = 
      c(
        `Numeric only` = "mean [sd] / median",
        Summary = "median (q1, q3)"
      ),
    categorical_summary = 
      c(
        Summary = "count",
        `Categorical only` = "percent = 100 * proportion"
      )
  )
Variable Level Numeric only Summary Categorical only
Age 54.44 [9.04] / 56 56 (48, 61)
Sex Female 97 32.01 = 100 * 0.32
Sex Male 206 67.99 = 100 * 0.68
ChestPain Typical angina 23 7.59 = 100 * 0.08
ChestPain Atypical angina 50 16.5 = 100 * 0.17
ChestPain Non-anginal pain 86 28.38 = 100 * 0.28
ChestPain Asymptomatic 144 47.52 = 100 * 0.48
BP 131.69 [17.6] / 130 130 (120, 140)
Cholesterol 246.69 [51.78] / 241 241 (211, 275)
MaximumHR 149.61 [22.88] / 153 153 (133.5, 166)
ExerciseInducedAngina No 204 67.33 = 100 * 0.67
ExerciseInducedAngina Yes 99 32.67 = 100 * 0.33
HeartDisease No 164 54.13 = 100 * 0.54
HeartDisease Yes 139 45.87 = 100 * 0.46

A more visually-appealing case for adding multiple summaries is probably when all the data is the same type:

heart_disease %>%
  univariate_table(
    categorical_types = NULL, #Easily disable categorical data from being summarized
    numeric_summary =
      c(
        `Median (Q1, Q3)` = "median (q1, q3)",
        `Min-Max` = "min - max",
        `Mean (SD)` = "mean (sd)"
      )
  )
Variable Median (Q1, Q3) Min-Max Mean (SD)
Age 56 (48, 61) 29 - 77 54.44 (9.04)
BP 130 (120, 140) 94 - 200 131.69 (17.6)
Cholesterol 241 (211, 275) 126 - 564 246.69 (51.78)
MaximumHR 153 (133.5, 166) 71 - 202 149.61 (22.88)

Or when adding a summary that applies to all columns:

heart_disease %>%
  univariate_table(
    all_summary = 
      c(
        `# obs. non-missing` = "available of length"
      )
  )
Variable Level Summary # obs. non-missing
Age 56 (48, 61) 303 of 303
Sex 303 of 303
Sex Female 97 (32.01%)
Sex Male 206 (67.99%)
ChestPain 303 of 303
ChestPain Typical angina 23 (7.59%)
ChestPain Atypical angina 50 (16.5%)
ChestPain Non-anginal pain 86 (28.38%)
ChestPain Asymptomatic 144 (47.52%)
BP 130 (120, 140) 303 of 303
Cholesterol 241 (211, 275) 303 of 303
BloodSugar 303 of 303
MaximumHR 153 (133.5, 166) 303 of 303
ExerciseInducedAngina 303 of 303
ExerciseInducedAngina No 204 (67.33%)
ExerciseInducedAngina Yes 99 (32.67%)
HeartDisease 303 of 303
HeartDisease No 164 (54.13%)
HeartDisease Yes 139 (45.87%)

These add an extra row for categorical variables. You may have also noticed that the BloodSugar column didn’t show up in the table until the all_summary argument was used–this is because it is not classified as numeric or categorical data, and thus not evaluated by default. See the “Backend functionality” section to learn more.

Stratification variables

The strata argument takes a formula() that can be used to stratify the analysis by any number of variables. Columns on the left side will appear down the rows, and columns on the right side will spread across the columns. You can use + on either side to specify more than one column. Let’s start by stratifying sex across the columns:

heart_disease %>%
  univariate_table(
    strata = ~ Sex
  )
Variable Level Female Male
Age 57 (50, 63) 54.5 (47, 59.75)
ChestPain Typical angina 4 (4.12%) 19 (9.22%)
ChestPain Atypical angina 18 (18.56%) 32 (15.53%)
ChestPain Non-anginal pain 35 (36.08%) 51 (24.76%)
ChestPain Asymptomatic 40 (41.24%) 104 (50.49%)
BP 132 (120, 140) 130 (120, 140)
Cholesterol 254 (215, 302) 235 (208.75, 268.5)
MaximumHR 157 (142, 165) 150.5 (132, 167.5)
ExerciseInducedAngina No 75 (77.32%) 129 (62.62%)
ExerciseInducedAngina Yes 22 (22.68%) 77 (37.38%)
HeartDisease No 72 (74.23%) 92 (44.66%)
HeartDisease Yes 25 (25.77%) 114 (55.34%)

You can do the same thing down the rows:

heart_disease %>%
  univariate_table(
    strata = Sex ~ 1
  )
Sex Variable Level Summary
Female Age 57 (50, 63)
Female ChestPain Typical angina 4 (4.12%)
Female ChestPain Atypical angina 18 (18.56%)
Female ChestPain Non-anginal pain 35 (36.08%)
Female ChestPain Asymptomatic 40 (41.24%)
Female BP 132 (120, 140)
Female Cholesterol 254 (215, 302)
Female MaximumHR 157 (142, 165)
Female ExerciseInducedAngina No 75 (77.32%)
Female ExerciseInducedAngina Yes 22 (22.68%)
Female HeartDisease No 72 (74.23%)
Female HeartDisease Yes 25 (25.77%)
Male Age 54.5 (47, 59.75)
Male ChestPain Typical angina 19 (9.22%)
Male ChestPain Atypical angina 32 (15.53%)
Male ChestPain Non-anginal pain 51 (24.76%)
Male ChestPain Asymptomatic 104 (50.49%)
Male BP 130 (120, 140)
Male Cholesterol 235 (208.75, 268.5)
Male MaximumHR 150.5 (132, 167.5)
Male ExerciseInducedAngina No 129 (62.62%)
Male ExerciseInducedAngina Yes 77 (37.38%)
Male HeartDisease No 92 (44.66%)
Male HeartDisease Yes 114 (55.34%)

Or even both:

heart_disease %>%
  univariate_table(
    strata = Sex ~ HeartDisease
  )
Sex Variable Level No Yes
Female Age 54 (46, 63.25) 60 (57, 62)
Female ChestPain Typical angina 4 (5.56%) 0 (0%)
Female ChestPain Atypical angina 16 (22.22%) 2 (8%)
Female ChestPain Non-anginal pain 34 (47.22%) 1 (4%)
Female ChestPain Asymptomatic 18 (25%) 22 (88%)
Female BP 130 (119.5, 140) 140 (130, 158)
Female Cholesterol 249 (210.75, 289.5) 268 (236, 307)
Female MaximumHR 159 (146.75, 167.25) 146 (133, 157)
Female ExerciseInducedAngina No 64 (88.89%) 11 (44%)
Female ExerciseInducedAngina Yes 8 (11.11%) 14 (56%)
Male Age 52 (44, 57) 57.5 (51, 61)
Male ChestPain Typical angina 12 (13.04%) 7 (6.14%)
Male ChestPain Atypical angina 25 (27.17%) 7 (6.14%)
Male ChestPain Non-anginal pain 34 (36.96%) 17 (14.91%)
Male ChestPain Asymptomatic 21 (22.83%) 83 (72.81%)
Male BP 130 (120, 140) 130 (120, 140)
Male Cholesterol 229.5 (206.5, 250.75) 247.5 (212, 282)
Male MaximumHR 163 (150, 175.75) 141 (125, 156)
Male ExerciseInducedAngina No 77 (83.7%) 52 (45.61%)
Male ExerciseInducedAngina Yes 15 (16.3%) 62 (54.39%)

Now suppose you want both stratification variables across the columns:

heart_disease %>%
  univariate_table(
    strata = ~ Sex + HeartDisease
  )
Female
Male
Variable Level No Yes No Yes
Age 54 (46, 63.25) 60 (57, 62) 52 (44, 57) 57.5 (51, 61)
ChestPain Typical angina 4 (5.56%) 0 (0%) 12 (13.04%) 7 (6.14%)
ChestPain Atypical angina 16 (22.22%) 2 (8%) 25 (27.17%) 7 (6.14%)
ChestPain Non-anginal pain 34 (47.22%) 1 (4%) 34 (36.96%) 17 (14.91%)
ChestPain Asymptomatic 18 (25%) 22 (88%) 21 (22.83%) 83 (72.81%)
BP 130 (119.5, 140) 140 (130, 158) 130 (120, 140) 130 (120, 140)
Cholesterol 249 (210.75, 289.5) 268 (236, 307) 229.5 (206.5, 250.75) 247.5 (212, 282)
MaximumHR 159 (146.75, 167.25) 146 (133, 157) 163 (150, 175.75) 141 (125, 156)
ExerciseInducedAngina No 64 (88.89%) 11 (44%) 77 (83.7%) 52 (45.61%)
ExerciseInducedAngina Yes 8 (11.11%) 14 (56%) 15 (16.3%) 62 (54.39%)

The levels will span the columns in a hierarchical fashion depending on their order in the formula:

heart_disease %>%
  univariate_table(
    strata = ~ HeartDisease + Sex
  )
No
Yes
Variable Level Female Male Female Male
Age 54 (46, 63.25) 52 (44, 57) 60 (57, 62) 57.5 (51, 61)
ChestPain Typical angina 4 (5.56%) 12 (13.04%) 0 (0%) 7 (6.14%)
ChestPain Atypical angina 16 (22.22%) 25 (27.17%) 2 (8%) 7 (6.14%)
ChestPain Non-anginal pain 34 (47.22%) 34 (36.96%) 1 (4%) 17 (14.91%)
ChestPain Asymptomatic 18 (25%) 21 (22.83%) 22 (88%) 83 (72.81%)
BP 130 (119.5, 140) 130 (120, 140) 140 (130, 158) 130 (120, 140)
Cholesterol 249 (210.75, 289.5) 229.5 (206.5, 250.75) 268 (236, 307) 247.5 (212, 282)
MaximumHR 159 (146.75, 167.25) 163 (150, 175.75) 146 (133, 157) 141 (125, 156)
ExerciseInducedAngina No 64 (88.89%) 77 (83.7%) 11 (44%) 52 (45.61%)
ExerciseInducedAngina Yes 8 (11.11%) 15 (16.3%) 14 (56%) 62 (54.39%)

Similarly, the rows also collapse hierarchically:

heart_disease %>%
  univariate_table(
    strata = HeartDisease + Sex ~ 1
  )
HeartDisease Sex Variable Level Summary
No Female Age 54 (46, 63.25)
No Female ChestPain Typical angina 4 (5.56%)
No Female ChestPain Atypical angina 16 (22.22%)
No Female ChestPain Non-anginal pain 34 (47.22%)
No Female ChestPain Asymptomatic 18 (25%)
No Female BP 130 (119.5, 140)
No Female Cholesterol 249 (210.75, 289.5)
No Female MaximumHR 159 (146.75, 167.25)
No Female ExerciseInducedAngina No 64 (88.89%)
No Female ExerciseInducedAngina Yes 8 (11.11%)
No Male Age 52 (44, 57)
No Male ChestPain Typical angina 12 (13.04%)
No Male ChestPain Atypical angina 25 (27.17%)
No Male ChestPain Non-anginal pain 34 (36.96%)
No Male ChestPain Asymptomatic 21 (22.83%)
No Male BP 130 (120, 140)
No Male Cholesterol 229.5 (206.5, 250.75)
No Male MaximumHR 163 (150, 175.75)
No Male ExerciseInducedAngina No 77 (83.7%)
No Male ExerciseInducedAngina Yes 15 (16.3%)
Yes Female Age 60 (57, 62)
Yes Female ChestPain Typical angina 0 (0%)
Yes Female ChestPain Atypical angina 2 (8%)
Yes Female ChestPain Non-anginal pain 1 (4%)
Yes Female ChestPain Asymptomatic 22 (88%)
Yes Female BP 140 (130, 158)
Yes Female Cholesterol 268 (236, 307)
Yes Female MaximumHR 146 (133, 157)
Yes Female ExerciseInducedAngina No 11 (44%)
Yes Female ExerciseInducedAngina Yes 14 (56%)
Yes Male Age 57.5 (51, 61)
Yes Male ChestPain Typical angina 7 (6.14%)
Yes Male ChestPain Atypical angina 7 (6.14%)
Yes Male ChestPain Non-anginal pain 17 (14.91%)
Yes Male ChestPain Asymptomatic 83 (72.81%)
Yes Male BP 130 (120, 140)
Yes Male Cholesterol 247.5 (212, 282)
Yes Male MaximumHR 141 (125, 156)
Yes Male ExerciseInducedAngina No 52 (45.61%)
Yes Male ExerciseInducedAngina Yes 62 (54.39%)

You can use any of the functionality described in the previous section with stratification variables as well:

heart_disease %>%
  univariate_table(
    strata = ~ Sex + HeartDisease,
    numeric_summary = 
      c(
        `Mean (SD)` = "mean (sd)"
      ),
    categorical_summary = 
      c(
        `Count (%)` = "count (percent%)"
      )
  )
Female
Male
No
Yes
No
Yes
Variable Level Mean (SD) Count (%) Mean (SD) Count (%) Mean (SD) Count (%) Mean (SD) Count (%)
Age 54.56 (10.27) 59.08 (4.86) 51.04 (8.62) 56.09 (8.39)
ChestPain Typical angina 4 (5.56%) 0 (0%) 12 (13.04%) 7 (6.14%)
ChestPain Atypical angina 16 (22.22%) 2 (8%) 25 (27.17%) 7 (6.14%)
ChestPain Non-anginal pain 34 (47.22%) 1 (4%) 34 (36.96%) 17 (14.91%)
ChestPain Asymptomatic 18 (25%) 22 (88%) 21 (22.83%) 83 (72.81%)
BP 128.74 (16.54) 146.6 (21.12) 129.65 (16.02) 131.93 (17.22)
Cholesterol 256.75 (66.22) 276.16 (59.88) 231.6 (37.64) 246.06 (45.44)
MaximumHR 154.03 (19.25) 143.16 (20.18) 161.78 (18.56) 138.4 (23.08)
ExerciseInducedAngina No 64 (88.89%) 11 (44%) 77 (83.7%) 52 (45.61%)
ExerciseInducedAngina Yes 8 (11.11%) 14 (56%) 15 (16.3%) 62 (54.39%)

The summary columns simply get added to the column-spanning hierarchy.

Adding sample size

The add_n argument will add the sample size to the label for the stratification group:

heart_disease %>%
  univariate_table(
    strata = ~ Sex,
    add_n = TRUE
  )
Variable Level Female (N=97) Male (N=206)
Age 57 (50, 63) 54.5 (47, 59.75)
ChestPain Typical angina 4 (4.12%) 19 (9.22%)
ChestPain Atypical angina 18 (18.56%) 32 (15.53%)
ChestPain Non-anginal pain 35 (36.08%) 51 (24.76%)
ChestPain Asymptomatic 40 (41.24%) 104 (50.49%)
BP 132 (120, 140) 130 (120, 140)
Cholesterol 254 (215, 302) 235 (208.75, 268.5)
MaximumHR 157 (142, 165) 150.5 (132, 167.5)
ExerciseInducedAngina No 75 (77.32%) 129 (62.62%)
ExerciseInducedAngina Yes 22 (22.68%) 77 (37.38%)
HeartDisease No 72 (74.23%) 92 (44.66%)
HeartDisease Yes 25 (25.77%) 114 (55.34%)

When multiple stratification variables are added on one side of the formula, the sample size will show up on the lowest level of the hierarchy, excluding summary columns:

heart_disease %>%
  univariate_table(
    strata = ~ Sex + HeartDisease,
    add_n = TRUE
  )
Female
Male
Variable Level No (N=72) Yes (N=25) No (N=92) Yes (N=114)
Age 54 (46, 63.25) 60 (57, 62) 52 (44, 57) 57.5 (51, 61)
ChestPain Typical angina 4 (5.56%) 0 (0%) 12 (13.04%) 7 (6.14%)
ChestPain Atypical angina 16 (22.22%) 2 (8%) 25 (27.17%) 7 (6.14%)
ChestPain Non-anginal pain 34 (47.22%) 1 (4%) 34 (36.96%) 17 (14.91%)
ChestPain Asymptomatic 18 (25%) 22 (88%) 21 (22.83%) 83 (72.81%)
BP 130 (119.5, 140) 140 (130, 158) 130 (120, 140) 130 (120, 140)
Cholesterol 249 (210.75, 289.5) 268 (236, 307) 229.5 (206.5, 250.75) 247.5 (212, 282)
MaximumHR 159 (146.75, 167.25) 146 (133, 157) 163 (150, 175.75) 141 (125, 156)
ExerciseInducedAngina No 64 (88.89%) 11 (44%) 77 (83.7%) 52 (45.61%)
ExerciseInducedAngina Yes 8 (11.11%) 14 (56%) 15 (16.3%) 62 (54.39%)

A limitation is that when sample size is added in the presence of row and column strata, it is displayed for the marginal groups only:

heart_disease %>%
  univariate_table(
    strata = Sex ~ HeartDisease,
    add_n = TRUE
  )
Sex Variable Level No (N=164) Yes (N=139)
Female (N=97) Age 54 (46, 63.25) 60 (57, 62)
Female (N=97) ChestPain Typical angina 4 (5.56%) 0 (0%)
Female (N=97) ChestPain Atypical angina 16 (22.22%) 2 (8%)
Female (N=97) ChestPain Non-anginal pain 34 (47.22%) 1 (4%)
Female (N=97) ChestPain Asymptomatic 18 (25%) 22 (88%)
Female (N=97) BP 130 (119.5, 140) 140 (130, 158)
Female (N=97) Cholesterol 249 (210.75, 289.5) 268 (236, 307)
Female (N=97) MaximumHR 159 (146.75, 167.25) 146 (133, 157)
Female (N=97) ExerciseInducedAngina No 64 (88.89%) 11 (44%)
Female (N=97) ExerciseInducedAngina Yes 8 (11.11%) 14 (56%)
Male (N=206) Age 52 (44, 57) 57.5 (51, 61)
Male (N=206) ChestPain Typical angina 12 (13.04%) 7 (6.14%)
Male (N=206) ChestPain Atypical angina 25 (27.17%) 7 (6.14%)
Male (N=206) ChestPain Non-anginal pain 34 (36.96%) 17 (14.91%)
Male (N=206) ChestPain Asymptomatic 21 (22.83%) 83 (72.81%)
Male (N=206) BP 130 (120, 140) 130 (120, 140)
Male (N=206) Cholesterol 229.5 (206.5, 250.75) 247.5 (212, 282)
Male (N=206) MaximumHR 163 (150, 175.75) 141 (125, 156)
Male (N=206) ExerciseInducedAngina No 77 (83.7%) 52 (45.61%)
Male (N=206) ExerciseInducedAngina Yes 15 (16.3%) 62 (54.39%)

Association metrics

Often when a descriptive analysis is stratified by one or more variables, it is also of interest to add statistics that compare each variable across the groups. The associations argument allows you to add a list containing an unlimited number of functions that can produce a scalar value to be placed in the table. First, let’s define a function:

#Function for a p-value
pval <-
  function(y, x) {
    
    #For categorical data use Fisher's Exact test
    if(some_type(x, "factor")) {
      
      p <- fisher.test(factor(y), factor(x), simulate.p.value = TRUE)$p.value
    
    #Otherwise use Kruskall-Wallis
    } else {
      
      p <- kruskal.test(x, factor(y))$p.value
      
    }
    
    ifelse(p < 0.001, "<0.001", as.character(round(p, 2)))
    
  }

The stratification variable will be placed in the second argument of the function(s) provided. Now you can add it to the function call:

heart_disease %>%
  univariate_table(
    strata = ~ HeartDisease,
    associations = list(`P-value` = pval)
  )
Variable Level No Yes P-value
Age 52 (44.75, 59) 58 (52, 62) 0.12
Sex <0.001
Sex Female 72 (43.9%) 25 (17.99%)
Sex Male 92 (56.1%) 114 (82.01%)
ChestPain <0.001
ChestPain Typical angina 16 (9.76%) 7 (5.04%)
ChestPain Atypical angina 41 (25%) 9 (6.47%)
ChestPain Non-anginal pain 68 (41.46%) 18 (12.95%)
ChestPain Asymptomatic 39 (23.78%) 105 (75.54%)
BP 130 (120, 140) 130 (120, 145) 0.51
Cholesterol 234.5 (208.75, 267.25) 249 (217.5, 283.5) 0.11
MaximumHR 161 (148.75, 172) 142 (125, 156.5) 0.08
ExerciseInducedAngina <0.001
ExerciseInducedAngina No 141 (85.98%) 63 (45.32%)
ExerciseInducedAngina Yes 23 (14.02%) 76 (54.68%)

The name of function in the list is what becomes the column label.

The comparison will take place across the number of subgroups there are within the column stratification:

heart_disease %>%
  univariate_table(
    strata = ~ Sex + HeartDisease,
    associations = list(`P-value` = pval)
  )
Female
Male
Variable Level No Yes No Yes P-value
Age 54 (46, 63.25) 60 (57, 62) 52 (44, 57) 57.5 (51, 61) 0.53
ChestPain <0.001
ChestPain Typical angina 4 (5.56%) 0 (0%) 12 (13.04%) 7 (6.14%)
ChestPain Atypical angina 16 (22.22%) 2 (8%) 25 (27.17%) 7 (6.14%)
ChestPain Non-anginal pain 34 (47.22%) 1 (4%) 34 (36.96%) 17 (14.91%)
ChestPain Asymptomatic 18 (25%) 22 (88%) 21 (22.83%) 83 (72.81%)
BP 130 (119.5, 140) 140 (130, 158) 130 (120, 140) 130 (120, 140) 0.55
Cholesterol 249 (210.75, 289.5) 268 (236, 307) 229.5 (206.5, 250.75) 247.5 (212, 282) 0.11
MaximumHR 159 (146.75, 167.25) 146 (133, 157) 163 (150, 175.75) 141 (125, 156) 0.01
ExerciseInducedAngina <0.001
ExerciseInducedAngina No 64 (88.89%) 11 (44%) 77 (83.7%) 52 (45.61%)
ExerciseInducedAngina Yes 8 (11.11%) 14 (56%) 15 (16.3%) 62 (54.39%)

However, using a row stratification makes the comparisons be within those groups:

heart_disease %>%
  univariate_table(
    strata = Sex ~ HeartDisease,
    associations = list(`P-value` = pval)
  )
Sex Variable Level No Yes P-value
Female Age 54 (46, 63.25) 60 (57, 62) 0.17
Female ChestPain <0.001
Female ChestPain Typical angina 4 (5.56%) 0 (0%)
Female ChestPain Atypical angina 16 (22.22%) 2 (8%)
Female ChestPain Non-anginal pain 34 (47.22%) 1 (4%)
Female ChestPain Asymptomatic 18 (25%) 22 (88%)
Female BP 130 (119.5, 140) 140 (130, 158) 0.37
Female Cholesterol 249 (210.75, 289.5) 268 (236, 307) 0.58
Female MaximumHR 159 (146.75, 167.25) 146 (133, 157) 0.15
Female ExerciseInducedAngina <0.001
Female ExerciseInducedAngina No 64 (88.89%) 11 (44%)
Female ExerciseInducedAngina Yes 8 (11.11%) 14 (56%)
Male Age 52 (44, 57) 57.5 (51, 61) 0.29
Male ChestPain <0.001
Male ChestPain Typical angina 12 (13.04%) 7 (6.14%)
Male ChestPain Atypical angina 25 (27.17%) 7 (6.14%)
Male ChestPain Non-anginal pain 34 (36.96%) 17 (14.91%)
Male ChestPain Asymptomatic 21 (22.83%) 83 (72.81%)
Male BP 130 (120, 140) 130 (120, 140) 0.71
Male Cholesterol 229.5 (206.5, 250.75) 247.5 (212, 282) 0.11
Male MaximumHR 163 (150, 175.75) 141 (125, 156) 0.26
Male ExerciseInducedAngina <0.001
Male ExerciseInducedAngina No 77 (83.7%) 52 (45.61%)
Male ExerciseInducedAngina Yes 15 (16.3%) 62 (54.39%)

In general, there must be at least one column stratification variable in order to use association metrics. See univariate_associations() for more details on the workhorse of this functionality.

Backend functionality

descriptives() is the function that drives the computation behind the statistics for the columns of the input dataset. Any of its arguments can be passed from univariate_table() to add further customization.

Specifying data types

As noted above, one of the columns did not appear in the table by default because it was a logical() type. By default, only factor() and numeric() types are placed into the result, though there are (at least) three ways to include it:

Change column type prior to function call

You could simply just make the column a conformable type outside of the call:

heart_disease %>%
  dplyr::mutate(
    BloodSugar = factor(BloodSugar)
  ) %>%
  univariate_table()
Variable Level Summary
Age 56 (48, 61)
Sex Female 97 (32.01%)
Sex Male 206 (67.99%)
ChestPain Typical angina 23 (7.59%)
ChestPain Atypical angina 50 (16.5%)
ChestPain Non-anginal pain 86 (28.38%)
ChestPain Asymptomatic 144 (47.52%)
BP 130 (120, 140)
Cholesterol 241 (211, 275)
BloodSugar FALSE 258 (85.15%)
BloodSugar TRUE 45 (14.85%)
MaximumHR 153 (133.5, 166)
ExerciseInducedAngina No 204 (67.33%)
ExerciseInducedAngina Yes 99 (32.67%)
HeartDisease No 164 (54.13%)
HeartDisease Yes 139 (45.87%)

Change scope of what column types are evaluated by what function sets

The _types arguments allow you to specify the data types that are to be interpreted by the high-level function call. Let’s allow logical() types to be treated as a categorical variable:

heart_disease %>%
  univariate_table(
    categorical_types = c("factor", "logical")
  )
Variable Level Summary
Age 56 (48, 61)
Sex Female 97 (32.01%)
Sex Male 206 (67.99%)
ChestPain Typical angina 23 (7.59%)
ChestPain Atypical angina 50 (16.5%)
ChestPain Non-anginal pain 86 (28.38%)
ChestPain Asymptomatic 144 (47.52%)
BP 130 (120, 140)
Cholesterol 241 (211, 275)
BloodSugar FALSE 258 (85.15%)
BloodSugar TRUE 45 (14.85%)
MaximumHR 153 (133.5, 166)
ExerciseInducedAngina No 204 (67.33%)
ExerciseInducedAngina Yes 99 (32.67%)
HeartDisease No 164 (54.13%)
HeartDisease Yes 139 (45.87%)

Allow evaluation by its own set of functions

The most flexible approach would be to define its own set of functions. By default, the data type of anything that is not interpreted as categorical or numeric is considered “other”. There is infrastruce in place to supply functions and summaries in the same manner for these columns.

heart_disease %>%
  univariate_table(
    f_other = list(count = function(x) table(x)),
    other_summary = 
      c(
        Summary = "count"
      )
  )
Variable Level Summary
Age 56 (48, 61)
Sex Female 97 (32.01%)
Sex Male 206 (67.99%)
ChestPain Typical angina 23 (7.59%)
ChestPain Atypical angina 50 (16.5%)
ChestPain Non-anginal pain 86 (28.38%)
ChestPain Asymptomatic 144 (47.52%)
BP 130 (120, 140)
Cholesterol 241 (211, 275)
BloodSugar FALSE 258
BloodSugar TRUE 45
MaximumHR 153 (133.5, 166)
ExerciseInducedAngina No 204 (67.33%)
ExerciseInducedAngina Yes 99 (32.67%)
HeartDisease No 164 (54.13%)
HeartDisease Yes 139 (45.87%)

You would need to also define functions for the percentages, proportions, etc. to exactly match the other examples.

Adding user-specified functions

You can also add custom functions that can be available for numeric or categorical columns:

heart_disease %>%
  univariate_table(
    categorical_types = NULL,
    f_numeric =
      list(
        cv = ~sd(.x) / mean(.x)
      ),
    numeric_summary = 
      c(
        `Coef. of variation` = "sd / mean = cv"
      )
  )
Variable Coef. of variation
Age 9.04 / 54.44 = 0.17
BP 17.6 / 131.69 = 0.13
Cholesterol 51.78 / 246.69 = 0.21
MaximumHR 22.88 / 149.61 = 0.15

The names of functions become the patterns that searched in the string templates.

Additional preferences

Finally, we’ll look at a few of the appearance-related arguments. These can be applied with any combination of other arguments.

Rendering format

As mentioned above, the default format for the table is HTML, but you could choose an alternative with the format argument:

heart_disease %>%
  univariate_table(
    format = "none"
  )
## # A tibble: 14 × 3
##    Variable                Level              Summary         
##    <chr>                   <chr>              <chr>           
##  1 "Age"                   ""                 56 (48, 61)     
##  2 "Sex"                   "Female"           97 (32.01%)     
##  3 ""                      "Male"             206 (67.99%)    
##  4 "ChestPain"             "Typical angina"   23 (7.59%)      
##  5 ""                      "Atypical angina"  50 (16.5%)      
##  6 ""                      "Non-anginal pain" 86 (28.38%)     
##  7 ""                      "Asymptomatic"     144 (47.52%)    
##  8 "BP"                    ""                 130 (120, 140)  
##  9 "Cholesterol"           ""                 241 (211, 275)  
## 10 "MaximumHR"             ""                 153 (133.5, 166)
## 11 "ExerciseInducedAngina" "No"               204 (67.33%)    
## 12 ""                      "Yes"              99 (32.67%)     
## 13 "HeartDisease"          "No"               164 (54.13%)    
## 14 ""                      "Yes"              139 (45.87%)

There are also options for "latex", "pandoc", "markdown".

Relabeling, releveling and reordering

You can use the labels and levels arguments to add clean text to any of the variable or categorical level names, and the order argument to change the position of the variables in the result:

heart_disease %>%
  univariate_table(
    labels = 
      c(
        Age = "Age (years)",
        ChestPain = "Chest pain"
      ),
    levels = 
      list(
        Sex =
          c(
            Male = "M"
          )
      ),
    order = 
      c(
        "BP",
        "Age",
        "Cholesterol"
      )
  )
Variable Level Summary
BP 130 (120, 140)
Age (years) 56 (48, 61)
Cholesterol 241 (211, 275)
Chest pain Typical angina 23 (7.59%)
Chest pain Atypical angina 50 (16.5%)
Chest pain Non-anginal pain 86 (28.38%)
Chest pain Asymptomatic 144 (47.52%)
ExerciseInducedAngina No 204 (67.33%)
ExerciseInducedAngina Yes 99 (32.67%)
HeartDisease No 164 (54.13%)
HeartDisease Yes 139 (45.87%)
MaximumHR 153 (133.5, 166)
Sex Female 97 (32.01%)
Sex M 206 (67.99%)

Notice you only need to specify values that need to be changed. Also, ordering is done with the original names even when relabeled.

Headers, fill values, and captions

The variableName and levelName arguments are used to change what the headers are for the column names and categorical levels, while fill_blanks determines what goes in empty cells. Finally, the caption argument specifies labels the entire table:

heart_disease %>%
  univariate_table(
    variableName = "THESE ARE VARIABLES",
    levelName = "THESE ARE LEVELS",
    fill_blanks = "BLANK",
    caption = "HERE IS MY CAPTION"
  )
HERE IS MY CAPTION
THESE ARE VARIABLES THESE ARE LEVELS Summary
Age BLANK 56 (48, 61)
Sex Female 97 (32.01%)
Sex Male 206 (67.99%)
ChestPain Typical angina 23 (7.59%)
ChestPain Atypical angina 50 (16.5%)
ChestPain Non-anginal pain 86 (28.38%)
ChestPain Asymptomatic 144 (47.52%)
BP BLANK 130 (120, 140)
Cholesterol BLANK 241 (211, 275)
MaximumHR BLANK 153 (133.5, 166)
ExerciseInducedAngina No 204 (67.33%)
ExerciseInducedAngina Yes 99 (32.67%)
HeartDisease No 164 (54.13%)
HeartDisease Yes 139 (45.87%)