GRIMMER

library(scrutiny)

Granularity-related inconsistency of means mapped to error repeats, or GRIMMER, is a test for the mathematical consistency of reported means or proportions with the corresponding standard deviations (SDs) and sample sizes (Anaya 2016; Allard 2018).

GRIMMER builds up on GRIM (Brown and Heathers 2017). Indeed, the elegant Analytic-GRIMMER algorithm (Allard 2018) implemented here tests for GRIM-consistency before conducting its own unique tests.

This vignette covers scrutiny’s implementation of the GRIMMER test. It’s an adapted version of the GRIM vignette because both the tests themselves and their implementations in scrutiny are very similar. If you are familiar with scrutiny’s grim_*() functions, much of the present vignette will seem quite natural to you.

The vignette has the following sections — to get started, though, you only need the first one:

  1. The basic grimmer() function and a specialized mapping function, grimmer_map().

  2. The audit() method for summarizing grimmer_map()’s results.

  3. The visualization function grim_plot(), which also works for GRIMMER.

  4. Testing numeric sequences with grimmer_map_seq().

  5. Handling unknown group sizes with grimmer_map_total_n().

Basic GRIMMER testing

Few cases: grimmer()

To test if a reported mean of 7.3 on a granular scale is GRIMMER-consistent with an SD of 2.51 and a sample size of 12, run this:

grimmer(x = "7.3", sd = "2.51", n = 12)
#>   7.3 
#> FALSE

Note that x, the reported mean, needs to be a string. The reason is that strings preserve trailing zeros, which can be crucial for GRIMMER-testing. Numeric values don’t, and even converting them to strings won’t help. A workaround for larger numbers of such values, restore_zeros(), is discussed in vignette("wrangling").

grimmer() has some further parameters, but all of them can be used from within grimmer_map(). The other parameters will be discussed in that context because grimmer_map() is often the more useful function in practice. Furthermore, although grimmer() is vectorized, grimmer_map() is safer and more convenient for testing multiple combinations of means, SDs, and sample sizes.

Many cases: grimmer_map()

If you want to GRIMMER-test more than a handful of cases, the recommended way is to enter them into a data frame and to run grimmer_map() on the data frame. Two different ways to do that are discussed in vignette("wrangling"), but here, I will only describe an easily accessible solution for a single table.

Copy summary data from a PDF file and paste them into tibble::tribble(), which is available via scrutiny:

flying_pigs1 <- tribble(
  ~x,   ~sd,
"8.9",  "2.81",
"2.6",  "2.05",
"7.2",  "2.89",
"3.6",  "3.11",
"9.2",  "7.13",
"10.4", "2.53",
"7.3",  "3.14"
) %>% 
  mutate(n = 25)

Use RStudio’s multiple cursors to draw quotation marks around all the x and sd values, and to set commas at the end. See vignette("wrangling"), section With copy and paste, if you are not sure how to do that.

Now, simply run grimmer_map() on that data frame:

grimmer_map(flying_pigs1)
#> # A tibble: 7 × 5
#>   x     sd        n consistency reason                       
#>   <chr> <chr> <dbl> <lgl>       <chr>                        
#> 1 8.9   2.81     25 FALSE       GRIMMER inconsistent (test 3)
#> 2 2.6   2.05     25 TRUE        Passed all                   
#> 3 7.2   2.89     25 TRUE        Passed all                   
#> 4 3.6   3.11     25 FALSE       GRIMMER inconsistent (test 3)
#> 5 9.2   7.13     25 TRUE        Passed all                   
#> 6 10.4  2.53     25 TRUE        Passed all                   
#> 7 7.3   3.14     25 TRUE        Passed all

The x and n columns are the same as in the input. By default, the number of items composing the mean is assumed to be 1. The main result, consistency, is the GRIMMER consistency of the former three columns.

The reason column says why a set of values was inconsistent. To be GRIMMER-consistent, a value set needs to pass four separate tests: the three GRIMMER tests by Allard (2018) and the more basic GRIM test. Here, the two inconsistent values passed GRIM as well as the first two GRIMMER tests, but failed the third one. All consistent value sets are marked as "Passed all" in the "reason" column.

Scale items

NOTE: Don’t use the items argument. It currently contains a bug that will be fixed in scrutiny’s next CRAN release.

If a mean is composed of multiple items, set the items parameter to that number. Below are hypothetical means of a three-items scale. With the single-item default, half of these are wrongly flagged as inconsistent:

jpap_1 <- tribble(
   ~x,    ~sd,
  "5.90", "2.19",
  "5.71", "1.42",
  "3.50", "1.81",
  "3.82", "2.43",
  "4.61", "1.92",
  "5.24", "2.51",
) %>% 
  mutate(n = 40)

jpap_1 %>% 
  grimmer_map()  # default is wrong here!
#> # A tibble: 6 × 5
#>   x     sd        n consistency reason           
#>   <chr> <chr> <dbl> <lgl>       <chr>            
#> 1 5.90  2.19     40 TRUE        Passed all       
#> 2 5.71  1.42     40 FALSE       GRIM inconsistent
#> 3 3.50  1.81     40 TRUE        Passed all       
#> 4 3.82  2.43     40 TRUE        Passed all       
#> 5 4.61  1.92     40 FALSE       GRIM inconsistent
#> 6 5.24  2.51     40 FALSE       GRIM inconsistent

Yet, all of them are consistent if the correct number of items is stated:

jpap_1 %>% 
  grimmer_map(items = 3)
#> Warning: The `items` argument in GRIMMER functions doesn't currently work the way it
#> should.
#> The `items` argument in GRIMMER functions doesn't currently work the way it
#> should.
#> The `items` argument in GRIMMER functions doesn't currently work the way it
#> should.
#> The `items` argument in GRIMMER functions doesn't currently work the way it
#> should.
#> The `items` argument in GRIMMER functions doesn't currently work the way it
#> should.
#> The `items` argument in GRIMMER functions doesn't currently work the way it
#> should.
#> # A tibble: 6 × 5
#>   x     sd        n consistency reason    
#>   <chr> <chr> <dbl> <lgl>       <chr>     
#> 1 5.90  2.19    120 TRUE        Passed all
#> 2 5.71  1.42    120 TRUE        Passed all
#> 3 3.50  1.81    120 TRUE        Passed all
#> 4 3.82  2.43    120 TRUE        Passed all
#> 5 4.61  1.92    120 TRUE        Passed all
#> 6 5.24  2.51    120 TRUE        Passed all

It is also possible to include an items column in the data frame instead:

jpap_2 <- tribble(
   ~x,     ~sd,    ~items,
  "6.92",  "2.19",  1,
  "3.48",  "1.42",  1,
  "1.59",  "1.81",  2,
  "2.61",  "2.43",  2,
  "4.04",  "1.92",  3,
  "4.50",  "2.51",  3,
) %>% 
  mutate(n = 30)

jpap_2 %>% 
  grimmer_map()
#> Warning: The `items` argument in GRIMMER functions doesn't currently work the way it
#> should.
#> The `items` argument in GRIMMER functions doesn't currently work the way it
#> should.
#> The `items` argument in GRIMMER functions doesn't currently work the way it
#> should.
#> The `items` argument in GRIMMER functions doesn't currently work the way it
#> should.
#> # A tibble: 6 × 5
#>   x     sd        n consistency reason           
#>   <chr> <chr> <dbl> <lgl>       <chr>            
#> 1 6.92  2.19     30 FALSE       GRIM inconsistent
#> 2 3.48  1.42     30 FALSE       GRIM inconsistent
#> 3 1.59  1.81     60 FALSE       GRIM inconsistent
#> 4 2.61  2.43     60 FALSE       GRIM inconsistent
#> 5 4.04  1.92     90 TRUE        Passed all       
#> 6 4.50  2.51     90 TRUE        Passed all

Rounding

The scrutiny package provides infrastructure for reconstructing rounded numbers. All of that can be commanded from within grimmer() and grimmer_map(). Several parameters allow for stating the precise way in which the original numbers have supposedly been rounded.

First and foremost is rounding. It takes a string with the rounding procedure’s name, which leads to the number being rounded in either of these ways:

  1. Rounded "up" or "down" from 5. Note that SAS, SPSS, Stata, Matlab, and Excel round "up" from 5, whereas Python used to round "down" from 5.
  2. Rounded to "even" using base R’s own round().
  3. Rounded "up_from" or "down_from" some number, which then needs to be specified via the threshold parameter.
  4. Given a "ceiling" or "floor" at the respective decimal place.
  5. Rounded towards zero with "trunc" or away from zero with "anti_trunc".

The default, "up_or_down", allows for numbers rounded either "up" or "down" from 5 when GRIMMER-testing; and likewise for "up_from_or_down_from" and "ceiling_or_floor". For more about these procedures, see documentation for round(), round_up(), and round_ceiling(). These include all of the above ways of rounding.

Points 3 to 5 above list some quite obscure options that were only included to cover a wide spectrum of possible rounding procedures. The same is true for the threshold and symmetric parameters, so these aren’t discussed here any further. Learn more about scrutiny’s infrastructure for rounding at vignette("rounding").

By default, grimmer() and grimmer_map() accept values rounded either up or down from 5. If you have reason to impose stricter assumptions on the way x and sd were rounded, specify rounding accordingly.

It might still be important to account for the different ways in which numbers can be rounded, if only to demonstrate that some given results are robust to those variable decisions. To err on the side of caution, the default for rounding is the permissive "up_or_down".

Summarizing results with audit()

Following up on a call to grimmer_map(), the generic function audit() summarizes GRIMMER test results:

flying_pigs1 %>% 
  grimmer_map() %>% 
  audit()
#> # A tibble: 1 × 7
#>   incons_cases all_cases incons_rate fail_grim fail_test1 fail_test2 fail_test3
#>          <int>     <int>       <dbl>     <int>      <int>      <int>      <int>
#> 1            2         7       0.286         0          0          0          2

These columns are —

  1. incons_cases: number of GRIMMER-inconsistent value sets.

  2. all_cases: total number of value sets.

  3. incons_rate: proportion of GRIMMER-inconsistent value sets.

  4. fail_grim, fail_test1, fail_test2, fail_test3: number of value sets failing the GRIM test or one of the three GRIMMER tests, respectively.

Visualizing results with grim_plot()

GRIMMER does not currently have a dedicated visualization function in scrutiny. However, grim_plot() will accept the output of grimmer_map() just as well as that from grim_map():

jpap_5 <- tribble(
  ~x,      ~sd,    ~n,
  "7.19",  "1.19",  54,
  "4.56",  "2.56",  66,
  "0.42",  "1.29",  59,
  "1.31",  "3.50",  57,
  "3.48",  "3.65",  66,
  "4.27",  "2.86",  61,
  "6.21",  "2.15",  62,
  "3.11",  "3.17",  50,
  "5.39",  "2.37",  68,
  "5.66",  "1.11",  44,
)


jpap_5 %>% 
  grimmer_map() %>% 
  grim_plot()
#> → Also visualizing 3 GRIMMER inconsistencies.

However, grim_plot() will fail with any object not returned by either of these two functions:

grim_plot(mtcars)
#> Error in `grim_plot()`:
#> ! `grim_plot()` needs GRIM or GRIMMER test results.
#> ✖ `data` is not `grim_map()` or `grimmer_map()` output.
#> ℹ The only exception is an "empty" plot that shows the background raster but no
#>   empirical test results. Create such a plot by setting `show_data` to `FALSE`.

See the GRIM vignette section on grim_plot() for more information.

Testing numeric sequences with grimmer_map_seq()

GRIMMER analysts might be interested in a mean or percentage value’s numeric neighborhood. Suppose you found multiple GRIMMER inconsistencies as in out example pigs5 data. You might wonder whether they are due to small reporting or computing errors.

Use grimmer_map_seq() to GRIMMER-test the values surrounding the reported means and sample sizes:

out_seq1 <- grimmer_map_seq(pigs5)
out_seq1
#> # A tibble: 180 × 7
#>    x     sd        n consistency reason             case var  
#>    <chr> <chr> <dbl> <lgl>       <chr>             <int> <chr>
#>  1 7.17  5.30     38 FALSE       GRIM inconsistent     1 x    
#>  2 7.18  5.30     38 TRUE        Passed all            1 x    
#>  3 7.19  5.30     38 FALSE       GRIM inconsistent     1 x    
#>  4 7.20  5.30     38 FALSE       GRIM inconsistent     1 x    
#>  5 7.21  5.30     38 TRUE        Passed all            1 x    
#>  6 7.23  5.30     38 FALSE       GRIM inconsistent     1 x    
#>  7 7.24  5.30     38 TRUE        Passed all            1 x    
#>  8 7.25  5.30     38 FALSE       GRIM inconsistent     1 x    
#>  9 7.26  5.30     38 TRUE        Passed all            1 x    
#> 10 7.27  5.30     38 FALSE       GRIM inconsistent     1 x    
#> # ℹ 170 more rows

Summaries with audit_seq()

As this output is a little unwieldy, run audit_seq() on the results:

audit_seq(out_seq1)
#> # A tibble: 6 × 17
#>   x     sd        n consistency hits_total hits_x hits_sd hits_n diff_x
#>   <chr> <chr> <dbl> <lgl>            <int>  <int>   <int>  <int>  <dbl>
#> 1 7.22  5.30     38 FALSE                8      4       0      4      1
#> 2 5.23  2.55     35 FALSE               16      2      10      4      3
#> 3 2.57  2.57     30 FALSE               11      1       8      2      3
#> 4 6.77  2.18     33 FALSE                6      4       0      2      1
#> 5 7.01  6.68     35 FALSE                4      4       0      0      1
#> 6 3.14  5.32     33 FALSE                9      4       0      5      1
#> # ℹ 8 more variables: diff_x_up <dbl>, diff_x_down <dbl>, diff_sd <dbl>,
#> #   diff_sd_up <dbl>, diff_sd_down <dbl>, diff_n <dbl>, diff_n_up <dbl>,
#> #   diff_n_down <dbl>

Here is what the output columns mean:

The default for dispersion is 1:5, for five steps up and down. When the dispersion sequence gets longer, the number of hits tends to increase:

out_seq2 <- grimmer_map_seq(pigs5, dispersion = 1:10)
audit_seq(out_seq2)
#> # A tibble: 6 × 17
#>   x     sd        n consistency hits_total hits_x hits_sd hits_n diff_x
#>   <chr> <chr> <dbl> <lgl>            <int>  <int>   <int>  <int>  <dbl>
#> 1 7.22  5.30     38 FALSE               15      8       0      7      1
#> 2 5.23  2.55     35 FALSE               32      6      19      7      3
#> 3 2.57  2.57     30 FALSE               24      3      16      5      3
#> 4 6.77  2.18     33 FALSE               11      7       0      4      1
#> 5 7.01  6.68     35 FALSE                8      8       0      0      1
#> 6 3.14  5.32     33 FALSE               14      7       0      7      1
#> # ℹ 8 more variables: diff_x_up <dbl>, diff_x_down <dbl>, diff_sd <dbl>,
#> #   diff_sd_up <dbl>, diff_sd_down <dbl>, diff_n <dbl>, diff_n_up <dbl>,
#> #   diff_n_down <dbl>

Visualizing GRIMMER-tested sequences

It’s curious what happens when we plot the output of grimmer_map_seq(). Like regular GRIM or GRIMMER plots, however, it does give us a sense of how many tested values are consistent:

grim_plot(out_seq1)
#> → Also visualizing 4 GRIMMER inconsistencies.

The crosses appear because grimmer_map_seq() creates sequences around both x and n. Restrict this process to any one of these with the var argument:

out_seq1_only_x <- grimmer_map_seq(pigs5, var = "x")
out_seq1_only_n <- grimmer_map_seq(pigs5, var = "n")

grim_plot(out_seq1_only_x)
#> → Also visualizing 1 GRIMMER inconsistency.

grim_plot(out_seq1_only_n)
#> → Also visualizing 1 GRIMMER inconsistency.

Handling unknown group sizes with grimmer_map_total_n()

Problems from underreporting

Unfortunately, some studies that report group averages don’t report the corresponding group sizes — only a total sample size. This makes any direct GRIMMER-testing impossible because only x values are known, not n values. All that is feasible here in terms of GRIMMER is to take a number around half the total sample size, go up and down from it, and check which hypothetical group sizes are consistent with the reported group means. grimmer_map_total_n() semi-automates this process, motivated by a recent GRIM analysis (Bauer and Francis 2021).

Here is an example:

jpap_6 <- tibble::tribble(
    ~x1,    ~x2,    ~sd1,   ~sd2,   ~n,
    "3.43", "5.28", "1.09", "2.12", 70,
    "2.97", "4.42", "0.43", "1.65", 65
)

out_total_n <- grimmer_map_total_n(jpap_6)
out_total_n
#> # A tibble: 48 × 9
#>    x     sd        n n_change consistency both_consistent reason      case dir  
#>    <chr> <chr> <dbl>    <dbl> <lgl>       <lgl>           <chr>      <int> <chr>
#>  1 3.43  1.09     35        0 FALSE       FALSE           GRIMMER i…     1 forth
#>  2 5.28  2.12     35        0 FALSE       FALSE           GRIM inco…     1 forth
#>  3 3.43  1.09     34       -1 FALSE       FALSE           GRIM inco…     1 forth
#>  4 5.28  2.12     36        1 FALSE       FALSE           GRIMMER i…     1 forth
#>  5 3.43  1.09     33       -2 FALSE       FALSE           GRIM inco…     1 forth
#>  6 5.28  2.12     37        2 FALSE       FALSE           GRIM inco…     1 forth
#>  7 3.43  1.09     32       -3 FALSE       FALSE           GRIM inco…     1 forth
#>  8 5.28  2.12     38        3 FALSE       FALSE           GRIM inco…     1 forth
#>  9 3.43  1.09     31       -4 FALSE       FALSE           GRIM inco…     1 forth
#> 10 5.28  2.12     39        4 FALSE       FALSE           GRIMMER i…     1 forth
#> # ℹ 38 more rows

audit_total_n(out_total_n)
#> # A tibble: 2 × 10
#>   x1    x2    sd1   sd2       n hits_total hits_forth hits_back scenarios_total
#>   <chr> <chr> <chr> <chr> <dbl>      <dbl>      <dbl>     <dbl>           <dbl>
#> 1 3.43  5.28  1.09  2.12     70          1          1         0              12
#> 2 2.97  4.42  0.43  1.65     65          1          0         1              12
#> # ℹ 1 more variable: hit_rate <dbl>

See the GRIM vignette, section Handling unknown group sizes with grim_map_total_n(), for a more comprehensive case study. It uses grim_map_total_n(), which is the same as grimmer_map_total_n() but only for GRIM.

References

Allard, Aurélien. 2018. “Analytic-GRIMMER: A New Way of Testing the Possibility of Standard Deviations.” https://aurelienallard.netlify.app/post/anaytic-grimmer-possibility-standard-deviations/.
Anaya, Jordan. 2016. “The GRIMMER Test: A Method for Testing the Validity of Reported Measures of Variability.”
Bauer, Patricia J., and Gregory Francis. 2021. “Expression of Concern: Is It Light or Dark? Recalling Moral Behavior Changes Perception of Brightness.” Psychological Science 32 (12): 2042–43.
Brown, Nicholas J. L., and James A. J. Heathers. 2017. “The GRIM Test: A Simple Technique Detects Numerous Anomalies in the Reporting of Results in Psychology.” Social Psychological and Personality Science 8 (4): 363–69.