autoFC: An R Package for Automatic Item Pairing in Forced-Choice Test Construction

Tutorial

Step 1: Input data

In this tutorial, we suppose that 60 5-Point Likert items measuring Big Five traits, each with a certain item location, are used to build a FC scale with block size 3. We also simulate responses from 1,000 participants for the 60 items on their social desirability.

set.seed(2021)
# Simulation of 1,000 respondents on 60 items. A better simulation should be
# consisting of responses produced by specific IRT parameters.
s1 <- sample(seq(1:5), 500*60, replace = TRUE, 
                                prob = c(0.10, 0.15, 0.20, 0.25, 0.30))
s2 <- sample(seq(1:5), 500*60, replace = TRUE, 
                                prob = c(0.50, 0.10, 0.10, 0.15, 0.15))

item_responses <- matrix(c(s1, s2), ncol = 60)

item_dims <- sample(c("Openness","Conscientiousness","Neuroticism",
                      "Extraversion","Agreeableness"), 60, replace = TRUE)
item_mean <- colMeans(item_responses)
item_difficulty <- runif(60, -1, 1)

# Then we build a data frame with item characteristics
item_chars <- data.frame(DIM = item_dims, SD_Mean = item_mean, DIFF = item_difficulty)

char_weights = c(1, -1, -3)

Step 2: Construct an initial solution

Next, we build a random FC scale using the 60 items with block size 3. You can see from initial_FC that now all 60 items are divided into 20 triplets.

initial_FC <- make_random_block(total_items = 60, item_per_block = 3)
knitr::kable(initial_FC)

59	14	43
42	16	34
41	60	22
26	19	2
28	10	45
12	52	8
35	1	51
23	49	33
40	7	57
55	54	18
38	5	4
31	11	39
13	3	6
32	58	48
15	29	47
27	20	56
24	9	30
44	21	46
53	17	25
50	36	37

Also let’s see the how the item characteristics look like for each of the 20 triplets.

First, the underlying latent traits. We see that there are some cases where two items measuring the same traits appear in the same block, which is something we want to avoid.

knitr::kable(matrix(item_chars$DIM[t(initial_FC)], ncol = 3, byrow = TRUE))

Conscientiousness	Conscientiousness	Neuroticism
Extraversion	Openness	Agreeableness
Openness	Conscientiousness	Openness
Conscientiousness	Extraversion	Openness
Neuroticism	Extraversion	Extraversion
Openness	Openness	Conscientiousness
Neuroticism	Openness	Agreeableness
Openness	Extraversion	Extraversion
Extraversion	Conscientiousness	Extraversion
Openness	Agreeableness	Extraversion
Neuroticism	Neuroticism	Neuroticism
Extraversion	Conscientiousness	Openness
Neuroticism	Conscientiousness	Neuroticism
Extraversion	Extraversion	Neuroticism
Openness	Openness	Neuroticism
Neuroticism	Conscientiousness	Conscientiousness
Neuroticism	Openness	Neuroticism
Agreeableness	Neuroticism	Openness
Conscientiousness	Conscientiousness	Agreeableness
Neuroticism	Extraversion	Openness

Then, scores on social desirability. We do see many cases where items differ in their social desirability on a magnitude of >1 on a 5-point scale, within a block. That’s not good.

sd_initial <- matrix(item_chars$SD_Mean[t(initial_FC)], ncol = 3, byrow = TRUE)
knitr::kable(sd_initial)

2.327	3.549	2.257
2.269	3.417	2.410
2.331	2.386	3.549
3.575	3.481	3.492
3.628	3.469	2.430
3.470	2.326	3.418
2.306	3.518	2.299
3.455	2.367	2.382
2.372	3.429	2.291
2.383	2.275	3.501
2.346	3.499	3.493
2.371	3.476	2.381
3.429	3.540	3.478
2.375	2.383	2.283
3.551	3.484	2.367
3.458	3.521	2.296
3.515	3.474	3.524
2.275	3.501	2.408
2.354	3.526	3.449
2.314	2.294	2.344

Lastly, item difficulty. We also see that item difficulties within a block are also inconsistent.

diff_initial <- matrix(item_chars$DIFF[t(initial_FC)], ncol = 3, byrow = TRUE)
knitr::kable(diff_initial)

0.5768038	-0.1649317	-0.1609612
0.7562551	0.3665809	-0.8802425
-0.3477869	0.4033506	0.9447096
0.8602120	-0.5168748	-0.1734361
0.3712718	-0.6267511	0.0917744
-0.9688906	0.9139843	0.3986206
0.2507761	-0.0490111	-0.5887666
-0.1364442	0.7372855	0.2383125
0.6598719	0.5910397	0.3847467
0.1906729	-0.1072381	-0.2764722
0.8652686	0.3189794	-0.7635046
-0.8770999	0.5814682	0.7113918
0.9814553	-0.0455446	0.6105311
-0.7874764	-0.1775934	-0.2743858
0.5481408	-0.1896666	-0.7339903
-0.7614264	0.6233055	0.2106074
-0.4487723	0.9388348	-0.7846000
0.8782261	-0.9898177	0.0105288
0.4304538	0.3800486	-0.4484543
0.9194196	0.8647261	-0.0951617

Step 3: Calculate the energy for the initial FC scale, without and with IIAs

Next we calculate the energy for initial_FC, with FUN set to be default. weights is set to -1 for social desirability and -3 for item difficulty because we want the discrepancy of these characteristics to be as low as possible within a block.

The weight for item difficulty is higher to scale for its smaller range than social desirability. Beware about the scaling difference among different item characteristics and use different weights accordingly.

cal_block_energy(block = initial_FC, item_chars = item_chars, weights = char_weights)
#> [1] -25.28441

If IIAs are to be involved we have lower energy value. This is because these randomly generated responses are not likely to be consistent with each other, hence very low and even negative IIAs:

cal_block_energy_with_iia(block = initial_FC, item_chars = item_chars, 
                          weights = char_weights,
                          rater_chars = item_responses)
#>           [,1]
#> [1,] -33.31589

Notice that if we give zero weights to all IIAs we will get the same energy value as cal_block_energy:

cal_block_energy_with_iia(block = initial_FC, item_chars = item_chars, 
                          weights = char_weights,
                          rater_chars = item_responses, 
                          iia_weights = c(0, 0, 0, 0))
#>           [,1]
#> [1,] -25.28441

Also, if you want to see the inter-item agreement metrics for each block, you can use get_iia(). It should not be too impressive and is for demonstration purposes only. Users are suggested to use real world response data to see the IIA within each block.

knitr::kable(get_iia(block = initial_FC, data = item_responses))

BPlin	BPquad	AClin	ACquad
-0.17500	-0.38133	-0.07767	-0.17000
-0.11958	-0.27900	-0.03922	-0.10768
-0.17125	-0.35733	-0.09333	-0.18799
0.09583	0.14883	0.13905	0.22648
-0.10458	-0.20183	-0.08040	-0.15034
-0.09542	-0.18250	-0.08069	-0.15112
-0.15000	-0.31867	-0.06953	-0.14620
-0.14583	-0.30900	-0.08054	-0.16786
-0.13167	-0.29967	-0.04673	-0.11820
-0.14208	-0.33417	-0.05416	-0.14343
-0.11500	-0.21550	-0.09354	-0.16960
-0.18083	-0.36500	-0.09805	-0.18615
0.07375	0.10000	0.11266	0.17256
-0.03125	-0.19833	0.14874	0.15781
-0.12875	-0.26000	-0.09955	-0.19646
-0.12708	-0.22517	-0.11082	-0.19031
0.11208	0.16633	0.15302	0.23982
-0.15458	-0.32067	-0.07820	-0.15677
-0.09208	-0.18267	-0.07283	-0.14170
-0.05833	-0.26133	0.14720	0.14891

Step 4: Automatic pairing

To produce an optimized paired FC scale, we have the objective of:
* Keeping items in the same block being from different latent traits;
* Minimizing variance of social desirability within each block;
* Minimizing variance of item difficulty within each block.

For IIAs, we also want to maximize the mean of the four IIAs within each block.

Explanation for arguments

Below is an example run of producing an automatically paired FC. Arguments that may be of interest for users include:

block: The initial paired FC scale, which can be produced in Step 2. If left empty, an FC scale with block size 2 and items presented sequentially will be produced, with total number of items equals to number of rows in item_chars.

total_items: Default to be number of unique values in block. Can be a value larger than this value which represents cases where only some items in the item pool are used to build an FC scale.

Temperature: The initial temperature value of the automatic pairing method. Higher temperature is associated with higher probability of accepting a worse solution. It is recommended to leave this value blank and let it be scaled on the energy of block by specifying eta_Temperature.

r: Determines the decrease rate of Temperature. Should be a value between 0 and 1. Larger r values allows more iterations in the optimization process but will slow down the program.

end_criteria: Determines the end criteria for the automatic pairing process. A proportion value scaled on Temperature. Should be a value between 0 and 1. Smaller values allows more iterations in the optimization process but will slow down the program.

item_chars: A data frame with item characteristics for all items. It is recommended that information irrelevant to pairing be discarded beforehand, but users can also set the corresponding position in weights to be 0 to bypass these irrelevant item characteristics (Such is item ID).

FUN: A vector of function names for optimizing each item characteristic within each block. For example: FUN = c('mean', 'var', 'sum'). Also supports customized functions. Defaults to var for numeric variables and facfun for factor/character variables.

n_exchange: Determines how many blocks are exchanged in order to produce a new solution for each iteration. Should be a value less than nrow(block).

weights: A vector of integer values indicating relative weights for each item characteristic after calculated by FUN. Default to be a vector of all 1s.

prob_newitem: Probability of choosing the strategy of picking a new item, when not all candidate items are used to build the FC scale.

Explanation for arguments (If IIAs are of concern)

If you wish to use IIAs as pairing criterion, here are some arguments that might be useful. Note that rater_chars and iia_weights are ignored when use_IIA is FALSE.

use_IIA: Logical. Indicates whether IIA metrics are used as matching criteria.

rater_chars: Item responses for all items by a certain number of participants.

iia_weights: A vector of length 4 indicating weights given for the 4 IIA metrics, including linearly and quadratically weighted AC (Gwet, 2008; 2014) and Brennan-Prediger Index (Prennan & Prediger, 1981; Gwet, 2014). Default to a vector of all 1s.

# Note that this will take some time to run! (~ 1-2 minutes with this setting)

# Weights for social desirability score and item difficulty should be set to -1, 
# because we don't want variance for these characteristics to be big.
result <- sa_pairing_generalized(block = initial_FC, eta_Temperature = 0.01,
                                 r = 0.995, end_criteria = 10^(-6), 
                                 weights = char_weights,
                                 item_chars = item_chars, use_IIA = TRUE,
                                 rater_chars = item_responses)

Step 5: See how this improves over the initial one

Finally, let’s see how this pairing method improves from the initial solution!

Let’s first see the total energy compared to the previous one. First are the initial energy, which is identical to what we have calculated in Step 4.

# Initial energy with IIA
cal_block_energy_with_iia(block = result$block_initial, item_chars = item_chars, 
                          weights = char_weights, rater_chars = item_responses)
#>           [,1]
#> [1,] -33.31589

# Alternative way to calculate initial energy
print(result$energy_initial)
#> [1] -33.31589

And the final result:

# Final energy with IIA
cal_block_energy_with_iia(block = result$block_final, item_chars = item_chars, 
                          weights = char_weights, rater_chars = item_responses)
#>          [,1]
#> [1,] 21.69018

# Alternative way to calculate final energy
print(result$energy_final)
#> [1] 21.69018

Let’s take a look at how items are matched within each block. First are underlying latent traits.

This time, within each block, the three items are already coming from three distinct latent traits!

(Note: It does not guarantee that items will ALWAYS come from different latent traits after pairing. But if you want to increase the likelihood for such a result, you can increase the weight corresponding to item dimension)

knitr::kable(matrix(item_chars$DIM[t(result$block_final)], ncol = 3, byrow = TRUE))

Neuroticism	Openness	Extraversion
Extraversion	Neuroticism	Agreeableness
Openness	Conscientiousness	Neuroticism
Neuroticism	Extraversion	Conscientiousness
Openness	Conscientiousness	Neuroticism
Extraversion	Conscientiousness	Openness
Openness	Extraversion	Neuroticism
Conscientiousness	Openness	Neuroticism
Openness	Agreeableness	Neuroticism
Agreeableness	Extraversion	Neuroticism
Neuroticism	Openness	Extraversion
Conscientiousness	Neuroticism	Openness
Extraversion	Openness	Neuroticism
Conscientiousness	Extraversion	Openness
Agreeableness	Extraversion	Conscientiousness
Neuroticism	Agreeableness	Extraversion
Conscientiousness	Extraversion	Openness
Neuroticism	Openness	Conscientiousness
Extraversion	Openness	Conscientiousness
Neuroticism	Openness	Conscientiousness

Next let’s look at difference in social desirability within each block. Item social desirability scores are much closer to each other within each block, but we see that there are still big discrepancies two blocks.

sd_final <- matrix(item_chars$SD_Mean[t(result$block_final)], ncol = 3, byrow = TRUE)
knitr::kable(sd_final)

3.493	3.518	3.481
3.469	3.501	3.449
3.549	3.429	3.478
2.306	2.382	2.354
3.417	3.418	3.499
2.372	2.327	2.383
3.470	2.375	3.458
3.575	3.474	3.429
2.344	2.299	2.283
2.275	2.269	2.314
2.346	2.326	2.294
3.540	3.524	3.492
2.383	2.331	2.257
2.386	2.291	2.408
2.275	2.430	2.296
2.367	2.410	2.371
3.476	2.367	2.381
3.515	3.455	3.526
3.501	3.484	3.549
3.628	3.551	3.521

A more intuitive way to present this: how much have we improved on the average variance for all blocks? Still, it is much lower, but we see there is always space to improve.

# Initial
print(mean(apply(sd_initial, 1, var)))   
#> [1] 0.3317092

# Final
print(mean(apply(sd_final, 1, var)))     
#> [1] 0.04195382

Finally we look at item difficulty. Good improvement is also observed. We see that difference in item difficulty within a block also decreases:

diff_final <- matrix(item_chars$DIF[t(result$block_final)], ncol = 3, byrow = TRUE)
knitr::kable(diff_final)

-0.7635046	-0.0490111	-0.5168748
-0.6267511	-0.9898177	-0.4484543
0.9447096	0.5910397	0.6105311
0.2507761	0.2383125	0.4304538
0.3665809	0.3986206	0.3189794
0.6598719	0.5768038	0.1906729
-0.9688906	-0.7874764	-0.7614264
0.8602120	0.9388348	0.9814553
-0.0951617	-0.5887666	-0.2743858
0.8782261	0.7562551	0.9194196
0.8652686	0.9139843	0.8647261
-0.0455446	-0.7846000	-0.1734361
-0.1775934	-0.3477869	-0.1609612
0.4033506	0.3847467	0.0105288
-0.1072381	0.0917744	0.2106074
-0.7339903	-0.8802425	-0.8770999
0.5814682	0.7372855	0.7113918
-0.4487723	-0.1364442	0.3800486
-0.2764722	-0.1896666	-0.1649317
0.3712718	0.5481408	0.6233055

How much have we improved on the average variance for all blocks in this case?

print(mean(apply(diff_initial, 1, var)))   
#> [1] 0.4275037
print(mean(apply(diff_final, 1, var)))     
#> [1] 0.04305659

We also list IIAs for demonstration purposes, which also improves.

colMeans(get_iia(result$block_final, data = item_responses))
#>      BPlin     BPquad      AClin     ACquad 
#>  0.0195425 -0.0498910  0.1263995  0.1595815

Step 6: Automatic pairing with a multi-step optimization process

In some cases, users may want to optimize item characteristics sequentially, rather than in a simultaneous manner. This makes sense because it is possible that simultaneous optimization will inevitably favor the improvement in one characteristic at the cost of losing the best fit for the other, as we have observed in Step 5.

Two solutions can be made to address this problem:

Pay careful attention to the distribution of each item characteristic and try out different weights for characteristics with different scales. Alternatively, try smaller end_criteria or larger r and n_exchange values to allow for more iterations to be run;
Use a multi-step optimization process, where some item characteristics are optimized first, then others. This involves running sa_pairing_generalized() several times, which each time optimizing more and more item characteristics. Those characteristics optimized will remain their weight in later stages, but have 0 weights if they are not yet optimized.

With the previous example, we show how method 2 will work, starting from initial_FC. First we perform optimization on latent traits:

FC_1 <- sa_pairing_generalized(initial_FC, eta_Temperature = 0.01,
                                 r = 0.995, end_criteria = 10^(-6), 
                                 weights = c(1, 0, 0),
                                 item_chars = item_chars, use_IIA = TRUE,
                                 rater_chars = item_responses)

Then, we optimize based on minimizing variance in social desirability within a block.

FC_2 <- sa_pairing_generalized(FC_1$block_final, eta_Temperature = 0.01,
                               r = 0.995, end_criteria = 10^(-6), 
                               weights = c(1, -1, 0),
                               item_chars = item_chars, use_IIA = TRUE,
                               rater_chars = item_responses)

Finally, we optimize bease on minimizing variance in item difficulty.

FC_3 <- sa_pairing_generalized(FC_2$block_final, eta_Temperature = 0.01,
                               r = 0.995, end_criteria = 10^(-6), 
                               weights = c(1, -1, -3),
                               item_chars = item_chars, use_IIA = TRUE,
                               rater_chars = item_responses)

Step 7: See how a multi-step iteration improves

First, underlying latent traits. It does look nice like what we have in Step 5.

knitr::kable(matrix(item_chars$DIM[t(FC_3$block_final)], ncol = 3, byrow = TRUE))

Extraversion	Openness	Neuroticism
Agreeableness	Extraversion	Neuroticism
Conscientiousness	Neuroticism	Openness
Extraversion	Openness	Conscientiousness
Agreeableness	Openness	Extraversion
Neuroticism	Openness	Conscientiousness
Neuroticism	Agreeableness	Extraversion
Neuroticism	Openness	Extraversion
Openness	Neuroticism	Extraversion
Openness	Neuroticism	Conscientiousness
Neuroticism	Openness	Agreeableness
Neuroticism	Openness	Conscientiousness
Conscientiousness	Extraversion	Neuroticism
Conscientiousness	Neuroticism	Openness
Extraversion	Conscientiousness	Openness
Extraversion	Agreeableness	Neuroticism
Conscientiousness	Conscientiousness	Openness
Conscientiousness	Openness	Extraversion
Openness	Neuroticism	Extraversion
Conscientiousness	Neuroticism	Extraversion

Next let’s look at difference in social desirability within each block. It performs better than in Step 5, where the discrepancy within block is removed!

sd_FC3 <- matrix(item_chars$SD_Mean[t(FC_3$block_final)], ncol = 3, byrow = TRUE)
knitr::kable(sd_FC3)

2.430	2.408	2.257
2.299	2.371	2.283
3.429	3.478	3.549
2.291	2.383	2.296
2.275	2.331	2.383
3.458	3.484	3.549
2.367	2.410	2.375
3.501	3.470	3.469
2.326	2.346	2.367
3.518	3.499	3.526
3.493	3.455	3.449
3.429	3.474	3.575
3.540	3.501	3.515
3.418	3.628	3.417
2.294	2.354	2.381
2.269	2.275	2.314
3.521	3.476	3.551
2.386	2.344	2.382
3.492	3.524	3.481
2.327	2.306	2.372

As before, let’s see how much have we improved on the average variance for all blocks. Improvement in social desirability is verified by seeing the decrease in variance.

# Initial solution
print(mean(apply(sd_initial, 1, var)))   
#> [1] 0.3317092

# Simultaneous optimization
print(mean(apply(sd_final, 1, var)))    
#> [1] 0.04195382

# Sequential optimization
print(mean(apply(sd_FC3, 1, var)))       
#> [1] 0.002573133

Finally we look at item difficulty.

diff_fc3 <- matrix(item_chars$DIF[t(FC_3$block_final)], ncol = 3, byrow = TRUE)
knitr::kable(diff_final)

-0.7635046	-0.0490111	-0.5168748
-0.6267511	-0.9898177	-0.4484543
0.9447096	0.5910397	0.6105311
0.2507761	0.2383125	0.4304538
0.3665809	0.3986206	0.3189794
0.6598719	0.5768038	0.1906729
-0.9688906	-0.7874764	-0.7614264
0.8602120	0.9388348	0.9814553
-0.0951617	-0.5887666	-0.2743858
0.8782261	0.7562551	0.9194196
0.8652686	0.9139843	0.8647261
-0.0455446	-0.7846000	-0.1734361
-0.1775934	-0.3477869	-0.1609612
0.4033506	0.3847467	0.0105288
-0.1072381	0.0917744	0.2106074
-0.7339903	-0.8802425	-0.8770999
0.5814682	0.7372855	0.7113918
-0.4487723	-0.1364442	0.3800486
-0.2764722	-0.1896666	-0.1649317
0.3712718	0.5481408	0.6233055

Average variance of item difficulty also decreases!

# Initial solution
print(mean(apply(diff_initial, 1, var))) 
#> [1] 0.4275037

# Simultaneous optimization
print(mean(apply(diff_final, 1, var)))   
#> [1] 0.04305659

# Sequential optimization
print(mean(apply(diff_fc3, 1, var)))      
#> [1] 0.04012038

How about IIAs?

colMeans(get_iia(FC_3$block_final, data = item_responses))
#>      BPlin     BPquad      AClin     ACquad 
#>  0.0335635 -0.0304330  0.1459820  0.1892760