This vignette shows you how to use the
UVA in the EGAnet package (Golino & Christensen, 2020). The contents of this vignette are taken directly from Christensen et al. (2020a). Following Christensen et al. (2020b),
UVA provides two approaches for reducing redundancy in data: removing all but one redundant variable or creating latent variables from redundant variables. For the former approach, researchers select one variable from variables that are determined to be redundant and remove the other variables from the dataset. As a general heuristic, researchers can compute corrected item-test correlations for the variables in the redundant response set. The variable that has the largest correlation is likely to be the one that best captures the overall essence of the redundant variables (DeVellis, 2017; McDonald, 1999). Other rules of thumb for this approach are to select variables that have the most variance (DeVellis, 2017) and variables that are more general (e.g., “I often express my opinions” is better than “I often express my opinions in meetings” because it does not imply a specific context). For the latter approach, redundant variables can be combined in to a reflective latent variable and latent scores can be estimated, replacing the redundant variables. Following recent suggestions, for ordinal data with categories fewer than six, the Weighted Least Squares Mean- and Variance-adjusted (WLSMV) estimator is used; otherwise, if all categories are greater than or equal to six then Maximum Likelihood with Robust standard errors (MLR) is used (Rhemtulla, Brosseau-Liard, & Savalei, 2012). We strongly recommend the latent variable approach because it minimizes measurement error and retains all possible information available in the data.
Before digging into UVA, the data should be loaded from the psychTools package (Revelle, 2019) in R.
# Download latest EGAnet package devtools::install_github("hfgolino/EGAnet", dependencies = c("Imports", "Suggests")) # Load packages library(psychTools) library(EGAnet) # Set seed for reproducibility set.seed(6724) # Load SAPA data # Select Five Factor Model personality items only idx <- na.omit(match(gsub("-", "", unlist(spi.keys[1:5])), colnames(spi))) items <- spi[,idx] # Obtain item descriptions for UVA key.ind <- match(colnames(items), as.character(spi.dictionary$item_id)) key <- as.character(spi.dictionary$item[key.ind])
The code above installs the latest EGAnet package, loads EGAnet and psychTools, sets a seed for random number generation, and obtains the 70 SAPA items that correspond to the five-factor model of personality as well as their respective item descriptions. The item descriptions are optional but provide convenient processing when deciding which items are redundant (see Figure 2).
Moving forward with the application of the
UVA, we start by evaluating the dimensional structure of the SAPA inventory without reducing redundancy. The following code can be run:
Without performing UVA, EGA estimates that there are seven factors. Notably, there were a couple small factors identified by EGA: Factor 2 and Factor 7 (see Figure 1). Investigating the items’ descriptions of these two factors, it seems likely that these represent minor factors of redundant variables: Factor 2 (“Believe that people are basically moral,” “Believe that others have good intentions,” “Trust people to mainly tell the truth,” “Trust what people say,” and “Feel that most people can’t be trusted”) and Factor 7 (“Enjoy being thought of as a normal mainstream person,” “Rebel against authority,” “Believe that laws should be strictly enforced,” and “Try to follow the rules”). The divergence from the traditional five factor structure is likely due to these (and other) redundancies.1
To handle the redundancy in the scale, we can now use the
There are a few arguments worth noting. First,
method will change the association method being used. By default, the weighted topological overlap method (
"wTO") is applied. Second,
type will change the significance type being used. By default, adaptive alpha is used (
key argument will accept item descriptions that map to the variables in the
data argument. The
reduce argument, which defaults to
TRUE, is for whether the reduction process should occur. The
reduce.method is whether the reduction process should be to create latent variables of redundant items (
"latent") or remove all but one of the redundant items (
reduce.method defaults to
"latent" (to continue with the
"remove" tutorial, skip to next section). Finally,
adhoc will perform an adhoc redundancy check using the weighted topological overlap method with threshold. This check is to determine whether redundancies still might exist in the data.
Next, we’ll walk through the reduction process. After running the code above, the R console will output a target variable with a list of potential redundant variables (Figure 2) and an associated “redundancy chain” plot (see Figure 3).
In Figure 2, the potential redundant variables are listed below the target variable. Some of the potential redundant variables were directly identified as redundant with the target variable while other potential redundant variables were indirectly redundant meaning that they were redundant with one (or more) of the variables that were directly identified as redundant with the target variable but they themselves not actually identified as redundant with the target variable. In this way, there is a so-called “redundancy chain.” Figure 3 provides a more intuitive depiction of this notion.
In the redundancy chain plot, each node represents a variable with label and color denoting the target variable (“Target” and blue, respectively) and potential redundancies (corresponding numbers and red, respectively). The connections between the nodes represent a regularized partial correlation with the thickness of an edge denoting its magnitude. The presence of an edge suggests that variables were identified as redundant rather than an actual network of associations. The interpretation of this plot would be that the target variable was identified with potential redundancy variables 1, 2, 3, and 4. Potential redundancy variable 5 was not redundant with the target variable but it was redundant with potential redundancy variable 4 (hence the “chain” of redundancy). When consulting the redundancy chain plot, researchers should pay particular attention to cliques or a fully connected set of nodes. In Figure 3, there are two 3-cliques (or triangles) with the target variable (i.e., Target – 1 – 2 and Target – 1 – 3).
In a typical psychometric network, these triangles contribute to a measure known as the clustering coefficient or the extent to which a node’s neighbors are connected to each other. Based on this statistical definition, the clustering coefficient has recently been considered as a measure of redundancy in psychological networks (Costantini et al., 2019; Dinić, Wertag, Tomašević, & Sokolovska, 2020). In this same sense, these triangles suggest that these variables are likely redundant. Therefore, triangles in these redundancy chain plots can be used as a heuristic to identify redundancies.
In our example, we selected these variables as redundant by inputting their numbers into the R console with commas separating them (i.e.,
1, 2, 3). After pressing
ENTER, a new latent variable is created from these variables and a prompt appears to label it with a new name (e.g.,
'Original ideation'). Finally, a message will appear confirming the creation of a latent variable and removal of the redundant variables from the dataset.
For the second target variable (Figure 4), “Trust what people say,” we combined it with all the other possible redundant items (i.e.,
1, 2, 3, 4). Notably, there was one item that was reverse keyed, “Feel that most people cant be trusted,” which was negatively correlated with the latent variable. Because there was an item negatively correlated with the latent variable, a secondary prompt appears asking to reverse code the latent variable so that the label can go in the desired direction. In review of the correlations of the variables with the latent variable, we can see that the latent variable is positively keyed already; therefore, we entered
n and labeled the component. If, however, the signs of the correlations were the inverse, then
y could be entered, which would reverse the meaning of the latent variable towards a positively keyed orientation. The function will proceed through the rest of the redundant variables until all have been handled (see Appendix for our handling).
After completing the UVA, an optional adhoc check of redundant variables can be performed using
adhoc = TRUE.
UVA performs this by default and will check if any redundancies remain using the weighted topological overlap method (
method = "wTO") and threshold (
type = "threshold"). For our example, there were no longer any redundant variables. Our UVA reduced the dataset from 70 items down to 25 personality components or items or sets of items that share a common cause (Christensen et al., 2020b). These components largely correspond to the 27 identified components by Condon (2018) suggesting that our approach was effective.
# EGA (with redundant variables combined) ega <- EGA(sapa.ra$reduced$data, algorithm = "louvain", plot.EGA = FALSE) plot(ega, plot.args = list(vsize = 8, edge.alpha = 0.2, label.size = 4, legend.names = c("Conscientiousness", "Neuroticism", "Extraversion", "Openness to Experience", "Agreeableness")))
With these components, we then re-estimated the dimensionality of the SAPA inventory using EGA. This time, five components resembling the five-factor model were estimated (Figure 5). These five factors also align and correspond to the expected factor structure of the SAPA inventory, corroborating the effectiveness of the UVA. In sum, our example demonstrates that redundancy can lead to minor factors, which may bias dimensionality estimates towards overfactoring (as shown in Figure 1). When this redundancy is handled, then the dimensionality estimates can be expected to be more accurate and in line with theoretical expectations (as shown in Figure 5).2 Similar results were achieved by using the remove all but one variable approach (see next section).
In going through UVA with the remove all but one variable option (see code below), we selected the same variables as redundant as shown in Appendix but rather than creating a latent variable we removed all but one variable.
The presentation of UVA interface is mostly the same with one minor detail changed (Figure 6).
After selecting which variables are redundant, a variable is selected to be kept. To make this decision, the corrected item-test (or redundant set) correlations, means, standard deviations, and ranges of the variables are provided in R’s plot window (Figure 7).
The row names of the table denote the redundancy options which are reprinted. For the entire analysis, we selected the variables which had the largest item-test correlations (i.e., “Item-Total r” in Figure 7) and when equivalent the largest standard deviation. After UVA was finished and the adhoc check confirmed there were no more redundancies, we re-estimated the dimensionality of the dataset (Figure 8).
# EGA (with redundant variables removed) ega.rm <- EGA(sapa.rm$reduced$data, algorithm = "louvain", plot.EGA = FALSE) plot(ega.rm, plot.args = list(vsize = 8, edge.alpha = 0.2, label.size = 4, layout.exp = 0.5, legend.names = c("Conscientiousness", "Neuroticism", "Extraversion", "Openness to Experience", "Agreeableness")))
Consistent with results presented in the manuscript, five factors roughly resembling the five-factor model were found. The item placement for all items are appropriate for their dimensions as well. Similarly, parallel analysis identified five and six dimensions for principal component analysis and principal axis factoring, respectively. In all, the results largely align with one another, demonstrating that removing variables can be an effective approach to reducing redundancy in data.
Christensen, A. P., Garrido, L. E., & Golino, H. (2020a). Unique variable analysis: A novel approach for detecting redundant variables in multivariate data. PsyArXiv. https://doi.org/10.31234/osf.io/4kra2
Christensen, A. P., Golino, H., & Silvia, P. J. (2020b). A psychometric network perspective on the validity and validation of personality trait questionnaires. European Journal of Personality, 34, 1095–1108. https://doi.org/10.1002/per.2265
Condon, D. M. (2018). The SAPA personality inventory: An empirically-derived, hierarchically-organized self-report personality assessment model. PsyArXiv. https://doi.org/10.31234/osf.io/sc4p9
Costantini, G., Richetin, J., Preti, E., Casini, E., Epskamp, S., & Perugini, M. (2019). Stability and variability of personality networks. A tutorial on recent developments in network psychometrics. Personality and Individual Differences, 136, 68–78. https://doi.org/10.1016/j.paid.2017.06.011
DeVellis, R. F. (2017). Scale development: Theory and applications (4th ed.). Thousand Oaks, CA: SAGE Publications.
Dinić, B. M., Wertag, A., Tomašević, A., & Sokolovska, V. (2020). Centrality and redundancy of the Dark Tetrad traits. Personality and Individual Differences, 155, 109621. https://doi.org/10.1016/j.paid.2019.109621
Golino, H., & Christensen, A. P. (2020). EGAnet: Exploratory Graph Analysis– a framework for estimating the number of dimensions in multivariate data using network psychometrics. Retrieved from https://CRAN.R-project.org/package=EGAnet
McDonald, R. P. (1999). Test theory: A unified treatment. https://doi.org/10.4324/9781410601087
Revelle, W. (2019). psychTools: Tools to accompany the ’psych’ package for psychological research. Retrieved from https://CRAN.R-project.org/package=psychTools
Rhemtulla, M., Brosseau-Liard, P. É., & Savalei, V. (2012). When can categorical variables be treated as continuous? A comparison of robust continuous and categorical SEM estimation methods under suboptimal conditions. Psychological Methods, 17, 354–373. https://doi.org/10.1037/a0029315
|Original ideation||Am full of ideas.||Am able to come up with new and different ideas.||Am an original thinker.||Love to think up new ways of doing things.|
|Sees good in people||Trust what people say.||Believe that people are basically moral.||Trust people to mainly tell the truth.||Believe that others have good intentions.||Feel that most people cant be trusted.|
|Sympathetic||Am sensitive to the needs of others.||Feel sympathy for those who are worse off than myself.||Think of others first.||Am concerned about others.||Sympathize with others feelings.|
|Motivated||Find it difficult to get down to work.||Need a push to get started.||Start tasks right away.|
|Attention-seeking||Hate being the center of attention.||Like to attract attention.||Dislike being the center of attention.||Make myself the center of attention.|
|Organized||Keep things tidy.||Often forget to put things back in their proper place.||Leave a mess in my room.||Like order.|
|People person||Usually like to spend my free time with people.||Like going out a lot.||Avoid company.||Want to be left alone.||Dont like crowded events.|
|Anxious||Worry about things.||Would call myself a nervous person.||Fear for the worst.||Am a worrier.||Panic easily.|
|Emotional stability||Experience very few emotional highs and lows.||Get overwhelmed by emotions.||Experience my emotions intensely.||Think that my moods dont change more than most peoples do.|
|Introspective||Love to reflect on things.||Try to understand myself.||Spend time reflecting on things.|
|Irritable (R)||Rarely get irritated.||Am not easily annoyed.||Seldom get mad.|
|Rule-follower||Rebel against authority.||Try to follow the rules.||Believe that laws should be strictly enforced.|
|Self-assessed intelligence||Think quickly.||Am quick to understand things.||Can handle a lot of information.|
|Manipulative||Use others for my own ends.||Cheat to get ahead.||Tell a lot of lies.|
|Perfectionist||Want every detail taken care of.||Continue until everything is perfect.|
|Low self-esteem||Feel a sense of worthlessness or hopelessness.||Dislike myself.|
|Social-efficacy||Am skilled in handling social situations.||Find it difficult to approach others.|
|Laughter||Laugh a lot.||Laugh aloud.|
|Fantasy||Have a vivid imagination.||Like to get lost in thought.|
For a comparison, we estimated dimensions using parallel analysis with polychoric correlations and principal component analysis (PCA) and principal axis factoring (PAF). These methods identified 13 and 14 dimensions, respectively.↩︎
For a comparison, we estimated dimensions using parallel analysis with polychoric correlations and principal component analysis (PCA) and principal axis factoring (PAF). These methods identified 5 and 6 dimensions, respectively.↩︎