1 EFA

  • A multivariate method in which to determine the relationships and patterns between variables
  • EFA is different from Confirmatory Factor Analysis (CFA) in which you are trying to confirm a specific set attempts to confirm how things should be related to one another using a SEM model
  • EFA can be used to in many ways (Yong & Pearce, 2013)
  1. Develop Scales: such as Big 5 Personality
  2. Item’s Analysis: which test items go together (to test specific constructs)
  3. Dimensionality reduction: Which items/questions/features are most important (strength of which things bind together) and create “factor scores” representing underlying constructs for use in other analyses.
  • EFA examines a pattern of correlations/covariances between the items and tries to determine if they are influenced by the same or a different underlying factor.

1.1 Collect Items measured in the same units

  • You might need to reverse score the items or recode/transform them

1.1.1 Visualize the patterns in the data

  • helpful to look at the patterns before you go further

  • Take data (BFI228) of the “study on personality and relationship satisfaction (Luo, 2005). The participants were 228 undergraduate students at a large public university in the US. The data were participants’ self-ratings on the 44 items of the Big Five Inventory (John, Donahue, & kentle, 1991). These items are Likert variables: disagree strongly (1), disagree a little (2), neither agree nor disagree (3), agree a little (4), and agree strongly (5)”. Taken from the EFAutilities package.

library(psych)
library(corrplot)
library(EFAutilities)
data(BFI228)
DataSet<-BFI228
corrplot(cor(DataSet), order = "hclust", tl.col='black', tl.cex=.75) 

1.2 Do I have enough data to proceed?

  • Kaiser Meyer Olkin (KMO) Measure of Sampling Adequacy (Howard, 2016)
  • You could remove anything below .6 and if the overall is below .6 you cannot do a EFA
KMO Interpreation Use?
0.00 to 0.50 Unacceptable No
0.50 to 0.60 Miserable No
0.60 to 0.70 Mediocre Yes
0.70 to 0.80 Middling Yes
0.80 to 0.90 Meritorious Yes
0.90 to 1.00 Marvelous Yes
KMO(BFI228)
## Kaiser-Meyer-Olkin factor adequacy
## Call: KMO(r = BFI228)
## Overall MSA =  0.83
## MSA for each item = 
##     talkative    researvedR    fullenergy  enthusiastic        quietR 
##          0.81          0.81          0.81          0.85          0.86 
##     assertive          shyR      outgoing    findfaultR       helpful 
##          0.91          0.78          0.84          0.87          0.83 
##     quarrelsR     forgiving      trusting         coldR   considerate 
##          0.84          0.82          0.81          0.83          0.83 
##         rudeR   cooperative      thorough     carelessR      reliable 
##          0.84          0.82          0.85          0.88          0.81 
## disorganizedR         lazyR     persevere     efficient         plans 
##          0.78          0.81          0.78          0.85          0.84 
##   distractedR          blue      relaxedR         tense       worries 
##          0.85          0.90          0.83          0.87          0.87 
##    emostableR         moody         calmR       nervous         ideas 
##          0.84          0.83          0.86          0.87          0.87 
##       curious     ingenious   imagination     inventive      artistic 
##          0.84          0.85          0.77          0.80          0.72 
##      routineR       reflect  nonartisticR sophisticated 
##          0.56          0.84          0.75          0.67

1.3 Get correlation/covariances matrix

  • The type of correlation is important:
  • If you have ordinal or nominal data you might want to switch from Pearson to poly- or tetra-choric correlations
  • Most psychologists just ignore this and use Pearson
CM<-cor(DataSet, use="complete.obs")

1.4 Select the number of factors

  • You might have a specific number in mind or it can be approximated from the data.
  1. The Kaiser criterion: a number of factors equal to the number of the eigenvalues of the correlation matrix that are greater than one.
  2. The “Scree test” is a lot of the eigenvalues of the correlation matrix in descending order

Selecting from the scree plot can be accomplished in a few different ways:

  • See http://www.empowerstats.com/manuals/paper/scree.pdf#1
  • Visually: When does the plot level off
  • Parallel Analysis: (basically what random correlations would give you for the same number of subjects and items)
  • Optimal Coordinate: extrapolation of the preceding eigenvalue by a regression line between the eigenvalue coordinates and the last eigenvalue coordinates
  • Acceleration Factor: When the slope change most abruptly (elbow)
library(nFactors)

ev <- eigen(CM) # get eigenvalues
ap <- parallel(subject=nrow(DataSet),var=ncol(DataSet),
  rep=100,cent=.05)
nS <- nScree(x=ev$values, aparallel=ap$eigen$qevpea)
plotnScree(nS)

  • Results all say 6, but we will force a 5 factor solution to match the Big 5

1.4.1 Extract your initial solution

  • Extraction methods: maximum likelihood (ML), principal components analysis (PCA), and principal axis factoring (PAF).
  • PCA is not a factor analysis method, but it often used as it tries to put most of the variance into the first factor
  • ML is often used and, but its sensitive to normality violation
  • PAF is less sensitive to normality violation, but less generalizable
  • There are many other types, such as Bayesian methods and methods for specific types of data (best to research each for your specific issue at hand)
mle.Intial <- factanal(covmat=cor(DataSet,use="complete.obs"),
                       factors=5, rotation="none")
print(mle.Intial$loadings, cutoff=0.3)
## 
## Loadings:
##               Factor1 Factor2 Factor3 Factor4 Factor5
## talkative      0.386   0.481  -0.414                 
## researvedR    -0.415           0.349                 
## fullenergy     0.676                                 
## enthusiastic   0.658   0.334                         
## quietR        -0.481  -0.379   0.480                 
## assertive      0.507   0.351                         
## shyR          -0.442           0.440                 
## outgoing       0.551          -0.472                 
## findfaultR    -0.413                                 
## helpful        0.321                   0.306         
## quarrelsR     -0.401   0.383                         
## forgiving      0.320                   0.350         
## trusting       0.378                   0.374         
## coldR         -0.436                  -0.378   0.346 
## considerate    0.339           0.301   0.489         
## rudeR         -0.473   0.321                         
## cooperative    0.463                   0.465         
## thorough       0.464                           0.471 
## carelessR              0.435                         
## reliable       0.396                   0.310   0.313 
## disorganizedR          0.410                  -0.433 
## lazyR         -0.417                                 
## persevere      0.363                           0.393 
## efficient      0.480                           0.363 
## plans          0.392                           0.439 
## distractedR   -0.455   0.305                         
## blue          -0.566   0.347                         
## relaxedR       0.505  -0.329          -0.323         
## tense         -0.474   0.351           0.352         
## worries       -0.567                   0.487         
## emostableR     0.540                                 
## moody         -0.497   0.317           0.311         
## calmR          0.519                                 
## nervous       -0.500                   0.517         
## ideas          0.458   0.432   0.374                 
## curious        0.405                                 
## ingenious      0.334   0.321   0.360                 
## imagination    0.330   0.496   0.375                 
## inventive      0.438   0.404   0.519                 
## artistic                       0.446                 
## routineR                                             
## reflect        0.353   0.507   0.311                 
## nonartisticR                                         
## sophisticated                  0.501                 
## 
##                Factor1 Factor2 Factor3 Factor4 Factor5
## SS loadings      8.263   3.706   2.793   2.594   1.986
## Proportion Var   0.188   0.084   0.063   0.059   0.045
## Cumulative Var   0.188   0.272   0.336   0.394   0.440

1.5 Rotate your factors to a final solution

  • Think of each factor a dimension and the problem is there is a near infinite set of dimensions that could explain your data
  • We need a way to help come up with a solution that is more interpretable
  • By rotating your factors you attempt to find a factor solution that is equal to that obtained in the initial extraction but which has the simplest interpretation
  • The simplest solution has 5 features (Thunderd, 1947; summarized in Abdi, 2003)
  1. each row contains at least one zero;
  2. for each column, there are at least as many zeros as there are columns
  3. for any pair of factors, there are some variables with zero loadings on one factor and large loadings on the other factor;
  4. for any pair of factors, there is a sizable proportion of zero loadings;
  5. for any pair of factors, there is only a small number of large loadings
  • There are many different types of rotation, but they all try get the strongest effect on the small subset of items.

There are two families of rotations: - Orthogonal rotations: uncorrelated factors (e.g., varimax) - Oblique rotations: produce correlated factors (e.g., promax)

  • Varimax is the most popular orthogonal rotation: “simple solution means that each factor has a small number of large loadings and a large number of zero (or small) loadings” (Abdi, 2003)

1.5.1 Orthogonal Rotation

mle.VM <- factanal(DataSet,factors=5, rotation="varimax")
print(mle.VM$loadings, cutoff=0.3)
## 
## Loadings:
##               Factor1 Factor2 Factor3 Factor4 Factor5
## talkative              0.736                         
## researvedR            -0.542                         
## fullenergy             0.563           0.390         
## enthusiastic           0.625           0.418         
## quietR                -0.780                         
## assertive              0.557                         
## shyR                  -0.639                         
## outgoing               0.752                         
## findfaultR                            -0.415         
## helpful                                0.446         
## quarrelsR                             -0.591         
## forgiving                              0.551         
## trusting                               0.538         
## coldR                                 -0.683         
## considerate                            0.656         
## rudeR          0.334                  -0.552         
## cooperative                            0.579         
## thorough                                       0.710 
## carelessR      0.301                          -0.449 
## reliable                                       0.550 
## disorganizedR                                 -0.631 
## lazyR                                         -0.516 
## persevere                                      0.559 
## efficient                                      0.608 
## plans                                          0.628 
## distractedR    0.520                          -0.364 
## blue           0.559                                 
## relaxedR      -0.668                                 
## tense          0.702                                 
## worries        0.749                                 
## emostableR    -0.599                                 
## moody          0.666                                 
## calmR         -0.549                                 
## nervous        0.649                                 
## ideas                          0.698                 
## curious                        0.433                 
## ingenious                      0.571                 
## imagination                    0.683                 
## inventive                      0.785                 
## artistic                       0.568                 
## routineR                                             
## reflect                        0.650                 
## nonartisticR                  -0.374                 
## sophisticated                  0.546                 
## 
##                Factor1 Factor2 Factor3 Factor4 Factor5
## SS loadings      4.382   3.968   3.925   3.711   3.357
## Proportion Var   0.100   0.090   0.089   0.084   0.076
## Cumulative Var   0.100   0.190   0.279   0.363   0.440

1.5.1.1 Useful to examine first two factors visually

  • We will need to extact the factor loading
VM.load = mle.VM$loadings[,1:2]
plot(VM.load, type="n")
text(VM.load,labels=colnames(DataSet),cex=.75) # add variable names

1.5.2 Oblique Rotation

  • When we examine the promax we will need to examine the correlation between factors
mle.PM <- factanal(DataSet, factors=5, 
                   rotation="promax")
mle.PM
## 
## Call:
## factanal(x = DataSet, factors = 5, rotation = "promax")
## 
## Uniquenesses:
##     talkative    researvedR    fullenergy  enthusiastic        quietR 
##         0.434         0.658         0.443         0.347         0.377 
##     assertive          shyR      outgoing    findfaultR       helpful 
##         0.545         0.520         0.376         0.732         0.759 
##     quarrelsR     forgiving      trusting         coldR   considerate 
##         0.544         0.686         0.665         0.472         0.483 
##         rudeR   cooperative      thorough     carelessR      reliable 
##         0.549         0.552         0.441         0.664         0.613 
## disorganizedR         lazyR     persevere     efficient         plans 
##         0.559         0.653         0.610         0.543         0.568 
##   distractedR          blue      relaxedR         tense       worries 
##         0.588         0.527         0.512         0.459         0.383 
##    emostableR         moody         calmR       nervous         ideas 
##         0.549         0.510         0.619         0.473         0.447 
##       curious     ingenious   imagination     inventive      artistic 
##         0.721         0.628         0.500         0.347         0.655 
##      routineR       reflect  nonartisticR sophisticated 
##         0.914         0.520         0.842         0.669 
## 
## Loadings:
##               Factor1 Factor2 Factor3 Factor4 Factor5
## talkative      0.187           0.778                 
## researvedR     0.175          -0.538                 
## fullenergy                     0.528   0.327         
## enthusiastic           0.190   0.598   0.379         
## quietR                        -0.805                 
## assertive     -0.115   0.262   0.518  -0.191   0.105 
## shyR           0.177          -0.643   0.189         
## outgoing                       0.775   0.158         
## findfaultR     0.232                  -0.403         
## helpful                                0.449         
## quarrelsR      0.206           0.155  -0.603         
## forgiving                              0.588         
## trusting              -0.101   0.145   0.552         
## coldR                  0.195  -0.112  -0.721         
## considerate    0.166   0.194  -0.140   0.697         
## rudeR          0.270           0.146  -0.544         
## cooperative    0.164           0.194   0.571   0.183 
## thorough                                       0.752 
## carelessR      0.225   0.228                  -0.444 
## reliable       0.167                   0.174   0.572 
## disorganizedR          0.171                  -0.685 
## lazyR                                         -0.515 
## persevere              0.238  -0.106           0.606 
## efficient              0.119                   0.627 
## plans          0.122                           0.674 
## distractedR    0.492                   0.153  -0.320 
## blue           0.515   0.126          -0.135  -0.159 
## relaxedR      -0.696                                 
## tense          0.750   0.148                   0.141 
## worries        0.798  -0.140           0.111         
## emostableR    -0.619   0.156           0.144         
## moody          0.704           0.106  -0.122         
## calmR         -0.540                                 
## nervous        0.707  -0.130  -0.156   0.220         
## ideas                  0.699   0.106                 
## curious                0.406   0.168   0.137         
## ingenious              0.580          -0.108   0.196 
## imagination            0.689                         
## inventive     -0.153   0.799                         
## artistic               0.577  -0.125   0.124         
## routineR       0.139  -0.243                   0.191 
## reflect                0.648   0.153                 
## nonartisticR          -0.380                         
## sophisticated          0.567  -0.239                 
## 
##                Factor1 Factor2 Factor3 Factor4 Factor5
## SS loadings      4.384   3.869   3.847   3.635   3.408
## Proportion Var   0.100   0.088   0.087   0.083   0.077
## Cumulative Var   0.100   0.188   0.275   0.358   0.435
## 
## Factor Correlations:
##         Factor1 Factor2 Factor3 Factor4 Factor5
## Factor1   1.000   0.309   0.129   0.326  -0.426
## Factor2   0.309   1.000   0.242   0.209  -0.211
## Factor3   0.129   0.242   1.000   0.207  -0.114
## Factor4   0.326   0.209   0.207   1.000  -0.392
## Factor5  -0.426  -0.211  -0.114  -0.392   1.000
## 
## Test of the hypothesis that 5 factors are sufficient.
## The chi square statistic is 1329.56 on 736 degrees of freedom.
## The p-value is 1.08e-36

1.5.2.1 Useful to examine first two factors visually

  • We will need to extract the factor loading
PM.load = mle.PM$loadings[,1:2]
plot(PM.load, type="n")
text(PM.load,labels=colnames(DataSet),cex=.75) # add variable names

1.6 Interpret the Loadings

  • This loading can be interpreted as like a part correlations (the variables are all controlled for)
  • You have to name the construct based on the pattern of positive and negative loadings
  • You could reverse score the loadings to make it easier
  • Lets name our factors from the varimax and promax and see what our 5 personality constructs might be

1.7 Extract factor scores

  • You can construct factor scores for each subject and conduct additional analysis
  • For example, you boiled down these items to 5 factors, you can run 5 regression on these constructs (they could be predictors or predicted)
  • There are many ways to extract scores per person (how much each person fits with the construct), most common are: Regression and Bartlett
  • Regression: use the estimated parameters from a factor analysis to define linear combinations of observed variables that generate factor scores. The scores may be correlated even when factors are orthogonal
  • Bartlett method corrects for bias
  • Other methods try to correct the correlation issue
mle.VM.Score.R <- factanal(DataSet,factors=5, rotation="varimax",  scores = "regression")$scores
#Cor
round(cor(mle.VM.Score.R),3)
##         Factor1 Factor2 Factor3 Factor4 Factor5
## Factor1   1.000  -0.023  -0.011  -0.027  -0.045
## Factor2  -0.023   1.000   0.030   0.023   0.007
## Factor3  -0.011   0.030   1.000   0.013   0.008
## Factor4  -0.027   0.023   0.013   1.000   0.051
## Factor5  -0.045   0.007   0.008   0.051   1.000
# Mean
round(apply(mle.VM.Score.R,2, mean),3)
## Factor1 Factor2 Factor3 Factor4 Factor5 
##       0       0       0       0       0
# SD
round(apply(mle.VM.Score.R,2, sd),3)
## Factor1 Factor2 Factor3 Factor4 Factor5 
##   0.934   0.939   0.935   0.918   0.907
mle.VM.Score.B <- factanal(DataSet,factors=5, rotation="varimax",  scores = "Bartlett")$scores
#Cor
round(cor(mle.VM.Score.B),3)
##         Factor1 Factor2 Factor3 Factor4 Factor5
## Factor1   1.000   0.022   0.009   0.024   0.044
## Factor2   0.022   1.000  -0.029  -0.022  -0.004
## Factor3   0.009  -0.029   1.000  -0.011  -0.007
## Factor4   0.024  -0.022  -0.011   1.000  -0.050
## Factor5   0.044  -0.004  -0.007  -0.050   1.000
# Mean
round(apply(mle.VM.Score.B,2, mean),3)
## Factor1 Factor2 Factor3 Factor4 Factor5 
##       0       0       0       0       0
# SD
round(apply(mle.VM.Score.B,2, sd),3)
## Factor1 Factor2 Factor3 Factor4 Factor5 
##   1.072   1.066   1.070   1.091   1.104

2 References

Abdi, H. (2003). Factor rotations in factor analyses. Encyclopedia for Research Methods for the Social Sciences. Sage: Thousand Oaks, CA, 792-795.

Howard, M. C. (2016). A Review of Exploratory Factor Analysis Decisions and Overview of Current Practices: What We Are Doing and How Can We Improve?. International Journal of Human-Computer Interaction, 32(1), 51-62.

Yong, A. G., & Pearce, S. (2013). A beginner’s guide to factor analysis: Focusing on exploratory factor analysis. Tutorials in Quantitative Methods for Psychology, 9(2), 79-94.

