1 Correlations with 2 or more variables

  • Correlations with more than 2 variables present a new challenge
  • What if a third variable (X2) actually explains the relationship between X1 and Y?
  • We need to find a way to figure out how X2 might relate to X1 and Y!

1.1 Ice cream example, part 2

  • Ice cream scopes (X1)
  • Brownies squares (X2)
  • Happiness score (Y)
  • Question, how much does ice cream and brownies predict/explain happiness scores
  • Lets rebuild our dataset with new variable.
#packages we will need to conduct to create and graph our data
library(MASS) #create data
library(car) #graph data
py1 =.6 #Cor between X1 (ice cream) and happiness
py2 =.4 #Cor between X2 (Brownies) and happiness
p12= .2 #Cor between X1 (ice cream) and X2 (Brownies)

Means.X1X2Y<- c(10,10,10) #set the means of X and Y variables
CovMatrix.X1X2Y <- matrix(c(1,p12,py1,
                            p12,1,py2,
                            py1,py2,1),3,3) # creates the covariate matrix 

CovMatrix.X1X2Y #nice and simple 2x2 matrix
##      [,1] [,2] [,3]
## [1,]  1.0  0.2  0.6
## [2,]  0.2  1.0  0.4
## [3,]  0.6  0.4  1.0
#build the correlated variables. Note: empirical=TRUE means make the correlation EXACTLY r. 
# if we say empirical=FALSE, the correlation would be normally distributed around r
set.seed(42)
CorrDataT<-mvrnorm(n=100, mu=Means.X1X2Y,Sigma=CovMatrix.X1X2Y, empirical=TRUE)

#Covert them to a "Data.Frame", which is like SPSS data window
CorrDataT<-as.data.frame(CorrDataT)

#lets add our labels to the vectors we created
colnames(CorrDataT) <- c("IceCream","Brownies","Happiness")

#Lets view the first few subjects
head(CorrDataT)
##    IceCream  Brownies Happiness
## 1 10.929755 11.094210 12.569467
## 2  8.840701  9.715172  9.878706
## 3  9.963873 10.388029  8.870038
## 4  7.692480 11.354929  9.252980
## 5 11.365400  9.962197 11.040250
## 6 10.541688  9.668536 10.923514
#make the scatter plot
scatterplot(Happiness~IceCream,CorrDataT, smoother=FALSE)

scatterplot(Happiness~Brownies,CorrDataT, smoother=FALSE)

scatterplot(Brownies~IceCream,CorrDataT, smoother=FALSE)

#
ry1<-cor(CorrDataT$Happiness,CorrDataT$IceCream)
ry2<-cor(CorrDataT$Happiness,CorrDataT$Brownies)
r12<-cor(CorrDataT$Brownies,CorrDataT$IceCream)
ry1
## [1] 0.6
ry2
## [1] 0.4
r12
## [1] 0.2

1.1.1 What the problem?

  • Ice Cream can explain happiness 0.36
  • Brownies can explain happiness, 0.16
  • But how we do know whether Ice Cream and Brownies are explaining the same variance?
  • At least brownies and Ice Cream do not explain each other to much,0.04
  • In other words, when people eat Brownies they font eat too many spoons of ice cream

1.1.2 Multiple R

  • We use the capital letter, \(R\),now cause we have multiple X variables
  • \(R_{Y.12} = \sqrt{\frac{r_{Y1}^2 + r_{Y2}^2 - 2r_{Y1} r_{Y2} r_{12}} {1 - r_{12}^2}}\)
  • \(R_{Y.12} =\) 0.6645801
  • if we square that value, 0.3916667, we get the Multiple \(R^2\)
  • or the total variance explained by these variables on happiness

2 Semipartial (part) correlation

  • We need to define to contribution of each X variable on Y
  • Semipartial (also called part) is one of two methods, the other is called partial
  • is called semi, cause it removes the effect of one IV relative to the other without removing the relationship to Y
  • Semipartial correlations indicate the “unique” contribution of an independent variable.
Ballantine for X1, X2, and Y

Ballantine for X1, X2, and Y

  • \(R_{Y.12}^2 = a + b + c\)
  • \(sr_1^2: a = R_{Y.12}^2 - r_{Y2}^2\)
  • \(sr_2^2:b = R_{Y.12}^2 - r_{Y1}^2\)

2.1 Calcuation

R2y.12<-sqrt((ry1^2+ry2^2 - (2*ry1*ry2*r12))/(1-r12^2))^2
R2y.12
## [1] 0.4416667
a = R2y.12 -ry2^2 
b = R2y.12 -ry1^2 

in other words,

  • In total we explained, 0.4416667 of the happiness ratings
  • ice cream uniquely explained, 0.2816667 of happiness ratings
  • brownies uniquely explained, 0.0816667 of happiness ratings
  • We should not solve for c cause it can be negative (in some cases)

2.1.1 Seeing control in action

Another way to understand it:

  • What if you want to know about happiness and how ice cream uniquely explains it? [controlling the effect of brownies on ice cream]
  • We can remove affect of brownies on ice cream by extracting the residuals lm(X1~X2)
  • Remember the residuals are the left over (after extracting what was explainable)
  • Next we can correlate happiness with the residualized ice cream.
#control for brownies
CorrDataT$Ice.control.Brownies<-residuals(lm(IceCream~Brownies, CorrDataT))
plot(CorrDataT$Happiness,CorrDataT$Ice.control.Brownies)

Sr1.alt<-cor(CorrDataT$Ice.control.Brownies,CorrDataT$Happiness)
Sr1.alt
## [1] 0.5307228

If we square the correlation value we got 0.2816667, it becomes 0.2816667 which matches our \(a\) from the analysis above.

-We can repeat this analysis but changing our control to find \(b\).

#control for ice cream
CorrDataT$Brownies.control.Ice<-residuals(lm(Brownies~IceCream, CorrDataT))
plot(CorrDataT$Brownies.control.Ice,CorrDataT$Happiness)

Sr2.alt<-cor(CorrDataT$Happiness,CorrDataT$Brownies.control.Ice)
Sr2.alt^2
## [1] 0.08166667

2.2 Semipartial notes:

  • note, it can be written as \(sr\) or more specifically, \(sr_1\) for X1 (with X2 removed) and \(sr_2\) (with X1 removed)
  • correlations with no control variables are called the zero-order correlations
  • in R you can calculate the \(sr\) rather quickly using the ppcor library
library(ppcor)
#last variable is the control variable!

Sr1<-spcor.test(CorrDataT$Happiness, CorrDataT$IceCream, CorrDataT$Brownies)
Sr1
##    estimate      p.value statistic   n gp  Method
## 1 0.5307228 1.598901e-08  6.167236 100  1 pearson
Sr2<-spcor.test(CorrDataT$Happiness, CorrDataT$Brownies, CorrDataT$IceCream)
Sr2
##    estimate     p.value statistic   n gp  Method
## 1 0.2857738 0.004139206  2.937028 100  1 pearson
# Note to convert them to R2 values, you just need to square the correlations
Sr1$estimate^2
## [1] 0.2816667
Sr2$estimate^2
## [1] 0.08166667
#notice they match our a and b values exactly. 

3 Partial correlation

  • Partial correlation asks how much of the Y variance, which is not estimated by the other IVs, is estimated by this variable.
  • It removes the shared variance of the control variable (Say x2) from both Y and X1.

  • \(pr_1^2: = \frac{a}{a+e} = \frac{R_{Y.12}^2 - r_{Y2}^2}{1-r_{Y2}^2}\)
  • \(pr_2^2: \frac{b}{b+e} = \frac{R_{Y.12}^2 - r_{Y1}^2}{1-r_{Y1}^2}\)

3.1 Seeing control in action

Another way to understand it:

  • What if you want to know about happiness and ice cream while controlling for brownies (cause brownies affect both happiness and ice cream)
  • We take residuals of lm(Y~X2) and correlate it with the residuals of lm(X1~X2)
  • Remember the residuals are the left over (after extracting what was explainable)
  • if you want to control for ice cream you would: residuals of lm(Y~X1) with the residuals of lm(X2~X1)
#control for brownies
CorrDataT$Happy.control.Brownies<-residuals(lm(Happiness~Brownies, CorrDataT))
CorrDataT$Ice.control.Brownies<-residuals(lm(IceCream~Brownies, CorrDataT))
plot(CorrDataT$Ice.control.Brownies,CorrDataT$Happy.control.Brownies)

cor(CorrDataT$Ice.control.Brownies,CorrDataT$Happy.control.Brownies)
## [1] 0.579066
#control for ice cream
CorrDataT$Happy.control.ice<-residuals(lm(Happiness~IceCream, CorrDataT))
CorrDataT$Brownies.control.ice<-residuals(lm(Brownies~IceCream, CorrDataT))

plot(CorrDataT$Brownies.control.ice,CorrDataT$Happy.control.ice)

cor(CorrDataT$Brownies.control.ice,CorrDataT$Happy.control.ice)
## [1] 0.3572173
  • in R you can calculate the \(pr\) directly via the functions
pr1<-pcor.test(CorrDataT$Happiness, CorrDataT$IceCream, CorrDataT$Brownies)
pr1
##   estimate      p.value statistic   n gp  Method
## 1 0.579066 3.413442e-10  6.995308 100  1 pearson
pr2<-pcor.test(CorrDataT$Happiness, CorrDataT$Brownies, CorrDataT$IceCream)
pr2
##    estimate      p.value statistic   n gp  Method
## 1 0.3572173 0.0002838052  3.766704 100  1 pearson
# Note to convert them to R2 values, you just need to square the correlations
pr1$estimate^2
## [1] 0.3353175
pr2$estimate^2
## [1] 0.1276042
