1 Correlations with 2 or more variables

  • Correlations with more than 2 variables present a new challenge
  • What if a third variable (X2) actually explains the relationship between X1 and Y?
  • We need to find a way to figure out how X2 might relate to X1 and Y!

1.1 Ice cream example, part 2

  • Ice cream scopes (X1)
  • Brownies squares (X2)
  • Happiness score (Y)
  • Question, how much does ice cream and brownies predict/explain happiness scores
  • Lets rebuild our dataset with new variable.
#packages we will need to conduct to create and graph our data
library(MASS) #create data
library(car) #graph data
py1 =.6 #Cor between X1 (ice cream) and happiness
py2 =.4 #Cor between X2 (Brownies) and happiness
p12= .2 #Cor between X1 (ice cream) and X2 (Brownies)

Means.X1X2Y<- c(10,10,10) #set the means of X and Y variables
CovMatrix.X1X2Y <- matrix(c(1,p12,py1,
                            p12,1,py2,
                            py1,py2,1),3,3) # creates the covariate matrix 

CovMatrix.X1X2Y #nice and simple 2x2 matrix
##      [,1] [,2] [,3]
## [1,]  1.0  0.2  0.6
## [2,]  0.2  1.0  0.4
## [3,]  0.6  0.4  1.0
#build the correlated variables. Note: empirical=TRUE means make the correlation EXACTLY r. 
# if we say empirical=FALSE, the correlation would be normally distributed around r
set.seed(42)
CorrDataT<-mvrnorm(n=100, mu=Means.X1X2Y,Sigma=CovMatrix.X1X2Y, empirical=TRUE)

#Covert them to a "Data.Frame", which is like SPSS data window
CorrDataT<-as.data.frame(CorrDataT)

#lets add our labels to the vectors we created
colnames(CorrDataT) <- c("IceCream","Brownies","Happiness")

#Lets view the first few subjects
head(CorrDataT)
##    IceCream  Brownies Happiness
## 1 10.929755 11.094210 12.569467
## 2  8.840701  9.715172  9.878706
## 3  9.963873 10.388029  8.870038
## 4  7.692480 11.354929  9.252980
## 5 11.365400  9.962197 11.040250
## 6 10.541688  9.668536 10.923514
#make the scatter plot
scatterplot(Happiness~IceCream,CorrDataT, smoother=FALSE)

scatterplot(Happiness~Brownies,CorrDataT, smoother=FALSE)

scatterplot(Brownies~IceCream,CorrDataT, smoother=FALSE)

#
ry1<-cor(CorrDataT$Happiness,CorrDataT$IceCream)
ry2<-cor(CorrDataT$Happiness,CorrDataT$Brownies)
r12<-cor(CorrDataT$Brownies,CorrDataT$IceCream)
ry1
## [1] 0.6
ry2
## [1] 0.4
r12
## [1] 0.2

1.1.1 What the problem?

  • Ice Cream can explain happiness 0.36
  • Brownies can explain happiness, 0.16
  • But how we do know whether Ice Cream and Brownies are explaining the same variance?
  • At least brownies and Ice Cream do not explain each other to much,0.04
  • In other words, when people eat Brownies they font eat too many spoons of ice cream

1.1.2 Multiple R

  • We use the capital letter, \(R\),now cause we have multiple X variables
  • \(R_{Y.12} = \sqrt{\frac{r_{Y1}^2 + r_{Y2}^2 - 2r_{Y1} r_{Y2} r_{12}} {1 - r_{12}^2}}\)
  • \(R_{Y.12} =\) 0.6645801
  • if we square that value, 0.3916667, we get the Multiple \(R^2\)
  • or the total variance explained by these variables on happiness

2 Semipartial (part) correlation

  • We need to define to contribution of each X variable on Y
  • Semipartial (also called part) is one of two methods, the other is called partial
  • is called semi, cause it removes the effect of one IV relative to the other without removing the relationship to Y
  • Semipartial correlations indicate the “unique” contribution of an independent variable.
Ballantine for X1, X2, and Y

Ballantine for X1, X2, and Y

  • \(R_{Y.12}^2 = a + b + c\)
  • \(sr_1^2: a = R_{Y.12}^2 - r_{Y2}^2\)
  • \(sr_2^2:b = R_{Y.12}^2 - r_{Y1}^2\)

2.1 Calcuation

R2y.12<-sqrt((ry1^2+ry2^2 - (2*ry1*ry2*r12))/(1-r12^2))^2
R2y.12
## [1] 0.4416667
a = R2y.12 -ry2^2 
b = R2y.12 -ry1^2 

in other words,

  • In total we explained, 0.4416667 of the happiness ratings
  • ice cream uniquely explained, 0.2816667 of happiness ratings
  • brownies uniquely explained, 0.0816667 of happiness ratings
  • We should not solve for c cause it can be negative (in some cases)

2.1.1 Seeing control in action

Another way to understand it:

  • What if you want to know about happiness and how ice cream uniquely explains it? [controlling the effect of brownies on ice cream]
  • We can remove affect of brownies on ice cream by extracting the residuals lm(X1~X2)
  • Remember the residuals are the left over (after extracting what was explainable)
  • Next we can correlate happiness with the residualized ice cream.
#control for brownies
CorrDataT$Ice.control.Brownies<-residuals(lm(IceCream~Brownies, CorrDataT))
plot(CorrDataT$Happiness,CorrDataT$Ice.control.Brownies)

Sr1.alt<-cor(CorrDataT$Ice.control.Brownies,CorrDataT$Happiness)
Sr1.alt
## [1] 0.5307228

If we square the correlation value we got 0.2816667, it becomes 0.2816667 which matches our \(a\) from the analysis above.

-We can repeat this analysis but changing our control to find \(b\).

#control for ice cream
CorrDataT$Brownies.control.Ice<-residuals(lm(Brownies~IceCream, CorrDataT))
plot(CorrDataT$Brownies.control.Ice,CorrDataT$Happiness)

Sr2.alt<-cor(CorrDataT$Happiness,CorrDataT$Brownies.control.Ice)
Sr2.alt^2
## [1] 0.08166667

2.2 Semipartial notes:

  • note, it can be written as \(sr\) or more specifically, \(sr_1\) for X1 (with X2 removed) and \(sr_2\) (with X1 removed)
  • correlations with no control variables are called the zero-order correlations
  • in R you can calculate the \(sr\) rather quickly using the ppcor library
library(ppcor)
#last variable is the control variable!

Sr1<-spcor.test(CorrDataT$Happiness, CorrDataT$IceCream, CorrDataT$Brownies)
Sr1
##    estimate      p.value statistic   n gp  Method
## 1 0.5307228 1.598901e-08  6.167236 100  1 pearson
Sr2<-spcor.test(CorrDataT$Happiness, CorrDataT$Brownies, CorrDataT$IceCream)
Sr2
##    estimate     p.value statistic   n gp  Method
## 1 0.2857738 0.004139206  2.937028 100  1 pearson
# Note to convert them to R2 values, you just need to square the correlations
Sr1$estimate^2
## [1] 0.2816667
Sr2$estimate^2
## [1] 0.08166667
#notice they match our a and b values exactly. 

3 Partial correlation

  • Partial correlation asks how much of the Y variance, which is not estimated by the other IVs, is estimated by this variable.
  • It removes the shared variance of the control variable (Say x2) from both Y and X1.

  • \(pr_1^2: = \frac{a}{a+e} = \frac{R_{Y.12}^2 - r_{Y2}^2}{1-r_{Y2}^2}\)
  • \(pr_2^2: \frac{b}{b+e} = \frac{R_{Y.12}^2 - r_{Y1}^2}{1-r_{Y1}^2}\)

3.1 Seeing control in action

Another way to understand it:

  • What if you want to know about happiness and ice cream while controlling for brownies (cause brownies affect both happiness and ice cream)
  • We take residuals of lm(Y~X2) and correlate it with the residuals of lm(X1~X2)
  • Remember the residuals are the left over (after extracting what was explainable)
  • if you want to control for ice cream you would: residuals of lm(Y~X1) with the residuals of lm(X2~X1)
#control for brownies
CorrDataT$Happy.control.Brownies<-residuals(lm(Happiness~Brownies, CorrDataT))
CorrDataT$Ice.control.Brownies<-residuals(lm(IceCream~Brownies, CorrDataT))
plot(CorrDataT$Ice.control.Brownies,CorrDataT$Happy.control.Brownies)

cor(CorrDataT$Ice.control.Brownies,CorrDataT$Happy.control.Brownies)
## [1] 0.579066
#control for ice cream
CorrDataT$Happy.control.ice<-residuals(lm(Happiness~IceCream, CorrDataT))
CorrDataT$Brownies.control.ice<-residuals(lm(Brownies~IceCream, CorrDataT))

plot(CorrDataT$Brownies.control.ice,CorrDataT$Happy.control.ice)

cor(CorrDataT$Brownies.control.ice,CorrDataT$Happy.control.ice)
## [1] 0.3572173
  • in R you can calculate the \(pr\) directly via the functions
pr1<-pcor.test(CorrDataT$Happiness, CorrDataT$IceCream, CorrDataT$Brownies)
pr1
##   estimate      p.value statistic   n gp  Method
## 1 0.579066 3.413442e-10  6.995308 100  1 pearson
pr2<-pcor.test(CorrDataT$Happiness, CorrDataT$Brownies, CorrDataT$IceCream)
pr2
##    estimate      p.value statistic   n gp  Method
## 1 0.3572173 0.0002838052  3.766704 100  1 pearson
# Note to convert them to R2 values, you just need to square the correlations
pr1$estimate^2
## [1] 0.3353175
pr2$estimate^2
## [1] 0.1276042
---
title: "Partial and Semipartial (part) Correlation"
output:
  html_document:
    code_download: yes
    fontsize: 8pt
    highlight: textmate
    number_sections: yes
    theme: flatly
    toc: yes
    toc_float:
      collapsed: no
---

# Correlations with 2 or more variables
- Correlations with more than 2 variables present a new challenge 
- What if a third variable (X2) actually explains the relationship between X1 and Y?
- We need to find a way to figure out how X2 might relate to X1 and Y! 

## Ice cream example, part 2
- Ice cream scopes (X1)
- Brownies squares (X2)
- Happiness score (Y)
- Question, how much does ice cream and brownies predict/explain happiness scores
- Lets rebuild our dataset with new variable. 

```{r, echo=TRUE, warning=FALSE}
#packages we will need to conduct to create and graph our data
library(MASS) #create data
library(car) #graph data
```


```{r, echo=TRUE, warning=FALSE}
py1 =.6 #Cor between X1 (ice cream) and happiness
py2 =.4 #Cor between X2 (Brownies) and happiness
p12= .2 #Cor between X1 (ice cream) and X2 (Brownies)

Means.X1X2Y<- c(10,10,10) #set the means of X and Y variables
CovMatrix.X1X2Y <- matrix(c(1,p12,py1,
                            p12,1,py2,
                            py1,py2,1),3,3) # creates the covariate matrix 

CovMatrix.X1X2Y #nice and simple 2x2 matrix

#build the correlated variables. Note: empirical=TRUE means make the correlation EXACTLY r. 
# if we say empirical=FALSE, the correlation would be normally distributed around r
set.seed(42)
CorrDataT<-mvrnorm(n=100, mu=Means.X1X2Y,Sigma=CovMatrix.X1X2Y, empirical=TRUE)

#Covert them to a "Data.Frame", which is like SPSS data window
CorrDataT<-as.data.frame(CorrDataT)

#lets add our labels to the vectors we created
colnames(CorrDataT) <- c("IceCream","Brownies","Happiness")

#Lets view the first few subjects
head(CorrDataT)

#make the scatter plot
scatterplot(Happiness~IceCream,CorrDataT, smoother=FALSE)
scatterplot(Happiness~Brownies,CorrDataT, smoother=FALSE)
scatterplot(Brownies~IceCream,CorrDataT, smoother=FALSE)

#
ry1<-cor(CorrDataT$Happiness,CorrDataT$IceCream)
ry2<-cor(CorrDataT$Happiness,CorrDataT$Brownies)
r12<-cor(CorrDataT$Brownies,CorrDataT$IceCream)
ry1
ry2
r12
```


### What the problem?
- Ice Cream can explain happiness `r ry1^2`
- Brownies can explain happiness, `r ry2^2`
- But how we do know whether Ice Cream and Brownies are explaining the same variance?  
- At least brownies and Ice Cream do not explain each other to much,`r r12^2`
- In other words, when people eat Brownies they font eat too many spoons of ice cream

### Multiple R 
- We use the capital letter, $R$,now cause we have multiple X variables
- $R_{Y.12} = \sqrt{\frac{r_{Y1}^2 + r_{Y2}^2 - 2r_{Y1} r_{Y2} r_{12}} {1 - r_{12}^2}}$ 
- $R_{Y.12} =$ `r sqrt((ry1^2+ry2^2 - 2*ry1*ry2*r12)/(1-r12^2))`
- if we square that value, `r sqrt((ry1^2+ry2^2 - 2*ry1*ry1*r12)/(1-r12^2))^2`, we get the Multiple $R^2$ 
- or the total variance explained by these variables on happiness 

# Semipartial (part) correlation
- We need to define to contribution of each X variable on Y
- Semipartial (also called part) is one of two methods, the other is called partial
- is called semi, cause it removes the effect of one IV relative to the other without removing the relationship to Y
-  Semipartial correlations indicate the "unique" contribution of an independent variable. 

![Ballantine for X1, X2, and Y](RegressionClass/L2_PartCorr/Ballantine.jpg)
 
- $R_{Y.12}^2 = a + b + c$
- $sr_1^2: a = R_{Y.12}^2 - r_{Y2}^2$
- $sr_2^2:b = R_{Y.12}^2 - r_{Y1}^2$

## Calcuation

```{r, echo=TRUE, warning=FALSE}
R2y.12<-sqrt((ry1^2+ry2^2 - (2*ry1*ry2*r12))/(1-r12^2))^2
R2y.12

a = R2y.12 -ry2^2 
b = R2y.12 -ry1^2 


```

in other words,

+ In total we explained, `r R2y.12` of the happiness ratings
+ ice cream uniquely explained, `r a` of happiness ratings
+ brownies uniquely explained, `r b` of happiness ratings 
+ We should not solve for c cause it can be negative (in some cases)

### Seeing control in action
Another way to understand it: 

- What if you want to know about happiness and how ice cream uniquely explains it? [controlling the effect of brownies on ice cream]
- We can remove affect of brownies on ice cream by extracting the residuals lm(X1~X2)
- Remember the residuals are the left over (after extracting what was explainable)
- Next we can correlate happiness with the residualized ice cream. 


```{r, echo=TRUE, warning=FALSE}
#control for brownies
CorrDataT$Ice.control.Brownies<-residuals(lm(IceCream~Brownies, CorrDataT))
plot(CorrDataT$Happiness,CorrDataT$Ice.control.Brownies)
Sr1.alt<-cor(CorrDataT$Ice.control.Brownies,CorrDataT$Happiness)
Sr1.alt
```

If we square the correlation value we got `r Sr1.alt^2`, it becomes `r Sr1.alt^2` which matches our $a$ from the analysis above. 

-We can repeat this analysis but changing our control to find $b$.


```{r, echo=TRUE, warning=FALSE}
#control for ice cream
CorrDataT$Brownies.control.Ice<-residuals(lm(Brownies~IceCream, CorrDataT))
plot(CorrDataT$Brownies.control.Ice,CorrDataT$Happiness)

Sr2.alt<-cor(CorrDataT$Happiness,CorrDataT$Brownies.control.Ice)
Sr2.alt^2
```


## Semipartial notes: 
- note, it can be written as $sr$ or more specifically, $sr_1$ for X1 (with X2 removed) and $sr_2$ (with X1 removed) 
- correlations with no control variables are called the zero-order correlations
- in R you can calculate the $sr$ rather quickly using the ppcor library


```{r, echo=TRUE, warning=FALSE}
library(ppcor)
#last variable is the control variable!

Sr1<-spcor.test(CorrDataT$Happiness, CorrDataT$IceCream, CorrDataT$Brownies)
Sr1
Sr2<-spcor.test(CorrDataT$Happiness, CorrDataT$Brownies, CorrDataT$IceCream)
Sr2

# Note to convert them to R2 values, you just need to square the correlations
Sr1$estimate^2
Sr2$estimate^2

#notice they match our a and b values exactly. 

```


# Partial correlation
- Partial correlation asks how much of the Y variance, which is not estimated by the other IVs, is estimated by this variable.
- It removes the shared variance of the control variable (Say x2) from both Y and X1. 

- $pr_1^2: = \frac{a}{a+e} = \frac{R_{Y.12}^2 - r_{Y2}^2}{1-r_{Y2}^2}$
- $pr_2^2: \frac{b}{b+e} = \frac{R_{Y.12}^2 - r_{Y1}^2}{1-r_{Y1}^2}$

## Seeing control in action
Another way to understand it: 

- What if you want to know about happiness and ice cream while controlling for brownies (cause brownies affect both happiness and ice cream)
- We take residuals of lm(Y~X2) and correlate it with the residuals of lm(X1~X2)
- Remember the residuals are the left over (after extracting what was explainable)
- if you want to control for ice cream you would: residuals of lm(Y~X1) with the residuals of lm(X2~X1)

```{r, echo=TRUE, warning=FALSE}
#control for brownies
CorrDataT$Happy.control.Brownies<-residuals(lm(Happiness~Brownies, CorrDataT))
CorrDataT$Ice.control.Brownies<-residuals(lm(IceCream~Brownies, CorrDataT))
plot(CorrDataT$Ice.control.Brownies,CorrDataT$Happy.control.Brownies)
cor(CorrDataT$Ice.control.Brownies,CorrDataT$Happy.control.Brownies)

#control for ice cream
CorrDataT$Happy.control.ice<-residuals(lm(Happiness~IceCream, CorrDataT))
CorrDataT$Brownies.control.ice<-residuals(lm(Brownies~IceCream, CorrDataT))

plot(CorrDataT$Brownies.control.ice,CorrDataT$Happy.control.ice)
cor(CorrDataT$Brownies.control.ice,CorrDataT$Happy.control.ice)
```

- in R you can calculate the $pr$ directly via the functions

```{r, echo=TRUE, warning=FALSE}
pr1<-pcor.test(CorrDataT$Happiness, CorrDataT$IceCream, CorrDataT$Brownies)
pr1
pr2<-pcor.test(CorrDataT$Happiness, CorrDataT$Brownies, CorrDataT$IceCream)
pr2

# Note to convert them to R2 values, you just need to square the correlations
pr1$estimate^2
pr2$estimate^2


```



<script>
  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
  m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
  })(window,document,'script','https://www.google-analytics.com/analytics.js','ga');

  ga('create', 'UA-90415160-1', 'auto');
  ga('send', 'pageview');

</script>