1 Nominal Variables

  • Things like gender or color (terms we would have as factors in ANOVA)
  • Decisions have to made on how to treat these variables in a regression
  • We will assume (for now) that a subject can be a member of only one level of a factor a variable (i.e., you can blue or red, but not both)
  • There are three basic methods we will cover and how to interpret their effects and interactions
  • Key to remember you are basically doing to t-tests so you are always just comparing two things, but what those two things are will chagne as you change you codings (think the concept of contrasts in ANOVA from last semester)

2 Models with no Interactions (1 Nominal variable)

2.1 Dummy Coding

  • The most common and basic type used (default in R if it senses a categorical variable)
  • For each variable you must assign a reference group (a baseline) for each variable
  • Each group is compared to the reference group
  • We ask the question, how much does each group deviate from the reference
  • Here are two levels of one factor
Variable C1
Female 0
Male 1
  • Here are Three levels, but now you have to make 2 new variables
  • Female is the reference group
Variable C1 C2
Female 0 0
Male 0 1
Trans 1 1
  • Male is the reference group
Variable C1 C2
Female 0 1
Male 0 0
Trans 1 1
  • Here are Four levels, but now you have to make 3 new variables
  • Female is the reference group
Variable C1 C2 C3
Cis Female 0 0 0
Cis Male 0 0 1
Trans Female 0 1 0
Trans Male 1 0 0

2.1.1 Regression equation

  • Two Levels: \(Y = B_1C1 + B_0\)
  • Three Levels: \(Y = B_1C1 +B_2C2+ B_0\)
  • Four Levels: \(Y = B_1C1 + B_2C2 + B_2C3 + B_0\)

2.1.2 Creating Dummy Variables

  • In SPSS you would have to HAND create dummy variables using Recode, but it R its a little easier
  • In R, we simply have to convert our variable into a factor
  • R will default to alphabetical order, so easiest to work with numbers (not words)
  • Best practice in R: Convert all your words into numbers (start with 0)
  • Next, convert your variable into a factor (as.factor)
  • First, lets simulate a simple factor to work with
  • Emotion rating ~ Expressive intentions of the actor (0 = flat, 1 = Normal, 2 = Exaggerated )
  • Three Levels: \(Emotion Rating = 0*Flat + 5*Normal +10*Exaggerated + 50 + \epsilon\)
Variable C1 C2
Flat 0 0
Normal 1 0
Exaggerated 0 1
library(car)
#Set up simulation
set.seed(42)
N <- 200
X<- sample(rep(c(0,1,2),N),N,replace = FALSE)
# Our equation to create Y
Y <- 5*X + 50 + rnorm(N, sd=10)
#Built our data frame
Emotion.Data<-data.frame(Emotion=Y,Style=X)

scatterplot(Emotion~Style, data= Emotion.Data, reg.line=FALSE, smoother=loessLine)

  • If the data is already coded as 0, 1,2 all we have to do is (also note R will not scatterplot it the same way)
Emotion.Data$StyleD<-as.factor(Emotion.Data$Style)
scatterplot(Emotion~StyleD, data= Emotion.Data, reg.line=FALSE, smoother=loessLine)

## [1] "169" "18"  "187"
  • if the data were coded as words (Flat, Normal, Exaggerated), R MIGHT have put Exaggerated as baseline
  • We can fix that manually like by creating the factor ourselves
# Here I will convert the data into words first (cause I had to simulate numbers)
Emotion.Data$StyleF <- factor(Emotion.Data$Style,
                              level=c(0,1,2),
                              labels=c("Flat", "Normal", "Exaggerated"))

Emotion.Data$StyleN<- relevel(Emotion.Data$StyleF, ref = "Normal")

scatterplot(Emotion~StyleF, data= Emotion.Data, reg.line=FALSE, smoother=loessLine)

## [1] "169" "18"  "187"

2.1.3 Interpret Regression

  • Order of the factors matters
  • Remember you are making each level of the variable a term in the equation as you have made a new variable for each level (except baseline)
  • R make these dummy coded automatically so its easy to forget this
  • for 3 levels we will have 3 terms: intercept and Style 1 and Style 2.
library(stargazer)
Model.1<-lm(Emotion ~ StyleD, data = Emotion.Data)
stargazer(Model.1,type="html",
          intercept.bottom = FALSE,
          single.row=TRUE, 
          notes.append = FALSE,
          omit.stat=c("ser"),
          star.cutoffs = c(0.05, 0.01, 0.001),
          header=FALSE)
Dependent variable:
Emotion
Constant 49.726*** (1.256)
StyleD1 5.099** (1.735)
StyleD2 9.349*** (1.674)
Observations 200
R2 0.137
Adjusted R2 0.128
F Statistic 15.597*** (df = 2; 197)
Note: p<0.05; p<0.01; p<0.001
  • if we use our factor coded as words so we can read it easier
Model.1.F<-lm(Emotion ~ StyleF, data = Emotion.Data)
stargazer(Model.1.F,type="html",
          intercept.bottom = FALSE,
          single.row=TRUE, 
          notes.append = FALSE,
          omit.stat=c("ser"),
          star.cutoffs = c(0.05, 0.01, 0.001),
          header=FALSE)
Dependent variable:
Emotion
Constant 49.726*** (1.256)
StyleFNormal 5.099** (1.735)
StyleFExaggerated 9.349*** (1.674)
Observations 200
R2 0.137
Adjusted R2 0.128
F Statistic 15.597*** (df = 2; 197)
Note: p<0.05; p<0.01; p<0.001

2.1.3.1 Coefficients

  • Intercept = Intercept of the equation (Which is the mean of Flat)
  • StyleFNormal = coefficient (slope of Normal) from baseline (Flat)
  • StyleFExaggerated = coefficient (slope of Exaggerated) from baseline (Flat)

2.1.3.2 Means per Condition

  • Flat = Intercept
  • Normal = Intercept + StyleFNormal
  • Exaggerated = Intercept + StyleFExaggerated

2.1.3.3 Pvalues

  • The pvalue on the intercept asks if the baseline (Flat) condition different from zero
  • The pvalue on the slopes asks if each of the other levels is different from baseline (Flat)

2.1.4 Rotate the matrix

  • What if we want to know if Normal is different from Exaggerated?
  • We need to relevel (aka change what is zero)
Variable C1 C2
Flat 1 0
Normal 0 0
Exaggerated 0 1
# Here I will convert the data into words first (cause I had to simulate numbers)
Emotion.Data$StyleN<- relevel(Emotion.Data$StyleF, ref = "Normal")
scatterplot(Emotion~StyleN, data= Emotion.Data, reg.line=FALSE, smoother=loessLine)

## [1] "169" "18"  "187"
  • Notice in the table the labels and values have changed
Model.1.N<-lm(Emotion ~ StyleN, data = Emotion.Data)
stargazer(Model.1.N,type="html",
          intercept.bottom = FALSE,
          single.row=TRUE, 
          notes.append = FALSE,
          omit.stat=c("ser"),
          star.cutoffs = c(0.05, 0.01, 0.001),
          header=FALSE)
Dependent variable:
Emotion
Constant 54.825*** (1.197)
StyleNFlat -5.099** (1.735)
StyleNExaggerated 4.249** (1.630)
Observations 200
R2 0.137
Adjusted R2 0.128
F Statistic 15.597*** (df = 2; 197)
Note: p<0.05; p<0.01; p<0.001

2.2 Contrast Coding

2.2.1 Deviation Coding

  • Note: R calls them Contrast Sums
  • Lets make the Flat the reference again
Variable C1 C2
Normal 0 1
Exaggerated 1 0
Flat -1 -1
  • For deviatation code, R will use the LAST variable as references so we need to recode it by hand to be first as to match the order we want in the above table
Emotion.Data$Style.C.S <- factor(Emotion.Data$Style,
                              level=c(1,2,0),
                              labels=c("Normal", "Exaggerated", "Flat"))
contrasts(Emotion.Data$Style.C.S) = contr.sum
attributes(Emotion.Data$Style.C.S)$contrasts
##             [,1] [,2]
## Normal         1    0
## Exaggerated    0    1
## Flat          -1   -1
  • Lets look at how things differ from the mean of all conditions (grand mean)
Model.1.CS<-lm(Emotion ~ Style.C.S, data = Emotion.Data)
stargazer(Model.1.CS,type="html",
          intercept.bottom = FALSE,
          single.row=TRUE, 
          notes.append = FALSE,
          omit.stat=c("ser"),
          star.cutoffs = c(0.05, 0.01, 0.001),
          header=FALSE)
Dependent variable:
Emotion
Constant 54.542*** (0.686)
Style.C.S1 0.283 (0.974)
Style.C.S2 4.533*** (0.937)
Observations 200
R2 0.137
Adjusted R2 0.128
F Statistic 15.597*** (df = 2; 197)
Note: p<0.05; p<0.01; p<0.001
  • Intercept = Mean of means (grand mean)
library(knitr)
#Mean of each condition
Mean.Interaction<-aggregate(Emotion~StyleF,data = Emotion.Data, FUN=mean)
kable(Mean.Interaction)
StyleF Emotion
Flat 49.72607
Normal 54.82525
Exaggerated 59.07459
#grand mean = 
sum(aggregate(Emotion~StyleF,data = Emotion.Data, FUN=mean)[2])/3
## [1] 54.54197

2.2.1.1 Coefficients

  • Style.C.S1 = coefficient on Normal as it differs from grand mean
  • Style.C.S2 = coefficient on Exaggerated as it differs from grand mean
  • To get the means for each condition:

2.2.1.2 Means per Condition

  • Flat = Cannot get from this model
  • Normal = Intercept + Style.C.S1
  • Exaggerated = Intercept + Style.C.S2

2.2.1.3 Pvalues

  • The pvalue on the intercept asks if the grand mean different from zero
  • The pvalue on the slopes asks if each of the other levels is different from grand mean
  • Note: We have lost any ability to ask about Flat condition, to get it back we would have to rotate the baseline and do it again

2.2.2 Simple Coding

  • They must sum to zero and the abs(values) must sum to 1
  • Like in ANOVA you can design these to ask specific questions by merging conditions (or not)
  • We will use the contr.treatment treatment (which R is using to automatically convert your categorical variables already)
  • Just like dummy coding but we change the meaning of the intercept
  • Reference level is first again
Variable C1 C2
Flat -1/3 -1/3
Normal 2/3 -1/3
Exaggerated -1/3 2/3
Emotion.Data$Style.Simple <- factor(Emotion.Data$Style,
                              level=c(0,1,2),
                              labels=c("Flat", "Normal", "Exaggerated"))

Levels<-3
Simple.1<-contr.treatment(Levels)
#Dummy code
Simple.1
##   2 3
## 1 0 0
## 2 1 0
## 3 0 1
#Make your custom codes
my.coding<-matrix(rep(1/Levels, Levels*(Levels-1)), ncol=Levels-1)
my.simple<-Simple.1-my.coding
my.simple
##            2          3
## 1 -0.3333333 -0.3333333
## 2  0.6666667 -0.3333333
## 3 -0.3333333  0.6666667
contrasts(Emotion.Data$Style.Simple)<-my.simple
Model.1.Simple<-lm(Emotion ~ Style.Simple, data = Emotion.Data)
stargazer(Model.1.Simple,type="html",
          intercept.bottom = FALSE,
          single.row=TRUE, 
          notes.append = FALSE,
          omit.stat=c("ser"),
          star.cutoffs = c(0.05, 0.01, 0.001),
          header=FALSE)
Dependent variable:
Emotion
Constant 54.542*** (0.686)
Style.Simple2 5.099** (1.735)
Style.Simple3 9.349*** (1.674)
Observations 200
R2 0.137
Adjusted R2 0.128
F Statistic 15.597*** (df = 2; 197)
Note: p<0.05; p<0.01; p<0.001

2.2.2.1 Coefficients

  • This merges the dummy and devience codings thus:
  • Intercept = Mean of means (grand mean)
  • StyleFNormal = slope of Normal from baseline (Flat)
  • StyleFExaggerated = slope of Exaggerated from baseline (Flat)

2.2.2.2 Means per Condition

-You cannot get means of each condition from this model

2.2.2.3 Pvalues

  • The pvalue on the intercept asks if the grand mean different from zero
  • The pvalue on the slopes asks if each of the other levels is different from baseline (Flat)

2.2.3 Other types of contrast coding

  • Helmert: compares each level to the mean of the subsequent levels [can be reversed]
  • Forward Difference coding: one level is compared to the next (adjacent) level. (level 1 vs 2. level 2 vs 3) [can be reversed]
  • Custom: Merge and compare levels at will

2.2.3.1 Center variables and test slopes

  • There is no reason why you cannot treat the variables as continuous is they are ordinal
  • just center them and treat like continuous variable

3 Models with Interactions (2 Nominal variables)

  • Interactions can be tricky as the interpretation depends on how you code IVs
  • First lets try some categorical vs categorical interactions
  • We will add a variable to our experiment from above, we will add the location where the actors work (Movie vs Theatre)
  • Emotion rating ~ Expressive intentions*location of the actor
  • \(Emotion Rating = -3*Flat + 0*Normal +3*Exaggerated +0*Movie - 1.5*Theatre - 0*Flat*Movie + 0*Normal*Movie +0*Exaggerated*Movie - 10*Flat*Theatre +0*Normal*Theatre +10*Exaggerated*Theatre + 50 + \epsilon\)
#Set up simulation

set.seed(42)
N <- 200
X <- sample(rep(c(-1,0,1),N),N,replace = FALSE)
Z <- sample(rep(c(0,1),N*3/2),N,replace = FALSE)

# Our equation to create Y
Y <- 3*X -1.5*Z+10*X*Z+ 50 + rnorm(N, sd=10)
#Built our data frame
Emotion.Data.2<-data.frame(Emotion=Y,Style=X,Location=Z)
Emotion.Data.2$Style<-Emotion.Data.2$Style+1

3.1 Dummy coding

  • Lets dummy code all of them: convert them to a label in the other we want
# Convert all our factors
Emotion.Data.2$StyleF <- factor(Emotion.Data.2$Style,
                              level=c(0,1,2),
                              labels=c("Flat", "Normal", "Exaggerated"))

Emotion.Data.2$LocationF <- factor(Emotion.Data.2$Location,
                              level=c(0,1),
                              labels=c("Movie Set", "Theatre"))

-Lets look at the collapsed means and the means per cell

Mean.Style<-aggregate(Emotion~StyleF,data = Emotion.Data.2, FUN=mean)
kable(Mean.Style)
StyleF Emotion
Flat 42.31459
Normal 47.71393
Exaggerated 57.61318
Mean.Location<-aggregate(Emotion~LocationF,data = Emotion.Data.2, FUN=mean)
kable(Mean.Location)
LocationF Emotion
Movie Set 50.51464
Theatre 49.22526
Mean.Interaction<-aggregate(Emotion~StyleF*LocationF,data = Emotion.Data.2, FUN=mean)
kable(Mean.Interaction)
StyleF LocationF Emotion
Flat Movie Set 48.09662
Normal Movie Set 49.25575
Exaggerated Movie Set 53.42796
Flat Theatre 37.75905
Normal Theatre 45.54397
Exaggerated Theatre 61.79840
  • Let examine our regression, but we need to examine a main effects and interaction model
Model.I1.Dummy<-lm(Emotion ~ StyleF+LocationF, data = Emotion.Data.2)
Model.I2.Dummy<-lm(Emotion ~ StyleF*LocationF, data = Emotion.Data.2)

stargazer(Model.I1.Dummy,Model.I2.Dummy,type="html",
          column.labels = c("Main Effects", "Interaction"),
          intercept.bottom = FALSE,
          single.row=TRUE, 
          notes.append = FALSE,
          omit.stat=c("ser"),
          star.cutoffs = c(0.05, 0.01, 0.001),
          header=FALSE)
Dependent variable:
Emotion
Main Effects Interaction
(1) (2)
Constant 42.880*** (1.558) 48.097*** (1.857)
StyleFNormal 5.254** (1.848) 1.159 (2.409)
StyleFExaggerated 15.239*** (1.774) 5.331* (2.409)
LocationFTheatre -1.011 (1.454) -10.338*** (2.482)
StyleFNormal:LocationFTheatre 6.626 (3.441)
StyleFExaggerated:LocationFTheatre 18.708*** (3.298)
Observations 200 200
R2 0.288 0.395
Adjusted R2 0.277 0.379
F Statistic 26.460*** (df = 3; 196) 25.285*** (df = 5; 194)
Note: p<0.05; p<0.01; p<0.001
  • You will notice large differences between the main effects and interaction model
  • This is because the meaning coefficients in the interaction differ from the main effects model
  • The main effects were explained above, but they change meaning alittle because there is second variable

3.1.1 Main Effects

3.1.1.1 Coefficients

  • These do not connect back to the individual cells as you are assuming the two factors have independent effects
  • Thus these are hypthotical values (not connected to the individual cells)
  • Intercept = Estimate of the mean, assuming the two factors have independent effects at their 0 points [baseline]
  • StyleFNormal = Normal - Flat (assuming the two factors are independent)
  • StyleFExaggerated = Exaggerated - Flat (assuming the two factors are independent)
  • LocationFTheatre = Theatre - Movie(assuming the two factors are independent)

3.1.1.2 Means per Condition

  • You cannot get them

3.1.1.3 Pvalues

  • The pvalue on the intercept asks if the baseline is different from zero
  • The pvalue on the differences asks if they are from different from baseline

3.1.2 Interactions

3.1.2.1 Coefficients

  • These cells connect back to experimental cells
  • Intercept = Location(0) @ Style(0) aka Movie @ Flat [real value]
  • StyleFNormal = Differnce of [Location(0) @ Style(1)] - [Location(0) @ Style(0)]
  • StyleFExaggerated = Differnce of [Location(0) @ Style(2)] - [Location(0) @ Style(0)]
  • LocationFTheatre = Differnce of [Location(1) @ Style(0)] - [Location(0) @ Style(0)]
  • StyleFNormal:LocationFTheatre = Differnce of [Location(1) @ Style(1)] - [Location(0) @ Style(1) - Location(0) @ Style(0) + [Location(1) @ Style(0)] - [Location(0) @ Style(0)]
  • StyleFExaggerated:LocationFTheatre = Differnce of [Location(1) @ Style(2)] - [Location(0) @ Style(2) - Location(0) @ Style(0) + [Location(2) @ Style(0)] - [Location(0) @ Style(0)]

3.1.2.2 Means per Condition

  • You can find all means for all cells!
  • Flat @ Movie = Intercept
  • Normal @ Movie = Intercept + StyleFNormal
  • Exaggerated @ Movie = Intercept + StyleFExaggerated
  • Flat @ Theatre = Intercept + LocationFTheatre
  • Normal @ Theatre = Intercept + LocationFTheatre+StyleFNormal+StyleFNormal:LocationFTheatre
  • Exaggerated @ Theatre = Intercept + LocationFTheatre + StyleFExaggerated + StyleFExaggerated:LocationFTheatre

3.1.2.3 Pvalues

  • The pvalue on the intercept asks if the baseline is different from zero
  • The pvalue on the differences asks if they are from different from baseline
  • But these are now are now simple effects

3.1.3 Main effect model vs Interaction model

  • Since the results are clearly not additive, the main effect model is useful only in which to understand how the terms have changed
  • Gives some insight into the interaction

3.1.4 Graph Model

library(effects)
Main.1<-effect("LocationF", Model.I2.Dummy)
plot(Main.1, multiline = TRUE)

Main.2<-effect("StyleF", Model.I2.Dummy)
plot(Main.2, multiline = TRUE)

Inter.1<-effect("StyleF*LocationF", Model.I2.Dummy)
plot(Inter.1, multiline = TRUE)

3.1.5 Connection to between-subject ANOVA

  • Regression with only categorical variables can be converted into an ANOVA very easily
  • Thus it can be treated just like ANOVA
  • but the regression makes the ANOVA process moot as you can now the regression can be coded in such a way that you can test follow up tests in a way you want to test them all
Anova(Model.I2.Dummy, type="III")
## Anova Table (Type III tests)
## 
## Response: Emotion
##                  Sum Sq  Df  F value    Pr(>F)    
## (Intercept)       60145   1 671.1619 < 2.2e-16 ***
## StyleF              535   2   2.9837   0.05293 .  
## LocationF          1554   1  17.3419 4.695e-05 ***
## StyleF:LocationF   3052   2  17.0304 1.534e-07 ***
## Residuals         17385 194                       
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  • Note, we have to call for a Type III sum of squares to match SPSS output as R would default to Type I

3.2 Contrast Coding

3.2.1 Deviation Coding

  • Lets code each variable as we did above
Emotion.Data.2$Style.C.S <- factor(Emotion.Data.2$Style,
                              level=c(1,2,0),
                              labels=c("Normal", "Exaggerated", "Flat"))
Emotion.Data.2$Location.C.S <- factor(Emotion.Data.2$Location,
                              level=c(1,0),
                              labels=c("Theatre", "Movie Set"))

contrasts(Emotion.Data.2$Style.C.S) = contr.sum
contrasts(Emotion.Data.2$Location.C.S) = contr.sum
attributes(Emotion.Data.2$Style.C.S)$contrasts
##             [,1] [,2]
## Normal         1    0
## Exaggerated    0    1
## Flat          -1   -1
attributes(Emotion.Data.2$Location.C.S)$contrasts
##           [,1]
## Theatre      1
## Movie Set   -1
  • Lets look at how things differ from the mean of all conditions (grand mean)
Model.I1.C.S<-lm(Emotion ~ Style.C.S+Location.C.S, data = Emotion.Data.2)
Model.I2.C.S<-lm(Emotion ~ Style.C.S*Location.C.S, data = Emotion.Data.2)

stargazer(Model.I1.C.S,Model.I2.C.S,type="html",
          column.labels = c("Main Effects", "Interaction"),
          intercept.bottom = FALSE,
          single.row=TRUE, 
          notes.append = FALSE,
          omit.stat=c("ser"),
          star.cutoffs = c(0.05, 0.01, 0.001),
          header=FALSE)
Dependent variable:
Emotion
Main Effects Interaction
(1) (2)
Constant 49.205*** (0.726) 49.314*** (0.678)
Style.C.S1 -1.577 (1.036) -1.914* (0.966)
Style.C.S2 8.408*** (0.992) 8.300*** (0.924)
Location.C.S1 -0.506 (0.727) -0.946 (0.678)
Style.C.S1:Location.C.S1 -0.909 (0.966)
Style.C.S2:Location.C.S1 5.132*** (0.924)
Observations 200 200
R2 0.288 0.395
Adjusted R2 0.277 0.379
F Statistic 26.460*** (df = 3; 196) 25.285*** (df = 5; 194)
Note: p<0.05; p<0.01; p<0.001
  • Again we are testing between the grand mean and the individual terms

3.2.1.1 Main Effects

3.2.1.1.1 Coefficients
  • These do not connect back to the individual cells as you are assuming the two factors have independent effects
  • Thus these are hypthotical Grand mean values
  • Intercept = Grand Mean, but assuming the two factors have independent effects
  • Style.C.S1 = Normal (averaged over location) - Grand Mean (assuming the two factors are independent)
  • Style.C.S2 = Exaggerated (averaged over location) - Grand Mean (assuming the two factors are independent)
  • Location.C.S1 = Theatre (averaged over Style) - Grand Mean (assuming the two factors are independent)
3.2.1.1.2 Means per Condition
  • Cannot get from this model
3.2.1.1.3 Pvalues
  • The pvalue on the intercept asks if the baseline (Grand Mean) is different from zero
  • The pvalue on the slopes asks if each of the other levels is different from the baseline (Grand Mean)
  • Note we missing tests on Flat and Movie cell, we would need to rotate the matrix and run again

3.2.1.2 Interactions

3.2.1.2.1 Coefficients
  • Intercept = Grand Mean (all 6 cells mean averaged)
  • Style.C.S1 = Normal (averaged over location) from Grand Mean
  • Style.C.S2 = Exaggerated (averaged over location) from Grand Mean
  • Location.C.S1 = Theatre (averaged over Style) from Grand Mean
  • Style.C.S1:Location.C.S1 = difference of Normal @ Theatre from Grand Mean
  • Style.C.S2:Location.C.S1 = difference of Exaggerated @ Theatre from Grand Mean
3.2.1.2.2 Means per Condition
  • Cannot get from this model
3.2.1.2.3 Pvalues
  • The pvalue on the intercept asks if the baseline (Grand Mean) is different from zero
  • The pvalue on the differences asks if each of the other levels is from the (Grand Mean)
  • Note we missing tests on Flat and movie cell, we would need to rotate the matrix and run again

4 Models with Interactions (1 Nominal and 1 Continuous Variable)

  • We will change our experiment from above, we will add a likert scale which the actor rates how “good” they think the performance was [0 terrible to 7 excellent]
  • Emotion rating ~ Quality Rating*location of the actor
  • \(Emotion Rating = -3*qua + 0*Normal +3*Exaggerated +0*Movie - 1.5*Theatre - 0*Flat*Movie + 0*Normal*Movie +0*Exaggerated*Movie - 10*Flat*Theatre +0*Normal*Theatre +10*Exaggerated*Theatre + 50 + \epsilon\)
#Set up simulation
set.seed(42)
N <- 200
X <- runif(N,-3,3)
Z <- sample(rep(c(0,1),N),N,replace = FALSE)

# Our equation to create Y
Y <- 4*X +2*Z+8*X*Z+ 50 + rnorm(N, sd=10)
#Built our data frame
Emotion.Data.3<-data.frame(Emotion=Y,Quality=X,Location=Z)

4.1 Center the continuous variable

  • Best to center as its will help us interpret
# Convert all our factors
Emotion.Data.3$Quality.C <- scale(Emotion.Data.3$Quality, scale = FALSE)[,]

4.2 Dummy coding of our nominal variable

  • Dummy coding will allow us to interpret the interaction as simple slopes analysis
  • Lets dummy code Location
# Convert all our factors
Emotion.Data.3$LocationF <- factor(Emotion.Data.3$Location,
                                   level=c(0,1),
                                   labels=c("Movie Set","Theatre"))

4.2.1 Interpret Regression

Model.E3.1<-lm(Emotion ~ Quality.C+LocationF, data = Emotion.Data.3)
Model.E3.2<-lm(Emotion ~ Quality.C*LocationF, data = Emotion.Data.3)

stargazer(Model.E3.1,Model.E3.2,type="html",
          column.labels = c("Main Effects", "Interaction"),
          intercept.bottom = FALSE,
          single.row=TRUE, 
          notes.append = FALSE,
          omit.stat=c("ser"),
          star.cutoffs = c(0.05, 0.01, 0.001),
          header=FALSE)
Dependent variable:
Emotion
Main Effects Interaction
(1) (2)
Constant 49.344*** (1.243) 50.207*** (0.959)
Quality.C 7.970*** (0.502) 3.573*** (0.539)
LocationFTheatre 4.106* (1.755) 4.137** (1.350)
Quality.C:LocationFTheatre 9.044*** (0.772)
Observations 200 200
R2 0.562 0.742
Adjusted R2 0.557 0.738
F Statistic 126.183*** (df = 2; 197) 187.927*** (df = 3; 196)
Note: p<0.05; p<0.01; p<0.001

4.2.2 Plotting Interaction

  • Often times you will need to use the effects package and plot by hand
  • The rochchalk package will work for simple models like this one
library(rockchalk)
plotSlopes(Model.E3.2, plotx = "Quality.C", modx = "LocationF")

4.2.3 Main Effects

4.2.3.1 Coefficients & Pvalues

  • Intercept = Mean of Quality @ Movies [p = Is the intercept different from 0]
  • Quality.C = Quality.C slope [p = Is slope different from 0]
  • LocationFTheatre = Threatre difference from intercept [p = Is mean of Theatre different from movies @ mean of quality]

4.2.4 Interactions

4.2.4.1 Coefficients & Pvalues

  • Intercept = Mean of Quality @ Movies [p = Is the intercept different from 0]
  • Quality.C = Quality.C slope @ Movies [p = Is the simple slope at movies different from 0]
  • LocationFTheatre = Threatre difference from intercept [p = Is mean of Theatre different from movies @ mean of quality]
  • Quality.C:LocationFTheatre= Quality.C slope @ Theatre [p = Is the simple slope at Theatre different from 0]

4.3 Deviation coding of our nomimal variable

  • We will still center quality
  • This creates a more ANOVA like interpretation of the interaction
Emotion.Data.3$LocationD <- factor(Emotion.Data.3$Location,
                                   level=c(1,0),
                                   labels=c("Theatre","Movie Set"))
contrasts(Emotion.Data.3$LocationD) = contr.sum
attributes(Emotion.Data.3$LocationD)$contrasts
##           [,1]
## Theatre      1
## Movie Set   -1

4.3.1 Interpret Regression

Model.E3.D1<-lm(Emotion ~ Quality.C+LocationD, data = Emotion.Data.3)
Model.E3.D2<-lm(Emotion ~ Quality.C*LocationD, data = Emotion.Data.3)

stargazer(Model.E3.D1,Model.E3.D2,type="html",
          column.labels = c("Main Effects", "Interaction"),
          intercept.bottom = FALSE,
          single.row=TRUE, 
          notes.append = FALSE,
          omit.stat=c("ser"),
          star.cutoffs = c(0.05, 0.01, 0.001),
          header=FALSE)
Dependent variable:
Emotion
Main Effects Interaction
(1) (2)
Constant 51.397*** (0.872) 52.275*** (0.675)
Quality.C 7.970*** (0.502) 8.095*** (0.386)
LocationD1 2.053* (0.877) 2.068** (0.675)
Quality.C:LocationD1 4.522*** (0.386)
Observations 200 200
R2 0.562 0.742
Adjusted R2 0.557 0.738
F Statistic 126.183*** (df = 2; 197) 187.927*** (df = 3; 196)
Note: p<0.05; p<0.01; p<0.001

4.3.2 Main Effects

4.3.2.1 Coefficients & Pvalues

  • Intercept = Mean of Quality @ mean of Movie & Theatre (imaginary thing) [p = Is the intercept different from 0]
  • Quality.C = Quality.C slope [p = Is slope different from 0]
  • LocationFTheatre = Threatre difference from 0 (not - 1) thus is 1/2 the slope of the dummy code [p = Is mean of Theatre different from movies @ mean of quality]

4.3.3 Interactions

4.3.3.1 Coefficients & Pvalues

  • Intercept = Mean of Quality @ mean of Movie & Theatre (imaginary thing) [p = Is the intercept different from 0]
  • Quality.C = Quality.C slope @ mean of Movie & Theatre (imaginary thing) [p = Is the Main effect of quality @ mean of Movie & Theatre]
  • LocationFTheatre = Threatre difference from intercept [p = Is the Main effect of Location @ mean of quality]
  • Quality.C:LocationFTheatre= Quality.C slope difference between Movie and theatre [p = Is the interaction]
  • Note all coefficients are 1/2 of the size they should be!

4.4 Simple coding of our nomimal variable

  • We will still center quality
  • This creates a more ANOVA like interpretation of the interaction,
  • Most importantly it does not screw up our coefficients as they match the dummy code values (which make the most sense)
  • This time I will hand code the it into a factor (easy with only 2 levels)
Emotion.Data.3$LocationS<-as.numeric(Emotion.Data.3$Location)-.5

4.4.1 Interpret Regression

Model.E3.S1<-lm(Emotion ~ Quality.C+LocationS, data = Emotion.Data.3)
Model.E3.S2<-lm(Emotion ~ Quality.C*LocationS, data = Emotion.Data.3)

stargazer(Model.E3.S1,Model.E3.S2,type="html",
          column.labels = c("Main Effects", "Interaction"),
          intercept.bottom = FALSE,
          single.row=TRUE, 
          notes.append = FALSE,
          omit.stat=c("ser"),
          star.cutoffs = c(0.05, 0.01, 0.001),
          header=FALSE)
Dependent variable:
Emotion
Main Effects Interaction
(1) (2)
Constant 51.397*** (0.872) 52.275*** (0.675)
Quality.C 7.970*** (0.502) 8.095*** (0.386)
LocationS 4.106* (1.755) 4.137** (1.350)
Quality.C:LocationS 9.044*** (0.772)
Observations 200 200
R2 0.562 0.742
Adjusted R2 0.557 0.738
F Statistic 126.183*** (df = 2; 197) 187.927*** (df = 3; 196)
Note: p<0.05; p<0.01; p<0.001

4.4.2 Main Effects

4.4.2.1 Coefficients & Pvalues

  • Intercept = Mean of Quality @ mean of Movie & Theatre (imaginary thing) [p = Is the intercept different from 0]
  • Quality.C = Quality.C slope [p = Is slope different from 0]
  • LocationFTheatre = Threatre difference from 0 [p = Is mean of Theatre different from movies @ mean of quality]

4.4.3 Interactions

4.4.3.1 Coefficients & Pvalues

  • Intercept = Mean of Quality @ mean of Movie & Theatre (imaginary thing) [p = Is the intercept different from 0]
  • Quality.C = Quality.C slope @ mean of Movie & Theatre (imaginary thing) [p = Is the Main effect of quality @ mean of Movie & Theatre]
  • LocationS = Threatre difference from intercept [p = Is the Main effect of Location @ mean of quality]
  • Quality.C:LocationS= Quality.C slope difference between Movie and theatre [p = Is their an interaction]
  • Note all coefficients are their proper size now (to match true differences)
  • Best option is often dummy or simple, deviation here is weird
---
title: 'Categorical Variables'
output:
  html_document:
    code_download: yes
    fontsize: 8pt
    highlight: textmate
    number_sections: yes
    theme: flatly
    toc: yes
    toc_float:
      collapsed: no
---


```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
knitr::opts_chunk$set(fig.width=5)
knitr::opts_chunk$set(fig.height=3.75)
knitr::opts_chunk$set(fig.align='center') 
```

# Nominal Variables 
- Things like gender or color (terms we would have as factors in ANOVA)
- Decisions have to made on how to treat these variables in a regression
- We will assume (for now) that a subject can be a member of *only one* level of a factor a variable (i.e., you can blue or red, but not both)
- There are three basic methods we will cover and how to interpret their effects and interactions
- Key to remember you are basically doing to t-tests so you are always just comparing two things, but what those two things are will chagne as you change you codings (think the concept of contrasts in ANOVA from last semester)

# Models with no Interactions (1 Nominal variable)

## Dummy Coding
- The most common and basic type used (default in R if it senses a categorical variable)
- For each variable you must assign a reference group (a baseline) for each variable
- Each group is compared to the reference group
- We ask the question, *how much does each group deviate from the reference*
- Here are two levels of one factor

Variable | C1 
---------| ---
Female   | 0
Male     | 1

- Here are Three levels, but now you have to make 2 new variables 
- Female is the reference group

Variable| C1| C2
--------| --| --
Female  | 0 | 0
Male    | 0 | 1
Trans   | 1 | 1

- Male is the reference group

Variable| C1| C2
--------| --| ---
Female  | 0 | 1
Male    | 0 | 0
Trans   | 1 | 1

- Here are Four levels, but now you have to make 3 new variables 
- Female is the reference group

Variable     | C1 | C2 | C3
-------------| -- | -- | -- 
Cis Female   | 0  | 0  | 0
Cis Male     | 0  | 0  | 1
Trans Female | 0  | 1  | 0
Trans Male   | 1  | 0  | 0


### Regression equation
- Two Levels: $Y = B_1C1 + B_0$
- Three Levels: $Y = B_1C1 +B_2C2+ B_0$
- Four Levels: $Y = B_1C1 + B_2C2 + B_2C3 + B_0$

### Creating Dummy Variables
- In SPSS you would have to HAND create dummy variables using Recode, but it R its a little easier
- In R, we simply have to convert our variable into a factor
- R will default to alphabetical order, so easiest to work with numbers (not words)
- Best practice in R: Convert all your words into numbers (start with 0)
- Next, convert your variable into a factor (as.factor)
- First, lets simulate a simple factor to work with
- Emotion rating ~ Expressive intentions of the actor (0 = flat, 1 = Normal, 2 = Exaggerated )
- Three Levels: $Emotion Rating = 0*Flat + 5*Normal +10*Exaggerated + 50 + \epsilon$

Variable    | C1| C2
------------| --| ---
Flat        | 0 | 0
Normal      | 1 | 0
Exaggerated | 0 | 1


```{r, echo=TRUE, warning=FALSE}
library(car)
#Set up simulation
set.seed(42)
N <- 200
X<- sample(rep(c(0,1,2),N),N,replace = FALSE)
# Our equation to create Y
Y <- 5*X + 50 + rnorm(N, sd=10)
#Built our data frame
Emotion.Data<-data.frame(Emotion=Y,Style=X)

scatterplot(Emotion~Style, data= Emotion.Data, reg.line=FALSE, smoother=loessLine)
```

- If the data is already coded as 0, 1,2 all we have to do is (also note R will not scatterplot it the same way)
```{r, echo=TRUE, warning=FALSE}
Emotion.Data$StyleD<-as.factor(Emotion.Data$Style)
scatterplot(Emotion~StyleD, data= Emotion.Data, reg.line=FALSE, smoother=loessLine)
```
- if the data were coded as words (Flat, Normal, Exaggerated), R MIGHT have put Exaggerated as baseline
- We can fix that manually like by creating the factor ourselves 
```{r, echo=TRUE, warning=FALSE}
# Here I will convert the data into words first (cause I had to simulate numbers)
Emotion.Data$StyleF <- factor(Emotion.Data$Style,
                              level=c(0,1,2),
                              labels=c("Flat", "Normal", "Exaggerated"))

Emotion.Data$StyleN<- relevel(Emotion.Data$StyleF, ref = "Normal")

scatterplot(Emotion~StyleF, data= Emotion.Data, reg.line=FALSE, smoother=loessLine)
```

### Interpret Regression
- Order of the factors matters
- Remember you are making each level of the variable a term in the equation as you have made a new variable for each level (except baseline)
- *R make these dummy coded automatically so its easy to forget this*
- for 3 levels we will have 3 terms: intercept and Style 1 and Style 2.

```{r, echo=TRUE, warning=FALSE,message=FALSE,results='asis'}
library(stargazer)
Model.1<-lm(Emotion ~ StyleD, data = Emotion.Data)
stargazer(Model.1,type="html",
          intercept.bottom = FALSE,
          single.row=TRUE, 
          notes.append = FALSE,
          omit.stat=c("ser"),
          star.cutoffs = c(0.05, 0.01, 0.001),
          header=FALSE)
```
- if we use our factor coded as words so we can read it easier

```{r, echo=TRUE, warning=FALSE,message=FALSE,results='asis'}
Model.1.F<-lm(Emotion ~ StyleF, data = Emotion.Data)
stargazer(Model.1.F,type="html",
          intercept.bottom = FALSE,
          single.row=TRUE, 
          notes.append = FALSE,
          omit.stat=c("ser"),
          star.cutoffs = c(0.05, 0.01, 0.001),
          header=FALSE)
```

#### Coefficients
- Intercept = Intercept of the equation (Which is the mean of Flat)
- StyleFNormal = coefficient (slope of Normal) from baseline (Flat)
- StyleFExaggerated = coefficient (slope of Exaggerated) from baseline (Flat)

#### Means per Condition

- *Flat = Intercept*
- *Normal = Intercept + StyleFNormal*
- *Exaggerated = Intercept + StyleFExaggerated*

#### Pvalues
- The pvalue on the intercept asks if the baseline (*Flat*) condition different from zero
- The pvalue on the slopes asks if each of the other levels is different from baseline (*Flat*)


### Rotate the matrix
- What if we want to know if Normal is different from Exaggerated?
- We need to relevel (aka change what is zero)

Variable    | C1| C2
------------| --| ---
Flat        | 1 | 0
Normal      | 0 | 0
Exaggerated | 0 | 1


```{r, echo=TRUE, warning=FALSE}
# Here I will convert the data into words first (cause I had to simulate numbers)
Emotion.Data$StyleN<- relevel(Emotion.Data$StyleF, ref = "Normal")
scatterplot(Emotion~StyleN, data= Emotion.Data, reg.line=FALSE, smoother=loessLine)
```

- Notice in the table the labels and values have changed
```{r, echo=TRUE, warning=FALSE,message=FALSE,results='asis'}
Model.1.N<-lm(Emotion ~ StyleN, data = Emotion.Data)
stargazer(Model.1.N,type="html",
          intercept.bottom = FALSE,
          single.row=TRUE, 
          notes.append = FALSE,
          omit.stat=c("ser"),
          star.cutoffs = c(0.05, 0.01, 0.001),
          header=FALSE)
```

## Contrast Coding
- Same contrasts as in our ANOVAs but there are two versions (unweighted and weighted) and both need to sum to zero
- We ask now a different question, *how much does each group deviate from the mean of all groups*
- I will review only the most common see for the others:
- http://stats.idre.ucla.edu/r/library/r-library-contrast-coding-systems-for-categorical-variables/
- I will not show you the weighted version as not not often used
- Note R calls dummy coding as contr.treatment

### Deviation Coding 
- Note: R calls them Contrast Sums
- Lets make the Flat the reference again

Variable    | C1| C2
------------| --| ---
Normal      | 0 | 1
Exaggerated | 1 | 0
Flat        |-1 | -1

- For deviatation code, R will use the LAST variable as references so we need to recode it by hand to be first as to match the order we want in the above table

```{r, echo=TRUE, warning=FALSE}
Emotion.Data$Style.C.S <- factor(Emotion.Data$Style,
                              level=c(1,2,0),
                              labels=c("Normal", "Exaggerated", "Flat"))
contrasts(Emotion.Data$Style.C.S) = contr.sum
attributes(Emotion.Data$Style.C.S)$contrasts
```

- Lets look at how things differ from the mean of all conditions (grand mean)

```{r, echo=TRUE, warning=FALSE,message=FALSE,results='asis'}
Model.1.CS<-lm(Emotion ~ Style.C.S, data = Emotion.Data)
stargazer(Model.1.CS,type="html",
          intercept.bottom = FALSE,
          single.row=TRUE, 
          notes.append = FALSE,
          omit.stat=c("ser"),
          star.cutoffs = c(0.05, 0.01, 0.001),
          header=FALSE)
```

- Intercept = Mean of means (grand mean)

```{r, echo=TRUE, warning=FALSE,message=FALSE, results='asis'}
library(knitr)
#Mean of each condition
Mean.Interaction<-aggregate(Emotion~StyleF,data = Emotion.Data, FUN=mean)
kable(Mean.Interaction)
```

```{r, echo=TRUE, warning=FALSE,message=FALSE}
#grand mean = 
sum(aggregate(Emotion~StyleF,data = Emotion.Data, FUN=mean)[2])/3
```

#### Coefficients
- Style.C.S1 = coefficient on Normal as it differs from grand mean
- Style.C.S2 = coefficient on Exaggerated as it differs from grand mean
- To get the means for each condition: 

#### Means per Condition
- *Flat = Cannot get from this model*
- *Normal = Intercept + Style.C.S1*
- *Exaggerated = Intercept + Style.C.S2*

#### Pvalues
- The pvalue on the intercept asks if the *grand mean* different from zero
- The pvalue on the slopes asks if each of the other levels is different from *grand mean*
- Note: We have lost any ability to ask about Flat condition, to get it back we would have to rotate the baseline and do it again

###  Simple Coding
- They must sum to zero and the abs(values) must sum to 1
- Like in ANOVA you can design these to ask specific questions by merging conditions (or not)
- We will use the contr.treatment treatment (which R is using to automatically convert your categorical variables already)
- **Just like dummy coding but we change the meaning of the intercept**
- Reference level is first again

Variable    | C1   | C2
------------| ---- | ---
Flat        | -1/3 | -1/3
Normal      |  2/3 | -1/3
Exaggerated | -1/3 |  2/3

```{r, echo=TRUE, warning=FALSE,message=FALSE}
Emotion.Data$Style.Simple <- factor(Emotion.Data$Style,
                              level=c(0,1,2),
                              labels=c("Flat", "Normal", "Exaggerated"))

Levels<-3
Simple.1<-contr.treatment(Levels)
#Dummy code
Simple.1
#Make your custom codes
my.coding<-matrix(rep(1/Levels, Levels*(Levels-1)), ncol=Levels-1)
my.simple<-Simple.1-my.coding
my.simple
contrasts(Emotion.Data$Style.Simple)<-my.simple

```


```{r, echo=TRUE, warning=FALSE,message=FALSE,results='asis'}
Model.1.Simple<-lm(Emotion ~ Style.Simple, data = Emotion.Data)
stargazer(Model.1.Simple,type="html",
          intercept.bottom = FALSE,
          single.row=TRUE, 
          notes.append = FALSE,
          omit.stat=c("ser"),
          star.cutoffs = c(0.05, 0.01, 0.001),
          header=FALSE)
```

#### Coefficients
- This merges the dummy and devience codings thus: 
- Intercept = Mean of means (grand mean)
- StyleFNormal =  slope of Normal from baseline (Flat)
- StyleFExaggerated = slope of Exaggerated from baseline (Flat)

#### Means per Condition
-*You cannot get means of each condition from this model*

#### Pvalues
- The pvalue on the intercept asks if the *grand mean* different from zero
- The pvalue on the slopes asks if each of the other levels is different from baseline (*Flat*)


### Other types of contrast coding
- Helmert: compares each level to the mean of the subsequent levels [can be reversed]
- Forward Difference coding: one level is compared to the next (adjacent) level. (level 1 vs 2. level 2 vs 3) [can be reversed]
- Custom: Merge and compare levels at will

#### Center variables and test slopes
- There is no reason why you cannot treat the variables as continuous is they are *ordinal*
- just center them and treat like continuous variable

***************

# Models with Interactions (2 Nominal variables)
- Interactions can be tricky as the interpretation depends on how you code IVs
- First lets try some categorical vs categorical interactions
- We will add a variable to our experiment from above, we will add the location where the actors work (Movie vs Theatre)
- Emotion rating ~ Expressive intentions*location of the actor 
- $Emotion Rating = -3*Flat + 0*Normal +3*Exaggerated +0*Movie - 1.5*Theatre - 0*Flat*Movie + 0*Normal*Movie +0*Exaggerated*Movie - 10*Flat*Theatre +0*Normal*Theatre +10*Exaggerated*Theatre + 50 + \epsilon$


```{r, echo=TRUE, warning=FALSE}
#Set up simulation

set.seed(42)
N <- 200
X <- sample(rep(c(-1,0,1),N),N,replace = FALSE)
Z <- sample(rep(c(0,1),N*3/2),N,replace = FALSE)

# Our equation to create Y
Y <- 3*X -1.5*Z+10*X*Z+ 50 + rnorm(N, sd=10)
#Built our data frame
Emotion.Data.2<-data.frame(Emotion=Y,Style=X,Location=Z)
Emotion.Data.2$Style<-Emotion.Data.2$Style+1
```


## Dummy coding
- Lets dummy code all of them: convert them to a label in the other we want

```{r, echo=TRUE, warning=FALSE}
# Convert all our factors
Emotion.Data.2$StyleF <- factor(Emotion.Data.2$Style,
                              level=c(0,1,2),
                              labels=c("Flat", "Normal", "Exaggerated"))

Emotion.Data.2$LocationF <- factor(Emotion.Data.2$Location,
                              level=c(0,1),
                              labels=c("Movie Set", "Theatre"))

```

-Lets look at the collapsed means and the means per cell

```{r, echo=TRUE, warning=FALSE,message=FALSE,results='asis'}
Mean.Style<-aggregate(Emotion~StyleF,data = Emotion.Data.2, FUN=mean)
kable(Mean.Style)
Mean.Location<-aggregate(Emotion~LocationF,data = Emotion.Data.2, FUN=mean)
kable(Mean.Location)
Mean.Interaction<-aggregate(Emotion~StyleF*LocationF,data = Emotion.Data.2, FUN=mean)
kable(Mean.Interaction)
```

- Let examine our regression, but we need to examine a main effects and interaction model

```{r, echo=TRUE, warning=FALSE,message=FALSE,results='asis'}
Model.I1.Dummy<-lm(Emotion ~ StyleF+LocationF, data = Emotion.Data.2)
Model.I2.Dummy<-lm(Emotion ~ StyleF*LocationF, data = Emotion.Data.2)

stargazer(Model.I1.Dummy,Model.I2.Dummy,type="html",
          column.labels = c("Main Effects", "Interaction"),
          intercept.bottom = FALSE,
          single.row=TRUE, 
          notes.append = FALSE,
          omit.stat=c("ser"),
          star.cutoffs = c(0.05, 0.01, 0.001),
          header=FALSE)
```

- You will notice large differences between the main effects and interaction model
- This is because the meaning coefficients in the interaction differ from the main effects model
- The main effects were explained above, but they change meaning alittle because there is second variable

### Main Effects

#### Coefficients
- These do not connect back to the individual cells as you are assuming the two factors have independent effects
- Thus these are hypthotical values (not connected to the individual cells)
- Intercept = Estimate of the mean, assuming the two factors have independent effects at their 0 points [baseline]
- StyleFNormal = Normal - Flat (assuming the two factors are independent)
- StyleFExaggerated = Exaggerated - Flat (assuming the two factors are independent)
- LocationFTheatre = Theatre - Movie(assuming the two factors are independent)

#### Means per Condition 
- You cannot get them

#### Pvalues
- The pvalue on the intercept asks if the baseline is different from zero
- The pvalue on the differences asks if they are from different from baseline 

### Interactions

#### Coefficients
- These cells connect back to experimental cells
- Intercept = Location(0) @ Style(0) aka Movie @ Flat [real value]
- StyleFNormal = Differnce of [Location(0) @ Style(1)] - [Location(0) @ Style(0)]
- StyleFExaggerated = Differnce of [Location(0) @ Style(2)] - [Location(0) @ Style(0)]
- LocationFTheatre = Differnce of [Location(1) @ Style(0)] - [Location(0) @ Style(0)]
- StyleFNormal:LocationFTheatre = Differnce of [Location(1) @ Style(1)] - [Location(0) @ Style(1) - Location(0) @ Style(0) + [Location(1) @ Style(0)] - [Location(0) @ Style(0)]
- StyleFExaggerated:LocationFTheatre = Differnce of [Location(1) @ Style(2)] - [Location(0) @ Style(2) - Location(0) @ Style(0) + [Location(2) @ Style(0)] - [Location(0) @ Style(0)]


#### Means per Condition
- You can find all means for all cells! 
- *Flat @ Movie = Intercept*
- *Normal @ Movie = Intercept + StyleFNormal* 
- *Exaggerated @ Movie = Intercept + StyleFExaggerated* 
- *Flat @ Theatre = Intercept + LocationFTheatre*
- *Normal @ Theatre = Intercept + LocationFTheatre+StyleFNormal+StyleFNormal:LocationFTheatre* 
- *Exaggerated @ Theatre =  Intercept + LocationFTheatre + StyleFExaggerated + StyleFExaggerated:LocationFTheatre * 

#### Pvalues
- The pvalue on the intercept asks if the baseline is different from zero
- The pvalue on the differences asks if they are from different from baseline 
- But these are now are now simple effects 


### Main effect model vs Interaction model
- Since the results are clearly not additive, the main effect model is useful only in which to understand how the terms have changed
- Gives some insight into the interaction

### Graph Model
```{r, echo=TRUE, warning=FALSE,message=FALSE}
library(effects)
Main.1<-effect("LocationF", Model.I2.Dummy)
plot(Main.1, multiline = TRUE)
Main.2<-effect("StyleF", Model.I2.Dummy)
plot(Main.2, multiline = TRUE)
Inter.1<-effect("StyleF*LocationF", Model.I2.Dummy)
plot(Inter.1, multiline = TRUE)
```

### Connection to between-subject ANOVA
- Regression with only categorical variables can be converted into an ANOVA very easily 
- Thus it can be treated just like ANOVA
- but the regression makes the ANOVA process moot as you can now the regression can be coded in such a way that you can test follow up tests in a way you want to test them all 

```{r, echo=TRUE, warning=FALSE}
Anova(Model.I2.Dummy, type="III")
```

- Note, we have to call for a Type III sum of squares to match SPSS output as R would default to Type I


## Contrast Coding

### Deviation Coding
- Lets code each variable as we did above

```{r, echo=TRUE, warning=FALSE}
Emotion.Data.2$Style.C.S <- factor(Emotion.Data.2$Style,
                              level=c(1,2,0),
                              labels=c("Normal", "Exaggerated", "Flat"))
Emotion.Data.2$Location.C.S <- factor(Emotion.Data.2$Location,
                              level=c(1,0),
                              labels=c("Theatre", "Movie Set"))

contrasts(Emotion.Data.2$Style.C.S) = contr.sum
contrasts(Emotion.Data.2$Location.C.S) = contr.sum
attributes(Emotion.Data.2$Style.C.S)$contrasts
attributes(Emotion.Data.2$Location.C.S)$contrasts
```

- Lets look at how things differ from the mean of all conditions (grand mean)

```{r, echo=TRUE, warning=FALSE,message=FALSE,results='asis'}
Model.I1.C.S<-lm(Emotion ~ Style.C.S+Location.C.S, data = Emotion.Data.2)
Model.I2.C.S<-lm(Emotion ~ Style.C.S*Location.C.S, data = Emotion.Data.2)

stargazer(Model.I1.C.S,Model.I2.C.S,type="html",
          column.labels = c("Main Effects", "Interaction"),
          intercept.bottom = FALSE,
          single.row=TRUE, 
          notes.append = FALSE,
          omit.stat=c("ser"),
          star.cutoffs = c(0.05, 0.01, 0.001),
          header=FALSE)
```

- Again we are testing between the grand mean and the individual terms

#### Main Effects

##### Coefficients
- These do not connect back to the individual cells as you are assuming the two factors have independent effects
- Thus these are hypthotical *Grand mean* values 
- Intercept = *Grand Mean*, but assuming the two factors have independent effects
- Style.C.S1  = Normal (averaged over location) - *Grand Mean* (assuming the two factors are independent)
- Style.C.S2 = Exaggerated (averaged over location) - *Grand Mean* (assuming the two factors are independent)
- Location.C.S1 = Theatre (averaged over Style) -  *Grand Mean* (assuming the two factors are independent)

##### Means per Condition
- *Cannot get from this model*

##### Pvalues
- The pvalue on the intercept asks if the baseline (*Grand Mean*) is different from zero
- The pvalue on the slopes asks if each of the other levels is different from the baseline (*Grand Mean*)
- Note we missing tests on Flat and Movie cell, we would need to rotate the matrix and run again

#### Interactions
##### Coefficients
- Intercept =  *Grand Mean* (all 6 cells mean averaged)
- Style.C.S1 = Normal (averaged over location) from *Grand Mean*
- Style.C.S2 = Exaggerated (averaged over location) from *Grand Mean*
- Location.C.S1 = Theatre (averaged over Style) from *Grand Mean*
- Style.C.S1:Location.C.S1 = difference of Normal @ Theatre from *Grand Mean*
- Style.C.S2:Location.C.S1 = difference of Exaggerated @ Theatre from *Grand Mean*
 
##### Means per Condition
- *Cannot get from this model*

##### Pvalues
- The pvalue on the intercept asks if the baseline (*Grand Mean*) is different from zero
- The pvalue on the differences asks if each of the other levels is from the (*Grand Mean*)
- Note we missing tests on Flat and movie cell, we would need to rotate the matrix and run again


# Models with Interactions (1 Nominal and 1 Continuous Variable)
- We will change our experiment from above, we will add a likert scale which the actor rates how "good" they think the performance was [0 terrible to 7 excellent]
- Emotion rating ~ Quality Rating*location of the actor 
- $Emotion Rating = -3*qua + 0*Normal +3*Exaggerated +0*Movie - 1.5*Theatre - 0*Flat*Movie + 0*Normal*Movie +0*Exaggerated*Movie - 10*Flat*Theatre +0*Normal*Theatre +10*Exaggerated*Theatre + 50 + \epsilon$


```{r, echo=TRUE, warning=FALSE}
#Set up simulation
set.seed(42)
N <- 200
X <- runif(N,-3,3)
Z <- sample(rep(c(0,1),N),N,replace = FALSE)

# Our equation to create Y
Y <- 4*X +2*Z+8*X*Z+ 50 + rnorm(N, sd=10)
#Built our data frame
Emotion.Data.3<-data.frame(Emotion=Y,Quality=X,Location=Z)


```



## Center the continuous variable
- Best to center as its will help us interpret 
```{r, echo=TRUE, warning=FALSE}
# Convert all our factors
Emotion.Data.3$Quality.C <- scale(Emotion.Data.3$Quality, scale = FALSE)[,]

```


## Dummy coding of our nominal variable
- Dummy coding will allow us to interpret the interaction as simple slopes analysis
- Lets dummy code Location

```{r, echo=TRUE, warning=FALSE}
# Convert all our factors
Emotion.Data.3$LocationF <- factor(Emotion.Data.3$Location,
                                   level=c(0,1),
                                   labels=c("Movie Set","Theatre"))

```

### Interpret Regression

```{r, echo=TRUE, warning=FALSE,message=FALSE,results='asis'}
Model.E3.1<-lm(Emotion ~ Quality.C+LocationF, data = Emotion.Data.3)
Model.E3.2<-lm(Emotion ~ Quality.C*LocationF, data = Emotion.Data.3)

stargazer(Model.E3.1,Model.E3.2,type="html",
          column.labels = c("Main Effects", "Interaction"),
          intercept.bottom = FALSE,
          single.row=TRUE, 
          notes.append = FALSE,
          omit.stat=c("ser"),
          star.cutoffs = c(0.05, 0.01, 0.001),
          header=FALSE)
```

### Plotting Interaction
- Often times you will need to use the effects package and plot by hand
- The rochchalk package will work for simple models like this one

```{r, echo=TRUE, warning=FALSE,message=FALSE}
library(rockchalk)
plotSlopes(Model.E3.2, plotx = "Quality.C", modx = "LocationF")
```

### Main Effects

#### Coefficients & Pvalues
- Intercept = Mean of Quality @ Movies [p = Is the intercept different from 0]
- Quality.C  = Quality.C slope [p = Is slope different from 0]
- LocationFTheatre = Threatre difference from intercept [p = Is mean of Theatre different from movies @ mean of quality]

### Interactions
#### Coefficients & Pvalues
- Intercept = Mean of Quality @ Movies [p = Is the intercept different from 0]
- Quality.C  = Quality.C slope @ Movies [p = Is the simple slope at movies different from 0]
- LocationFTheatre = Threatre difference from intercept [p = Is mean of Theatre different from movies @ mean of quality]
- Quality.C:LocationFTheatre= Quality.C slope @ Theatre [p = Is the simple slope at Theatre different from 0]
 

## Deviation coding of our nomimal variable
- We will still center quality
- This creates a more ANOVA like interpretation of the interaction 
```{r, echo=TRUE, warning=FALSE}
Emotion.Data.3$LocationD <- factor(Emotion.Data.3$Location,
                                   level=c(1,0),
                                   labels=c("Theatre","Movie Set"))
contrasts(Emotion.Data.3$LocationD) = contr.sum
attributes(Emotion.Data.3$LocationD)$contrasts
```

### Interpret Regression

```{r, echo=TRUE, warning=FALSE,message=FALSE,results='asis'}
Model.E3.D1<-lm(Emotion ~ Quality.C+LocationD, data = Emotion.Data.3)
Model.E3.D2<-lm(Emotion ~ Quality.C*LocationD, data = Emotion.Data.3)

stargazer(Model.E3.D1,Model.E3.D2,type="html",
          column.labels = c("Main Effects", "Interaction"),
          intercept.bottom = FALSE,
          single.row=TRUE, 
          notes.append = FALSE,
          omit.stat=c("ser"),
          star.cutoffs = c(0.05, 0.01, 0.001),
          header=FALSE)
```


### Main Effects

#### Coefficients & Pvalues
- Intercept = Mean of Quality @ mean of Movie & Theatre (imaginary thing) [p = Is the intercept different from 0]
- Quality.C  = Quality.C slope [p = Is slope different from 0]
- LocationFTheatre = Threatre difference from 0 (not - 1) thus is 1/2 the slope of the dummy code [p = Is mean of Theatre different from movies @ mean of quality]

### Interactions
#### Coefficients & Pvalues 
- Intercept = Mean of Quality @ mean of Movie & Theatre (imaginary thing) [p = Is the intercept different from 0]
- Quality.C  = Quality.C slope @ mean of Movie & Theatre (imaginary thing) [p = Is the Main effect of quality @ mean of Movie & Theatre]
- LocationFTheatre = Threatre difference from intercept [p = Is the Main effect of Location @ mean of quality]
- Quality.C:LocationFTheatre= Quality.C slope difference between Movie and theatre [p = Is the interaction]
- Note all coefficients are 1/2 of the size they should be! 
 
 
 
## Simple coding of our nomimal variable
- We will still center quality
- This creates a more ANOVA like interpretation of the interaction,
- Most importantly it does not screw up our coefficients as they match the dummy code values (which make the most sense)
- This time I will hand code the it into a factor (easy with only 2 levels)
```{r, echo=TRUE, warning=FALSE}
Emotion.Data.3$LocationS<-as.numeric(Emotion.Data.3$Location)-.5

```

### Interpret Regression

```{r, echo=TRUE, warning=FALSE,message=FALSE,results='asis'}
Model.E3.S1<-lm(Emotion ~ Quality.C+LocationS, data = Emotion.Data.3)
Model.E3.S2<-lm(Emotion ~ Quality.C*LocationS, data = Emotion.Data.3)

stargazer(Model.E3.S1,Model.E3.S2,type="html",
          column.labels = c("Main Effects", "Interaction"),
          intercept.bottom = FALSE,
          single.row=TRUE, 
          notes.append = FALSE,
          omit.stat=c("ser"),
          star.cutoffs = c(0.05, 0.01, 0.001),
          header=FALSE)
```


### Main Effects

#### Coefficients & Pvalues
- Intercept = Mean of Quality @ mean of Movie & Theatre (imaginary thing) [p = Is the intercept different from 0]
- Quality.C  = Quality.C slope [p = Is slope different from 0]
- LocationFTheatre = Threatre difference from 0 [p = Is mean of Theatre different from movies @ mean of quality]

### Interactions

#### Coefficients & Pvalues 
- Intercept = Mean of Quality @ mean of Movie & Theatre (imaginary thing) [p = Is the intercept different from 0]
- Quality.C  = Quality.C slope @ mean of Movie & Theatre (imaginary thing) [p = Is the Main effect of quality @ mean of Movie & Theatre]
- LocationS = Threatre difference from intercept [p = Is the Main effect of Location @ mean of quality]
- Quality.C:LocationS= Quality.C slope difference between Movie and theatre [p = Is their an interaction]
- Note all coefficients are their proper size now (to match true differences)
- Best option is often dummy or simple, deviation here is weird 



<script>
  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
  m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
  })(window,document,'script','https://www.google-analytics.com/analytics.js','ga');

  ga('create', 'UA-90415160-1', 'auto');
  ga('send', 'pageview');

</script>