Mixed Models

  • Multiple regressions only have fixed-effects [continuous] or fixed-factors [groups]: variables of interest (often your manipulation)
  • The only random factor assumed is your subject’s response
    • What if you have other effects or factors that might add random variation?
  • Random-effects/factors: Levels randomly sampled from a much larger population: Something you want to generalize across: Subjects, Words, Pictures, etc

Type of Mixed Models

  • We will explore crossed designs

Nested designs

  • Problem: Clustering means each student’s scores within each classroom might be correlated because they all have the same teacher (for example)
    • Multilevel/Hierarchical designs: Students nested in classrooms [Clustered data]

Crossed designs

  • Problem: Each person’s response is going to inter-correlated with their other responses
    • Repeated measures: everyone does all the conditions
    • Longitudinal data: People are tracked over time
      • Growth Curve Models: Fitting functions (like polynomials) to behavior over time

Crossed & Nested designs

  • Now each person’s response is going to inter-correlated with their other responses, but that correlation might differ by which group they belong too
    • Longitudinal Multilevel Designs: Students nested in classrooms measured over many months
    • Partially Nested design: Different groups are measures over time

Inter-Correlations

  • All these correlations inflate your type I error if you don’t treat the data correctly

Practice data to make sense of random effects

  • Hypothetical Experiment, n = 6
  • DV: % of how happy they are
  • IV1: Time of measurement: 5 time windows
  • IV2: Social Setting: Alone vs With a group

As 2x5 RM ANOVA

  • Treats Time-Steps as a factor
  • Are there any differences between any time-steps?
  • Not slopes (predictable changes in time)
  • Treats Social as a factor
  • Is there a difference between Alone and With Group
  • Questions: Is there an effect of time and Social?
library(lme4)     #mixed model package by Douglas Bates
library(afex)     #easy ANOVA package 
library(ggplot2)  #GGplot package for visualizing data
source("HelperFunctions.R") # some custom functions
###Read in Data file
HappyData <- read.csv("Mixed/HappyStudy.csv", header=T)
### Label variables
HappyData$Social <- factor(HappyData$SocialDummy,
                        levels = c(0,1),
                        labels = c("Alone", "Group"))
HappyData$Subject<-as.factor(HappyData$Subject)
  • Let’s plot each person’s data (spaghetti plot)
theme_set(theme_bw(base_size = 12, base_family = "")) 

pab <-ggplot(data = HappyData, aes(x = TimeStep, y=HappyPercent, group=Subject))+
  facet_grid( ~ Social, labeller=label_both)+
  #  coord_cartesian(ylim = c(.03,.074))+ 
  geom_line(aes(color=Subject))+
  xlab("Time Step")+ylab("Happiness")
pab

  • Let’s plot each person’s data with the means plotted
##custom function to generate means and SE form any long format dataset. 
HappyDataPlot2<-summarySEwithin(HappyData,measurevar="HappyPercent", 
                               withinvars=c("Social","TimeStep"),
                               idvar="Subject", na.rm=FALSE, conf.interval=.95)
pab2 <-ggplot()+
  geom_line(data = HappyData, aes(x = as.factor(TimeStep), y =HappyPercent, group=Subject, color=Subject))+
  facet_grid(. ~ Social, labeller=label_both)+
  #  coord_cartesian(ylim = c(.03,.074))+ 
  geom_point(data = HappyDataPlot2, aes(x = as.factor(TimeStep), y =HappyPercent, group=Social))+
  geom_errorbar(data = HappyDataPlot2,aes(x = as.factor(TimeStep), y =HappyPercent, group=Social, ymin=HappyPercent-se, ymax=HappyPercent+se),colour="#6E6E6E",size=.25, width=.25)+
  xlab("Time Step")+ylab("Happiness")
pab2

  • Let’s examine an RM ANOVA results of this data
  • Remember time is treated like a factor (not a slope)
  • [Note: I ran the ANOVA like a GLM just so we can extract an overall \(R^2\) and get a fitted result for each subject]
#######Anova (using type 3 sum of squares same as default in SPSS/SAS)
Anova.Model1 <-aov_car(HappyPercent ~ Social*TimeStep + Error(Subject/(Social*TimeStep)), 
                               data = HappyData, anova_table = list(correction = "none", MSE = FALSE))
Anova.Model1
###### General Linear Model (same as ANOVA, but uses type 1 sum of squares) 
GLM.model<-glm(HappyPercent ~ Social*TimeStep, data=HappyData)
#fit data
HappyData$GLMfitted<-fitted(GLM.model)
#manually extract R2 to compare to mixed models
summary(lm(HappyPercent ~ GLMfitted, data=HappyData))$r.squared
## Anova Table (Type 3 tests)
## 
## Response: HappyPercent
##            Effect    df       F  ges p.value
## 1          Social  1, 5    2.36 .183    .185
## 2        TimeStep 4, 20 6.91 ** .195    .001
## 3 Social:TimeStep 4, 20    1.11 .018    .381
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '+' 0.1 ' ' 1
## [1] 0.325839
  • We can fit the GLM/ANOVA estimate for every single subject
  • How is our ANOVA doing in predicting each subject?
  • ANOVA is not predicting subject level; it is testing/predicting at the average level of the data
  • However, if we have longitudinal-like data (like in this study) why not make predictions at the subject-level?
FittedGlmPlot1 <-ggplot()+
  facet_grid(Subject ~ Social, labeller=label_both)+
  geom_line(data = HappyData, aes(x = TimeStep, y =GLMfitted))+
  geom_point(data = HappyData, aes(x = TimeStep, y =HappyPercent, group=Subject,colour = Subject),size=3)+
  xlab("Time Step")+ylab("Happiness")
FittedGlmPlot1

Random Effect Modeling: Fitting at different levels

  • Linear regression assumes one measurement per subject, but what is you have multiple measurements per subject?
  • If we have longitudinal data per subject we could estimate a slope per subject!
  • We could then extract the slopes estimates per subject [and do statistics on the slopes per subject]
  • But we lose the variances around those slopes! Such as below.
FittedlmPlot <-ggplot(data = HappyData, aes(x = TimeStep, y =HappyPercent, group=Subject))+
  facet_grid(Subject ~ Social, labeller=label_both)+
  geom_point(aes(colour = Subject), size=3)+
  geom_smooth(method = "lm",se=FALSE, color ="black", size=.5) +
  xlab("Time Step")+ylab("Happiness")
FittedlmPlot

Random intercepts

  • So we need to get a regression model to try to fit at the subject level
    • (1|Subject) means each subject can have their own intercept
    • REML=FALSE means fit the with maximum likelihood [like we did with GLM]
  • You can see the random, fixed terms, and correlations between the fixed terms
Model.1<-lmer(HappyPercent ~Social*TimeStep
                   +(1|Subject),  
                   data=HappyData, REML=FALSE)
summary(Model.1, correlations=FALSE)
## Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
##   method [lmerModLmerTest]
## Formula: HappyPercent ~ Social * TimeStep + (1 | Subject)
##    Data: HappyData
## 
##      AIC      BIC   logLik deviance df.resid 
##    -48.5    -36.0     30.3    -60.5       54 
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.01279 -0.74343  0.06896  0.56894  2.52851 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  Subject  (Intercept) 0.004231 0.06505 
##  Residual             0.018986 0.13779 
## Number of obs: 60, groups:  Subject, 6
## 
## Fixed effects:
##                      Estimate Std. Error       df t value Pr(>|t|)    
## (Intercept)           0.30250    0.05103 30.77293   5.928 1.55e-06 ***
## SocialGroup          -0.08583    0.06162 54.00000  -1.393 0.169358    
## TimeStep              0.06750    0.01779 54.00000   3.795 0.000376 ***
## SocialGroup:TimeStep -0.02917    0.02516 54.00000  -1.159 0.251401    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) SclGrp TimStp
## SocialGroup -0.604              
## TimeStep    -0.697  0.577       
## SclGrp:TmSt  0.493 -0.816 -0.707
  • Next, we examine the intercepts for each person
  • The var and sd of these values are what was seen in the table above
ranef(Model.1)
## $Subject
##   (Intercept)
## 1 -0.01064173
## 2  0.07909391
## 3 -0.06586366
## 4  0.06528843
## 5 -0.04515543
## 6 -0.02272152
## 
## with conditional variances for "Subject"
  • We can examine an \(R^2\) for the models, but we have a problem
  • Is the \(R^2\) coming from the fixed or random effect? We can parse that:
  • The marginal is the \(R^2\) for the fixed effects
  • The conditional \(R^2\) includes random + fixed effects
  • There is no standard as to which to report \(R^2\) and most people never report these at all as if the random effects are complex (nested, or crossed) the meaning of them changes
library(performance)
r2_nakagawa(Model.1)
## # R2 for Mixed Models
## 
##   Conditional R2: 0.452
##      Marginal R2: 0.330
  • Let’s plot the model fit
HappyData$Model.1.fitted<-predict(Model.1)
FittedlmPlot1 <-ggplot()+
  facet_grid(Subject ~ Social, labeller=label_both)+
  geom_line(data = HappyData, aes(x = TimeStep, y =Model.1.fitted))+
  geom_point(data = HappyData, aes(x = TimeStep, y =HappyPercent, group=Subject,colour = Subject), size=3)+
  xlab("Time Step")+ylab("Happiness")
FittedlmPlot1

Random intercept and slopes for the social condition

  • Model 2 treats that Social condition as free to vary between Subjects, but accounts for the correlation within Subjects between the Social condition)
    • (1+Social|Subject) means each subject can have their intercept and social condition can vary as a function of the subject
  • The correlations of the random terms are the correlations of each subjects’ intercept and slope of social condition
Model.2<-lmer(HappyPercent ~Social*TimeStep
                       +(1+Social|Subject),  
                       data=HappyData, REML=FALSE)
summary(Model.2, correlations=FALSE)
ranef(Model.2)
## Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
##   method [lmerModLmerTest]
## Formula: HappyPercent ~ Social * TimeStep + (1 + Social | Subject)
##    Data: HappyData
## 
##      AIC      BIC   logLik deviance df.resid 
##    -80.6    -63.9     48.3    -96.6       52 
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -1.74976 -0.65978 -0.08771  0.53103  2.52126 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr 
##  Subject  (Intercept) 0.025365 0.15926       
##           SocialGroup 0.041054 0.20262  -0.94
##  Residual             0.007582 0.08708       
## Number of obs: 60, groups:  Subject, 6
## 
## Fixed effects:
##                      Estimate Std. Error       df t value Pr(>|t|)    
## (Intercept)           0.30250    0.07061  7.41852   4.284  0.00318 ** 
## SocialGroup          -0.08583    0.09143  7.74635  -0.939  0.37617    
## TimeStep              0.06750    0.01124 48.00001   6.004 2.47e-07 ***
## SocialGroup:TimeStep -0.02917    0.01590 48.00001  -1.835  0.07277 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) SclGrp TimStp
## SocialGroup -0.899              
## TimeStep    -0.318  0.246       
## SclGrp:TmSt  0.225 -0.348 -0.707
## $Subject
##   (Intercept) SocialGroup
## 1 -0.13319227  0.22090410
## 2  0.20795560 -0.21223660
## 3 -0.01883687 -0.10541513
## 4  0.19986403 -0.22675775
## 5 -0.19743771  0.26507163
## 6 -0.05835279  0.05843375
## 
## with conditional variances for "Subject"
  • The \(R^2\) for fixed and random
r2_nakagawa(Model.2)
## # R2 for Mixed Models
## 
##   Conditional R2: 0.781
##      Marginal R2: 0.330
  • Plot the model results
HappyData$Model.2.fitted<-predict(Model.2)
FittedlmPlot2 <-ggplot()+
  facet_grid(Subject ~ Social, labeller=label_both)+
  geom_line(data = HappyData, aes(x = TimeStep, y=Model.2.fitted))+
  geom_point(data = HappyData, aes(x = TimeStep, y =HappyPercent, group=Subject,colour = Subject), size=3)+
  #  coord_cartesian(ylim = c(.03,.074))+ 
  xlab("Time Step")+ylab("Happiness")
FittedlmPlot2

Random intercepts and slopes for time

  • Model 3 treats that timesteps as free to vary between Subjects, but accounts for the correlation within Subjects across time
    • (1+TimeStep|Subject) means each subject can have their intercept and time slope can vary as a function of the subject
  • The correlations of the random terms are the correlations of each subjects’ intercept and slope of time
Model.3<-lmer(HappyPercent ~Social*TimeStep
                       +(1+TimeStep|Subject),  
                       data=HappyData, REML=FALSE)
summary(Model.3, correlations=FALSE)
ranef(Model.3)
## Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
##   method [lmerModLmerTest]
## Formula: HappyPercent ~ Social * TimeStep + (1 + TimeStep | Subject)
##    Data: HappyData
## 
##      AIC      BIC   logLik deviance df.resid 
##    -50.7    -34.0     33.4    -66.7       52 
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -1.94803 -0.77296  0.08334  0.68945  2.02418 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr
##  Subject  (Intercept) 0.000000 0.00000      
##           TimeStep    0.001151 0.03393   NaN
##  Residual             0.016312 0.12772      
## Number of obs: 60, groups:  Subject, 6
## 
## Fixed effects:
##                      Estimate Std. Error       df t value Pr(>|t|)    
## (Intercept)           0.30250    0.04039 54.00000   7.490  6.6e-10 ***
## SocialGroup          -0.08583    0.05712 54.00000  -1.503  0.13873    
## TimeStep              0.06750    0.02153 20.23059   3.135  0.00517 ** 
## SocialGroup:TimeStep -0.02917    0.02332 54.00000  -1.251  0.21639    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) SclGrp TimStp
## SocialGroup -0.707              
## TimeStep    -0.625  0.442       
## SclGrp:TmSt  0.577 -0.816 -0.541
## optimizer (nloptwrap) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')
## 
## $Subject
##   (Intercept)    TimeStep
## 1           0 -0.01842559
## 2           0  0.04224404
## 3           0 -0.02786309
## 4           0  0.04359225
## 5           0 -0.01842559
## 6           0 -0.02112202
## 
## with conditional variances for "Subject"
  • The \(R^2\) for fixed and random
r2_nakagawa(Model.3)
## Random effect variances not available. Returned R2 does not account for random effects.
## # R2 for Mixed Models
## 
##   Conditional R2: NA
##      Marginal R2: 0.412
  • Plot the model results
HappyData$Model.3.fitted<-predict(Model.3)
FittedlmPlot3 <-ggplot()+
  facet_grid(Subject ~ Social, labeller=label_both)+
  geom_line(data = HappyData, aes(x = TimeStep, y =Model.3.fitted))+
  geom_point(data = HappyData, aes(x = TimeStep, y =HappyPercent, group=Subject,colour = Subject), size=3)+
  #  coord_cartesian(ylim = c(.03,.074))+ 
  xlab("Time Step")+ylab("Happiness")
FittedlmPlot3

Random intercepts and slopes for time and social (but independent)

  • Model 4 treats that timesteps and Social as free to vary between Subjects but accounts for the correlation within Subjects across times and social
    • (1+Social|Subject)+(0+TimeStep|Subject) means each subject can have their intercept and can have their slope of social condition
  • Also, time slope can vary as a function of the subject, but it treats the variance between social and time as independent
Model.4<-lmer(HappyPercent ~Social*TimeStep
                   +(1+Social|Subject)+(0+TimeStep|Subject),  
                   data=HappyData, REML=FALSE)
summary(Model.4, correlations=FALSE)
ranef(Model.4)
## Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
##   method [lmerModLmerTest]
## Formula: HappyPercent ~ Social * TimeStep + (1 + Social | Subject) + (0 +  
##     TimeStep | Subject)
##    Data: HappyData
## 
##      AIC      BIC   logLik deviance df.resid 
##   -104.5    -85.7     61.3   -122.5       51 
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -1.87709 -0.58050  0.03998  0.52475  2.22036 
## 
## Random effects:
##  Groups    Name        Variance Std.Dev. Corr 
##  Subject   (Intercept) 0.010881 0.10431       
##            SocialGroup 0.042748 0.20676  -0.88
##  Subject.1 TimeStep    0.001580 0.03975       
##  Residual              0.003363 0.05799       
## Number of obs: 60, groups:  Subject, 6
## 
## Fixed effects:
##                      Estimate Std. Error       df t value Pr(>|t|)    
## (Intercept)           0.30250    0.04637  6.68620   6.524 0.000396 ***
## SocialGroup          -0.08583    0.08830  6.74963  -0.972 0.364565    
## TimeStep              0.06750    0.01787  7.30670   3.777 0.006390 ** 
## SocialGroup:TimeStep -0.02917    0.01059 41.24318  -2.755 0.008699 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) SclGrp TimStp
## SocialGroup -0.854              
## TimeStep    -0.135  0.071       
## SclGrp:TmSt  0.228 -0.240 -0.296
## $Subject
##    (Intercept) SocialGroup     TimeStep
## 1 -0.057375747  0.23815690 -0.041860305
## 2  0.128676074 -0.20816844  0.042516614
## 3  0.005939829 -0.13327397 -0.013725144
## 4  0.091619257 -0.22893624  0.059397968
## 5 -0.176138708  0.27387216 -0.009377556
## 6  0.007279295  0.05834959 -0.036951577
## 
## with conditional variances for "Subject"
  • The \(R^2\) for fixed and random
r2_nakagawa(Model.4)
## # R2 for Mixed Models
## 
##   Conditional R2: 0.880
##      Marginal R2: 0.407
  • Plot the model results
HappyData$Model.4.fitted<-predict(Model.4)
FittedlmPlot4 <-ggplot()+
  facet_grid(Subject ~ Social, labeller=label_both)+
  geom_line(data = HappyData, aes(x = TimeStep, y =Model.4.fitted))+
  geom_point(data = HappyData, aes(x = TimeStep, y =HappyPercent, group=Subject,colour = Subject), size=3)+
  #  coord_cartesian(ylim = c(.03,.074))+ 
  xlab("Time Step")+ylab("Happiness")
FittedlmPlot4

Random intercepts and slopes for time and social (but not independent)

  • Model 5 treats that timesteps and Social as free to vary between Subjects but accounts for the correlation within Subjects across time and Social
    • (1+Social+ TimeStep|Subject) means each subject can have their intercept and can have their slope of social condition
  • Also, time slope can vary as a function of the subject, but it treats the variance between social and time as correlated
Model.5<-lmer(HappyPercent ~Social*TimeStep
                       +(1+Social+TimeStep|Subject),  
                       data=HappyData, REML=FALSE)
summary(Model.5, correlations=FALSE)
ranef(Model.5)
## Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
##   method [lmerModLmerTest]
## Formula: HappyPercent ~ Social * TimeStep + (1 + Social + TimeStep | Subject)
##    Data: HappyData
## 
##      AIC      BIC   logLik deviance df.resid 
##   -106.1    -83.1     64.1   -128.1       49 
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -1.75667 -0.64187  0.01123  0.46691  2.32345 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr       
##  Subject  (Intercept) 0.009981 0.09991             
##           SocialGroup 0.042755 0.20677  -0.86      
##           TimeStep    0.001701 0.04124   0.57 -0.79
##  Residual             0.003330 0.05771             
## Number of obs: 60, groups:  Subject, 6
## 
## Fixed effects:
##                      Estimate Std. Error       df t value Pr(>|t|)    
## (Intercept)           0.30250    0.04468  6.72430   6.770 0.000311 ***
## SocialGroup          -0.08583    0.08827  6.74363  -0.972 0.364436    
## TimeStep              0.06750    0.01841  7.10947   3.666 0.007791 ** 
## SocialGroup:TimeStep -0.02917    0.01054 41.99980  -2.768 0.008346 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) SclGrp TimStp
## SocialGroup -0.832              
## TimeStep     0.343 -0.620       
## SclGrp:TmSt  0.236 -0.239 -0.286
## $Subject
##    (Intercept) SocialGroup     TimeStep
## 1 -0.048404623  0.23920407 -0.046420455
## 2  0.127197739 -0.21291500  0.043738980
## 3 -0.007510835 -0.12257436 -0.008039099
## 4  0.088893308 -0.23639018  0.061465980
## 5 -0.166304142  0.26705599 -0.013706674
## 6  0.006128553  0.06561948 -0.037038731
## 
## with conditional variances for "Subject"
  • The \(R^2\) for fixed and random
r2_nakagawa(Model.5)
## # R2 for Mixed Models
## 
##   Conditional R2: 0.904
##      Marginal R2: 0.330
  • Plot the model results
HappyData$Model.5.fitted<-predict(Model.5)
FittedlmPlot5 <-ggplot()+
  facet_grid(Subject ~ Social, labeller=label_both)+
  geom_line(data = HappyData, aes(x = TimeStep, y =Model.5.fitted))+
  geom_point(data = HappyData, aes(x = TimeStep, y =HappyPercent, group=Subject,colour = Subject), size=3)+
  #  coord_cartesian(ylim = c(.03,.074))+ 
  xlab("Time Step")+ylab("Happiness")
FittedlmPlot5

Random intercepts and slopes for time x social

  • Model 6 treats that timesteps, Social and timesteps:Social (interaction) as free to vary between Subjects but accounts for the correlation within Subjects across time and Social
    • (1+Social*TimeStep|Subject) means (1+Social+TimeStep+Social:TimeStep|Subject) each subject can have their intercept, their slope of social condition, their time slope, and their interaction between social and time
  • Also, all random slopes and intercepts can be correlated
  • This is called the maximum model Barr et al., 2013 (these do not always fit best see Bates et al., 2015)
Model.6<-lmer(HappyPercent ~Social*TimeStep
                   +(1+Social*TimeStep|Subject),  
                   data=HappyData, REML=FALSE)
summary(Model.6, correlations=FALSE)
ranef(Model.6)
## Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
##   method [lmerModLmerTest]
## Formula: HappyPercent ~ Social * TimeStep + (1 + Social * TimeStep | Subject)
##    Data: HappyData
## 
##      AIC      BIC   logLik deviance df.resid 
##   -142.2   -110.8     86.1   -172.2       45 
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -1.92979 -0.57518  0.06785  0.52187  1.63727 
## 
## Random effects:
##  Groups   Name                 Variance Std.Dev. Corr             
##  Subject  (Intercept)          0.003024 0.05499                   
##           SocialGroup          0.011462 0.10706  -0.38            
##           TimeStep             0.004246 0.06516   0.46 -0.84      
##           SocialGroup:TimeStep 0.003061 0.05533  -0.79  0.84 -0.89
##  Residual                      0.001137 0.03372                   
## Number of obs: 60, groups:  Subject, 6
## 
## Fixed effects:
##                      Estimate Std. Error       df t value Pr(>|t|)    
## (Intercept)           0.30250    0.02485  6.10702  12.172 1.64e-05 ***
## SocialGroup          -0.08583    0.04624  6.05010  -1.856   0.1124    
## TimeStep              0.06750    0.02696  6.00466   2.504   0.0462 *  
## SocialGroup:TimeStep -0.02917    0.02341  6.02192  -1.246   0.2591    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) SclGrp TimStp
## SocialGroup -0.424              
## TimeStep     0.357 -0.753       
## SclGrp:TmSt -0.620  0.697 -0.875
## $Subject
##   (Intercept) SocialGroup    TimeStep SocialGroup:TimeStep
## 1  0.01347014  0.14914062 -0.07346568           0.04327375
## 2  0.06811478 -0.07643351  0.07762432          -0.07145605
## 3 -0.01596821 -0.11402999 -0.00324552          -0.01100070
## 4  0.01793845 -0.11310785  0.09555100          -0.05939808
## 5 -0.10556486  0.10825514 -0.04924558           0.08583846
## 6  0.02200970  0.04617559 -0.04721854           0.01274262
## 
## with conditional variances for "Subject"
  • The \(R^2\) for fixed and random
r2_nakagawa(Model.6)
## # R2 for Mixed Models
## 
##   Conditional R2: 0.967
##      Marginal R2: 0.330
  • Plot the model results
HappyData$Model.6.fitted<-predict(Model.6)
FittedlmPlot6 <-ggplot()+
  facet_grid(Subject ~ Social, labeller=label_both)+
  geom_line(data = HappyData, aes(x = TimeStep, y =Model.6.fitted))+
  geom_point(data = HappyData, aes(x = TimeStep, y =HappyPercent, group=Subject,colour = Subject), size=3)+
  #  coord_cartesian(ylim = c(.03,.074))+ 
  xlab("Time Step")+ylab("Happiness")
FittedlmPlot6

How to select a random structure

  • First, these models often fail to converge
  • Convergence mean that the optimizer (numerical method to find the parameters) has failed to reach an optimal or stable solution
  • Failures might not just be a numerical optimization problem, but a problem with your specification of the random effects
  • You could over-specify your random effects in a way that is not parsimonious with the true random effects in the population
    • You could set them up in such a way that do not reflect the experiment

Practical Solution to random structure

  • This is a battlefield in LMM
  • The current best practice is to try to match your design as best as you can
    • If that fails, try different optimizer
    • If that fails, reduce the complexity of the random effect (remove random correlations one at a time)
    • If that fails, follow the set of procedures here: http://www.alexanderdemos.org/Mixed8.html
  • If you under-specify, you inflate your type I error, so it a balancing act

Fixed effects

  • Just like your linear regression you can run these models hierarchically, once the random effects are in place
  • you just test between model fits using a deviance test (like we did with GLM) anova(model.1, model.2)

Plot the final model correctly

  • The method I used was to predict the data, but that works well if you have no covariates or a balanced design
  • If you have covariates, its best to use the effects package just like we did with linear regression
    • Also you have to decide to plot both fixed + random or just the fixed effects
  • Most people only plot the fixed effect, as they want to control for the random effects
  • However, if someone asks you to plot the data and the model fits together, you will need to tell them it complicated as we saw last week.
  • Below is how to plot fixed only
library(effects)
Final.Fixed<-effect(c("Social*TimeStep"), Model.6,
                           xlevels=list(TimeStep=seq(0,4,1)))

# You have to convert the output to a dataframe
Final.Fixed<-as.data.frame(Final.Fixed)

Final.Fixed.Plot <-ggplot(data = Final.Fixed, aes(x = TimeStep, y =fit, group=Social))+
  coord_cartesian(xlim=c(0,4),ylim = c(0,.7))+ 
  geom_line(aes(color=Social), size=2)+
  geom_ribbon(aes(ymin=fit-se, ymax=fit+se,fill=Social),alpha=.2)+
  xlab("Time")+
  ylab("Happiness\nscore")+
  scale_color_manual(values=c("blue", "red"))+
  scale_fill_manual(values=c("blue", "red"))+
  theme_bw()+
  theme(text=element_text(face="bold", size=12),
        panel.grid.major = element_blank(), 
        panel.grid.minor = element_blank(), 
        panel.border = element_rect(fill = NA, colour = "NA"),
        axis.line = element_line(size = 1, colour = "grey80"),
        legend.title=element_blank(),
        legend.position = c(.2, .92))
Final.Fixed.Plot

Problem with P-values

  • What gives mixed-effects models their ability to handle complex designs (e.g., nested, crossed, nested & crossed, partially nested and crossed) is in part what does not allow them to calculate p-values
  • Remember, \(t = \frac{M-\mu}{SE}\), where, \(SE = \sqrt\frac{\sigma^2}{n}\), \(df = n-1\)
  • GLM uses standard error to determine significance: Required degrees of freedom to correct z-distribution for “sampling error” to produce \(t\) and \(F\) tables.
  • But mixed models do not actually produce \(t\) or \(t^2=F\) values as they are not calculating the \(SE\) using the equation above, which uses observed - expected [mean square = \(\sigma^2 = \frac{SS}{N-1}\), where \(SS = (M-\mu)^2\)
  • Instead, they estimate the \(SE\) using ML or REML function
  • ML is a way of finding the value of one or more parameters in mixed model for a given statistic which makes the known likelihood distribution a maximum
    • likelihood is “a hypothetical probability that an event that has already occurred would yield a specific outcome” (http://mathworld.wolfram.com/Likelihood.html)
    • maximum is the largest value the function can produce
    • We will iterate through parameters until we maximize our likelihood
    • There are different ML (or just L) functions that can be used and they can apply to most distributions
  • Problem 1: SE is not the one used in \(t\) and \(F\) (it does not use N)
  • Problem 2: The estimate we get is not via OLS either, it’s via ML or REML (so what is the proper distribution to test against?)
  • Problem 3: What is our \(DF\) even if we want to assume \(t\) and \(F\)?

Problem 3: DF Conceptual Issues

  • Sample 500, 11the graders take the SAT from 3 schools partially nested in 2 economic districts

  • Random sampling of 500 total from 3 schools [Note: school 3 is twice as large school 1 and 2]
  • ANOVA Design: 1 factor: School OR district [they provide an unbalanced design so they can only be analyzed one at time]
  • Assume homogeneity of variance of SAT score across schools and districts
  • ANOVA can only ask 1 basic question: Are the means between the schools or districts different

ANOVA problem?

  • Why is ANOVA bad here?
    • School 1 is predominately poor vs School 2, which is rich, so school 1 has less funding, larger classrooms sizes, more social issues at home and in school, fewer teachers, etc.
  • This could impact both mean test scores and variance of test scores
    • Students from the poor district might have more diverse scores [high variance]
    • Student from rich district might have more similar scores, but higher average scores [low variance]
    • Poor students in the Rich school might also have changes in variance or mean test scores, which look very different from the poor or mixed school.

Mixed vs. ANOVA Solution

ANOVA Logic

  • If we assume ANOVA that would mean we calculate mean/variance relative to their classification based on school and district [at the level we are examining only]
  • In the case when all the variables are categorical, the DF “seems” logical, in that the proper MSerror terms could be calculated. However, in the mixed models, the random factors have been nested (kids in schools in districts).
    • So we cannot know how many of the people went into parsing the variance for each level of the random error term.
      • We could make assumptions, but some statisticians say over their dead bodies.
    • ANOVA is assumed to be robust against many violations, but that’s mostly only true in well-controlled balanced designs.
    • When ANOVA is stretched into the mixed model territory, it can result in unreasonable levels of type 1 error.

Mixed Logic

    1. Fixed factors: School and District [student belongs to]
    • SAT score MEANS could be different depending on the school a kid goes to and what district they are from.
    1. Random Factors: District, School, [Student]
    • SAT score VARIANCES could be different depending on the school a kid goes to and what district they are from.
    • So are the kids the DF, or the School or the district? What do with kids who are mismatched (poor but in richer schools or vice versa?)
    1. Random Slopes: The impact of SES could be different depending on the school a kid goes to and what district they are from,
    • In the rich school, the SES of the student might not matter as much as the poor or mixed school
      • When we add random slopes, how many measurements of the effect of SES are we making - is it based on the number of students, schools or districts?
  • Which level do we count the degrees of freedom for each fixed and random factor? Students, schools, or districts? How do we count DF for random factors?

Getting P-values

Path A: Don’t push your assumptions on me!

  1. Modeling Fitting testing
  • Likelihood ratio test [chi-square distribution]
  • D = 2 - ln (Likelihood of null model / Likelihood of alternative model]
  • Add/Remove fixed factors and test to see if the model “fits the data better”
  • Suggested by Bates and Bolker
  • FAST and well-accepted [a bit anti-conservative some argue]
    • To do this, you simply have to anova(model.1, model.2)
  1. Bootstrapping
  • Non-parametric: Re-sampling method where you resample with replacement from your own data
    • Common method in modern stats: A much better method then assuming normality and t-distributions
    • Example: You build your final model and you then take the subjects, resample them with replacement, and refit your model
      • You do this over and over again to build a distribution for each parameter in your model
        • You than calculate the 95th percentile and see if contains zero, if not you have a significant effect
  • Semi-parametric: Adds noise to the Non-parametric (assumes your measurements are randomly sampled from all possible measurements)
    • So you sample from your data, add parametric noise, rinse and repeat like above
  • Parametric: You assume your data follows a specific distribution, you use your model to estimate the parameters, and then you simulate the study based on those parameters to estimate the CI. You than calculate the 95th percentile and see if it contains zero.
  • So far we can only do Semi-parametric & Parametric because of the random effects
    • However, Semi-parametric is not fully implemented yet
  • VERY, VERY SLOW.
  • This procedure can fail because your model is unstable (probably means your model is bad anyway)
  • You can also bootstrap Model fit testing (95th percentile method)
# parametric
Model.Final.CI.boot <- confint(Model.6, method="boot",nsim=100,boot.type = c("perc")) 
Model.Final.CI.boot
##                            2.5 %      97.5 %
## .sig01                0.01043755  0.08564062
## .sig02               -1.00000000  0.49578085
## .sig03               -0.56243568  0.95713822
## .sig04               -0.99609592  0.12063845
## .sig05                0.02587011  0.16581080
## .sig06               -0.99497337 -0.01520546
## .sig07                0.27728402  0.99418700
## .sig08                0.02181994  0.09209837
## .sig09               -0.99712452 -0.42267082
## .sig10                0.01789321  0.08145823
## .sigma                0.02395331  0.03927647
## (Intercept)           0.25820491  0.35155433
## SocialGroup          -0.18889284 -0.01249034
## TimeStep              0.01166325  0.11785430
## SocialGroup:TimeStep -0.07574074  0.01093074

Path B: Let’s hold hands, put our fingers in our ears, and assume!

  1. Assume z distributions to get p-values
  • Very, very common approach
  • If you number of observations is high, just assume DF -> inf [so \(t > |2|\), is significant]
    • This is possible because you assume your random effects have controlled for the random variations in the populations and your values good estimates of the population
  • Simulation studies by Barr et al. show this to be a reasonable assumption IF and ONLY OF random structure is MAXIMAL
    • This is HOTLY debated by Bates
  1. Estimate degrees of freedom
  • SAS/SPSS uses Satterthwaite approximations
    • However, there are also Kenward-Roger approximations [see Westfall, Kenny, & Judd, 2014]
    • Either is acceptable, but KR is recommended (but does not always work)
  • Makes a number of assumptions, will mirror results of traditional ANOVA fairly closely [as you would expect when you make similar assumptions about DF]
    • This method can also fail to work because your model is unstable/convergence problems

[Note: You have to refit the model with the lmerTest package]

library(lmerTest) #mixed model add-on package to 
Model.Final<-lmer(HappyPercent ~Social*TimeStep
                  +(1+Social*TimeStep|Subject),  
                  data=HappyData, REML=FALSE)
summary(Model.Final, ddf="Satterthwaite") ## SPSS/SAS method

Model.Final.REML<-lmer(HappyPercent ~Social*TimeStep
                  +(1+Social*TimeStep|Subject),  
                  data=HappyData, REML=TRUE)
summary(Model.Final.REML,ddf = "Kenward-Roger") # Kenney et al suggested method
## Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
##   method [lmerModLmerTest]
## Formula: HappyPercent ~ Social * TimeStep + (1 + Social * TimeStep | Subject)
##    Data: HappyData
## 
##      AIC      BIC   logLik deviance df.resid 
##   -142.2   -110.8     86.1   -172.2       45 
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -1.92979 -0.57518  0.06785  0.52187  1.63727 
## 
## Random effects:
##  Groups   Name                 Variance Std.Dev. Corr             
##  Subject  (Intercept)          0.003024 0.05499                   
##           SocialGroup          0.011462 0.10706  -0.38            
##           TimeStep             0.004246 0.06516   0.46 -0.84      
##           SocialGroup:TimeStep 0.003061 0.05533  -0.79  0.84 -0.89
##  Residual                      0.001137 0.03372                   
## Number of obs: 60, groups:  Subject, 6
## 
## Fixed effects:
##                      Estimate Std. Error       df t value Pr(>|t|)    
## (Intercept)           0.30250    0.02485  6.10702  12.172 1.64e-05 ***
## SocialGroup          -0.08583    0.04624  6.05010  -1.856   0.1124    
## TimeStep              0.06750    0.02696  6.00466   2.504   0.0462 *  
## SocialGroup:TimeStep -0.02917    0.02341  6.02192  -1.246   0.2591    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) SclGrp TimStp
## SocialGroup -0.424              
## TimeStep     0.357 -0.753       
## SclGrp:TmSt -0.620  0.697 -0.875
## Linear mixed model fit by REML. t-tests use Kenward-Roger's method [
## lmerModLmerTest]
## Formula: HappyPercent ~ Social * TimeStep + (1 + Social * TimeStep | Subject)
##    Data: HappyData
## 
## REML criterion at convergence: -148.4
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -1.91088 -0.57147  0.06476  0.51788  1.60624 
## 
## Random effects:
##  Groups   Name                 Variance Std.Dev. Corr             
##  Subject  (Intercept)          0.003666 0.06054                   
##           SocialGroup          0.013791 0.11743  -0.38            
##           TimeStep             0.005104 0.07144   0.46 -0.84      
##           SocialGroup:TimeStep 0.003682 0.06068  -0.78  0.84 -0.89
##  Residual                      0.001165 0.03413                   
## Number of obs: 60, groups:  Subject, 6
## 
## Fixed effects:
##                      Estimate Std. Error       df t value Pr(>|t|)    
## (Intercept)           0.30250    0.02697  5.00000  11.216 9.84e-05 ***
## SocialGroup          -0.08583    0.05031  5.00000  -1.706   0.1487    
## TimeStep              0.06750    0.02950  5.00000   2.288   0.0708 .  
## SocialGroup:TimeStep -0.02917    0.02554  5.00000  -1.142   0.3052    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) SclGrp TimStp
## SocialGroup -0.415              
## TimeStep     0.367 -0.764       
## SclGrp:TmSt -0.641  0.715 -0.876
## optimizer (nloptwrap) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')

Path C: Go Bayesian

  • There use to be a way to get pvalues based on Markov-chain Monte-Carlo methods (MCMC)
    • Currently does not work for lme4 package we are using
  • You have to refit the model as Bayesian using rstanarm and get CIs (see Gelman & Hill textbook)

Practical solution

  • Practically, when you add factors in stepwise and model fit shows a significant improvement (Path 1A) you will see the new factors has a \(t > |2|\). (Path 2A)
  • That does not always agree with bootstrapping (Path 1B) or ANOVA-like degrees of freedom (Path 2B)
    • This is because LME4 package we are using does not allow us to control for the heterogeneous variance problem. The older package LME will allow you to control for it.
      • Note LME uses Satterthwaite approximations and will always give you DF and pvalues
    • Also, there could be other problems with your assumptions that bootstrapping is picking up on the other paths will not pick up
  • Recommendation: Based on current trends:
    1. Always do Path 1A to say the model is better
    2. To test the parameters, add Path 1B, Path 2A or 2B.
      • I usually pick 1B or 2A
      • But some fields might demand DF, so you have to do 2B

Mixed to ANOVA

  • If you wanted to report the main effects and interactions, instead of slopes, you might convert your mixed model into an ANOVA
  • But remember, your slopes of time revert to being factors and not continuous variables
anova(Model.Final.REML,ddf = "Kenward-Roger") 
## Type III Analysis of Variance Table with Kenward-Roger's method
##                    Sum Sq   Mean Sq NumDF DenDF F value  Pr(>F)  
## Social          0.0033897 0.0033897     1     5  2.9104 0.14873  
## TimeStep        0.0087322 0.0087322     1     5  7.4975 0.04088 *
## Social:TimeStep 0.0015187 0.0015187     1     5  1.3039 0.30522  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
---
title: 'Linear Mixed Models'
output:
  html_document:
    code_download: yes
    fontsize: 8pt
    highlight: textmate
    number_sections: no
    theme: flatly
    toc: yes
    toc_float:
      collapsed: no
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
knitr::opts_chunk$set(message = FALSE)
knitr::opts_chunk$set(warning =  FALSE)
knitr::opts_chunk$set(fig.width=5.25)
knitr::opts_chunk$set(fig.height=5.0)
knitr::opts_chunk$set(fig.align='center') #Set default figure
knitr::opts_chunk$set(fig.show = "hold") #Set default figure
knitr::opts_chunk$set(results = "hold") 
```

```{r, echo=FALSE, warning=FALSE}
setwd("C:/Users/apd21/Dropbox/AlexFiles/Website Dec/Mixed")
```

# Mixed Models
- Multiple regressions only have fixed-effects [continuous] or fixed-factors [groups]: variables of interest (often your manipulation)
- The only random factor assumed is your subject's response
    - What if you have other effects or factors that might add random variation? 
- Random-effects/factors: Levels randomly sampled from a much larger population: Something you want to generalize across: Subjects, Words, Pictures, etc

## Type of Mixed Models
- We will explore crossed designs

### Nested designs
- Problem: Clustering means each student's scores within each classroom might be correlated because they all have the same teacher (for example)
    - Multilevel/Hierarchical designs: Students nested in classrooms [Clustered data]

### Crossed designs
- Problem: Each person's response is going to inter-correlated with their other responses 
    - Repeated measures: everyone does all the conditions
    - Longitudinal data: People are tracked over time
        - Growth Curve Models: Fitting functions (like polynomials) to behavior over time 

### Crossed & Nested designs
- Now each person's response is going to inter-correlated with their other responses, but that correlation might differ by which group they belong too 
    - Longitudinal Multilevel Designs: Students nested in classrooms measured over many months
    - Partially Nested design: Different groups are measures over time
 
#### Inter-Correlations 
- All these correlations inflate your type I error if you don’t treat the data correctly

# Practice data to make sense of random effects 
- Hypothetical Experiment, *n* = 6
- DV: % of how happy they are 
- IV1: Time of measurement: 5 time windows
- IV2: Social Setting: Alone vs With a group

## As 2x5 RM ANOVA
- Treats Time-Steps as a factor 
- Are there any differences between any time-steps?
- Not slopes (predictable changes in time)
- Treats Social as a factor
- Is there a difference between Alone and With Group
- Questions: Is there an effect of time and Social? 

```{r}
library(lme4)     #mixed model package by Douglas Bates
library(afex)     #easy ANOVA package 
library(ggplot2)  #GGplot package for visualizing data
source("HelperFunctions.R") # some custom functions
###Read in Data file
HappyData <- read.csv("Mixed/HappyStudy.csv", header=T)
### Label variables
HappyData$Social <- factor(HappyData$SocialDummy,
                        levels = c(0,1),
                        labels = c("Alone", "Group"))
HappyData$Subject<-as.factor(HappyData$Subject)
```

- Let's plot each person's data (spaghetti plot)

```{r}
theme_set(theme_bw(base_size = 12, base_family = "")) 

pab <-ggplot(data = HappyData, aes(x = TimeStep, y=HappyPercent, group=Subject))+
  facet_grid( ~ Social, labeller=label_both)+
  #  coord_cartesian(ylim = c(.03,.074))+ 
  geom_line(aes(color=Subject))+
  xlab("Time Step")+ylab("Happiness")
pab
```

- Let's plot each person's data with the means plotted

```{r}
##custom function to generate means and SE form any long format dataset. 
HappyDataPlot2<-summarySEwithin(HappyData,measurevar="HappyPercent", 
                               withinvars=c("Social","TimeStep"),
                               idvar="Subject", na.rm=FALSE, conf.interval=.95)
pab2 <-ggplot()+
  geom_line(data = HappyData, aes(x = as.factor(TimeStep), y =HappyPercent, group=Subject, color=Subject))+
  facet_grid(. ~ Social, labeller=label_both)+
  #  coord_cartesian(ylim = c(.03,.074))+ 
  geom_point(data = HappyDataPlot2, aes(x = as.factor(TimeStep), y =HappyPercent, group=Social))+
  geom_errorbar(data = HappyDataPlot2,aes(x = as.factor(TimeStep), y =HappyPercent, group=Social, ymin=HappyPercent-se, ymax=HappyPercent+se),colour="#6E6E6E",size=.25, width=.25)+
  xlab("Time Step")+ylab("Happiness")
pab2
```

- Let's examine an RM ANOVA results of this data
- Remember time is treated like a factor (not a slope)
- [Note: I ran the ANOVA like a GLM just so we can extract an overall $R^2$ and get a fitted result for each subject]

```{r}
#######Anova (using type 3 sum of squares same as default in SPSS/SAS)
Anova.Model1 <-aov_car(HappyPercent ~ Social*TimeStep + Error(Subject/(Social*TimeStep)), 
                               data = HappyData, anova_table = list(correction = "none", MSE = FALSE))
Anova.Model1
###### General Linear Model (same as ANOVA, but uses type 1 sum of squares) 
GLM.model<-glm(HappyPercent ~ Social*TimeStep, data=HappyData)
#fit data
HappyData$GLMfitted<-fitted(GLM.model)
#manually extract R2 to compare to mixed models
summary(lm(HappyPercent ~ GLMfitted, data=HappyData))$r.squared
```

- We can fit the GLM/ANOVA estimate for every single subject
- How is our ANOVA doing in predicting each subject? 
- ANOVA is not predicting subject level; it is testing/predicting at the average level of the data
- However, if we have longitudinal-like data (like in this study) why not make predictions at the subject-level? 

```{r}
FittedGlmPlot1 <-ggplot()+
  facet_grid(Subject ~ Social, labeller=label_both)+
  geom_line(data = HappyData, aes(x = TimeStep, y =GLMfitted))+
  geom_point(data = HappyData, aes(x = TimeStep, y =HappyPercent, group=Subject,colour = Subject),size=3)+
  xlab("Time Step")+ylab("Happiness")
FittedGlmPlot1
```

# Random Effect Modeling: Fitting at different levels
- Linear regression assumes one measurement per subject, but what is you have multiple measurements per subject? 
- If we have longitudinal data per subject we could estimate a slope per subject!
- We could then extract the slopes estimates per subject [and do statistics on the slopes per subject]
- But we lose the variances around those slopes! Such as below.

```{r}
FittedlmPlot <-ggplot(data = HappyData, aes(x = TimeStep, y =HappyPercent, group=Subject))+
  facet_grid(Subject ~ Social, labeller=label_both)+
  geom_point(aes(colour = Subject), size=3)+
  geom_smooth(method = "lm",se=FALSE, color ="black", size=.5) +
  xlab("Time Step")+ylab("Happiness")
FittedlmPlot
```

## Random intercepts
- So we need to get a regression model to try to fit at the subject level
    - `(1|Subject)` means each subject can have their own intercept
    - `REML=FALSE` means fit the with maximum likelihood [like we did with GLM]
- You can see the random, fixed terms, and correlations between the fixed terms 

```{r}
Model.1<-lmer(HappyPercent ~Social*TimeStep
                   +(1|Subject),  
                   data=HappyData, REML=FALSE)
summary(Model.1, correlations=FALSE)
```

- Next, we examine the intercepts for each person
- The var and sd of these values are what was seen in the table above

```{r}
ranef(Model.1)
```

- We can examine an $R^2$ for the models, but we have a problem
- Is the $R^2$ coming from the fixed or random effect? We can parse that:
- The marginal is the $R^2$ for the fixed effects
- The conditional $R^2$ includes random + fixed effects
- There is no standard as to which to report $R^2$ and most people never report these at all as if the random effects are complex (nested, or crossed) the meaning of them changes

```{r}
library(performance)
r2_nakagawa(Model.1)
```

- Let's plot the model fit

```{r}
HappyData$Model.1.fitted<-predict(Model.1)
FittedlmPlot1 <-ggplot()+
  facet_grid(Subject ~ Social, labeller=label_both)+
  geom_line(data = HappyData, aes(x = TimeStep, y =Model.1.fitted))+
  geom_point(data = HappyData, aes(x = TimeStep, y =HappyPercent, group=Subject,colour = Subject), size=3)+
  xlab("Time Step")+ylab("Happiness")
FittedlmPlot1
```

## Random intercept and slopes for the social condition 
- Model 2 treats that Social condition as free to vary between Subjects, but accounts for the correlation within Subjects between the Social condition)
    - `(1+Social|Subject)` means each subject can have their intercept and social condition can vary as a function of the subject
- The correlations of the random terms are the correlations of each subjects’ intercept and slope of social condition

```{r}
Model.2<-lmer(HappyPercent ~Social*TimeStep
                       +(1+Social|Subject),  
                       data=HappyData, REML=FALSE)
summary(Model.2, correlations=FALSE)
ranef(Model.2)
```

- The $R^2$ for fixed and random

```{r}
r2_nakagawa(Model.2)
```

- Plot the model results

```{r}
HappyData$Model.2.fitted<-predict(Model.2)
FittedlmPlot2 <-ggplot()+
  facet_grid(Subject ~ Social, labeller=label_both)+
  geom_line(data = HappyData, aes(x = TimeStep, y=Model.2.fitted))+
  geom_point(data = HappyData, aes(x = TimeStep, y =HappyPercent, group=Subject,colour = Subject), size=3)+
  #  coord_cartesian(ylim = c(.03,.074))+ 
  xlab("Time Step")+ylab("Happiness")
FittedlmPlot2
```


## Random intercepts and slopes for time  
- Model 3 treats that timesteps as free to vary between Subjects, but accounts for the correlation within Subjects across time
    - `(1+TimeStep|Subject)` means each subject can have their intercept and time slope can vary as a function of the subject
- The correlations of the random terms are the correlations of each subjects’ intercept and slope of time 

```{r}
Model.3<-lmer(HappyPercent ~Social*TimeStep
                       +(1+TimeStep|Subject),  
                       data=HappyData, REML=FALSE)
summary(Model.3, correlations=FALSE)
ranef(Model.3)
```

- The $R^2$ for fixed and random

```{r}
r2_nakagawa(Model.3)
```

- Plot the model results

```{r}
HappyData$Model.3.fitted<-predict(Model.3)
FittedlmPlot3 <-ggplot()+
  facet_grid(Subject ~ Social, labeller=label_both)+
  geom_line(data = HappyData, aes(x = TimeStep, y =Model.3.fitted))+
  geom_point(data = HappyData, aes(x = TimeStep, y =HappyPercent, group=Subject,colour = Subject), size=3)+
  #  coord_cartesian(ylim = c(.03,.074))+ 
  xlab("Time Step")+ylab("Happiness")
FittedlmPlot3
```


## Random intercepts and slopes for time and social (but independent)  
- Model 4 treats that timesteps and Social as free to vary between Subjects but accounts for the correlation within Subjects across times and social
    - `(1+Social|Subject)+(0+TimeStep|Subject)` means each subject can have their intercept and can have their slope of social condition
- Also, time slope can vary as a function of the subject, but it treats the variance between social and time as **independent**

```{r}
Model.4<-lmer(HappyPercent ~Social*TimeStep
                   +(1+Social|Subject)+(0+TimeStep|Subject),  
                   data=HappyData, REML=FALSE)
summary(Model.4, correlations=FALSE)
ranef(Model.4)
```

- The $R^2$ for fixed and random

```{r}
r2_nakagawa(Model.4)
```

- Plot the model results

```{r}
HappyData$Model.4.fitted<-predict(Model.4)
FittedlmPlot4 <-ggplot()+
  facet_grid(Subject ~ Social, labeller=label_both)+
  geom_line(data = HappyData, aes(x = TimeStep, y =Model.4.fitted))+
  geom_point(data = HappyData, aes(x = TimeStep, y =HappyPercent, group=Subject,colour = Subject), size=3)+
  #  coord_cartesian(ylim = c(.03,.074))+ 
  xlab("Time Step")+ylab("Happiness")
FittedlmPlot4
```

## Random intercepts and slopes for time and social (but not independent) 
- Model 5 treats that timesteps and Social as free to vary between Subjects but accounts for the correlation within Subjects across time and Social
    - `(1+Social+ TimeStep|Subject)` means each subject can have their intercept and can have their slope of social condition
- Also, time slope can vary as a function of the subject, but it treats the variance between social and time as **correlated**

```{r}
Model.5<-lmer(HappyPercent ~Social*TimeStep
                       +(1+Social+TimeStep|Subject),  
                       data=HappyData, REML=FALSE)
summary(Model.5, correlations=FALSE)
ranef(Model.5)
```

- The $R^2$ for fixed and random

```{r}
r2_nakagawa(Model.5)
```

- Plot the model results

```{r}
HappyData$Model.5.fitted<-predict(Model.5)
FittedlmPlot5 <-ggplot()+
  facet_grid(Subject ~ Social, labeller=label_both)+
  geom_line(data = HappyData, aes(x = TimeStep, y =Model.5.fitted))+
  geom_point(data = HappyData, aes(x = TimeStep, y =HappyPercent, group=Subject,colour = Subject), size=3)+
  #  coord_cartesian(ylim = c(.03,.074))+ 
  xlab("Time Step")+ylab("Happiness")
FittedlmPlot5
```


## Random intercepts and slopes for time x social
- Model 6 treats that timesteps, Social and timesteps:Social (interaction) as free to vary between Subjects but accounts for the correlation within Subjects across time and Social
    - `(1+Social*TimeStep|Subject)` means `(1+Social+TimeStep+Social:TimeStep|Subject)` each subject can have their intercept, their slope of social condition, their time slope, and their interaction between social and time
- Also, all random slopes and intercepts can be **correlated**
- This is called the *maximum model* Barr et al., 2013 (these do not always fit best see Bates et al., 2015)

```{r}
Model.6<-lmer(HappyPercent ~Social*TimeStep
                   +(1+Social*TimeStep|Subject),  
                   data=HappyData, REML=FALSE)
summary(Model.6, correlations=FALSE)
ranef(Model.6)
```

- The $R^2$ for fixed and random
```{r}
r2_nakagawa(Model.6)
```

- Plot the model results

```{r}
HappyData$Model.6.fitted<-predict(Model.6)
FittedlmPlot6 <-ggplot()+
  facet_grid(Subject ~ Social, labeller=label_both)+
  geom_line(data = HappyData, aes(x = TimeStep, y =Model.6.fitted))+
  geom_point(data = HappyData, aes(x = TimeStep, y =HappyPercent, group=Subject,colour = Subject), size=3)+
  #  coord_cartesian(ylim = c(.03,.074))+ 
  xlab("Time Step")+ylab("Happiness")
FittedlmPlot6
```

## How to select a random structure
- First, these models often fail to converge
- Convergence mean that the **optimizer** (numerical method to find the parameters) has failed to reach an optimal or stable solution
    - There are many optimizers to select from in R if you need to change it (https://rstudio-pubs-static.s3.amazonaws.com/33653_57fc7b8e5d484c909b615d8633c01d51.html)
- Failures might not just be a numerical optimization problem, but a problem with your specification of the random effects 
- You could over-specify your random effects in a way that is not *parsimonious* with the true random effects in the population
    - You could set them up in such a way that do not reflect the experiment 

### Practical Solution to random structure
- This is a battlefield in LMM
- The current best practice is to try to match your design as best as you can
    - If that fails, try different optimizer
    - If that fails, reduce the complexity of the random effect (remove random correlations one at a time)
    - If that fails, follow the set of procedures here: http://www.alexanderdemos.org/Mixed8.html
- If you under-specify, you inflate your type I error, so it a balancing act


# Fixed effects
- Just like your linear regression you can run these models hierarchically, once the random effects are in place
- you just test between model fits using a deviance test (like we did with GLM) `anova(model.1, model.2)`

## Plot the final model correctly
- The method I used was to predict the data, but that works well if you have no covariates or a balanced design
- If you have covariates, its best to use the effects package just like we did with linear regression
    - Also you have to decide to plot both fixed + random or just the fixed effects
- Most people only plot the fixed effect, as they want to control for the random effects 
- However, if someone asks you to plot the data and the model fits together, you will need to tell them it complicated as we saw last week. 
- Below is how to plot fixed only 

```{r,fig.height=4,fig.width=4}
library(effects)
Final.Fixed<-effect(c("Social*TimeStep"), Model.6,
                           xlevels=list(TimeStep=seq(0,4,1)))

# You have to convert the output to a dataframe
Final.Fixed<-as.data.frame(Final.Fixed)

Final.Fixed.Plot <-ggplot(data = Final.Fixed, aes(x = TimeStep, y =fit, group=Social))+
  coord_cartesian(xlim=c(0,4),ylim = c(0,.7))+ 
  geom_line(aes(color=Social), size=2)+
  geom_ribbon(aes(ymin=fit-se, ymax=fit+se,fill=Social),alpha=.2)+
  xlab("Time")+
  ylab("Happiness\nscore")+
  scale_color_manual(values=c("blue", "red"))+
  scale_fill_manual(values=c("blue", "red"))+
  theme_bw()+
  theme(text=element_text(face="bold", size=12),
        panel.grid.major = element_blank(), 
        panel.grid.minor = element_blank(), 
        panel.border = element_rect(fill = NA, colour = "NA"),
        axis.line = element_line(size = 1, colour = "grey80"),
        legend.title=element_blank(),
        legend.position = c(.2, .92))
Final.Fixed.Plot
```

# Problem with P-values
- What gives mixed-effects models their ability to handle complex designs (e.g., nested, crossed, nested & crossed, partially nested and crossed) is in part what does not allow them to calculate p-values
- Remember, $t = \frac{M-\mu}{SE}$, where, $SE = \sqrt\frac{\sigma^2}{n}$, $df = n-1$
- GLM uses standard error to determine significance: Required degrees of freedom to correct z-distribution for "sampling error" to produce $t$ and $F$ tables.  
- But mixed models do not actually produce $t$ or $t^2=F$ values as they are not calculating the $SE$ using the equation above, which uses observed - expected [mean square = $\sigma^2 = \frac{SS}{N-1}$, where $SS = (M-\mu)^2$
- Instead, they estimate the $SE$ using ML or REML function
- ML is a way of finding the value of one or more parameters in mixed model for a given statistic which makes the known *likelihood* distribution a *maximum*
    - *likelihood* is "a hypothetical probability that an event that has **already** occurred would yield a specific outcome" (http://mathworld.wolfram.com/Likelihood.html)
    - *maximum* is the largest value the function can produce
    - We will iterate through parameters until we maximize our likelihood
    - There are different ML (or just L) functions that can be used and they can apply to most distributions
- **Problem 1:** SE is not the one used in $t$ and $F$ (it does not use N)
- **Problem 2:** The estimate we get is not via OLS either, it's via ML or REML (so what is the proper distribution to test against?)
- **Problem 3:** What is our $DF$ even if we want to assume $t$ and $F$?

## Problem 3: DF Conceptual Issues
- Sample 500, 11the graders take the SAT from 3 schools partially nested in 2 economic districts  

```{r, echo = FALSE, out.width = "300px"}
knitr::include_graphics("Mixed/VennSchools.jpg")
```


- Random sampling of 500 total from 3 schools [Note: school 3 is twice as large school 1 and 2] 
- ANOVA Design: 1 factor: School OR district [they provide an unbalanced design so they can only be analyzed one at time]
- Assume homogeneity of variance of SAT score across schools and districts
- ANOVA can only ask 1 basic question: Are the means between the schools or districts different

### ANOVA problem?
- Why is ANOVA bad here? 
    - School 1 is predominately poor vs School 2, which is rich, so school 1 has less funding, larger classrooms sizes, more social issues at home and in school, fewer teachers, etc. 
- This could impact both mean test scores and variance of test scores  
    - Students from the poor district might have more diverse scores [high variance]
    - Student from rich district might have more similar scores, but higher average scores [low variance]
    - Poor students in the Rich school might also have changes in variance or mean test scores, which look very different from the poor or mixed school.  

### Mixed vs. ANOVA Solution

#### ANOVA Logic

- If we assume ANOVA that would mean we calculate mean/variance relative to their classification based on school and district [at the level we are examining only] 
- In the case when all the variables are categorical, the DF "seems" logical, in that the proper MSerror terms could be calculated. However, in the mixed models, the random factors have been nested (kids in schools in districts). 
    - So we cannot know how many of the people went into parsing the variance for each level of the random error term.  
        - We could make assumptions, but some statisticians say over their dead bodies. 
    - ANOVA is assumed to be robust against many violations, but that's mostly only true in well-controlled balanced designs.
    - When ANOVA is stretched into the mixed model territory, it can result in unreasonable levels of type 1 error.  
    
#### Mixed Logic

- 1. Fixed factors:  School and District [student belongs to]
    - SAT score MEANS could be different depending on the school a kid goes to and what district they are from.   
- 2. Random Factors:  District, School, [Student]
    - SAT score VARIANCES could be different depending on the school a kid goes to and what district they are from.   
    - So are the kids the DF, or the School or the district? What do with kids who are mismatched (poor but in richer schools or vice versa?)
- 3. Random Slopes: The impact of SES could be different depending on the school a kid goes to and what district they are from,
    - In the rich school, the SES of the student might not matter as much as the poor or mixed school 
        - When we add random slopes, how many measurements of the effect of SES are we making - is it based on the number of students, schools or districts?
- Which level do we count the degrees of freedom for each fixed and random factor? Students, schools, or districts? How do we count DF for random factors?  

# Getting P-values
## Path A: Don't push your assumptions on me!

1.  Modeling Fitting testing 
- Likelihood ratio test [chi-square distribution]
- D = 2 - ln (Likelihood of null model / Likelihood of alternative model] 
- Add/Remove fixed factors and test to see if the model "fits the data better" 
- Suggested by Bates and Bolker
- FAST and well-accepted [a bit anti-conservative some argue]
    - To do this, you simply have to `anova(model.1, model.2)`
    

2. Bootstrapping 
- Non-parametric: Re-sampling method where you resample with replacement from your own data     
    - Common method in modern stats: A much better method then assuming normality and t-distributions 
    - Example: You build your final model and you then take the subjects, resample them with replacement, and refit your model
        - You do this over and over again to build a distribution for each parameter in your model
            - You than calculate the 95th percentile and see if contains zero, if not you have a *significant* effect 
- Semi-parametric: Adds noise to the Non-parametric (assumes your measurements are randomly sampled from all possible measurements)
    - So you sample from your data, add parametric noise, rinse and repeat like above
- Parametric: You assume your data follows a specific distribution, you use your model to estimate the parameters, and then you simulate the study based on those parameters to estimate the CI. You than calculate the 95th percentile and see if it contains zero.
- So far we can only do Semi-parametric & Parametric because of the random effects
    - However,  Semi-parametric is not fully implemented yet 
- VERY, VERY SLOW.
- This procedure can fail because your model is unstable (probably means your model is bad anyway)
- You can also bootstrap Model fit testing (95th percentile method)

```{r}
# parametric
Model.Final.CI.boot <- confint(Model.6, method="boot",nsim=100,boot.type = c("perc")) 
Model.Final.CI.boot

```

## Path B:  Let's hold hands, put our fingers in our ears, and assume! 

3. Assume z distributions to get p-values
- Very, very common approach
- If you number of observations is high, just assume DF -> inf [so $t > |2|$, is significant]
    - This is possible because you assume your random effects have controlled for the random variations in the populations and your values good estimates of the population
- Simulation studies by Barr et al. show this to be a reasonable assumption IF and ONLY OF random structure is MAXIMAL
    - This is HOTLY debated by Bates

4. Estimate degrees of freedom
- SAS/SPSS uses Satterthwaite approximations 
    - However, there are also Kenward-Roger approximations [see Westfall, Kenny, & Judd, 2014]
    - Either is acceptable, but KR is recommended (but does not always work) 
- Makes a number of assumptions, will mirror results of traditional ANOVA fairly closely [as you would expect when you make similar assumptions about DF]
    - This method can also fail to work because your model is unstable/convergence problems

[Note: You have to refit the model with the lmerTest package]

```{r}
library(lmerTest) #mixed model add-on package to 
Model.Final<-lmer(HappyPercent ~Social*TimeStep
                  +(1+Social*TimeStep|Subject),  
                  data=HappyData, REML=FALSE)
summary(Model.Final, ddf="Satterthwaite") ## SPSS/SAS method

Model.Final.REML<-lmer(HappyPercent ~Social*TimeStep
                  +(1+Social*TimeStep|Subject),  
                  data=HappyData, REML=TRUE)
summary(Model.Final.REML,ddf = "Kenward-Roger") # Kenney et al suggested method
```

## Path C: Go Bayesian
- There use to be a way to get pvalues based on Markov-chain Monte-Carlo methods (MCMC)
    - Currently does not work for lme4 package we are using
- You have to refit the model as Bayesian using `rstanarm` and get CIs (see Gelman & Hill textbook)
    
### Practical solution
- Practically, when you add factors in stepwise and model fit shows a significant improvement (Path 1A) you will see the new factors has a $t > |2|$. (Path 2A)
- That does not always agree with bootstrapping (Path 1B) or ANOVA-like degrees of freedom (Path 2B)
    - This is because LME4 package we are using does not allow us to control for the heterogeneous variance problem. The older package LME will allow you to control for it.
        - Note LME uses Satterthwaite approximations and will always give you DF and pvalues
    - Also, there could be other problems with your assumptions that bootstrapping is picking up on the other paths will not pick up
- *Recommendation:* Based on current trends:
    1. Always do Path 1A to say the model is better 
    2. To test the parameters, add Path 1B, Path 2A or 2B.
        - I usually pick 1B or 2A 
        - But some fields might demand DF, so you have to do 2B

# Mixed to ANOVA
- If you wanted to report the main effects and interactions, instead of slopes, you might convert your mixed model into an ANOVA
- But remember, your slopes of time revert to being factors and not continuous variables

```{r}
anova(Model.Final.REML,ddf = "Kenward-Roger") 
```

<script>
  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
  m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
  })(window,document,'script','https://www.google-analytics.com/analytics.js','ga');

  ga('create', 'UA-90415160-1', 'auto');
  ga('send', 'pageview');

</script>
