You want to show the effectiveness of CBT therapy against no therapy in reducing depression scores. You give clients (and controls) the Beck depression index (BDI at baseline, and every two weeks afterward for up to 6 Weeks. You assign 30 people to each between-subjects group (control or CBT) with varying levels of depression (from mild to severe; BDI of 20 to 50), and in total, you will collect 4 BDI scores for each person (within subject). You expect BDI scores to drop by the second BDI measurement and continue to decrease over sessions relative to control which should show no change. Keep in mind there is no intervention for the control group, and BDI score has a tendency to be unstable, so you are worried about assumption violations.
Each line represents a person’s score across conditions
Just like RM ANOVA, we will use Cousineau-Morey Corrected error bars which remove individual differences. The function will correct for the mixed design differences in the correction, just pass which variables are between below. Again, you can select between SE and CI (I plotted CI below).
source('HelperFunctions.R')
MeansforGGPlot <- summarySEwithin(data=DataSim2, "BDI",
withinvars=c("Time"),
betweenvars=c("Group"),
idvar="ID", na.rm=FALSE, conf.interval=.95)
Means.Within.CI <-ggplot(data = MeansforGGPlot, aes(x = Time, y=BDI, group=Group))+
scale_y_continuous(breaks = seq(0,40,5))+
geom_line(aes(colour = Group), size=2)+
geom_ribbon(aes(ymin=BDI-ci, ymax=BDI+ci,fill = Group), alpha=.5)+
ylab("Mean BDI Score")+xlab("Time")+
theme(legend.position = "right")
Means.Within.CI
For within error bars (Corrected) you can eye-ball significance (Cumming & Finch, 2005)
95% CIs:
Notice also you can see problems with the variance
Two-way Mixed ANOVA Table With the formulas (conceptually simplified)
Source | SS | DF | MS | F |
---|---|---|---|---|
\(Between\,Subject\) | \(CSS_{row\,means}\) | \(nK-1\) | \(\frac{SS_{BS}}{df_{BS}}\) | |
\(Group\) | \(CnSS_{Group\,means}\) | \(K-1\) | \(\frac{SS_{G}}{df_{G}}\) | \(\frac{MS_{G}}{MS_{W_{G}}}\) |
\(W_{G}\) | \(SS_{BS} - SS_{G}\) | \(K(n-1)\) | \(\frac{SS_{W_{G}}}{df_{W_{G}}}\) | |
\(Within\,Subject\) | \(SS_T-SS_{BS}\) | \(nK(C-1)\) | \(\frac{SS_w}{df_w}\) | |
\(\,RM\) | \(KnSS_{RM\,means}\) | \(C-1\) | \(\frac{SS_{RM}}{df_{RM}}\) | \(\frac{MS_{RM}}{MS_{SXRM}}\) |
\(\,GXRM\) | \(nSS_{Col\,means}-SS_G-SS_{RM}\) | \((K-1)(C-1)\) | \(\frac{SS_{GXRM}}{df_{GXRM}}\) | \(\frac{MS_{GXRM}}{MS_{SXRM}}\) |
\(\,Residual\,[Sub\,x\,RM]\) | \(SS_{WS}-SS_{RM}-SS_{GXRM}\) | \(K(n-1)(C-1)\) | \(\frac{SS_{SXRM}}{df_{SXRM}}\) | |
Total | \(SS_{scores}\) | \(N-1\) |
Notes:
afex
calculates the RM ANOVA for us and it will
automatically correct for sphericity violations.library(afex)
Mixed.1<-aov_car(BDI~ Time*Group + Error(ID/Time),
data = DataSim2)
Mixed.1
## Anova Table (Type 3 tests)
##
## Response: BDI
## Effect df MSE F ges p.value
## 1 Group 1, 38 115.22 7.52 ** .062 .009
## 2 Time 2.38, 90.46 95.35 2.52 + .042 .076
## 3 Group:Time 2.38, 90.46 95.35 4.45 * .072 .010
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '+' 0.1 ' ' 1
##
## Sphericity correction method: GG
Same as RM ANOVA, except we should examine Mauchly’s test for Sphericity on each term (that has more than two levels)
summary(Mixed.1, return="univariate")
##
## Univariate Type III Repeated-Measures ANOVA Assuming Sphericity
##
## Sum Sq num Df Error SS den Df F value Pr(>F)
## (Intercept) 150794 1 4378.3 38 1308.7751 < 2.2e-16 ***
## Group 866 1 4378.3 38 7.5159 0.009271 **
## Time 573 3 8625.1 114 2.5239 0.061195 .
## Group:Time 1011 3 8625.1 114 4.4531 0.005363 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## Mauchly Tests for Sphericity
##
## Test statistic p-value
## Time 0.58306 0.0013626
## Group:Time 0.58306 0.0013626
##
##
## Greenhouse-Geisser and Huynh-Feldt Corrections
## for Departure from Sphericity
##
## GG eps Pr(>F[GG])
## Time 0.79352 0.07603 .
## Group:Time 0.79352 0.01005 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## HF eps Pr(>F[HF])
## Time 0.8501145 0.071636760
## Group:Time 0.8501145 0.008454026
Violations of Sphericity are more common in mixed designs, so be cautious
We assume Compound Symmetry for each group:
\[ \mathbf{Cov} = \sigma^2\left[\begin{array} {rrrr} 1 & \rho & \rho & \rho \\ \rho & 1 & \rho & \rho \\ \rho & \rho & 1 & \rho \\ \rho & \rho & \rho & 1\\ \end{array}\right] \]
However, as in the case of the study that we examined today, we have two Covariance Matrices (one for each group: \(COV_1 = COV_2\)). So we must test to see if they are homogeneous. We can use Box’s M.
Box’s M tests to see if the covariance matrices are equal between
groups (HOCV). This function in R requires we spread DV
into columns based on the Within factor (we will use spread
function in tidyr package:
dataset %>% spread(Within factor, DV)
.
library(tidyr); library(dplyr)
#Box's M
data_wide <- DataSim2 %>% spread(Time, BDI)
Now the data is wide, we will index the columns of the DV (rows 3 to
6) We will send this result to boxM function in heplots
and
cut it buy group. We can view the covariance matrix for each group and
also view the results of the test.
library(heplots)
BoxResult<-boxM(data_wide[,3:6],data_wide$Group)
BoxResult$cov
BoxResult
## $CBT
## Baseline 2 Weeks 4 Weeks 6 Weeks
## Baseline 35.830484 10.082930 39.039303 4.564906
## 2 Weeks 10.082930 74.984873 2.848143 -14.474673
## 4 Weeks 39.039303 2.848143 52.237993 7.832589
## 6 Weeks 4.564906 -14.474673 7.832589 46.560581
##
## $Control
## Baseline 2 Weeks 4 Weeks 6 Weeks
## Baseline 63.826004 5.222681 43.836602 -41.487667
## 2 Weeks 5.222681 124.377804 28.686090 40.546881
## 4 Weeks 43.836602 28.686090 120.724050 -8.021894
## 6 Weeks -41.487667 40.546881 -8.021894 165.847124
##
##
## Box's M-test for Homogeneity of Covariance Matrices
##
## data: data_wide[, 3:6]
## Chi-Sq (approx.) = 32.502, df = 10, p-value = 0.0003301
Box’s M does not work well with small samples and becomes overly sensitive in large samples. We can set alpha here to be \(\alpha = .001\). If the result is significant, that means the covariance matrices are not equal which means we have a problem. This increases the rate of which we might commit a Type I error when examining the interaction: the treatment affected the covariance between trials by the group and not the just the means as we expected. So is the interaction due to mean differences or covariance differences?
Box’s M is not a great test, so its recommended you also check
Levene’s test (center the data via mean) or Brown-Forsythe test (center
the data via median) in the car
package. Brown-Forsythe
test is more robust to the violation of normality (Note: the output will
still say Levene’s test when you center via median). Both these tests
are sensitive to unequal sample sizes per group. We can test HOV of the
groups and group*treatment. If the interaction HOV is violated it
supports the conclusion of Box’s M.
library(car)
# Brown-Forsythe test on Group Only
leveneTest(BDI~ Group, data=DataSim2,center=median)
# Brown-Forsythe test on Group X Treatment Only
leveneTest(BDI~ Time*Group, data=DataSim2,center=median)
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 1 7.7604 0.005994 **
## 158
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 7 2.766 0.009832 **
## 152
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
In the case of this study, Box’s M and Brown-Forsythe test showed HOCV and HOV problems. So we are in trouble, what can we do?
Solution?: Cohen (2008) recommends that you do not do the mixed ANOVA, meaning you will be unable to test the interaction. You can only do one-way RMs for each group and do ANOVA or independent t-tests on the groups (collapsing over RM term). Is this extremely conservative approach is it justified?
We have the same options as we did with RM ANOVA, but we have one
extra assumption that will affect both the Modern Approach (using
emmeans
with error term from ANOVA and Satterthwaite’s DF)
and the Multivariate approach.
I ran a simulation (see MonteCarloMixed.R
) on the Type I
error rates give the violations. In this simulation (n = 1000) of 2
(between) x 4 (within) design with a sample size of 30. All the cells
have mean of 0 and I let the covariance matrices for each group be
random but different from each other.
Type I Levels:
This is extremely high on the follow-ups testing and worse yet when you follow up between-subject way. This is the reason for Cohen suggestions not to test the interaction. In the wild no one does this in practice because usually the whole point of the study.
Example using simple effects followed up by polynomial contrast
library(emmeans)
Time.by.Group<-emmeans(Mixed.1,~Time|Group)
test(pairs(Time.by.Group), joint=TRUE)
## Group df1 df2 F.ratio p.value note
## CBT 3 38 13.063 <.0001 d
## Control 3 38 0.766 0.5199 d
##
## d: df1 reduced due to linear dependence
So changes over time in Control group which we can follow up with contrasts: polynomials, consecutive, or any custom contrast you want to test. [Note we can index 1:3 to see only the CBT results]
contrast(Time.by.Group, 'poly')[1:3]
## contrast Group estimate SE df t.ratio p.value
## linear CBT -34.14 8.97 38 -3.805 0.0005
## quadratic CBT 6.72 3.26 38 2.061 0.0462
## cubic CBT -8.65 9.66 38 -0.896 0.3759
Cousineau, D., & O’Brien, F. (2014). Error bars in within-subject designs: a comment on Baguley (2012). Behavior Research Methods, 46(4), 1149-1151.