1 Making the intercept and slopes makes sense!

  • Centering
  • Zscore
  • POMP
  • When to use depends on your questions. However, centering is safest to do (and is often recommended)
  • You need to decide on whether is makes sense to transform both DV and IVs or one or the other.
  • Lets make a practice dataset to explore
  • We will transform just the IVs for now:
library(car) #graph data
# IQ scores of 5 people
Y<-c(85, 90, 100, 120, 140)
# Likert scale rating of liking of reading books (1 hate to 7 love)
X1<-c(1,2,4,6,7)

scatterplot(Y~X1, smooth=FALSE)

summary(lm(Y~X1))
## 
## Call:
## lm(formula = Y ~ X1)
## 
## Residuals:
##       1       2       3       4       5 
##  3.9615  0.3077 -7.0000 -4.3077  7.0385 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)   72.385      6.010   12.04  0.00123 **
## X1             8.654      1.305    6.63  0.00699 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.655 on 3 degrees of freedom
## Multiple R-squared:  0.9361, Adjusted R-squared:  0.9148 
## F-statistic: 43.96 on 1 and 3 DF,  p-value: 0.006989

1.1 Center

  • \(Center = {X - M}\)
  • Intercept is at the MEAN of IV (not 0 of original IV)
  • Does NOT changes meaning of slope
  • R: scale(Data,scale=FALSE)
# Center: Likert scale rating of liking of reading books (1 hate to 7 love)
X1.C<-scale(X1,scale=FALSE)
X1.C<-X1.C[,1]
#Center the DV and IV 
scatterplot(Y~X1.C, smooth=FALSE)

summary(lm(Y~X1.C))
## 
## Call:
## lm(formula = Y ~ X1.C)
## 
## Residuals:
##       1       2       3       4       5 
##  3.9615  0.3077 -7.0000 -4.3077  7.0385 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  107.000      2.976   35.95 4.73e-05 ***
## X1.C           8.654      1.305    6.63  0.00699 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.655 on 3 degrees of freedom
## Multiple R-squared:  0.9361, Adjusted R-squared:  0.9148 
## F-statistic: 43.96 on 1 and 3 DF,  p-value: 0.006989

1.2 Zscore

  • \(Z = \frac{X - M}{s}\)
  • Intercept is not at the MEAN of IV (no 0 of IV)
  • Slope changes meaning: no longer in unites of original DV, now in SD units
  • R: scale(data)
##            [,1]
## [1,] -1.1766968
## [2,] -0.7844645
## [3,]  0.0000000
## [4,]  0.7844645
## [5,]  1.1766968
## attr(,"scaled:center")
## [1] 4
## attr(,"scaled:scale")
## [1] 2.54951
## [1] -1.1766968 -0.7844645  0.0000000  0.7844645  1.1766968

## 
## Call:
## lm(formula = Y ~ X1.Z)
## 
## Residuals:
##       1       2       3       4       5 
##  3.9615  0.3077 -7.0000 -4.3077  7.0385 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  107.000      2.976   35.95 4.73e-05 ***
## X1.Z          22.063      3.328    6.63  0.00699 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.655 on 3 degrees of freedom
## Multiple R-squared:  0.9361, Adjusted R-squared:  0.9148 
## F-statistic: 43.96 on 1 and 3 DF,  p-value: 0.006989

1.3 POMP

  • \(Center = {X - M}\)
  • \(POMP = \frac{X - MinX}{Max_X - Min_X)}*100\)
  • Note: I like to X 100 cause I find it easier to think in percent (not proportion)
  • Useful when data are bounded (or scaled funny)
  • Intercept is again at 0 of IV [but the slopes is different so the intercept changes a bit]
  • Does changes meaning of slope: is now a function of percent change of IV
#POMP of X1
X1_POMP = (X1 - min(X1)) / (max(X1) - min(X1))*100
X1_POMP
## [1]   0.00000  16.66667  50.00000  83.33333 100.00000
scatterplot(Y~X1_POMP, smooth=FALSE)

summary(lm(Y~X1_POMP))
## 
## Call:
## lm(formula = Y ~ X1_POMP)
## 
## Residuals:
##       1       2       3       4       5 
##  3.9615  0.3077 -7.0000 -4.3077  7.0385 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 81.03846    4.91852   16.48 0.000487 ***
## X1_POMP      0.51923    0.07831    6.63 0.006989 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.655 on 3 degrees of freedom
## Multiple R-squared:  0.9361, Adjusted R-squared:  0.9148 
## F-statistic: 43.96 on 1 and 3 DF,  p-value: 0.006989

2 Simaltanous Regression (standard approach)

  • Put all your variables in and see that the effect is of each term
  • Very conservative approach
  • Does not allow you to understand additive effects very easily
  • You noticed this problem when we were trying to explain Health ~ Years married + Age
  • Had you only looked at this final model you might never have understand that Years married acted as good predictor on its own.
  • Also what if you have a theory testing? You need to see the additive effects.

3 Hierarchical Modeling

  • Is the change in \(R^2\), meaningful (Model 2 \(R^2\) - Model 1 \(R^2\))?
  • The order in which models are run are meaningful
  • Terms in models do not need to be analysed one at a time, but can be entered as ‘sets’
  • a set of variables are theoretically or experimentally driven
  • So Model 2 \(R^2\) - Model 1 \(R^2\) meaningful?

3.1 Hierarchical Modeling driven by the reseacher

  • Forward selection: Start with simple models and get more complex nested models
  • Backwards selection: Start with complex nested models and get more simple
  • Stepwise selection: can be viewed as a variation of the forward selection method (one predictor at time) but predictors are deleted in subsequent steps if they no longer contribute appreciable unique prediction
  • Which you choose is can depend on how you like to ask questions

3.1.1 Forward Selection of nested models

  • A common approach “model building”
  • Again lets make up our dummy data
#packages we will need to conduct to create and graph our data
library(MASS) #create data
library(car) #graph data
py1 =.6 #Cor between X1 (ice cream) and happiness
py2 =.4 #Cor between X2 (Brownies) and happiness
p12= .2 #Cor between X1 (ice cream) and X2 (Brownies)

Means.X1X2Y<- c(10,10,10) #set the means of X and Y variables
CovMatrix.X1X2Y <- matrix(c(1,p12,py1, p12,1,py2, py1,py2,1),3,3) # creates the covariate matrix 

set.seed(42)
CorrDataT<-mvrnorm(n=100, mu=Means.X1X2Y,Sigma=CovMatrix.X1X2Y, empirical=TRUE)
CorrDataT<-as.data.frame(CorrDataT)
colnames(CorrDataT) <- c("IceCream","Brownies","Happiness")

3.1.1.1 First alittle side track…

  • Remember the \(R^2\) values are reported as F values right?
  • This means you can actually get an ANOVA like table for the model
  • for example:
###############Model 1 
Ice.Model<-lm(Happiness~ IceCream, data = CorrDataT)
anova(Ice.Model)
## Analysis of Variance Table
## 
## Response: Happiness
##           Df Sum Sq Mean Sq F value    Pr(>F)    
## IceCream   1  35.64  35.640  55.125 4.193e-11 ***
## Residuals 98  63.36   0.647                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  • The \(R^2\) this is explained to unexplained variance (like in our ANOVA) \[R^2 = \frac{SS_{explained}}{SS_{explained}+SS_{residual}}\]
  • just to check: anova(Ice.Model) 64.36
  • which matched the \(R^2\) that R gives us 0.36
  • When we check to see which model is best we actually test the differences

3.1.2 Lets forward-fit our models

###############Model 1 
Ice.Model<-lm(Happiness~ IceCream, data = CorrDataT)
R2.Model.1<-summary(Ice.Model)$r.squared

###############Model 2
Ice.Brown.Model<-lm(Happiness~ IceCream+Brownies, data = CorrDataT)
R2.Model.2<-summary(Ice.Brown.Model)$r.squared
library(stargazer)
#http://jakeruss.com/cheatsheets/stargazer.html

stargazer(Ice.Model,Ice.Brown.Model,type="html",
          column.labels = c("Model 1", "Model 2"),
          intercept.bottom = FALSE,
          single.row=FALSE, 
          star.cutoffs = c(0.1, 0.05, 0.01, 0.001),
          star.char = c("@", "*", "**", "***"), 
          notes= c("@p < .1 *p < .05 **p < .01 ***p < .001"),
          notes.append = FALSE)
Dependent variable:
Happiness
Model 1 Model 2
(1) (2)
Constant 4.000*** 1.667@
(0.812) (0.982)
IceCream 0.600*** 0.542***
(0.081) (0.077)
Brownies 0.292***
(0.077)
Observations 100 100
R2 0.360 0.442
Adjusted R2 0.353 0.430
Residual Std. Error 0.804 (df = 98) 0.755 (df = 97)
F Statistic 55.125*** (df = 1; 98) 38.366*** (df = 2; 97)
Note: @p < .1 p < .05 p < .01 p < .001
  • Lets lets the difference in \(R^2\)
  • In R, we call for function anova
R2.Change<-R2.Model.2-R2.Model.1
R2.Change
## [1] 0.08166667
anova(Ice.Model,Ice.Brown.Model)
## Analysis of Variance Table
## 
## Model 1: Happiness ~ IceCream
## Model 2: Happiness ~ IceCream + Brownies
##   Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
## 1     98 63.360                                  
## 2     97 55.275  1     8.085 14.188 0.0002838 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  • So in other words we see model 2 fit the data better than model 1 and the change in R is significant.

3.1.3 Backward Selection of nested models

  • You as does taking away variables reduce my \(R^2\) significantly
  • A less common approach, but sometimes used to validate you have a parsimonious model
  • You might forward fit a set of variables and backwards fit critical ones to test a specific hypothesis
  • Using the same data as above, we will get the same values (just negative)
###############Model 1.B 
Ice.Brown.Model<-lm(Happiness~ IceCream+Brownies, data = CorrDataT)
R2.Model.1.B<-summary(Ice.Brown.Model)$r.squared

###############Model 2.B
Ice.Model<-lm(Happiness~ IceCream, data = CorrDataT)
R2.Model.2.B<-summary(Ice.Model)$r.squared

R2.Change.B<-R2.Model.2.B-R2.Model.1.B
R2.Change.B
## [1] -0.08166667
anova(Ice.Brown.Model,Ice.Model)
## Analysis of Variance Table
## 
## Model 1: Happiness ~ IceCream + Brownies
## Model 2: Happiness ~ IceCream
##   Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
## 1     97 55.275                                  
## 2     98 63.360 -1    -8.085 14.188 0.0002838 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

3.2 Hierarchical modeling (often driven by the computer)

  • Stepwise with many predicts is often done by computer and it does not always assume nested models (you can add and remove at the same)
  • Exploratory: you have too many predictors and have no idea where to start
  • You give the computer a larger number of predictors and the computer decides the best fit model
  • Sounds good right? No, as the results can be unstable
  • Change one variable in the set and the final model can change
  • high chance of type I and type II error
  • computer makes decisions based on Akaike information criterion (AIC) not selected based on change in \(R^2\), because models are not nested
  • also computer makes decisions purely on fit values and has nothing do with a theory
  • Solutions are often unique to that particular dataset
  • the best model is often the one that parses a theory and only a human can do that at present
  • Not really publishable because of these problem

4 Parsing influence

  • As models get bigger and bigger its becomes a challenge to figure out the unique contribution to \(R^2\) of each variable
  • There are many computation solutions that you can select from, but we will use one called lmg
  • you can read about all the different ones here: https://core.ac.uk/download/pdf/6305006.pdf
  • these methods are not well known in psychology, but can be very useful when people ask you what the relative importance of each variable is
  • two approaches: show absolute \(R^2\) for each term or the relative % of \(R^2\) for each term
library(relaimpo)
# In terms of R2
calc.relimp(Ice.Brown.Model) 
## Response variable: Happiness 
## Total response variance: 1 
## Analysis based on 100 observations 
## 
## 2 Regressors: 
## IceCream Brownies 
## Proportion of variance explained by model: 44.17%
## Metrics are not normalized (rela=FALSE). 
## 
## Relative importance metrics: 
## 
##                lmg
## IceCream 0.3208333
## Brownies 0.1208333
## 
## Average coefficients for different model sizes: 
## 
##           1X       2Xs
## IceCream 0.6 0.5416667
## Brownies 0.4 0.2916667
# as % of R2
calc.relimp(Ice.Brown.Model,rela = TRUE) 
## Response variable: Happiness 
## Total response variance: 1 
## Analysis based on 100 observations 
## 
## 2 Regressors: 
## IceCream Brownies 
## Proportion of variance explained by model: 44.17%
## Metrics are normalized to sum to 100% (rela=TRUE). 
## 
## Relative importance metrics: 
## 
##                lmg
## IceCream 0.7264151
## Brownies 0.2735849
## 
## Average coefficients for different model sizes: 
## 
##           1X       2Xs
## IceCream 0.6 0.5416667
## Brownies 0.4 0.2916667

5 Final notes:

  • If you play with lots of predictors and do lots of models, something will be significant
  • Type I error is a big problem because of the ‘researcher degree of freedom problem’
  • Type II increases as function of the number of predictors. a) you slice too much pie, b) each variable might try to each eat someone else’s slice
  • Less is more: ask targeted questions with as orthogonal a set of variables as you can