# 1 Making the intercept and slopes makes sense!

• Centering
• Zscore
• POMP
• When to use depends on your questions. However, centering is safest to do (and is often recommended)
• You need to decide on whether is makes sense to transform both DV and IVs or one or the other.
• Lets make a practice dataset to explore
• We will transform just the IVs for now:
library(car) #graph data
# IQ scores of 5 people
Y<-c(85, 90, 100, 120, 140)
# Likert scale rating of liking of reading books (1 hate to 7 love)
X1<-c(1,2,4,6,7)

scatterplot(Y~X1, smooth=FALSE)

summary(lm(Y~X1))
##
## Call:
## lm(formula = Y ~ X1)
##
## Residuals:
##       1       2       3       4       5
##  3.9615  0.3077 -7.0000 -4.3077  7.0385
##
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)   72.385      6.010   12.04  0.00123 **
## X1             8.654      1.305    6.63  0.00699 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.655 on 3 degrees of freedom
## Multiple R-squared:  0.9361, Adjusted R-squared:  0.9148
## F-statistic: 43.96 on 1 and 3 DF,  p-value: 0.006989

## 1.1 Center

• $$Center = {X - M}$$
• Intercept is at the MEAN of IV (not 0 of original IV)
• Does NOT changes meaning of slope
• R: scale(Data,scale=FALSE)
# Center: Likert scale rating of liking of reading books (1 hate to 7 love)
X1.C<-scale(X1,scale=FALSE)
X1.C<-X1.C[,1]
#Center the DV and IV
scatterplot(Y~X1.C, smooth=FALSE)

summary(lm(Y~X1.C))
##
## Call:
## lm(formula = Y ~ X1.C)
##
## Residuals:
##       1       2       3       4       5
##  3.9615  0.3077 -7.0000 -4.3077  7.0385
##
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)  107.000      2.976   35.95 4.73e-05 ***
## X1.C           8.654      1.305    6.63  0.00699 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.655 on 3 degrees of freedom
## Multiple R-squared:  0.9361, Adjusted R-squared:  0.9148
## F-statistic: 43.96 on 1 and 3 DF,  p-value: 0.006989

## 1.2 Zscore

• $$Z = \frac{X - M}{s}$$
• Intercept is not at the MEAN of IV (no 0 of IV)
• Slope changes meaning: no longer in unites of original DV, now in SD units
• R: scale(data)
##            [,1]
## [1,] -1.1766968
## [2,] -0.7844645
## [3,]  0.0000000
## [4,]  0.7844645
## [5,]  1.1766968
## attr(,"scaled:center")
## [1] 4
## attr(,"scaled:scale")
## [1] 2.54951
## [1] -1.1766968 -0.7844645  0.0000000  0.7844645  1.1766968

##
## Call:
## lm(formula = Y ~ X1.Z)
##
## Residuals:
##       1       2       3       4       5
##  3.9615  0.3077 -7.0000 -4.3077  7.0385
##
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)  107.000      2.976   35.95 4.73e-05 ***
## X1.Z          22.063      3.328    6.63  0.00699 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.655 on 3 degrees of freedom
## Multiple R-squared:  0.9361, Adjusted R-squared:  0.9148
## F-statistic: 43.96 on 1 and 3 DF,  p-value: 0.006989

## 1.3 POMP

• $$Center = {X - M}$$
• $$POMP = \frac{X - MinX}{Max_X - Min_X)}*100$$
• Note: I like to X 100 cause I find it easier to think in percent (not proportion)
• Useful when data are bounded (or scaled funny)
• Intercept is again at 0 of IV [but the slopes is different so the intercept changes a bit]
• Does changes meaning of slope: is now a function of percent change of IV
#POMP of X1
X1_POMP = (X1 - min(X1)) / (max(X1) - min(X1))*100
X1_POMP
## [1]   0.00000  16.66667  50.00000  83.33333 100.00000
scatterplot(Y~X1_POMP, smooth=FALSE)

summary(lm(Y~X1_POMP))
##
## Call:
## lm(formula = Y ~ X1_POMP)
##
## Residuals:
##       1       2       3       4       5
##  3.9615  0.3077 -7.0000 -4.3077  7.0385
##
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept) 81.03846    4.91852   16.48 0.000487 ***
## X1_POMP      0.51923    0.07831    6.63 0.006989 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.655 on 3 degrees of freedom
## Multiple R-squared:  0.9361, Adjusted R-squared:  0.9148
## F-statistic: 43.96 on 1 and 3 DF,  p-value: 0.006989

# 2 Simaltanous Regression (standard approach)

• Put all your variables in and see that the effect is of each term
• Very conservative approach
• Does not allow you to understand additive effects very easily
• You noticed this problem when we were trying to explain Health ~ Years married + Age
• Had you only looked at this final model you might never have understand that Years married acted as good predictor on its own.
• Also what if you have a theory testing? You need to see the additive effects.

# 3 Hierarchical Modeling

• Is the change in $$R^2$$, meaningful (Model 2 $$R^2$$ - Model 1 $$R^2$$)?
• The order in which models are run are meaningful
• Terms in models do not need to be analysed one at a time, but can be entered as ‘sets’
• a set of variables are theoretically or experimentally driven
• So Model 2 $$R^2$$ - Model 1 $$R^2$$ meaningful?

## 3.1 Hierarchical Modeling driven by the reseacher

• Forward selection: Start with simple models and get more complex nested models
• Backwards selection: Start with complex nested models and get more simple
• Stepwise selection: can be viewed as a variation of the forward selection method (one predictor at time) but predictors are deleted in subsequent steps if they no longer contribute appreciable unique prediction
• Which you choose is can depend on how you like to ask questions

### 3.1.1 Forward Selection of nested models

• A common approach “model building”
• Again lets make up our dummy data
#packages we will need to conduct to create and graph our data
library(MASS) #create data
library(car) #graph data
py1 =.6 #Cor between X1 (ice cream) and happiness
py2 =.4 #Cor between X2 (Brownies) and happiness
p12= .2 #Cor between X1 (ice cream) and X2 (Brownies)

Means.X1X2Y<- c(10,10,10) #set the means of X and Y variables
CovMatrix.X1X2Y <- matrix(c(1,p12,py1, p12,1,py2, py1,py2,1),3,3) # creates the covariate matrix

set.seed(42)
CorrDataT<-mvrnorm(n=100, mu=Means.X1X2Y,Sigma=CovMatrix.X1X2Y, empirical=TRUE)
CorrDataT<-as.data.frame(CorrDataT)
colnames(CorrDataT) <- c("IceCream","Brownies","Happiness")

#### 3.1.1.1 First alittle side track…

• Remember the $$R^2$$ values are reported as F values right?
• This means you can actually get an ANOVA like table for the model
• for example:
###############Model 1
Ice.Model<-lm(Happiness~ IceCream, data = CorrDataT)
anova(Ice.Model)
## Analysis of Variance Table
##
## Response: Happiness
##           Df Sum Sq Mean Sq F value    Pr(>F)
## IceCream   1  35.64  35.640  55.125 4.193e-11 ***
## Residuals 98  63.36   0.647
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
• The $$R^2$$ this is explained to unexplained variance (like in our ANOVA) $R^2 = \frac{SS_{explained}}{SS_{explained}+SS_{residual}}$
• just to check: anova(Ice.Model) 64.36
• which matched the $$R^2$$ that R gives us 0.36
• When we check to see which model is best we actually test the differences

### 3.1.2 Lets forward-fit our models

###############Model 1
Ice.Model<-lm(Happiness~ IceCream, data = CorrDataT)
R2.Model.1<-summary(Ice.Model)$r.squared ###############Model 2 Ice.Brown.Model<-lm(Happiness~ IceCream+Brownies, data = CorrDataT) R2.Model.2<-summary(Ice.Brown.Model)$r.squared
library(stargazer)
#http://jakeruss.com/cheatsheets/stargazer.html

stargazer(Ice.Model,Ice.Brown.Model,type="html",
column.labels = c("Model 1", "Model 2"),
intercept.bottom = FALSE,
single.row=FALSE,
star.cutoffs = c(0.1, 0.05, 0.01, 0.001),
star.char = c("@", "*", "**", "***"),
notes= c("@p < .1 *p < .05 **p < .01 ***p < .001"),
notes.append = FALSE)
 Dependent variable: Happiness Model 1 Model 2 (1) (2) Constant 4.000*** 1.667@ (0.812) (0.982) IceCream 0.600*** 0.542*** (0.081) (0.077) Brownies 0.292*** (0.077) Observations 100 100 R2 0.360 0.442 Adjusted R2 0.353 0.430 Residual Std. Error 0.804 (df = 98) 0.755 (df = 97) F Statistic 55.125*** (df = 1; 98) 38.366*** (df = 2; 97) Note: @p < .1 p < .05 p < .01 p < .001
• Lets lets the difference in $$R^2$$
• In R, we call for function anova
R2.Change<-R2.Model.2-R2.Model.1
R2.Change
## [1] 0.08166667
anova(Ice.Model,Ice.Brown.Model)
## Analysis of Variance Table
##
## Model 1: Happiness ~ IceCream
## Model 2: Happiness ~ IceCream + Brownies
##   Res.Df    RSS Df Sum of Sq      F    Pr(>F)
## 1     98 63.360
## 2     97 55.275  1     8.085 14.188 0.0002838 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
• So in other words we see model 2 fit the data better than model 1 and the change in R is significant.

### 3.1.3 Backward Selection of nested models

• You as does taking away variables reduce my $$R^2$$ significantly
• A less common approach, but sometimes used to validate you have a parsimonious model
• You might forward fit a set of variables and backwards fit critical ones to test a specific hypothesis
• Using the same data as above, we will get the same values (just negative)
###############Model 1.B
Ice.Brown.Model<-lm(Happiness~ IceCream+Brownies, data = CorrDataT)
R2.Model.1.B<-summary(Ice.Brown.Model)$r.squared ###############Model 2.B Ice.Model<-lm(Happiness~ IceCream, data = CorrDataT) R2.Model.2.B<-summary(Ice.Model)$r.squared

R2.Change.B<-R2.Model.2.B-R2.Model.1.B
R2.Change.B
## [1] -0.08166667
anova(Ice.Brown.Model,Ice.Model)
## Analysis of Variance Table
##
## Model 1: Happiness ~ IceCream + Brownies
## Model 2: Happiness ~ IceCream
##   Res.Df    RSS Df Sum of Sq      F    Pr(>F)
## 1     97 55.275
## 2     98 63.360 -1    -8.085 14.188 0.0002838 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

## 3.2 Hierarchical modeling (often driven by the computer)

• Stepwise with many predicts is often done by computer and it does not always assume nested models (you can add and remove at the same)
• Exploratory: you have too many predictors and have no idea where to start
• You give the computer a larger number of predictors and the computer decides the best fit model
• Sounds good right? No, as the results can be unstable
• Change one variable in the set and the final model can change
• high chance of type I and type II error
• computer makes decisions based on Akaike information criterion (AIC) not selected based on change in $$R^2$$, because models are not nested
• also computer makes decisions purely on fit values and has nothing do with a theory
• Solutions are often unique to that particular dataset
• the best model is often the one that parses a theory and only a human can do that at present
• Not really publishable because of these problem

# 4 Parsing influence

• As models get bigger and bigger its becomes a challenge to figure out the unique contribution to $$R^2$$ of each variable
• There are many computation solutions that you can select from, but we will use one called lmg
• these methods are not well known in psychology, but can be very useful when people ask you what the relative importance of each variable is
• two approaches: show absolute $$R^2$$ for each term or the relative % of $$R^2$$ for each term
library(relaimpo)
# In terms of R2
calc.relimp(Ice.Brown.Model) 
## Response variable: Happiness
## Total response variance: 1
## Analysis based on 100 observations
##
## 2 Regressors:
## IceCream Brownies
## Proportion of variance explained by model: 44.17%
## Metrics are not normalized (rela=FALSE).
##
## Relative importance metrics:
##
##                lmg
## IceCream 0.3208333
## Brownies 0.1208333
##
## Average coefficients for different model sizes:
##
##           1X       2Xs
## IceCream 0.6 0.5416667
## Brownies 0.4 0.2916667
# as % of R2
calc.relimp(Ice.Brown.Model,rela = TRUE) 
## Response variable: Happiness
## Total response variance: 1
## Analysis based on 100 observations
##
## 2 Regressors:
## IceCream Brownies
## Proportion of variance explained by model: 44.17%
## Metrics are normalized to sum to 100% (rela=TRUE).
##
## Relative importance metrics:
##
##                lmg
## IceCream 0.7264151
## Brownies 0.2735849
##
## Average coefficients for different model sizes:
##
##           1X       2Xs
## IceCream 0.6 0.5416667
## Brownies 0.4 0.2916667

# 5 Final notes:

• If you play with lots of predictors and do lots of models, something will be significant
• Type I error is a big problem because of the ‘researcher degree of freedom problem’
• Type II increases as function of the number of predictors. a) you slice too much pie, b) each variable might try to each eat someone else’s slice
• Less is more: ask targeted questions with as orthogonal a set of variables as you can