Making the intercept and slopes makes sense!
- Centering
- Zscore
- POMP
- When to use depends on your questions. However, centering is safest to do (and is often recommended)
- You need to decide on whether is makes sense to transform both DV and IVs or one or the other.
- Lets make a practice dataset to explore
- We will transform just the IVs for now:
library(car) #graph data
# IQ scores of 5 people
Y<-c(85, 90, 100, 120, 140)
# Likert scale rating of liking of reading books (1 hate to 7 love)
X1<-c(1,2,4,6,7)
scatterplot(Y~X1, smooth=FALSE)
summary(lm(Y~X1))
##
## Call:
## lm(formula = Y ~ X1)
##
## Residuals:
## 1 2 3 4 5
## 3.9615 0.3077 -7.0000 -4.3077 7.0385
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 72.385 6.010 12.04 0.00123 **
## X1 8.654 1.305 6.63 0.00699 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.655 on 3 degrees of freedom
## Multiple R-squared: 0.9361, Adjusted R-squared: 0.9148
## F-statistic: 43.96 on 1 and 3 DF, p-value: 0.006989
Center
- \(Center = {X - M}\)
- Intercept is at the MEAN of IV (not 0 of original IV)
- Does NOT changes meaning of slope
- R: scale(Data,scale=FALSE)
# Center: Likert scale rating of liking of reading books (1 hate to 7 love)
X1.C<-scale(X1,scale=FALSE)
X1.C<-X1.C[,1]
#Center the DV and IV
scatterplot(Y~X1.C, smooth=FALSE)
summary(lm(Y~X1.C))
##
## Call:
## lm(formula = Y ~ X1.C)
##
## Residuals:
## 1 2 3 4 5
## 3.9615 0.3077 -7.0000 -4.3077 7.0385
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 107.000 2.976 35.95 4.73e-05 ***
## X1.C 8.654 1.305 6.63 0.00699 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.655 on 3 degrees of freedom
## Multiple R-squared: 0.9361, Adjusted R-squared: 0.9148
## F-statistic: 43.96 on 1 and 3 DF, p-value: 0.006989
Zscore
- \(Z = \frac{X - M}{s}\)
- Intercept is not at the MEAN of IV (no 0 of IV)
- Slope changes meaning: no longer in unites of original DV, now in SD units
- R: scale(data)
## [,1]
## [1,] -1.1766968
## [2,] -0.7844645
## [3,] 0.0000000
## [4,] 0.7844645
## [5,] 1.1766968
## attr(,"scaled:center")
## [1] 4
## attr(,"scaled:scale")
## [1] 2.54951
## [1] -1.1766968 -0.7844645 0.0000000 0.7844645 1.1766968
##
## Call:
## lm(formula = Y ~ X1.Z)
##
## Residuals:
## 1 2 3 4 5
## 3.9615 0.3077 -7.0000 -4.3077 7.0385
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 107.000 2.976 35.95 4.73e-05 ***
## X1.Z 22.063 3.328 6.63 0.00699 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.655 on 3 degrees of freedom
## Multiple R-squared: 0.9361, Adjusted R-squared: 0.9148
## F-statistic: 43.96 on 1 and 3 DF, p-value: 0.006989
POMP
- \(Center = {X - M}\)
- \(POMP = \frac{X - MinX}{Max_X - Min_X)}*100\)
- Note: I like to X 100 cause I find it easier to think in percent (not proportion)
- Useful when data are bounded (or scaled funny)
- Intercept is again at 0 of IV [but the slopes is different so the intercept changes a bit]
- Does changes meaning of slope: is now a function of percent change of IV
#POMP of X1
X1_POMP = (X1 - min(X1)) / (max(X1) - min(X1))*100
X1_POMP
## [1] 0.00000 16.66667 50.00000 83.33333 100.00000
scatterplot(Y~X1_POMP, smooth=FALSE)
summary(lm(Y~X1_POMP))
##
## Call:
## lm(formula = Y ~ X1_POMP)
##
## Residuals:
## 1 2 3 4 5
## 3.9615 0.3077 -7.0000 -4.3077 7.0385
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 81.03846 4.91852 16.48 0.000487 ***
## X1_POMP 0.51923 0.07831 6.63 0.006989 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.655 on 3 degrees of freedom
## Multiple R-squared: 0.9361, Adjusted R-squared: 0.9148
## F-statistic: 43.96 on 1 and 3 DF, p-value: 0.006989
Parsing influence
- As models get bigger and bigger its becomes a challenge to figure out the unique contribution to \(R^2\) of each variable
- There are many computation solutions that you can select from, but we will use one called lmg
- you can read about all the different ones here: https://core.ac.uk/download/pdf/6305006.pdf
- these methods are not well known in psychology, but can be very useful when people ask you what the relative importance of each variable is
- two approaches: show absolute \(R^2\) for each term or the relative % of \(R^2\) for each term
library(relaimpo)
# In terms of R2
calc.relimp(Ice.Brown.Model)
## Response variable: Happiness
## Total response variance: 1
## Analysis based on 100 observations
##
## 2 Regressors:
## IceCream Brownies
## Proportion of variance explained by model: 44.17%
## Metrics are not normalized (rela=FALSE).
##
## Relative importance metrics:
##
## lmg
## IceCream 0.3208333
## Brownies 0.1208333
##
## Average coefficients for different model sizes:
##
## 1X 2Xs
## IceCream 0.6 0.5416667
## Brownies 0.4 0.2916667
# as % of R2
calc.relimp(Ice.Brown.Model,rela = TRUE)
## Response variable: Happiness
## Total response variance: 1
## Analysis based on 100 observations
##
## 2 Regressors:
## IceCream Brownies
## Proportion of variance explained by model: 44.17%
## Metrics are normalized to sum to 100% (rela=TRUE).
##
## Relative importance metrics:
##
## lmg
## IceCream 0.7264151
## Brownies 0.2735849
##
## Average coefficients for different model sizes:
##
## 1X 2Xs
## IceCream 0.6 0.5416667
## Brownies 0.4 0.2916667