1 What does Linearity & Non-Linearity mean?

  • The majority of psychologists often use these terms to describe the relationship between Y~X
  • Linearity = mathematical relationship between variables can be fit with a straight line
  • Non-Linear (Curvilinear) = mathematical relationship between variables can be fit with non-straight (Curvy) lines
  • However, be careful as for other fields or specialty areas of psychology these terms can mean different things: When people say non-linear, as them what they mean
  • We will use non-linear to describe the mathematical relationship between variables

2 Non-Linear Fit Types that can be tranformed for linear regression (basic types)

  • Power & Orthogonal Polynomials
  • Growth: Exponential, Power, logarithmic
  • Rates: Reciprocal
  • Correlations

2.1 Power Polynomials

  • Models that simply curves, such as quadratic or cubic effects
  • Linear: \(Y =B_{1}X + B_0\)
  • Quadratic: \(Y = B_{1.2}X + B_{2.1}X^2 + B_0\)
  • Cubic: \(Y = B_{1.23}X + B_{2.13}X^2 + B_{3.12}X^3 + B_0\)
  • Let simulate a relationship ExamScore ~ Anxiety level
  • Our relationship will be defined as: \(Y = 17.5X -2.5X^2 + 40 + \epsilon\)
  • Error term, $= $ Random Normal with Mean[0] & SD[10]
library(car) #graph data
set.seed(42)
# 250 people
n <- 250
# Uniform distrobution of likert scores between 0 to 7 (0 = No anxiety)
x <- runif(n, 0, 7)
# Our equation to  create Y
y <- -2.5*x^2 + 17.5*x+40+rnorm(n, sd=10)
#Built our data frame
Quad.Data<-data.frame(Anxiety=x,ExamScore=y)

# Plot: Notes: reg.line=FALSE means remove best fit line. 
# smoother=loessLine means add smoothed line called the loess
scatterplot(ExamScore~Anxiety, data= Quad.Data, reg.line=FALSE, smoother=loessLine)

2.1.1 The loess Line

  • Locally Weighted Scatterplot Smoothing
  • Does lots of regressions on small sections of the scatter plot to get a best fit (curvy) line
  • Lets you visualize complex relationships, but it can overfit the relationship (so be careful)
  • Good diagnostic tool, but rarely used in papers

2.2 Process of fitting Power Polynomials

  • Forward Selection Approach
  • Model 1: Linear fit
  • Model 2: Quadratic fit
  • Model i: keep going up until you are satisfied
  • Check the change in \(R^2\) and select the best fit models
  • Note: Higher order terms ALWAYS increase \(R^2\), you need to make sure it is both a significant improvement and meaningful improvement
  • in R there are two ways to add Power Polynomials: I(Term^2) or using the poly command
#Linear model
Model.1<-lm(ExamScore~Anxiety, data= Quad.Data)
Model.2<-lm(ExamScore~Anxiety+I(Anxiety^2), data= Quad.Data)

anova(Model.1,Model.2)
## Analysis of Variance Table
## 
## Model 1: ExamScore ~ Anxiety
## Model 2: ExamScore ~ Anxiety + I(Anxiety^2)
##   Res.Df   RSS Df Sum of Sq      F    Pr(>F)    
## 1    248 43354                                  
## 2    247 21319  1     22036 255.31 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
library(stargazer)
stargazer(Model.1,Model.2,type="html",
          column.labels = c("Linear", "Quadratic"),
          intercept.bottom = FALSE,
          single.row=FALSE, 
          notes.append = FALSE)
Dependent variable:
ExamScore
Linear Quadratic
(1) (2)
Constant 60.333*** 39.681***
(1.686) (1.753)
Anxiety -0.137 17.753***
(0.411) (1.156)
I(Anxiety2) -2.559***
(0.160)
Observations 250 250
R2 0.0004 0.508
Adjusted R2 -0.004 0.505
Residual Std. Error 13.222 (df = 248) 9.290 (df = 247)
F Statistic 0.111 (df = 1; 248) 127.767*** (df = 2; 247)
Note: p<0.1; p<0.05; p<0.01
  • The change in \(R^2\) = 0.5080438 and the we can see that the change was significant
  • Would the third order term be any better?
#Linear model
Model.3<-lm(ExamScore~Anxiety+I(Anxiety^2)+I(Anxiety^3), data= Quad.Data)

anova(Model.1,Model.2,Model.3)
## Analysis of Variance Table
## 
## Model 1: ExamScore ~ Anxiety
## Model 2: ExamScore ~ Anxiety + I(Anxiety^2)
## Model 3: ExamScore ~ Anxiety + I(Anxiety^2) + I(Anxiety^3)
##   Res.Df   RSS Df Sum of Sq        F Pr(>F)    
## 1    248 43354                                 
## 2    247 21319  1   22035.8 255.0382 <2e-16 ***
## 3    246 21255  1      63.8   0.7378 0.3912    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
library(stargazer)
stargazer(Model.1,Model.2,Model.3,type="html",
          column.labels = c("Linear", "Quadratic","Cubic"),
          intercept.bottom = FALSE,
          single.row=FALSE, 
          notes.append = FALSE)
Dependent variable:
ExamScore
Linear Quadratic Cubic
(1) (2) (3)
Constant 60.333*** 39.681*** 38.386***
(1.686) (1.753) (2.312)
Anxiety -0.137 17.753*** 20.023***
(0.411) (1.156) (2.885)
I(Anxiety2) -2.559*** -3.371***
(0.160) (0.959)
I(Anxiety3) 0.077
(0.090)
Observations 250 250 250
R2 0.0004 0.508 0.510
Adjusted R2 -0.004 0.505 0.504
Residual Std. Error 13.222 (df = 248) 9.290 (df = 247) 9.295 (df = 246)
F Statistic 0.111 (df = 1; 248) 127.767*** (df = 2; 247) 85.333*** (df = 3; 246)
Note: p<0.1; p<0.05; p<0.01
  • So second order power polynomial was our best fit

2.2.1 Lets examine Model 2 in detail

  • Since we simulated our model we know what the answers should be
  • Simulation: \(Y = 17.5X -2.5X^2 + 40 + \epsilon\)
  • Lets get our terms from our regression:
LT<-round(Model.2$coefficients[2],2)
QT<-round(Model.2$coefficients[3],2)
IT<-round(Model.2$coefficients[1],2)
  • Regression: \(Y =\) 17.75\(X\) -2.56\(X^2 +\) 39.68 \(+ \epsilon\)
  • Our regression did a good job, but what if our sample was smaller?
  • Let redo our analysis with n = 5 to 60 and see how the terms change [see simulation in separate R file]
  • the gray band = 95% CI

  • Its important to know that in small samples polynomial estimates can be very unstable
  • if you flip the quadratic term the line will flip direction

2.2.2 How to plot this function (and not the loess line)

  • Literally you can solve the regression equation for different values of x
  • in R you can call the effects package to do work for you
library(effects)
plot(effect("Anxiety", Model.2,
            xlevels=list(Anxiety=0:7)))

2.3 Orthogonal Polynomials

  • Problem with power polynomials is interpreting them independently
  • The higher order terms depend on the lower order terms
  • Thus linear term can be significant cause the higher order term is significant, but actually there is no linear trend in the data (just like in today’s example)
  • This is because power polynomials correlate with each other (they are are not unique predictors)
  • For Example:
LinearTerms<-c(1,2,3,4,5,6,7)
QuadTerms<-LinearTerms^2
round(cor(LinearTerms,QuadTerms),3)
## [1] 0.977
  • Thus they are basically multicollinear
  • lets run our VIF to verify
vif(Model.2)
##      Anxiety I(Anxiety^2) 
##     16.05275     16.05275
  • Technically, its OK that they are multicollinear but you cannot interpret the Pvalues on each term
  • The solution to make the linear and quadratic terms correlate with each at zero (just as we did with ANOVA)
  • we can use the poly code in R.
  • Here is an example of what R is going to do:
LinearTerms<-c(1,2,3,4,5,6,7)

#make them Orthogonal
O.Poly<-poly(LinearTerms,2,simple = TRUE)
round(O.Poly,5)
##             1        2
## [1,] -0.56695  0.54554
## [2,] -0.37796  0.00000
## [3,] -0.18898 -0.32733
## [4,]  0.00000 -0.43644
## [5,]  0.18898 -0.32733
## [6,]  0.37796  0.00000
## [7,]  0.56695  0.54554
#run our correlation
round(cor(O.Poly[,1],O.Poly[,2]),3)
## [1] 0
#lets make a new poly vector
Quad.Data$Anxiety.poly.1<-poly(Quad.Data$Anxiety,1,simple = TRUE)[,1]
Quad.Data$Anxiety.poly.2<-poly(Quad.Data$Anxiety,2,simple = TRUE)[,2]

Model.1.O<-lm(ExamScore~Anxiety.poly.1, data= Quad.Data)
Model.2.O<-lm(ExamScore~Anxiety.poly.1+Anxiety.poly.2, data= Quad.Data)

anova(Model.1.O,Model.2.O)
## Analysis of Variance Table
## 
## Model 1: ExamScore ~ Anxiety.poly.1
## Model 2: ExamScore ~ Anxiety.poly.1 + Anxiety.poly.2
##   Res.Df   RSS Df Sum of Sq      F    Pr(>F)    
## 1    248 43354                                  
## 2    247 21319  1     22036 255.31 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
stargazer(Model.1.O,Model.2.O,type="html",
          column.labels = c("Linear.Ortho", "Quadratic.Ortho"),
          intercept.bottom = FALSE,
          single.row=FALSE, 
          notes.append = FALSE)
Dependent variable:
ExamScore
Linear.Ortho Quadratic.Ortho
(1) (2)
Constant 59.845*** 59.845***
(0.836) (0.588)
Anxiety.poly.1 -4.403 -4.403
(13.222) (9.290)
Anxiety.poly.2 -148.444***
(9.290)
Observations 250 250
R2 0.0004 0.508
Adjusted R2 -0.004 0.505
Residual Std. Error 13.222 (df = 248) 9.290 (df = 247)
F Statistic 0.111 (df = 1; 248) 127.767*** (df = 2; 247)
Note: p<0.1; p<0.05; p<0.01
  • Notice that the linear term is not significant, but the quadratic is
  • BUT the actual estimate of the slopes are nonsensical
  • this is because you have rescaled the xvalues in a funny way
  • But no more multicollinearity
  • lets run our VIF to verify
vif(Model.2.O)
## Anxiety.poly.1 Anxiety.poly.2 
##              1              1
  • Also the \(R^2\) do not differ between the power or orthogonal polynomials
  • Other pros of using orthogonal polynomials will come next week when we start looking at interactions

2.3.1 Coding short cut (Use this for your analysis)

  • You don’t need to premake the vector
  • You can put the poly code right into the model
  • but the labeling will change
  • Also much easier to makes plots of the final model this way
Model.2.O.S<-lm(ExamScore~poly(Anxiety,2), data= Quad.Data)
stargazer(Model.2.O.S,type="html",
          column.labels = c("Quadratic.Ortho"),
          intercept.bottom = FALSE,
          single.row=FALSE, 
          notes.append = FALSE)
Dependent variable:
ExamScore
Quadratic.Ortho
Constant 59.845***
(0.588)
poly(Anxiety, 2)1 -4.403
(9.290)
poly(Anxiety, 2)2 -148.444***
(9.290)
Observations 250
R2 0.508
Adjusted R2 0.505
Residual Std. Error 9.290 (df = 247)
F Statistic 127.767*** (df = 2; 247)
Note: p<0.1; p<0.05; p<0.01
plot(effect("poly(Anxiety,2)", Model.2.O.S,
           xlevels=list(Anxiety=0:7)))

  • Also, you could have said poly(Anxiety,2,raw=TRUE) to get power polynomial on the fly

2.4 Diagnotics

  • When you have the wrong number of polynomials you will get funny looking residuals
  • Here is when we use 1, when we need 2
plot(Model.1, which=1)

  • When you have the right number of polynomials you will get correct looking residuals
  • Here is when we use 2, when we need 2
plot(Model.2.O.S, which=1)

3 Dealing Growth models

  • These are models that cannot be fit with polynomials
  • Exponential growth: \(Y = ce^{dx}\)
  • Severity of panic attack (0 to 500) predicted by anxiety level
  • Exponential Example: \(Y = .09e^{1.2x+\epsilon}\)
  • Note: Euler’s number, \(e =\) 2.7182818, is an irrational math constant and the base value of the natural log
set.seed(42)
n <- 250
x <- runif(n, 0, 7)
#lets make our e value (not the error term)
e<-exp(1)
y <-.09*e^(1.20*(x+rnorm(n, sd=.25)))
Exp.Data<-data.frame(Anxiety=x,PanicAttack=y)
scatterplot(PanicAttack~Anxiety, data= Exp.Data, reg.line=FALSE, smoother=loessLine)

3.1 Do Polynomials fail us?

  • Lets try a second order fit since it one curve
Model.Panic.Poly<-lm(PanicAttack~poly(Anxiety,2), data= Exp.Data)
stargazer(Model.Panic.Poly,type="html",
          column.labels = c("Quadratic.Ortho"),
          intercept.bottom = FALSE,
          single.row=FALSE, 
          notes.append = FALSE)
Dependent variable:
PanicAttack
Quadratic.Ortho
Constant 50.000***
(2.515)
poly(Anxiety, 2)1 1,006.757***
(39.762)
poly(Anxiety, 2)2 772.384***
(39.762)
Observations 250
R2 0.805
Adjusted R2 0.803
Residual Std. Error 39.762 (df = 247)
F Statistic 509.199*** (df = 2; 247)
Note: p<0.1; p<0.05; p<0.01
  • Using the poly right into the package can be useful for the effects package because we can now view independent impact of the linear and quadratic terms
plot(effect("poly(Anxiety,2)", Model.Panic.Poly,
           xlevels=list(Anxiety=0:7)))

- Yikes, based on this you might tell people having zero anxiety is worse then having lower anxiety

  • Let check the residuals
plot(Model.Panic.Poly, which=1)

  • Also that looks pretty odd

3.2 Transformations of Exponential Model

  • The solution to this problem is often to transform the offending variable
  • The type of transform depends on the type of growth model you have
  • Exponential model: Dependent variable = log(y) [in r, log = natural log]
  • Thus: \(log(Y) = B_0 + BX\)
  • And to convert from log into meaningful values:
  • \(Y = e^{B_0 + BX}\)
  • Lets look at Log(Y) by Anxiety
scatterplot(log(PanicAttack)~Anxiety, data= Exp.Data)

  • Looks like a line now
  • Lets look at the model fit
Model.Panic.Log<-lm(log(PanicAttack)~Anxiety, data= Exp.Data)
stargazer(Model.Panic.Log,type="html",
          intercept.bottom = FALSE,
          single.row=FALSE, 
          notes.append = FALSE)
Dependent variable:
log(PanicAttack)
Constant -2.403***
(0.035)
Anxiety 1.195***
(0.009)
Observations 250
R2 0.987
Adjusted R2 0.987
Residual Std. Error 0.278 (df = 248)
F Statistic 19,129.080*** (df = 1; 248)
Note: p<0.1; p<0.05; p<0.01
  • Let plot the fit
plot(effect("Anxiety", Model.Panic.Log,
           xlevels=list(Anxiety=0:7)))

  • or you plot it non-log units
  • You have to tell the package to covert it over
plot(effect("Anxiety", Model.Panic.Log,
          transformation=list(link=log, inverse=exp)))

  • Let plot the fit
  • Let check the residuals
plot(Model.Panic.Log, which=1)

  • Those look as they should

3.2.1 Challange of Transforms

  • Problem is our scores are in log (not raw score)
  • to convert from log into meaningful values:
  • \(Y = e^{B_0 + BX}\)
#intercept
IT.L<-round(Model.Panic.Log$coefficients[1],2)
#slope
LT.L<-round(Model.Panic.Log$coefficients[2],2)
  • \(Y = e\)^(-2.4 + 1.2X)
  • For Anxiety score of X = 0, Y = 0.090718
  • For Anxiety score of X = 4, Y = 11.0231764
  • For Anxiety score of X = 7, Y = 403.4287935
  • [Note: Be careful about which log you use (natural or base 10, you can use either but make sure to specify)]

3.3 Power law

  • Common function in motor control and perception experiments
  • \(Y = cX^d\)
  • Severity of sweating (0 to 100) predicted by anxiety level
  • Exponential Example: \(Y = .07(X)^{1.7}+\epsilon\)
  • Note: Natural log of 0 = -inf, so if you have zeros you need to always add 1
set.seed(42)
n <- 250
x <- runif(n, 0, 7)+1
y <-.07*(x)^1.7
y = y * (.05* (5+rnorm(n, sd=.25)))
Power.Data<-data.frame(Anxiety=x,Sweating=y)
scatterplot(Sweating~Anxiety, data= Power.Data, reg.line=FALSE, smoother=loessLine)

3.3.1 Transformations of Power Law

  • The solution to this problem is often to transform the offending variable
  • Power law model: DV and IV get logged: log(y) & log(x)
  • Thus: \(log(Y) = B_0 + B(logX)\)
  • And to convert from log into meaningful values:
  • \(Y = e^{B_0 + B(logX)}\)
  • Lets look at the log-log scatterplot
scatterplot(log(Sweating)~log(Anxiety), data= Power.Data)

Model.Sweat.LL<-lm(log(Sweating)~log(Anxiety), data= Power.Data)

-Looks like a line now -Lets look at the model fit

stargazer(Model.Sweat.LL,type="html",
          column.labels = c("Log-Log Transform"),
          intercept.bottom = FALSE,
          single.row=FALSE, 
          notes.append = FALSE)
Dependent variable:
log(Sweating)
Log-Log Transform
Constant -4.046***
(0.008)
log(Anxiety) 1.698***
(0.005)
Observations 250
R2 0.998
Adjusted R2 0.998
Residual Std. Error 0.046 (df = 248)
F Statistic 100,505.500*** (df = 1; 248)
Note: p<0.1; p<0.05; p<0.01
  • Let plot the fit and tell the package that you did a transform
plot(effect("Anxiety", Model.Sweat.LL,
           transformation=list(link=log, inverse=exp)))

  • Let check the residuals
plot(Model.Sweat.LL, which=1)

  • They look OK

3.4 Logarithmic

  • These functions are often physical (sound intensity, earthquakes, etc)
  • \(c^Y = dX\)
  • Breathing rate (in ms between breaths) predicted by anxiety level
  • Logarithmic Example: \(Y = .05(X*\epsilon)^{3.8}\)
set.seed(42)
n <- 250
x <- runif(n, 0, 7)+1
y <-200*log(x*(5+rnorm(n, sd=1)))
Logarithmic.Data<-data.frame(Anxiety=x,Breathing=y)
scatterplot(Breathing~Anxiety, data= Logarithmic.Data, reg.line=FALSE, smoother=loessLine)

scatterplot(Breathing~log(Anxiety), data= Logarithmic.Data)

Model.Breath.Log<-lm(Breathing~log(Anxiety), data= Logarithmic.Data)

-Looks like a line now -Lets look at the model fit

stargazer(Model.Breath.Log,type="html",
          column.labels = c("Log Transform"),
          intercept.bottom = FALSE,
          single.row=FALSE, 
          notes.append = FALSE)
Dependent variable:
Breathing
Log Transform
Constant 319.641***
(6.656)
log(Anxiety) 197.847***
(4.456)
Observations 250
R2 0.888
Adjusted R2 0.888
Residual Std. Error 38.680 (df = 248)
F Statistic 1,971.251*** (df = 1; 248)
Note: p<0.1; p<0.05; p<0.01
  • Let plot the fit
plot(effect("Anxiety", Model.Breath.Log))

  • Let check the residuals
plot(Model.Breath.Log, which=1)

3.5 Reciprocals

  • Common fix for rate data (things involving time)
  • \(1/Y\)
  • Adrenaline level in the blood predicted by mins after a panic attack (mins)
  • Reciprocal Example: \(Y = (1/X)*\epsilon\)
set.seed(42)
n <- 250
# Length of panic attack
x <- 15*(runif(n, 1, 4))
y <- 1/(x)*rnorm(n,5,sd=.1)
Reciprocal.Data<-data.frame(PostPanic=x,Adrenaline=y)
scatterplot(Adrenaline~PostPanic, data= Reciprocal.Data, reg.line=FALSE, smoother=loessLine)

scatterplot(1/Adrenaline~PostPanic, data= Reciprocal.Data)

  • Looks like a line now
  • Lets look at the model fit
Model.Calm.Rep<-lm(1/Adrenaline~PostPanic, data= Reciprocal.Data)
stargazer(Model.Calm.Rep,type="latex",
          intercept.bottom = FALSE,
          single.row=FALSE, 
          notes.append = FALSE,          
          header=FALSE)
  • Let plot the fit
plot(effect("PostPanic", Model.Calm.Rep))

  • Let check the residuals
plot(Model.Calm.Rep, which=1)

  • A little off (but I add in noise in a simple way)

3.6 Wierd Bulges in the data (not full on curves)

  • Data do not always fit our nice predefined functions
  • Sometimes they clearly have bulges, but are we sure those are real?
  • Bulges can be not real in small samples or can results from some latent factor you are not accounting for
  • Lets assume that if the data has a bulge that is real, meaningful, and we need to account for it
  • We can use the Box-Cox Transform (fits different polynomials which we call \(\lambda\))
  • \(Y = Y^\lambda - 1 / \lambda\), where \(\lambda \neq 0\) OR \(Y = lnY\), where \(\lambda = 0\)
  • In this case, we will allow \(\lambda\) to be selected based on which makes the data most normal
  • Lets make a case with a weird power polynomial of 1.3 (something you would not notice in your raw data)
set.seed(42)
# 250 people
n <- 250
x <- runif(n, 0, 7)
y <- -4*x^1.3+ 20*x+rnorm(n, sd=2)+20
SortaQuad.Data<-data.frame(Anxiety=x,ExamScore=y)
scatterplot(ExamScore~Anxiety, data= SortaQuad.Data, reg.line=FALSE, smoother=loessLine)

  • Lets try a linear model
Model.SortaQuad<-lm(ExamScore~Anxiety, data= SortaQuad.Data)
plot(Model.SortaQuad, which=1)

  • Seems there is a slight curve in our data
library(MASS)
bc<-boxcox(ExamScore ~Anxiety, data = SortaQuad.Data,
       lambda = seq(-2, 2, len = 20))

#we can extract the lambda
(lambda <- bc$x[which.max(bc$y)])
## [1] 1.272727
  • Lets plot the scatter and residuals from our transformed DV
scatterplot(ExamScore^lambda~Anxiety, data= SortaQuad.Data, reg.line=FALSE, smoother=loessLine)

Model.SortaQuad.BC<-lm(ExamScore^lambda ~ Anxiety, data= SortaQuad.Data)
plot(Model.SortaQuad.BC, which=1)

3.6.1 Some issues

  • Box-Cox is guess and requires some fine turning (like adjusting the range of lambdas)
  • You have to keep track of your zeros in the DV (so you can add constants to fix it)
  • Most importantly, you have make it harder to make sense of the DV (but your fits will be far better using it)

4 Doing regression with correlation values as the DV

  • You should not just use the correlation values straight away
  • Correlations are NOT normal, but luckily Fisher give us a fix
  • Fisher’s r to Z
  • You convert your correlation values to Z scores, do your analysis, predict your values and convert them back to r values for graphing
  • see Demos, et al (2012) for an example.
  • Functions below you can use to transform
FisherRtoZ <-function (z) {
  (exp(2 * z) - 1)/(1 + exp(2 * z))
}

FisherZtoR <-function (z) {
  tanh(z)
}
---
title: 'Non-Linear Models'
output:
  html_document:
    code_download: yes
    fontsize: 8pt
    highlight: textmate
    number_sections: yes
    theme: flatly
    toc: yes
    toc_float:
      collapsed: no
---


```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
knitr::opts_chunk$set(fig.width=5)
knitr::opts_chunk$set(fig.height=3.75)
knitr::opts_chunk$set(fig.align='center') 
```

# What does Linearity & Non-Linearity mean?
- The majority of psychologists often use these terms to describe the relationship between Y~X
- Linearity = mathematical relationship between variables can be fit with a *straight* line
- Non-Linear (Curvilinear) = mathematical relationship between variables can be fit with *non-straight* (**Curvy**) lines
- However, be careful as for other fields or specialty areas of psychology these terms can mean different things: When people say non-linear, as them what they mean
- We will use non-linear to describe the mathematical relationship between variables 

# Non-Linear Fit Types that can be tranformed for linear regression (basic types)
- Power & Orthogonal Polynomials
- Growth:  Exponential, Power, logarithmic  
- Rates: Reciprocal
- Correlations


## Power Polynomials 
- Models that simply curves, such as quadratic or cubic effects  
- Linear: $Y =B_{1}X + B_0$
- Quadratic: $Y = B_{1.2}X + B_{2.1}X^2 + B_0$
- Cubic: $Y = B_{1.23}X + B_{2.13}X^2 + B_{3.12}X^3 + B_0$
- Let simulate a relationship ExamScore ~ Anxiety level 
- Our relationship will be defined as: $Y = 17.5X -2.5X^2 + 40 + \epsilon$
- Error term, $\epsilon = $ Random Normal with Mean[0] & SD[10]


```{r, echo=TRUE, warning=FALSE}
library(car) #graph data
set.seed(42)
# 250 people
n <- 250
# Uniform distrobution of likert scores between 0 to 7 (0 = No anxiety)
x <- runif(n, 0, 7)
# Our equation to  create Y
y <- -2.5*x^2 + 17.5*x+40+rnorm(n, sd=10)
#Built our data frame
Quad.Data<-data.frame(Anxiety=x,ExamScore=y)

# Plot: Notes: reg.line=FALSE means remove best fit line. 
# smoother=loessLine means add smoothed line called the loess
scatterplot(ExamScore~Anxiety, data= Quad.Data, reg.line=FALSE, smoother=loessLine)

```

### The loess Line
- **Lo**cally W**e**ighted **S**catterplot **S**moothing
- Does lots of regressions on small sections of the scatter plot to get a best fit (curvy) line
- Lets you visualize complex relationships, but it can *overfit* the relationship (so be careful)
- Good diagnostic tool, but rarely used in papers 

## Process of fitting Power Polynomials 
- Forward Selection Approach
- Model 1: Linear fit
- Model 2: Quadratic fit
- Model i: keep going up until you are satisfied 
- Check the change in $R^2$ and select the best fit models
- Note: Higher order terms **ALWAYS** increase $R^2$, you need to make sure it is both a significant improvement and meaningful improvement 
- in R there are two ways to add Power Polynomials: *I(Term^2)* or using the *poly* command
```{r, echo=TRUE, warning=FALSE}
#Linear model
Model.1<-lm(ExamScore~Anxiety, data= Quad.Data)
Model.2<-lm(ExamScore~Anxiety+I(Anxiety^2), data= Quad.Data)

anova(Model.1,Model.2)
```


```{r, echo=TRUE, warning=FALSE,message=FALSE,results='asis'}
library(stargazer)
stargazer(Model.1,Model.2,type="html",
          column.labels = c("Linear", "Quadratic"),
          intercept.bottom = FALSE,
          single.row=FALSE, 
          notes.append = FALSE)

```

- The change in $R^2$ = `r summary(Model.2)$r.square - summary(Model.1)$r.square` and the we can see that the change was significant
- Would the third order term be any better? 

```{r, echo=TRUE, warning=FALSE}
#Linear model
Model.3<-lm(ExamScore~Anxiety+I(Anxiety^2)+I(Anxiety^3), data= Quad.Data)

anova(Model.1,Model.2,Model.3)
```

```{r, echo=TRUE, warning=FALSE,message=FALSE,results='asis'}
library(stargazer)
stargazer(Model.1,Model.2,Model.3,type="html",
          column.labels = c("Linear", "Quadratic","Cubic"),
          intercept.bottom = FALSE,
          single.row=FALSE, 
          notes.append = FALSE)
```

- So second order power polynomial was our best fit

### Lets examine Model 2 in detail
- Since we simulated our model we know what the answers **should** be
- Simulation: $Y = 17.5X -2.5X^2 + 40 + \epsilon$
- Lets get our terms from our regression:
```{r, echo=TRUE}
LT<-round(Model.2$coefficients[2],2)
QT<-round(Model.2$coefficients[3],2)
IT<-round(Model.2$coefficients[1],2)
```
- Regression: $Y =$ `r LT`$X$ `r QT`$X^2 +$ `r IT` $+ \epsilon$
- Our regression did a good job, but what if our sample was smaller?
- Let redo our analysis with n = 5 to 60 and see how the terms change [see simulation in separate R file]
- the gray band = 95% CI
![](RegressionClass/L5_NLM/Simplot1.jpeg)

- Its important to know that in small samples polynomial estimates can be very unstable
- if you flip the quadratic term the line will flip direction

### How to plot this function (and not the loess line)
- Literally you can solve the regression equation for different values of x
- in R you can call the effects package to do work for you


```{r, echo=TRUE,message=FALSE, warning=FALSE}
library(effects)
plot(effect("Anxiety", Model.2,
            xlevels=list(Anxiety=0:7)))

```

## Orthogonal Polynomials
- Problem with power polynomials is interpreting them independently
- The higher order terms depend on the lower order terms
- Thus linear term can be significant cause the higher order term is significant, but actually there is no linear trend in the data (just like in today's example)
- This is because power polynomials correlate with each other (they are are not unique predictors)
- For Example:

```{r, echo=TRUE, warning=FALSE}
LinearTerms<-c(1,2,3,4,5,6,7)
QuadTerms<-LinearTerms^2
round(cor(LinearTerms,QuadTerms),3)

```

- Thus they are basically multicollinear
- lets run our VIF to verify

```{r, echo=TRUE, warning=FALSE}
vif(Model.2)
```

- Technically, its OK that they are multicollinear but you cannot interpret the Pvalues on each term
- The solution to make the linear and quadratic terms correlate with each at zero (just as we did with ANOVA)
- we can use the poly code in R. 
- Here is an example of what R is going to do: 


```{r, echo=TRUE, warning=FALSE}
LinearTerms<-c(1,2,3,4,5,6,7)

#make them Orthogonal
O.Poly<-poly(LinearTerms,2,simple = TRUE)
round(O.Poly,5)

#run our correlation
round(cor(O.Poly[,1],O.Poly[,2]),3)
```


```{r, echo=TRUE, warning=FALSE}
#lets make a new poly vector
Quad.Data$Anxiety.poly.1<-poly(Quad.Data$Anxiety,1,simple = TRUE)[,1]
Quad.Data$Anxiety.poly.2<-poly(Quad.Data$Anxiety,2,simple = TRUE)[,2]

Model.1.O<-lm(ExamScore~Anxiety.poly.1, data= Quad.Data)
Model.2.O<-lm(ExamScore~Anxiety.poly.1+Anxiety.poly.2, data= Quad.Data)

anova(Model.1.O,Model.2.O)
```

```{r, echo=TRUE, warning=FALSE,message=FALSE,results='asis'}
stargazer(Model.1.O,Model.2.O,type="html",
          column.labels = c("Linear.Ortho", "Quadratic.Ortho"),
          intercept.bottom = FALSE,
          single.row=FALSE, 
          notes.append = FALSE)

```

- Notice that the linear term is not significant, but the quadratic is
- BUT the actual estimate of the slopes are nonsensical
- this is because you have **rescaled** the xvalues in a funny way
- But no more multicollinearity
- lets run our VIF to verify

```{r, echo=TRUE, warning=FALSE}
vif(Model.2.O)
```

- Also the $R^2$ do not differ between the power or orthogonal polynomials
- Other pros of using orthogonal polynomials will come next week when we start looking at interactions

### Coding short cut (Use this for your analysis)
- You don't need to premake the vector
- You can put the poly code right into the model
- but the labeling will change
- Also much easier to makes plots of the final model this way

```{r, echo=TRUE, warning=FALSE}
Model.2.O.S<-lm(ExamScore~poly(Anxiety,2), data= Quad.Data)

```

```{r, echo=TRUE, warning=FALSE,message=FALSE,results='asis'}
stargazer(Model.2.O.S,type="html",
          column.labels = c("Quadratic.Ortho"),
          intercept.bottom = FALSE,
          single.row=FALSE, 
          notes.append = FALSE)

```



```{r, echo=TRUE,message=FALSE, warning=FALSE}

plot(effect("poly(Anxiety,2)", Model.2.O.S,
           xlevels=list(Anxiety=0:7)))

```

- Also, you could have said poly(Anxiety,2,raw=TRUE) to get power polynomial on the fly


## Diagnotics

- When you have the **wrong** number of polynomials you will get funny looking residuals
- Here is when we use 1, when we need 2
```{r, echo=TRUE,message=FALSE, warning=FALSE}
plot(Model.1, which=1)

```

- When you have the **right** number of polynomials you will get correct looking residuals
- Here is when we use 2, when we need 2
```{r, echo=TRUE,message=FALSE, warning=FALSE}
plot(Model.2.O.S, which=1)
```

# Dealing Growth models 
- These are models that cannot be fit with polynomials
- Exponential growth: $Y = ce^{dx}$
- Severity of panic attack (0 to 500) predicted by anxiety level
- Exponential Example: $Y = .09e^{1.2x+\epsilon}$
- Note: Euler's number, $e =$ `r  exp(1)`, is an irrational math constant and the base value of the natural log
```{r, echo=TRUE, warning=FALSE}
set.seed(42)
n <- 250
x <- runif(n, 0, 7)
#lets make our e value (not the error term)
e<-exp(1)
y <-.09*e^(1.20*(x+rnorm(n, sd=.25)))
Exp.Data<-data.frame(Anxiety=x,PanicAttack=y)
scatterplot(PanicAttack~Anxiety, data= Exp.Data, reg.line=FALSE, smoother=loessLine)

```

## Do Polynomials fail us?
- Lets try a second order fit since it one curve
```{r, echo=TRUE, warning=FALSE}
Model.Panic.Poly<-lm(PanicAttack~poly(Anxiety,2), data= Exp.Data)

```

```{r, echo=TRUE, warning=FALSE,message=FALSE,results='asis'}
stargazer(Model.Panic.Poly,type="html",
          column.labels = c("Quadratic.Ortho"),
          intercept.bottom = FALSE,
          single.row=FALSE, 
          notes.append = FALSE)

```

- Using the poly right into the package can be useful for the effects package because we can now view independent impact of the linear and quadratic terms

```{r, echo=TRUE,message=FALSE, warning=FALSE}

plot(effect("poly(Anxiety,2)", Model.Panic.Poly,
           xlevels=list(Anxiety=0:7)))

```
- Yikes, based on this you might tell people having zero anxiety is worse then having lower anxiety

- Let check the residuals
```{r, echo=TRUE,message=FALSE, warning=FALSE}
plot(Model.Panic.Poly, which=1)
```

- Also that looks pretty odd

## Transformations of Exponential Model
- The solution to this problem is often to transform the offending variable
- The type of transform depends on the type of growth model you have
- Exponential model: Dependent variable = log(y) [in r, log = natural log]
- Thus: $log(Y) = B_0 + BX$
- And to convert from log into meaningful values:
- $Y = e^{B_0 + BX}$
- Lets look at Log(Y) by Anxiety

```{r, echo=TRUE, warning=FALSE}
scatterplot(log(PanicAttack)~Anxiety, data= Exp.Data)
```

- Looks like a line now
- Lets look at the model fit
```{r, echo=TRUE, warning=FALSE,message=FALSE,results='asis'}
Model.Panic.Log<-lm(log(PanicAttack)~Anxiety, data= Exp.Data)
stargazer(Model.Panic.Log,type="html",
          intercept.bottom = FALSE,
          single.row=FALSE, 
          notes.append = FALSE)

```

- Let plot the fit
```{r, echo=TRUE,message=FALSE, warning=FALSE}

plot(effect("Anxiety", Model.Panic.Log,
           xlevels=list(Anxiety=0:7)))

```

- or you plot it non-log units
- You have to tell the package to covert it over
```{r, echo=TRUE,message=FALSE, warning=FALSE}
plot(effect("Anxiety", Model.Panic.Log,
          transformation=list(link=log, inverse=exp)))

```


- Let plot the fit
- Let check the residuals

```{r, echo=TRUE,message=FALSE, warning=FALSE}
plot(Model.Panic.Log, which=1)
```

- Those look as they should

### Challange of Transforms
- Problem is our scores are in log (not raw score)
- to convert from log into meaningful values:
- $Y = e^{B_0 + BX}$
```{r, echo=TRUE}
#intercept
IT.L<-round(Model.Panic.Log$coefficients[1],2)
#slope
LT.L<-round(Model.Panic.Log$coefficients[2],2)
```
- $Y = e$^(`r IT.L` + `r LT.L`X)
- For Anxiety score of X = 0, Y = `r exp(1)^(IT.L + LT.L*0)`
- For Anxiety score of X = 4, Y = `r exp(1)^(IT.L + LT.L*4)`
- For Anxiety score of X = 7, Y = `r exp(1)^(IT.L + LT.L*7)`
- [Note: Be careful about which log you use (natural or base 10, you can use either but make sure to specify)]

## Power law
- Common function in motor control and perception experiments
- $Y = cX^d$
- Severity of sweating (0 to 100) predicted by anxiety level
- Exponential Example: $Y = .07(X)^{1.7}+\epsilon$
- Note: Natural log of 0 =  -inf, so if you have zeros you need to always add 1
```{r, echo=TRUE, warning=FALSE}
set.seed(42)
n <- 250
x <- runif(n, 0, 7)+1
y <-.07*(x)^1.7
y = y * (.05* (5+rnorm(n, sd=.25)))
Power.Data<-data.frame(Anxiety=x,Sweating=y)
scatterplot(Sweating~Anxiety, data= Power.Data, reg.line=FALSE, smoother=loessLine)

```

### Transformations of Power Law
- The solution to this problem is often to transform the offending variable
- Power law model: DV and IV get logged: log(y) & log(x) 
- Thus: $log(Y) = B_0 + B(logX)$
- And to convert from log into meaningful values:
- $Y = e^{B_0 + B(logX)}$
- Lets look at the log-log scatterplot
```{r, echo=TRUE, warning=FALSE}
scatterplot(log(Sweating)~log(Anxiety), data= Power.Data)
Model.Sweat.LL<-lm(log(Sweating)~log(Anxiety), data= Power.Data)

```
-Looks like a line now
-Lets look at the model fit
```{r, echo=TRUE, warning=FALSE,message=FALSE,results='asis'}
stargazer(Model.Sweat.LL,type="html",
          column.labels = c("Log-Log Transform"),
          intercept.bottom = FALSE,
          single.row=FALSE, 
          notes.append = FALSE)

```

- Let plot the fit and tell the package that you did a transform
```{r, echo=TRUE,message=FALSE, warning=FALSE}
plot(effect("Anxiety", Model.Sweat.LL,
           transformation=list(link=log, inverse=exp)))
```

- Let check the residuals
```{r, echo=TRUE,message=FALSE, warning=FALSE}
plot(Model.Sweat.LL, which=1)
```


- They look OK


## Logarithmic 
- These functions are often physical (sound intensity, earthquakes, etc)
- $c^Y = dX$
- Breathing rate (in ms between breaths) predicted by anxiety level
- Logarithmic Example: $Y = .05(X*\epsilon)^{3.8}$


```{r, echo=TRUE, warning=FALSE}
set.seed(42)
n <- 250
x <- runif(n, 0, 7)+1
y <-200*log(x*(5+rnorm(n, sd=1)))
Logarithmic.Data<-data.frame(Anxiety=x,Breathing=y)
scatterplot(Breathing~Anxiety, data= Logarithmic.Data, reg.line=FALSE, smoother=loessLine)

```
```{r, echo=TRUE, warning=FALSE}
scatterplot(Breathing~log(Anxiety), data= Logarithmic.Data)
Model.Breath.Log<-lm(Breathing~log(Anxiety), data= Logarithmic.Data)

```

-Looks like a line now
-Lets look at the model fit

```{r, echo=TRUE, warning=FALSE,message=FALSE,results='asis'}
stargazer(Model.Breath.Log,type="html",
          column.labels = c("Log Transform"),
          intercept.bottom = FALSE,
          single.row=FALSE, 
          notes.append = FALSE)

```

- Let plot the fit 
```{r, echo=TRUE,message=FALSE, warning=FALSE}
plot(effect("Anxiety", Model.Breath.Log))
```

- Let check the residuals
```{r, echo=TRUE,message=FALSE, warning=FALSE}
plot(Model.Breath.Log, which=1)
```


## Reciprocals
- Common fix for rate data (things involving time)
- $1/Y$
- Adrenaline level in the blood predicted by mins after a panic attack (mins)
- Reciprocal Example: $Y = (1/X)*\epsilon$

```{r, echo=TRUE, warning=FALSE}
set.seed(42)
n <- 250
# Length of panic attack
x <- 15*(runif(n, 1, 4))
y <- 1/(x)*rnorm(n,5,sd=.1)
Reciprocal.Data<-data.frame(PostPanic=x,Adrenaline=y)
scatterplot(Adrenaline~PostPanic, data= Reciprocal.Data, reg.line=FALSE, smoother=loessLine)

```
```{r, echo=TRUE, warning=FALSE}
scatterplot(1/Adrenaline~PostPanic, data= Reciprocal.Data)
```

- Looks like a line now
- Lets look at the model fit

```{r, echo=TRUE, warning=FALSE,message=FALSE,results='asis'}
Model.Calm.Rep<-lm(1/Adrenaline~PostPanic, data= Reciprocal.Data)
stargazer(Model.Calm.Rep,type="latex",
          intercept.bottom = FALSE,
          single.row=FALSE, 
          notes.append = FALSE,          
          header=FALSE)

```

- Let plot the fit 
```{r, echo=TRUE,message=FALSE, warning=FALSE}
plot(effect("PostPanic", Model.Calm.Rep))
```

- Let check the residuals
```{r, echo=TRUE,message=FALSE, warning=FALSE}
plot(Model.Calm.Rep, which=1)
```

- A little off (but I add in noise in a simple way)

## Wierd Bulges in the data (not full on curves)
- Data do not always fit our nice predefined functions
- Sometimes they clearly have **bulges**, but are we sure those are real?
- Bulges can be not real in small samples or can results from some latent factor you are not accounting for
- Lets assume that if the data has a bulge that is real, meaningful, and we need to account for it
- We can use the Box-Cox Transform (fits different polynomials which we call $\lambda$)
- $Y = Y^\lambda - 1 / \lambda$, where $\lambda \neq 0$ OR $Y = lnY$, where $\lambda = 0$
- In this case, we will allow $\lambda$ to be selected based on which makes the data most normal
- Lets make a case with a weird power polynomial of 1.3 (something you would not notice in your raw data)
```{r, echo=TRUE,message=FALSE, warning=FALSE}


set.seed(42)
# 250 people
n <- 250
x <- runif(n, 0, 7)
y <- -4*x^1.3+ 20*x+rnorm(n, sd=2)+20
SortaQuad.Data<-data.frame(Anxiety=x,ExamScore=y)
scatterplot(ExamScore~Anxiety, data= SortaQuad.Data, reg.line=FALSE, smoother=loessLine)

```

- Lets try a linear model

```{r, echo=TRUE,message=FALSE, warning=FALSE}
Model.SortaQuad<-lm(ExamScore~Anxiety, data= SortaQuad.Data)
plot(Model.SortaQuad, which=1)

```

- Seems there is a slight curve in our data

```{r, echo=TRUE,message=FALSE, warning=FALSE}
library(MASS)
bc<-boxcox(ExamScore ~Anxiety, data = SortaQuad.Data,
       lambda = seq(-2, 2, len = 20))
#we can extract the lambda
(lambda <- bc$x[which.max(bc$y)])
```

- Lets plot the scatter and residuals from our transformed DV
```{r, echo=TRUE,message=FALSE, warning=FALSE}
scatterplot(ExamScore^lambda~Anxiety, data= SortaQuad.Data, reg.line=FALSE, smoother=loessLine)
Model.SortaQuad.BC<-lm(ExamScore^lambda ~ Anxiety, data= SortaQuad.Data)
plot(Model.SortaQuad.BC, which=1)

```

### Some issues
- Box-Cox is guess and requires some fine turning (like adjusting the range of lambdas)
- You have to keep track of your zeros in the DV (so you can add constants to fix it)
- Most importantly, you have make it harder to make sense of the DV (but your fits will be far better using it)


# Doing regression with correlation values as the DV
- You should not just use the correlation values straight away
- Correlations are NOT normal, but luckily Fisher give us a fix
- Fisher's *r* to Z
- You convert your correlation values to Z scores, do your analysis, predict your values and convert them back to r values for graphing
- see Demos, et al (2012) for an example.
- Functions below you can use to transform

```{r, echo=TRUE,message=FALSE, warning=FALSE}
FisherRtoZ <-function (z) {
  (exp(2 * z) - 1)/(1 + exp(2 * z))
}

FisherZtoR <-function (z) {
  tanh(z)
}

```



<script>
  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
  m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
  })(window,document,'script','https://www.google-analytics.com/analytics.js','ga');

  ga('create', 'UA-90415160-1', 'auto');
  ga('send', 'pageview');

</script>