EDUC152, week 9 homework

Grade: /10

Overview

The purpose of this short exercise is to give you some practice with the basics of multivariate regression. In this exercise you will:

Run a regression model in R (code provided)

Write out the population linear regression model

Write out the OLS prediction line

with estimates

without estimates

Interpret value of regression coefficients

Calculate predicted values

Load libraries and dataset

# remove scientific notation options(scipen=999) ########## ########## Libraries ########## library(tidyverse) library(labelled) library(haven) ########## ########## ELS:2002 data ########## # RUN SCRIPT THAT CREATES STUDENT-LEVEL DATA FRAME CONTAINING ALL VARIABLES AND CREATES DATA FRAME WITH A SUBSET OF VARIABLES #NOTE: this script will take 30 seconds to a minute to run because loading a dataset w/ about 16,000 observations and 4,000 variables from a website source(file = url('https://github.com/anyone-can-cook/educ152/raw/main/scripts/els/read_els_by_pets.R')) #source(file = file.path('.','..','..','scripts','els','read_els_by_pets.R')) #list.files(path = file.path('.','..','..','scripts','els')) # Create a dataframe df_els_stu_fac that has categorical variables as factor class variables rather than labelled class variables df_els_stu_fac <- as_factor(df_els_stu, only_labelled = TRUE) %>% # create a version of parent income that is in $1000s mutate(parent_income000 = parent_income/1000) # convert continuous variables we know we want numeric back to numeric for (v in c('bytxmstd','bytxrstd','f1txmstd','f3stloanamt','f3stloanpay','f3ern2011','f3tzrectrans','f3tzreqtrans','f3tzschtotal')) { df_els_stu_fac[[v]] <- df_els_stu[[v]] }

Run descriptive statistics and regression

Variables

The dependent variable is high school reading test score, bytxrstd

variable label: Reading test standardized score

The continuous independent variable is parent_income000

variable label: continuous measure of base year parental household income, calculated from categorical variable byincome

Note. This is parent income in $thousands (e.g., value of 62.5 refers to $62,500)

so a “one-unit” increase in this variable would be a $1,000 increase in parent income

The categorical independent variable is school control (e.g. public school, Catholic private school, or non-Catholic private school) bysctrl

variable label: School control

Your job in this section is just to run the provided code

Descriptive statistics about variable in the model

df_els_stu_fac %>% select(bytxrstd,parent_income000,bysctrl) %>% glimpse() #> Rows: 8,910 #> Columns: 3 #> $ bytxrstd <dbl+lbl> 56.70, 64.46, 48.69, 33.53, 40.80, 41.05, 56.33,… #> $ parent_income000 <dbl> 87.5, 62.5, 0.5, 17.5, 62.5, 3.0, 30.0, 30.0, 150.0,… #> $ bysctrl <fct> Public, Public, Public, Public, Public, Public, Publ… # dependent variable df_els_stu_fac %>% summarize( mean_read_score = mean(bytxrstd, na.rm = TRUE), sd_read_score = sd(bytxrstd, na.rm = TRUE) ) #> # A tibble: 1 x 2 #> mean_read_score sd_read_score #> <dbl> <dbl> #> 1 53.3 9.30 # continuous independent variable df_els_stu_fac %>% summarize( mean_parent_income = mean(parent_income000, na.rm = TRUE), sd_parent_income = sd(parent_income000, na.rm = TRUE) ) #> # A tibble: 1 x 2 #> mean_parent_income sd_parent_income #> <dbl> <dbl> #> 1 71.3 58.2 # categorical independent variable df_els_stu_fac %>% count(bysctrl) #> # A tibble: 3 x 2 #> bysctrl n #> <fct> <int> #> 1 Public 6486 #> 2 Catholic 1465 #> 3 Other private 959 # categorical independent variable, showing value of underlying integer values df_els_stu_fac %>% count(as.integer(bysctrl)) #> # A tibble: 3 x 2 #> `as.integer(bysctrl)` n #> <int> <int> #> 1 1 6486 #> 2 2 1465 #> 3 3 959

Run regression model

mod1 <- lm(formula = bytxrstd ~ parent_income000 + bysctrl, data = df_els_stu_fac %>% filter(f2enroll0506=='yes')) summary(mod1) #> #> Call: #> lm(formula = bytxrstd ~ parent_income000 + bysctrl, data = df_els_stu_fac %>% #> filter(f2enroll0506 == "yes")) #> #> Residuals: #> Min 1Q Median 3Q Max #> -30.2574 -5.6841 0.1432 6.0089 25.5002 #> #> Coefficients: #> Estimate Std. Error t value Pr(>|t|) #> (Intercept) 51.167236 0.167530 305.421 < 0.0000000000000002 *** #> parent_income000 0.033481 0.001762 19.004 < 0.0000000000000002 *** #> bysctrlCatholic 1.980870 0.270856 7.313 0.000000000000288 *** #> bysctrlOther private 2.300603 0.329044 6.992 0.000000000002954 *** #> --- #> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 #> #> Residual standard error: 8.69 on 7315 degrees of freedom #> Multiple R-squared: 0.07423, Adjusted R-squared: 0.07385 #> F-statistic: 195.5 on 3 and 7315 DF, p-value: < 0.00000000000000022

Questions for you to answer

/3

1. Write out the population linear regression model (make sure to define which variable (e.g., “parental income”) is associated with which $X_{ki}$ in the model; and define unit of analysis if relevant).

YOUR ANSWER HERE:

Population linear regression model (multivariate regression)

$Y_i = \beta_0 + \beta_1 X_{1i}+ \beta_2 X_{2i}+ \beta_3 X_{3i}+ u_i$, where:

$Y_i$: is high school reading test score for student $i$

$X_{1i}$: (continuous) is parental income in $1,000s

$X_{2i}$: (dichotomous) student $i$ attends a Catholic private school

$X_{3i}$: (dichotomous) student $i$ attends a non-Catholic private school

$u_i$ is all variables that affect $Y_i$ but were not included in our model

/2

2. Write out the OLS prediction line without estimate values and write out the OLS prediction line with estimate values.

YOUR ANSWER HERE:

OLS prediction line without estimates

$\hat{Y_i} = \hat{\beta_0} + \hat{\beta_1}X_{1i} + \hat{\beta_2} X_{2i}+ \hat{\beta_3} X_{3i}$

OLS prediction line with estimates

$\hat{Y_i} =$ 51.17 + 0.03 $\times X_{1i}$ + 1.98 $\times X_{2i}$ + 2.3 $\times X_{3i}$

/3

3. Interpret the value of regression coefficients $\hat{\beta_1}$, $\hat{\beta_2}$, and $\hat{\beta_3}$ in words.

YOUR ANSWER HERE:

$X_{1i}$ (parental income in $ thousands); $\hat{\beta_1}=$ 0.03

a one thousand dollar increase in parental income is associated with a 0.03 point change in high school reading test score.

$X_{2i}$ (attends a Catholic private school); $\hat{\beta_2}=$ 1.98

attending a Catholic private school as opposed to a public school is, on average, is associated with a 1.98 change in high school reading test score, holding the value of all other $X$ variables constant.

$X_{3i}$ (attends a non-Catholic private school); $\hat{\beta_3}=$ 2.3

attending a non-Catholic private school as opposed to a public school is, on average, is associated with a 2.3 change in high school reading test score, holding the value of all other $X$ variables constant.

/1

4. Interpret the value of the regression coefficients $\hat{\beta_0}$ in words.

YOUR ANSWER HERE:

$\hat{\beta_0}=$ 51.17

The estimated average high school test score for a student with parental income of 0 who attended a public high school is 51.17

/1

5. Calculate the predicted high school reading test score for a student who attended a non-Catholic private school and who has parental_income000 = 150 (i.e., $150,000); show your work

YOUR ANSWER HERE:

$\hat{Y_i} = \hat{\beta_0} + \hat{\beta_1}\times 150 + \hat{\beta_2} \times 0 + \hat{\beta_3} \times 1$

$\hat{Y_i} =$ 51.17 + 0.03 $\times 150$ + 2.3 $\times 1=$ 57.97

Knit to html and submit exercise

Knit to html by clicking the “Knit” button near the top of your RStudio window (icon with blue yarn ball) or drop down and select “Knit to HTML”

Go to the class website and under the “Readings & Assignments” >> “Week 9” tab, click on the “Short exercise submission link”

Submit both your html and .Rmd files

Use this naming convention “lastname_firstname_se” for your .Rmd (e.g. martin_patricia_se.Rmd)

EDUC152, week 9 homework

Overview

Load libraries and dataset

Run descriptive statistics and regression

Questions for you to answer

1. Write out the population linear regression model (make sure to define which variable (e.g., “parental income”) is associated with which \(X_{ki}\) in the model; and define unit of analysis if relevant).

2. Write out the OLS prediction line without estimate values and write out the OLS prediction line with estimate values.

3. Interpret the value of regression coefficients \(\hat{\beta_1}\), \(\hat{\beta_2}\), and \(\hat{\beta_3}\) in words.

4. Interpret the value of the regression coefficients \(\hat{\beta_0}\) in words.

5. Calculate the predicted high school reading test score for a student who attended a non-Catholic private school and who has `parental_income000 = 150` (i.e., $150,000); show your work

Knit to html and submit exercise

EDUC152, week 9 homework

Overview

Load libraries and dataset

Run descriptive statistics and regression

Questions for you to answer

1. Write out the population linear regression model (make sure to define which variable (e.g., “parental income”) is associated with which \(X_{ki}\) in the model; and define unit of analysis if relevant).

2. Write out the OLS prediction line without estimate values and write out the OLS prediction line with estimate values.

3. Interpret the value of regression coefficients \(\hat{\beta_1}\), \(\hat{\beta_2}\), and \(\hat{\beta_3}\) in words.

4. Interpret the value of the regression coefficients \(\hat{\beta_0}\) in words.

5. Calculate the predicted high school reading test score for a student who attended a non-Catholic private school and who has parental_income000 = 150 (i.e., $150,000); show your work

Knit to html and submit exercise

5. Calculate the predicted high school reading test score for a student who attended a non-Catholic private school and who has `parental_income000 = 150` (i.e., $150,000); show your work