Grade: /10

Overview

The purpose of this short exercise is to give you some practice with the basics of multivariate regression. In this exercise you will:

  • Run a regression model in R (code provided)
  • Write out the population linear regression model
  • Write out the OLS prediction line
    • with estimates
    • without estimates
  • Interpret value of regression coefficients
  • Calculate predicted values

Load libraries and dataset

# remove scientific notation
options(scipen=999)

##########
########## Libraries
##########

  library(tidyverse)
  library(labelled)
  library(haven)

##########
########## ELS:2002 data
##########

# RUN SCRIPT THAT CREATES STUDENT-LEVEL DATA FRAME CONTAINING ALL VARIABLES AND CREATES DATA FRAME WITH A SUBSET OF VARIABLES

  #NOTE: this script will take 30 seconds to a minute to run because loading a dataset w/ about 16,000 observations and 4,000 variables from a website

  source(file = url('https://github.com/anyone-can-cook/educ152/raw/main/scripts/els/read_els_by_pets.R'))
    #source(file = file.path('.','..','..','scripts','els','read_els_by_pets.R'))
      #list.files(path = file.path('.','..','..','scripts','els'))

# Create a dataframe df_els_stu_fac that has categorical variables as factor class variables rather than labelled class variables
  df_els_stu_fac <- as_factor(df_els_stu, only_labelled = TRUE) %>%
    # create a version of parent income that is in $1000s
    mutate(parent_income000 = parent_income/1000)
  # convert continuous variables we know we want numeric back to numeric
  for (v in c('bytxmstd','bytxrstd','f1txmstd','f3stloanamt','f3stloanpay','f3ern2011','f3tzrectrans','f3tzreqtrans','f3tzschtotal')) {
    df_els_stu_fac[[v]] <- df_els_stu[[v]]  
  }

Run descriptive statistics and regression

Variables

  • The dependent variable is high school reading test score, bytxrstd
    • variable label: Reading test standardized score
  • The continuous independent variable is parent_income000
    • variable label: continuous measure of base year parental household income, calculated from categorical variable byincome
    • Note. This is parent income in $thousands (e.g., value of 62.5 refers to $62,500)
      • so a “one-unit” increase in this variable would be a $1,000 increase in parent income
  • The categorical independent variable is school control (e.g. public school, Catholic private school, or non-Catholic private school) bysctrl
    • variable label: School control

Your job in this section is just to run the provided code

  • Descriptive statistics about variable in the model
df_els_stu_fac %>% select(bytxrstd,parent_income000,bysctrl) %>% glimpse()
#> Rows: 8,910
#> Columns: 3
#> $ bytxrstd         <dbl+lbl> 56.70, 64.46, 48.69, 33.53, 40.80, 41.05, 56.33,…
#> $ parent_income000 <dbl> 87.5, 62.5, 0.5, 17.5, 62.5, 3.0, 30.0, 30.0, 150.0,…
#> $ bysctrl          <fct> Public, Public, Public, Public, Public, Public, Publ…

# dependent variable
df_els_stu_fac %>% summarize(
  mean_read_score = mean(bytxrstd, na.rm = TRUE),
  sd_read_score = sd(bytxrstd, na.rm = TRUE)
)
#> # A tibble: 1 x 2
#>   mean_read_score sd_read_score
#>             <dbl>         <dbl>
#> 1            53.3          9.30

# continuous independent variable
df_els_stu_fac %>% summarize(
  mean_parent_income = mean(parent_income000, na.rm = TRUE),
  sd_parent_income = sd(parent_income000, na.rm = TRUE)
)
#> # A tibble: 1 x 2
#>   mean_parent_income sd_parent_income
#>                <dbl>            <dbl>
#> 1               71.3             58.2

# categorical independent variable
df_els_stu_fac %>% count(bysctrl)
#> # A tibble: 3 x 2
#>   bysctrl           n
#>   <fct>         <int>
#> 1 Public         6486
#> 2 Catholic       1465
#> 3 Other private   959

  # categorical independent variable, showing value of underlying integer values
 df_els_stu_fac %>% count(as.integer(bysctrl))
#> # A tibble: 3 x 2
#>   `as.integer(bysctrl)`     n
#>                   <int> <int>
#> 1                     1  6486
#> 2                     2  1465
#> 3                     3   959
  • Run regression model
mod1 <- lm(formula = bytxrstd ~ parent_income000 + bysctrl, data = df_els_stu_fac %>% filter(f2enroll0506=='yes'))

summary(mod1)
#> 
#> Call:
#> lm(formula = bytxrstd ~ parent_income000 + bysctrl, data = df_els_stu_fac %>% 
#>     filter(f2enroll0506 == "yes"))
#> 
#> Residuals:
#>      Min       1Q   Median       3Q      Max 
#> -30.2574  -5.6841   0.1432   6.0089  25.5002 
#> 
#> Coefficients:
#>                       Estimate Std. Error t value             Pr(>|t|)    
#> (Intercept)          51.167236   0.167530 305.421 < 0.0000000000000002 ***
#> parent_income000      0.033481   0.001762  19.004 < 0.0000000000000002 ***
#> bysctrlCatholic       1.980870   0.270856   7.313    0.000000000000288 ***
#> bysctrlOther private  2.300603   0.329044   6.992    0.000000000002954 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 8.69 on 7315 degrees of freedom
#> Multiple R-squared:  0.07423,    Adjusted R-squared:  0.07385 
#> F-statistic: 195.5 on 3 and 7315 DF,  p-value: < 0.00000000000000022

Questions for you to answer

/3

1. Write out the population linear regression model (make sure to define which variable (e.g., “parental income”) is associated with which \(X_{ki}\) in the model; and define unit of analysis if relevant).

  • YOUR ANSWER HERE:

/2

2. Write out the OLS prediction line without estimate values and write out the OLS prediction line with estimate values.

YOUR ANSWER HERE:

/3

3. Interpret the value of regression coefficients \(\hat{\beta_1}\), \(\hat{\beta_2}\), and \(\hat{\beta_3}\) in words.

YOUR ANSWER HERE:

/1

4. Interpret the value of the regression coefficients \(\hat{\beta_0}\) in words.

YOUR ANSWER HERE:

/1

5. Calculate the predicted high school reading test score for a student who attended a non-Catholic private school and who has parental_income000 = 150 (i.e., $150,000); show your work

YOUR ANSWER HERE:

Knit to html and submit exercise

Knit to html by clicking the “Knit” button near the top of your RStudio window (icon with blue yarn ball) or drop down and select “Knit to HTML”

  • Go to the class website and under the “Readings & Assignments” >> “Week 9” tab, click on the “Short exercise submission link”
  • Submit both your html and .Rmd files
  • Use this naming convention “lastname_firstname_se” for your .Rmd (e.g. martin_patricia_se.Rmd)