Grade: /10
The purpose of this short exercise is to give you some practice with the basics of multivariate regression. In this exercise you will:
# remove scientific notation
options(scipen=999)
##########
########## Libraries
##########
library(tidyverse)
library(labelled)
library(haven)
##########
########## ELS:2002 data
##########
# RUN SCRIPT THAT CREATES STUDENT-LEVEL DATA FRAME CONTAINING ALL VARIABLES AND CREATES DATA FRAME WITH A SUBSET OF VARIABLES
#NOTE: this script will take 30 seconds to a minute to run because loading a dataset w/ about 16,000 observations and 4,000 variables from a website
source(file = url('https://github.com/anyone-can-cook/educ152/raw/main/scripts/els/read_els_by_pets.R'))
#source(file = file.path('.','..','..','scripts','els','read_els_by_pets.R'))
#list.files(path = file.path('.','..','..','scripts','els'))
# Create a dataframe df_els_stu_fac that has categorical variables as factor class variables rather than labelled class variables
df_els_stu_fac <- as_factor(df_els_stu, only_labelled = TRUE) %>%
# create a version of parent income that is in $1000s
mutate(parent_income000 = parent_income/1000)
# convert continuous variables we know we want numeric back to numeric
for (v in c('bytxmstd','bytxrstd','f1txmstd','f3stloanamt','f3stloanpay','f3ern2011','f3tzrectrans','f3tzreqtrans','f3tzschtotal')) {
df_els_stu_fac[[v]] <- df_els_stu[[v]]
}
Variables
bytxrstd
parent_income000
bysctrl
Your job in this section is just to run the provided code
df_els_stu_fac %>% select(bytxrstd,parent_income000,bysctrl) %>% glimpse()
#> Rows: 8,910
#> Columns: 3
#> $ bytxrstd <dbl+lbl> 56.70, 64.46, 48.69, 33.53, 40.80, 41.05, 56.33,…
#> $ parent_income000 <dbl> 87.5, 62.5, 0.5, 17.5, 62.5, 3.0, 30.0, 30.0, 150.0,…
#> $ bysctrl <fct> Public, Public, Public, Public, Public, Public, Publ…
# dependent variable
df_els_stu_fac %>% summarize(
mean_read_score = mean(bytxrstd, na.rm = TRUE),
sd_read_score = sd(bytxrstd, na.rm = TRUE)
)
#> # A tibble: 1 x 2
#> mean_read_score sd_read_score
#> <dbl> <dbl>
#> 1 53.3 9.30
# continuous independent variable
df_els_stu_fac %>% summarize(
mean_parent_income = mean(parent_income000, na.rm = TRUE),
sd_parent_income = sd(parent_income000, na.rm = TRUE)
)
#> # A tibble: 1 x 2
#> mean_parent_income sd_parent_income
#> <dbl> <dbl>
#> 1 71.3 58.2
# categorical independent variable
df_els_stu_fac %>% count(bysctrl)
#> # A tibble: 3 x 2
#> bysctrl n
#> <fct> <int>
#> 1 Public 6486
#> 2 Catholic 1465
#> 3 Other private 959
# categorical independent variable, showing value of underlying integer values
df_els_stu_fac %>% count(as.integer(bysctrl))
#> # A tibble: 3 x 2
#> `as.integer(bysctrl)` n
#> <int> <int>
#> 1 1 6486
#> 2 2 1465
#> 3 3 959
mod1 <- lm(formula = bytxrstd ~ parent_income000 + bysctrl, data = df_els_stu_fac %>% filter(f2enroll0506=='yes'))
summary(mod1)
#>
#> Call:
#> lm(formula = bytxrstd ~ parent_income000 + bysctrl, data = df_els_stu_fac %>%
#> filter(f2enroll0506 == "yes"))
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -30.2574 -5.6841 0.1432 6.0089 25.5002
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 51.167236 0.167530 305.421 < 0.0000000000000002 ***
#> parent_income000 0.033481 0.001762 19.004 < 0.0000000000000002 ***
#> bysctrlCatholic 1.980870 0.270856 7.313 0.000000000000288 ***
#> bysctrlOther private 2.300603 0.329044 6.992 0.000000000002954 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 8.69 on 7315 degrees of freedom
#> Multiple R-squared: 0.07423, Adjusted R-squared: 0.07385
#> F-statistic: 195.5 on 3 and 7315 DF, p-value: < 0.00000000000000022
/3
YOUR ANSWER HERE:
Population linear regression model (multivariate regression)
/2
YOUR ANSWER HERE:
/3
YOUR ANSWER HERE:
/1
YOUR ANSWER HERE:
0 who attended a public high school is 51.17/1
parental_income000 = 150 (i.e., $150,000); show your workYOUR ANSWER HERE:
Knit to html by clicking the “Knit” button near the top of your RStudio window (icon with blue yarn ball) or drop down and select “Knit to HTML”