Grade: /50

Overview

In this problem set, you will work with data from the Tennessee Student Teacher Achievement Ratio (STAR) project. Tennessee STAR was a massive experiment that sought to identify the effect of class size on student learning. Elementary school children were randomly assigned to one of three kinds of classrooms: small class size; regular class size; and regular class size with a teacher aide. In the lecture on causal inference and comparing two groups, we deleted student observations assigned to the “regular class size with a teacher aid” condition so that you could compare “small class size” (treatment group) to “regular class size” (control group). In this problem set, we will delete observations assigned to the “regular class size” condition and you will compare “small class size” (treatment group) to “regular class size plus teacher aide” (control group). In addition to variables about random assignment to classroom, the Tennessee STAR data contains categorical and continuous variables about the characteristics of students and their teachers. We will use the continuous variable “years of teacher experience” to run a regression that examines the relationship between years of teacher experience (\(X\)) and Kindergarten reading score (\(Y\)).

The problem set is divided into three parts:

  • In part I, you will answer questions about fundamental concepts of causal inference
  • In part II, you will answer questions about experiments and you will test a hypothesis about whether average reading scores differ between the “small class size” vs. the “regular class size plus teacher aide” groups.
  • In part III, you will answer questions about concepts in bivariate regression and you will run two regression models. The first bivariate regression model will examine the relationship between years of teacher experience (\(X\)) and reading score (\(Y\)). The second bivariate regression model will examine the relationship between classroom assignment (small vs. regular with teacher aide) (\(X\)) and reading score (\(Y\)).

If you have any questions about the problem set, please also post them on the #problemsets slack channel.

Tips on notation

Some questions will ask you to write out notation and/or equations.

You can write out notation/equations one of two ways: (1) using “inline equations,” which begin with a dollar sign $ and end with a dollar sign $; OR (2) you can write out notation/equation in plain text without. We encourage you to try inline equations, but fine if you do not.

Tips on writing notation/equations using "inline equations’:

  • Make sure there are no spaces after the dollar sign $ that begins the equation and no spaces before the dollar sign that ends the equation.
    • For example, you would write out the notation for treated potential outcome like this: \(Y_i(1)\)
    • But this wouldn’t work: $ Y_i(1)$
    • And this wouldn’t work: $Y_i(1) $
  • Special characters – like greek letters – within inline equations are referred to using special symbols that start with a backslash
    • e.g., “Beta” is \beta: \(\beta\)
    • “Mu” (symbol for population mean) is \mu: \(\mu\)
  • Subscripts after a character or symbol are specified like this:
    • e.g., “Beta subscript 1” is beta_1: \(\beta_1\)
    • e.g., “Mu subscript Y” (referring to population mean of variable Y) is \mu_Y: \(\mu_Y\)
  • “hats” are specified by wrapping the character/symbol within curly brackets \hat{} like this:
    • e.g., “Beta hat” is \hat{\beta}: \(\hat{\beta}\)
    • e.g., “Beta hat subscript 1” is \hat{\beta}_1 (note that the subscript is not within the “hat”): \(\hat{\beta}_1\)
  • “bars” are specified by wrapping the character/symbol within curly brackets \bar{} like this:
    • e.g., “sample mean of Y” is \bar{Y}: \(\bar{Y}\)
  • Don’t worry about getting it perfect and don’t spend too much time trying to get it perfect; if you are trying, that is a great start! and fine to use inline equations for some notation/equations and plain text for others that you can’t figure out.


Tips on writing notation/equations in plain text

  • Instead of writing \(Y_i(1)\), you could write this: Y_i(1)
  • Instead of writing \(Y_i = \beta_0 + \beta_1X_i + u_i\), you could write this: Y_i = beta_0 + beta_1*X_i + u_i
  • Instead of writing \(\hat{Y_i} = \hat{\beta}_0 + \hat{\beta}_1X_i\), you could write something like this: Y_hat_i = beta_hat_0 + beta_hat_1*X_i
  • don’t worry if it doesn’t look pretty!

Load libraries and data

Please run the code in the following chunk, which does the following:

  • Loads libraries
  • Loads and creates data frame from Tennessee STAR

Note: code chunk omitted from html document using include = FALSE


Run basic frquency tabulations on the variables star and treatment

# frequency tabulation of the original classroom assignment variable named star
df_stark %>% count(star)
#>   star    n
#> 1    2 1739
#> 2    3 2044
df_stark %>% count(star) %>% as_factor()
#>           star    n
#> 1        small 1739
#> 2 regular+aide 2044

# frequency tabulation of the variable named treatment, which we created from star
df_stark %>% count(treatment)
#>   treatment    n
#> 1         0 2044
#> 2         1 1739

# two-way frequency tabulation of star and treatment
  # basically, we run this to make sure that we created the variable treatment correctly
df_stark %>% group_by(treatment) %>% count(star)
#> # A tibble: 2 x 3
#> # Groups:   treatment [2]
#>   treatment             star     n
#>       <dbl>        <dbl+lbl> <int>
#> 1         0 3 [regular+aide]  2044
#> 2         1 2 [small]         1739
  #df_stark %>% group_by(star) %>% count(treatment)

# compare mean reading score by treatment status
df_stark %>% group_by(treatment) %>% summarize(
  n = n(),
  n_nonmiss_read = sum(!is.na(read)),
  read_mean = mean(read, na.rm = TRUE)
)
#> # A tibble: 2 x 4
#>   treatment     n n_nonmiss_read read_mean
#>       <dbl> <int>          <int>     <dbl>
#> 1         0  2044           2044      435.
#> 2         1  1739           1739      441.

Part 1: Fundamentals of causal inference and experiments


We introduce the following notation:

  • \(i=1,\ldots,n\) refers to “units” or “subjects”
    • in our data frame df_stark, each observation \(i\) represents a kindergarten student
  • \(Y_i\): actual observed outcome \(Y\) (reading score) for unit \(i\) (kindergarten student)
    • in df_stark, the variable read
  • \(T_i\): actual observed treatment condition for unit \(i\) (kindergarten student). \(T_i=1\) is “treated” (small class size); \(T_i=0\) is “untreated” or “control” (regular class size+teacher aide)
    • in df_stark, the variable treatment


/1

1. Our analysis goal is to examine the causal effect of being in a small class size vs. a regular class size+teacher aide on the reading achievement of kindergarten students. Write out a one-sentence research question associated with this analysis goal. This research question will help guide our analyses.


  • YOUR ANSWER HERE: What is the effect of being in a small class size (\(T_i =1\)) versus being in a regular class size with a teacher aid (\(T_i=0\)) on the reading achievement scores (\(Y_i\)) of kindergarten students?

/2

2. In general, what is the “treated potential outcome” \(Y_i(1)\) for a unit \(i\) and what is the “untreated potential outcome” \(Y_i(0)\) for a unit \(i\)? write out the treated potential outcome and the untreated potential outcome for a unit \(i\) for our research question.

  • YOUR ANSWER HERE: The treated potential outcome \(Y_i(1)\) for a unit \(i\) = the reading score of a kindergarten student in a small class size (\(T_i = 1\)). The untreated potential outcome \(Y_i(0)\) for a unit \(i\) = the reading score of a kindergarten student in a regular class size with a teacher aide (\(T_i= (0)\)).

/1

3. In general, how does the value of the treatment assignment variable \(T_i\) determine which potential outcome (\(Y_i(1)\) or \(Y_i(0)\)) is the “observed outcome” (\(Y_i\))?

  • YOUR ANSWER HERE: We imagine that for every kindergarten student \(i\), there exists two potential outcomes– the treated potential outcome \(Y_i(1)\) and the untreated potential outcome \(Y_i(0)\). The observed value of the treatment variable \(T_i\) just determines which of the two potential outcomes we actually get to observe.

/2

4. If student \(i\) is assigned to small class size (\(T_i\)=1), which potential outcome do we observe as \(Y_i\) and which potential outcome do we not observe? Explain why we observe the one potential outcome and not the other.

  • YOUR ANSWER HERE: If student \(i\) is assigned to the small class size \(T_i\) = 1, then we observe the treated potential outcome, \(Y_i(1)\) and we do not observe the untreated potential outcome, \(Y_i(0)\). The value of \(T_i\) determines which potential outcome we observe. If a student is assigned to a small class (treatment), then we observe their treated potential outcome. We cannot observe the untreated potential outcome \(Y_i(0)\) for someone that received the treatment (i.e., was assigned to a small class size).

/2

5. If student \(i\) is assigned to regular class size + teacher aide (\(T_i\)=0), which potential outcome do we observe as \(Y_i\) and which potential outcome do we not observe? Explain why we observe the one potential outcome and not the other.

  • YOUR ANSWER HERE: If student \(i\) is assigned to the regular class size with a teacher aide \(T_i\) = 0, then we observe the untreated potential outcome, \(Y_i(0)\) and we do not observe the treated potential outcome, \(Y_i(1)\). The value of \(T_i\) determines which potential outcome we observe. If a student is assigned to a regular class size with a teacher aide (control), then we observe their untreated potential outcome. We cannot observe the treated potential outcome \(Y_i(1)\) for someone that did not receive the treatment (i.e., was assigned to a regular class size with teacher aide).

/2

6. Write out the formula for the “unit causal effect” \(\tau_i\) and state what this formula means in words

  • YOUR ANSWER HERE: \(\tau_i = Y_i(1) - Y_i(0)\) The causal effect for unit \(i\) is the treated potential outcome \(Y_i(1)\), outcome had they received the treatment, minus the untreated potential outcome \(Y_i(0)\), outcome had they received the control.

/1

7. Explain why it is not possible to calculate the unit causal effect in real life.

  • YOUR ANSWER HERE: It is not possible to calculate the unit causal effect in real life because we do not get to observe both the treated potential outcome \(Y_i(1)\) and the untreated potential outcome \(Y_i(1)\) for a unit \(i\).

/2

8. Imagine that we did know both the treated potential outcome and the untreated potential outcome for students \(i=1\) through \(i=5\) (see table below). Calculate the unit causal effect for each student.

Assume we know treated \(Y_i(1)\) and untreated \(Y_i(0)\) potential outcomes for all \(i\). You can fill in your answer by replacing the ? mark.

\(i\) \(Y_i(1)\)
Treated
\(Y_i(0)\)
Untreated
\(\tau_i\)
Unit effect
1 65 60 5
2 30 35 -5
3 25 30 -5
4 80 70 10
5 45 45 0

/2

9. For these same five students, calculate the average treatment effect two different ways: First, as the mean value of the treated potential outcome minus the mean value of the untreated potential outcome; Second, as the mean value of the unit causal effect.

  • YOUR ANSWER HERE:
  1. Average treatment effect = 49 - 48 = 1
  2. Average treatment effect = Mean of \(\tau_i\) =1

Part 2. Experiments

/2

1. Write out the formula for \(\hat{ATE}\), the “difference in means” estimator of the average treatment effect (ATE). Explain what this formula means in words

  • YOUR ANSWER HERE: \(\hat{ATE}\) = \(\hat\tau\) = \(\bar{Y}_{treatment} - \bar{Y}_{control}\)

The estimator of the average treatment effect (ATE) is equal to the sample mean for treated units minus the sample mean for untreated units.

/3

2. Below, we show the value for treatment assignment variable \(T_i\) and the observed outcome reading score \(Y_i\) for the first observations in df_stark. Calculate the value of \(\hat{ATE}\) using the “difference in means” estimator for these 10 observations.

df_stark %>% select(id, treatment,read) %>% head(10)
#>      id treatment read
#> 1   943         1  447
#> 2   986         1  450
#> 3  1263         0  439
#> 4  2020         1  447
#> 5  2241         0  395
#> 6  3219         1  478
#> 7  3455         1  455
#> 8  3884         0  437
#> 9  4273         1  474
#> 10 4377         1  424
  • YOUR ANSWER HERE:

\(\hat{ATE}\) = (\(\bar{Y}_{treatment}\)) 453.5714286 - (\(\bar{Y}_{control}\)) 423.6666667

\(\hat{ATE}\) = \(\hat\tau\) = 29.9047619


Y_treat = mean(head(df_stark$read,10)[df_stark$treatment==1], na.rm = T) #sample mean for treated units from the first 10 observations

Y_control = mean(head(df_stark$read,10)[df_stark$treatment==0], na.rm = T) #sample mean for untreated units from the first 10 observations

ate <- Y_treat - Y_control

ate
#> [1] 29.90476

/1

3. In data frame df_stark the variable lunch identifies whether the student qualifies for free lunch (variable coded as: 1=non-free; 2=free). This variable was used as an indicator of household income because low-income students were elgible for free lunch at school. Below, we give a frequency distribution of lunch . We also show mean reading score by lunch, wich shows that students in the “non-free” lunch group have higher average reading scores than students in the “free” lunch group. Now, consider our treatment variable (1=small class; 0 = regular class + teacher aide). Imagine that, instead of being randomly assigned, students/parents self-selected into values of the treatment. Why might we be concerned that our estimator \(\bar{Y}_{treatment} - bar{Y}_{control}\) does not capture the true average treatment effect?

# frequency count of lunch
df_stark %>% count(lunch)
#>   lunch    n
#> 1     1 1932
#> 2     2 1838
#> 3    NA   13
df_stark %>% count(lunch) %>% as_factor()
#>      lunch    n
#> 1 non-free 1932
#> 2     free 1838
#> 3     <NA>   13

# mean reading score by lunch
df_stark %>% group_by(lunch) %>% summarize(
  mean_read = mean(read, na.rm = TRUE)
)
#> # A tibble: 3 x 2
#>           lunch mean_read
#>       <dbl+lbl>     <dbl>
#> 1  1 [non-free]      446.
#> 2  2 [free]          429.
#> 3 NA                 434.
  • YOUR ANSWER HERE: If students/parents self-selected into the values of treatment (e.g., small class size), then we would be concerned that our estimator \(\bar{Y}_{treatment} - \bar{Y}_{control}\) does not capture the true average treatment effect because it is likely that people with certain characteristics (e.g., non-free lunch group = higher income) are more likely to select into the treatment and this could have an affect on the potential outcomes.

/1

4. Students in the Tennessee STAR experiment were randomly assigned to values of the treatment variable (instead of self-selection into the treatment). Imagine that prior research tells us that being from a high-income household has a positive causal effect on reading achievement. Given that students in Tennessee STAR were randomly assigned into the treatment, why are we not concerned that household income (measured by the variable lunch) affects our ability to estimate the average treatment effect?

  • YOUR ANSWER HERE: We would not be concerned that household income affects our ability to estimate the average treatment effect because students were randomly assigned and that means their treatment assignment does not affect the value of potential outcomes.

/5

5. Below, we list the steps in hypothesis testing. Restate the research question and conduct a hypothesis test about whether the population mean reading score for students in the treatment group is equal the the population mean reading score for students in the control group, using an alpha level of 0.05.

  1. Hypothesis
    • formally state your “null” and “alternative” hypothesis
    • Null hypothesis, \(H_0\)
      • \(H_0: \mu_{Y_{{treated}}} = \mu_{Y_{{control}}}\)
      • \(H_0:\) the population mean reading score for kindergarten students assigned to the treatment, \(\mu_{Y_{{treated}}}\), is the same as the population mean reading score for kindergarten students assigned to the control \(\mu_{Y_{{control}}}\).
    • Alternative hypothesis, \(H_a\)
      • \(H_a: \mu_{Y_{{treated}}} \ne \mu_{Y_{{control}}}\)
      • \(H_a:\) the population mean reading score for kindergarten students assigned to the treatment, \(\mu_{Y_{{treated}}}\), is different than the population mean reading score for kindergarten students assigned to the control \(\mu_{Y_{{control}}}\).
  2. Assumptions [YOU CAN SKIP THIS STEP]
    • state assumptions that are relied upon by the statistical test you are using to test your hypothesis
  3. Test statistic
    • Using some appropriate statistical analysis, calculate the “test statistic” necessary to test your hypothesis

    • Test statistic, two population means \(t = \frac{(\bar{Y}_2 - \bar{Y}_1)- 0}{\hat{\sigma}_{\bar{Y}_2 - \bar{Y}_1}} = \frac{sample\_estimate - value\_associated\_with\_H_0}{sample\_standard\_error}\)

#> # A tibble: 2 x 5
#>   treatment     n n_non_miss  mean    sd
#>       <dbl> <int>      <int> <dbl> <dbl>
#> 1         0  2044       2044  435.  31.5
#> 2         1  1739       1739  441.  32.5
#> [1] 5.1179
#> [1] 1.045377
#> [1] 4.895747
#> 
#>  Welch Two Sample t-test
#> 
#> data:  read by treatment
#> t = -4.8957, df = 3645.6, p-value = 0.000001022
#> alternative hypothesis: true difference in means is not equal to 0
#> 95 percent confidence interval:
#>  -7.167473 -3.068310
#> sample estimates:
#> mean in group 0 mean in group 1 
#>        435.4295        440.5474
  1. p-value (means probability value)
    • calculate the probability of observing a test statistic as large or larger as the one you calculated

    • p-value, t.test above

      • .000001
  2. Alpha level/rejection region and conclusion
    • compare the p-value you observed to the alpha level and make a conclusion about your hypothesis test

    • alpha level = 0.05

      • p-value of 0.00 is less than alpha level of .05 so we reject \(H_0\) and accept \(H_a\)
  • YOUR ANSWER HERE:

/3

6. Below is the plot – created by the user-defined function plot_t_distribution() – of the sampling distribution assuming that \(H_0: \mu_{treatment} = \mu_{control}\) is true. Explain in your own words what is happening in the below plot and explain what the different statistics, dotted lines, and shaded areas mean.

#t.test(formula = read ~ treatment, data = df_stark)
plot_t_distribution(data_df = df_stark, data_var = 'read',group_var = 'treatment', group_cat = c(1, 0), shade_pval = TRUE)

  • YOUR ANSWER HERE: The plot above is the sampling distribution assuming the \(H_0\) is true. The blue dotted line shows us the t-statistic/p-value and the red line shows us the threshold for the critical value/alpha level to reject \(H_0\). The p-value of 0.00 is less than the alpha level of .05 and therefore we reject the \(H_0\) hypothesis. This tells us that the population mean reading scores for students assigned to the treatment (small class) is different than the population mean for students assigned to the control (regular class with aide).

Part 3: Bivariate regression

Consider the research question, what is the relationship between teacher years of teaching experience (\(X\)) and kindergarten reading score (\(Y\))?

In the data frame df_stark, teacher years of experience is measured by the variable experience.

Below is a scatterplot of the relationship between teacher years of experience \(X\) and kindergarten reading score (\(Y\))? We have also added an linear ordinary least squares (OLS) prediction line

df_stark %>%  ggplot(aes(x=experience, y=read)) + geom_point() + stat_smooth(method = 'lm')

/3

1. Using the lm() function create an object named mod1 that contains results from the bivariate regression of the relationship between years of teaching experience (\(X\)) and Kindergarten reading score (\(Y\)). Apply the summary() function to the object mod1 to print a summary of these regression results.

mod1 <- lm(formula = read ~ experience, data = df_stark)

summary(mod1)
#> 
#> Call:
#> lm(formula = read ~ experience, data = df_stark)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -70.890 -21.931  -4.584  15.722 192.722 
#> 
#> Coefficients:
#>              Estimate Std. Error t value             Pr(>|t|)    
#> (Intercept) 432.62463    0.99656 434.117 < 0.0000000000000002 ***
#> experience    0.55101    0.09005   6.119        0.00000000104 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 31.96 on 3761 degrees of freedom
#>   (20 observations deleted due to missingness)
#> Multiple R-squared:  0.009857,   Adjusted R-squared:  0.009594 
#> F-statistic: 37.44 on 1 and 3761 DF,  p-value: 0.000000001038

/6

2. For your analysis of the relationship teacher years of experience (\(X\)) and kindergarten reading score (\(Y\)), do the following: write out the population linear regression model (label the symbols and write out what the variables \(X\) and \(Y\) actually represent); write out the OLS prediction line (without estimate values); write out the OLS prediction line (with estimate values); interpet the point estimate value of \(\hat{\beta}_0\) in words; interpet the point estimate value of \(\hat{\beta}_1\) in words;


*note: you can always use general approach for interpreting \(\hat{\beta}_1\) in words:

  • “On average, a one-unit increase in \(X\) is associated with a \(\hat{\beta}_1\) increase (or decrease if \(\hat{\beta}_1\) is negative) in the value of \(Y\)

  • When interpreting \(\hat{\beta}_1\) in words, replace the generic “one-unit increase in \(X\)” with text that is specific to your analysis (e.g., “a one-year increase in years of teacher experience \(X\)); and do the same thing for”the value of \(Y\)"

  • YOUR ANSWER HERE:

  • Population linear regression model: \(Y_i = \beta_0 + \beta_1X_i + u_i\)

  • where:

    • subscript \(i\) refers to kindergarten students
    • \(Y_i\) = reading scores for kindergarten student \(i\)
    • \(X_i\) = the number of years the teacher of student \(i\) has been teaching
    • \(\beta_0\) = population intercept"), which represents average reading scores for kindergarten students in a class with a teacher who has no prior teaching experience X=0 (teaching experience = 0)
    • \(\beta_1\) = population regression coefficient, which represents average effect of a one-year increase in teacher experience \(X\) on kindergarten student reading scores \(Y\)
  • OLS prediction line (without estimates): \(\hat{Y_i} = \hat{\beta_0} + \hat{\beta_1}X_i\)

  • Where:

    • \(\hat{Y_i}\) = predicted value of kindergarten reading score for student \(i\)
    • \(\hat{\beta_0}\) = predicted average kindergarden student reading score when Xi = 0(when teacher experience, Xi = 0).
    • \(\hat{\beta_1}\)= estimate of change in Yi(kindergarden reading score) associated with a one unit change in Xi (teacher years of experience).
  • OLS prediction line (with estimates): \(\hat{Y_i} =\) 432.62 + 0.55 \(\times X_i\)

  • Interpret point estimate value of \(\hat{\beta}_0\) & \(\hat{\beta}_1\):

    • interpretation of \(\hat{\beta_0}=\) 432.62: the predicted kindergarten reading scores for a student in a class with a teacher with no prior teaching experience is 0 is 432.62
    • interpretation of \(\hat{\beta_1}=\) 0.55: a one-year increase in years of teacher experience is associated with a 0.55 point increase in student’s reading score.

/2

3. What is the predicted reading score for a student taught by a teacher who has: 5 years of experience? 20 years of experience? Show your work.

  • YOUR ANSWER HERE:
    • \(E(\hat{Y_i}|X=5) = \hat{Y_i} = \hat{\beta_0} + \hat{\beta_1}X_i=\) 432.62 + 0.55 \(\times 5\) = 435
    • \(E(\hat{Y_i}|X=20) = \hat{Y_i} = \hat{\beta_0} + \hat{\beta_1}X_i=\) 432.62 + 0.55 \(\times 20\) = 444

/7

4. Returning to our original research question (what is the effect of being in a small class size (\(X_i =1\)) versus being in a regular class size with a teacher aid (\(X_i=0\)) on the reading achievement scores (\(Y_i\)) of Kindergarten students?) do the following: run the regression in R (using lm() and summary()); write out the population linear regression model; write out the OLS prediction line (without estimate values); write out the OLS prediction line (with estimate values); interpet the point estimate value of \(\hat{\beta}_1\) in words;


*note: you can always use general approach for interpreting \(\hat{\beta}_1\) in words:

  • “On average, a one-unit increase in \(X\) is associated with a \(\hat{\beta}_1\) increase (or decrease if \(\hat{\beta}_1\) is negative) in the value of \(Y\)
  • This general approach is written with the idea of \(X\) being a continuous variable (e.g,. teacher years of experience), but it can also work with \(X\) is a dichotomous variable because going from \(X_i=0\) (control) to \(X_i=1\) (treatment) is still a one-unit increase in X
  • Modifying the general approach when \(X\) is a dichotomous variable:
    • instead of this: “a one-unit increase in \(X\) is associated with a …”
    • you can write this: “being assigned to the \(X_i=1\) as opposed to \(X_i=0\) is associated with a…”


mod2 <- lm(formula = read ~ treatment, data = df_stark)

summary(mod2)
#> 
#> Call:
#> lm(formula = read ~ treatment, data = df_stark)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -70.547 -21.547  -4.547  15.570 191.570 
#> 
#> Coefficients:
#>             Estimate Std. Error t value             Pr(>|t|)    
#> (Intercept)  435.430      0.707 615.888 < 0.0000000000000002 ***
#> treatment      5.118      1.043   4.908          0.000000959 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 31.96 on 3781 degrees of freedom
#> Multiple R-squared:  0.006331,   Adjusted R-squared:  0.006068 
#> F-statistic: 24.09 on 1 and 3781 DF,  p-value: 0.0000009588
  • YOUR ANSWER HERE:

  • Population linear regression model: \(Y_i = \beta_0 + \beta_1X_i + u_i\)

  • where:

    • subscript \(i\) refers to kindergarten students
    • \(Y_i\) = reading scores for kindergarten student \(i\)
    • \(X_i\) = treatment assignment of student \(i\)
    • \(\beta_0\) = population intercept"), which represents average reading scores for kindergarten students who are assigned to the control X=0 (regular class size + aide = 0)
    • \(\beta_1\) = population regression coefficient, which represents average effect of treatment \(X_i\) =1 on kindergarten student reading scores \(Y\)
  • OLS prediction line (without estimates): \(\hat{Y_i} = \hat{\beta_0} + \hat{\beta_1}X_i\)

  • Where:

    • \(\hat{Y_i}\) = predicted value of kindergarten reading score for student \(i\)
    • \(\hat{\beta_0}\) = predicted average kindergarden student reading score when Xi = 0(when students assigned to regular classroom with teacher aid).
    • \(\hat{\beta_1}\)= average effect of treatment \(X_i\) =1 on kindergarten student reading scores \(Y_i\).
  • OLS prediction line (with estimates): \(\hat{Y_i} =\) 435.43 + 5.12 \(\times X_i\)

  • Interpret point estimate value of \(\hat{\beta}_0\) & \(\hat{\beta}_1\):

    • interpretation of \(\hat{\beta_0}=\) 435.43: the predicted kindergarten reading scores for a student who was assigned to a regular sized class with a teacher aide \(X_i\)=0 is 435.43
    • interpretation of \(\hat{\beta_1}=\) 5.12: being assigned to the \(X_i=1\) as opposed to \(X_i=0\) is associated with 5.12 point increase in student’s reading score.

Part 4: Post a comment/question

/2

  • Go to the class #problemsets channel and create a new post.
  • You can either:
    • Share something you learned or a question from this problem set. Make sure to mention the instructors (@ozanj, @Patricia Martín).
    • Respond to a post made by another student.

Knit to html and submit problem set

Knit to html by clicking the “Knit” button near the top of your RStudio window (icon with blue yarn ball) or drop down and select “Knit to HTML”

  • Go to the class website and under the “Readings & Assignments” >> “Week 5” tab, click on the “Problem set 2 submission link”
  • Submit both your html and .Rmd files
  • Use this naming convention “lastname_firstname_ps#” for your .Rmd (e.g. martin_patricia_ps2.Rmd)