Grade: /50

Overview

In this problem set, you will work with data from the Tennessee Student Teacher Achievement Ratio (STAR) project. Tennessee STAR was a massive experiment that sought to identify the effect of class size on student learning. Elementary school children were randomly assigned to one of three kinds of classrooms: small class size; regular class size; and regular class size with a teacher aide. In the lecture on causal inference and comparing two groups, we deleted student observations assigned to the “regular class size with a teacher aid” condition so that you could compare “small class size” (treatment group) to “regular class size” (control group). In this problem set, we will delete observations assigned to the “regular class size” condition and you will compare “small class size” (treatment group) to “regular class size plus teacher aide” (control group). In addition to variables about random assignment to classroom, the Tennessee STAR data contains categorical and continuous variables about the characteristics of students and their teachers. We will use the continuous variable “years of teacher experience” to run a regression that examines the relationship between years of teacher experience (\(X\)) and Kindergarten reading score (\(Y\)).

The problem set is divided into three parts:

  • In part I, you will answer questions about fundamental concepts of causal inference
  • In part II, you will answer questions about experiments and you will test a hypothesis about whether average reading scores differ between the “small class size” vs. the “regular class size plus teacher aide” groups.
  • In part III, you will answer questions about concepts in bivariate regression and you will run two regression models. The first bivariate regression model will examine the relationship between years of teacher experience (\(X\)) and reading score (\(Y\)). The second bivariate regression model will examine the relationship between classroom assignment (small vs. regular with teacher aide) (\(X\)) and reading score (\(Y\)).

If you have any questions about the problem set, please also post them on the #problemsets slack channel.

Tips on notation

Some questions will ask you to write out notation and/or equations.

You can write out notation/equations one of two ways: (1) using “inline equations,” which begin with a dollar sign $ and end with a dollar sign $; OR (2) you can write out notation/equation in plain text without. We encourage you to try inline equations, but fine if you do not.

Tips on writing notation/equations using "inline equations’:

  • Make sure there are no spaces after the dollar sign $ that begins the equation and no spaces before the dollar sign that ends the equation.
    • For example, you would write out the notation for treated potential outcome like this: \(Y_i(1)\)
    • But this wouldn’t work: $ Y_i(1)$
    • And this wouldn’t work: $Y_i(1) $
  • Special characters – like greek letters – within inline equations are referred to using special symbols that start with a backslash
    • e.g., “Beta” is \beta: \(\beta\)
    • “Mu” (symbol for population mean) is \mu: \(\mu\)
  • Subscripts after a character or symbol are specified like this:
    • e.g., “Beta subscript 1” is beta_1: \(\beta_1\)
    • e.g., “Mu subscript Y” (referring to population mean of variable Y) is \mu_Y: \(\mu_Y\)
  • “hats” are specified by wrapping the character/symbol within curly brackets \hat{} like this:
    • e.g., “Beta hat” is \hat{\beta}: \(\hat{\beta}\)
    • e.g., “Beta hat subscript 1” is \hat{\beta}_1 (note that the subscript is not within the “hat”): \(\hat{\beta}_1\)
  • “bars” are specified by wrapping the character/symbol within curly brackets \bar{} like this:
    • e.g., “sample mean of Y” is \bar{Y}: \(\bar{Y}\)
  • Don’t worry about getting it perfect and don’t spend too much time trying to get it perfect; if you are trying, that is a great start! and fine to use inline equations for some notation/equations and plain text for others that you can’t figure out.


Tips on writing notation/equations in plain text

  • Instead of writing \(Y_i(1)\), you could write this: Y_i(1)
  • Instead of writing \(Y_i = \beta_0 + \beta_1X_i + u_i\), you could write this: Y_i = beta_0 + beta_1*X_i + u_i
  • Instead of writing \(\hat{Y_i} = \hat{\beta}_0 + \hat{\beta}_1X_i\), you could write something like this: Y_hat_i = beta_hat_0 + beta_hat_1*X_i
  • don’t worry if it doesn’t look pretty!

Load libraries and data

Please run the code in the following chunk, which does the following:

  • Loads libraries
  • Loads and creates data frame from Tennessee STAR

Note: code chunk omitted from html document using include = FALSE


Run basic frquency tabulations on the variables star and treatment

# frequency tabulation of the original classroom assignment variable named star
df_stark %>% count(star)
#>   star    n
#> 1    2 1739
#> 2    3 2044
df_stark %>% count(star) %>% as_factor()
#>           star    n
#> 1        small 1739
#> 2 regular+aide 2044

# frequency tabulation of the variable named treatment, which we created from star
df_stark %>% count(treatment)
#>   treatment    n
#> 1         0 2044
#> 2         1 1739

# two-way frequency tabulation of star and treatment
  # basically, we run this to make sure that we created the variable treatment correctly
df_stark %>% group_by(treatment) %>% count(star)
#> # A tibble: 2 x 3
#> # Groups:   treatment [2]
#>   treatment             star     n
#>       <dbl>        <dbl+lbl> <int>
#> 1         0 3 [regular+aide]  2044
#> 2         1 2 [small]         1739
  #df_stark %>% group_by(star) %>% count(treatment)

# compare mean reading score by treatment status
df_stark %>% group_by(treatment) %>% summarize(
  n = n(),
  n_nonmiss_read = sum(!is.na(read)),
  read_mean = mean(read, na.rm = TRUE)
)
#> # A tibble: 2 x 4
#>   treatment     n n_nonmiss_read read_mean
#>       <dbl> <int>          <int>     <dbl>
#> 1         0  2044           2044      435.
#> 2         1  1739           1739      441.

Part 1: Fundamentals of causal inference and experiments


We introduce the following notation:

  • \(i=1,\ldots,n\) refers to “units” or “subjects”
    • in our data frame df_stark, each observation \(i\) represents a kindergarten student
  • \(Y_i\): actual observed outcome \(Y\) (reading score) for unit \(i\) (kindergarten student)
    • in df_stark, the variable read
  • \(T_i\): actual observed treatment condition for unit \(i\) (kindergarten student). \(T_i=1\) is “treated” (small class size); \(T_i=0\) is “untreated” or “control” (regular class size+teacher aide)
    • in df_stark, the variable treatment


/1

1. Our analysis goal is to examine the causal effect of being in a small class size vs. a regular class size+teacher aide on the reading achievement of kindergarten students. Write out a one-sentence research question associated with this analysis goal. This research question will help guide our analyses.


  • YOUR ANSWER HERE:

/2

2. In general, what is the “treated potential outcome” \(Y_i(1)\) for a unit \(i\) and what is the “untreated potential outcome” \(Y_i(0)\) for a unit \(i\)? write out the treated potential outcome and the untreated potential outcome for a unit \(i\) for our research question.

  • YOUR ANSWER HERE:

/1

3. In general, how does the value of the treatment assignment variable \(T_i\) determine which potential outcome (\(Y_i(1)\) or \(Y_i(0)\)) is the “observed outcome” (\(Y_i\))?

  • YOUR ANSWER HERE:

/2

4. If student \(i\) is assigned to small class size (\(T_i\)=1), which potential outcome do we observe as \(Y_i\) and which potential outcome do we not observe? Explain why we observe the one potential outcome and not the other.

  • YOUR ANSWER HERE:

/2

5. If student \(i\) is assigned to regular class size + teacher aide (\(T_i\)=0), which potential outcome do we observe as \(Y_i\) and which potential outcome do we not observe? Explain why we observe the one potential outcome and not the other.

  • YOUR ANSWER HERE:

/2

6. Write out the formula for the “unit causal effect” \(\tau_i\) and state what this formula means in words

  • YOUR ANSWER HERE:

/1

7. Explain why it is not possible to calculate the unit causal effect in real life.

  • YOUR ANSWER HERE:

/2

8. Imagine that we did know both the treated potential outcome and the untreated potential outcome for students \(i=1\) through \(i=5\) (see table below). Calculate the unit causal effect for each student.

Assume we know treated \(Y_i(1)\) and untreated \(Y_i(0)\) potential outcomes for all \(i\). You can fill in your answer by replacing the ? mark.

\(i\) \(Y_i(1)\)
Treated
\(Y_i(0)\)
Untreated
\(\tau_i\)
Unit effect
1 65 60 ?
2 30 35 ?
3 25 30 ?
4 80 70 ?
5 45 45 ?

/2

9. For these same five students, calculate the average treatment effect two different ways: First, as the mean value of the treated potential outcome minus the mean value of the untreated potential outcome; Second, as the mean value of the unit causal effect.

  • YOUR ANSWER HERE:

Part 2. Experiments

/2

1. Write out the formula for \(\hat{ATE}\), the “difference in means” estimator of the average treatment effect (ATE). Explain what this formula means in words

  • YOUR ANSWER HERE:

/3

2. Below, we show the value for treatment assignment variable \(T_i\) and the observed outcome reading score \(Y_i\) for the first observations in df_stark. Calculate the value of \(\hat{ATE}\) using the “difference in means” estimator for these 10 observations.

df_stark %>% select(id, treatment,read) %>% head(10)
#>      id treatment read
#> 1   943         1  447
#> 2   986         1  450
#> 3  1263         0  439
#> 4  2020         1  447
#> 5  2241         0  395
#> 6  3219         1  478
#> 7  3455         1  455
#> 8  3884         0  437
#> 9  4273         1  474
#> 10 4377         1  424
  • YOUR ANSWER HERE:

/1

3. In data frame df_stark the variable lunch identifies whether the student qualifies for free lunch (variable coded as: 1=non-free; 2=free). This variable was used as an indicator of household income because low-income students were elgible for free lunch at school. Below, we give a frequency distribution of lunch . We also show mean reading score by lunch, wich shows that students in the “non-free” lunch group have higher average reading scores than students in the “free” lunch group. Now, consider our treatment variable (1=small class; 0 = regular class + teacher aide). Imagine that, instead of being randomly assigned, students/parents self-selected into values of the treatment. Why might we be concerned that our estimator \(\bar{Y}_{treatment} - bar{Y}_{control}\) does not capture the true average treatment effect?

# frequency count of lunch
df_stark %>% count(lunch)
#>   lunch    n
#> 1     1 1932
#> 2     2 1838
#> 3    NA   13
df_stark %>% count(lunch) %>% as_factor()
#>      lunch    n
#> 1 non-free 1932
#> 2     free 1838
#> 3     <NA>   13

# mean reading score by lunch
df_stark %>% group_by(lunch) %>% summarize(
  mean_read = mean(read, na.rm = TRUE)
)
#> # A tibble: 3 x 2
#>           lunch mean_read
#>       <dbl+lbl>     <dbl>
#> 1  1 [non-free]      446.
#> 2  2 [free]          429.
#> 3 NA                 434.
  • YOUR ANSWER HERE:

/1

4. Students in the Tennessee STAR experiment were randomly assigned to values of the treatment variable (instead of self-selection into the treatment). Imagine that prior research tells us that being from a high-income household has a positive causal effect on reading achievement. Given that students in Tennessee STAR were randomly assigned into the treatment, why are we not concerned that household income (measured by the variable lunch) affects our ability to estimate the average treatment effect?

  • YOUR ANSWER HERE:

/5

5. Below, we list the steps in hypothesis testing. Restate the research question and conduct a hypothesis test about whether the population mean reading score for students in the treatment group is equal the the population mean reading score for students in the control group, using an alpha level of 0.05.

Research Question:

  1. Hypothesis
    • formally state your “null” and “alternative” hypothesis
  2. Assumptions [YOU CAN SKIP THIS STEP]
    • state assumptions that are relied upon by the statistical test you are using to test your hypothesis
  3. Test statistic
    • Using some appropriate statistical analysis, calculate the “test statistic” necessary to test your hypothesis
  4. p-value (means probability value)
    • calculate the probability of observing a test statistic as large or larger as the one you calculated
  5. Alpha level/rejection region and conclusion
    • compare the p-value you observed to the alpha level and make a conclusion about your hypothesis test
  • YOUR ANSWER HERE:

/3

6. Below is the plot – created by the user-defined function plot_t_distribution() – of the sampling distribution assuming that \(H_0: \mu_{treatment} = \mu_{control}\) is true. Explain in your own words what is happening in the below plot and explain what the different statistics, dotted lines, and shaded areas mean.

#t.test(formula = read ~ treatment, data = df_stark)
plot_t_distribution(data_df = df_stark, data_var = 'read',group_var = 'treatment', group_cat = c(1, 0), shade_pval = TRUE)

  • YOUR ANSWER HERE:

Part 3: Bivariate regression

Consider the research question, what is the relationship between teacher years of teaching experience (\(X\)) and kindergarten reading score (\(Y\))?

In the data frame df_stark, teacher years of experience is measured by the variable experience.

Below is a scatterplot of the relationship between teacher years of experience \(X\) and kindergarten reading score (\(Y\))? We have also added an linear ordinary least squares (OLS) prediction line

df_stark %>%  ggplot(aes(x=experience, y=read)) + geom_point() + stat_smooth(method = 'lm')

/3

1. Using the lm() function create an object named mod1 that contains results from the bivariate regression of the relationship between years of teaching experience (\(X\)) and Kindergarten reading score (\(Y\)). Apply the summary() function to the object mod1 to print a summary of these regression results.

/6

2. For your analysis of the relationship teacher years of experience (\(X\)) and kindergarten reading score (\(Y\)), do the following: write out the population linear regression model (label the symbols and write out what the variables \(X\) and \(Y\) actually represent); write out the OLS prediction line (without estimate values); write out the OLS prediction line (with estimate values); interpet the point estimate value of \(\hat{\beta}_0\) in words; interpet the point estimate value of \(\hat{\beta}_1\) in words;


*note: you can always use general approach for interpreting \(\hat{\beta}_1\) in words:

  • “On average, a one-unit increase in \(X\) is associated with a \(\hat{\beta}_1\) increase (or decrease if \(\hat{\beta}_1\) is negative) in the value of \(Y\)

  • When interpreting \(\hat{\beta}_1\) in words, replace the generic “one-unit increase in \(X\)” with text that is specific to your analysis (e.g., “a one-year increase in years of teacher experience \(X\)); and do the same thing for”the value of \(Y\)"

  • YOUR ANSWER HERE:

/2

3. What is the predicted reading score for a student taught by a teacher who has: 5 years of experience? 20 years of experience? Show your work.

  • YOUR ANSWER HERE:

/7

4. Returning to our original research question (what is the effect of being in a small class size (\(X_i =1\)) versus being in a regular class size with a teacher aid (\(X_i=0\)) on the reading achievement scores (\(Y_i\)) of Kindergarten students?) do the following: run the regression in R (using lm() and summary()); write out the population linear regression model; write out the OLS prediction line (without estimate values); write out the OLS prediction line (with estimate values); interpet the point estimate value of \(\hat{\beta}_1\) in words;


*note: you can always use general approach for interpreting \(\hat{\beta}_1\) in words:

  • “On average, a one-unit increase in \(X\) is associated with a \(\hat{\beta}_1\) increase (or decrease if \(\hat{\beta}_1\) is negative) in the value of \(Y\)
  • This general approach is written with the idea of \(X\) being a continuous variable (e.g,. teacher years of experience), but it can also work with \(X\) is a dichotomous variable because going from \(X_i=0\) (control) to \(X_i=1\) (treatment) is still a one-unit increase in X
  • Modifying the general approach when \(X\) is a dichotomous variable:
    • instead of this: “a one-unit increase in \(X\) is associated with a …”
    • you can write this: “being assigned to the \(X_i=1\) as opposed to \(X_i=0\) is associated with a…”


  • YOUR ANSWER HERE:

Part 4: Post a comment/question

/2

  • Go to the class #problemsets channel and create a new post.
  • You can either:
    • Share something you learned or a question from this problem set. Make sure to mention the instructors (@ozanj, @Patricia Martín).
    • Respond to a post made by another student.

Knit to html and submit problem set

Knit to html by clicking the “Knit” button near the top of your RStudio window (icon with blue yarn ball) or drop down and select “Knit to HTML”

  • Go to the class website and under the “Readings & Assignments” >> “Week 5” tab, click on the “Problem set 2 submission link”
  • Submit both your html and .Rmd files
  • Use this naming convention “lastname_firstname_ps#” for your .Rmd (e.g. martin_patricia_ps2.Rmd)