Instructions

Task

Install/load packages and read in data

  • Run the code chunks below to install and load Tidyverse.
    You will be reading in a subset of the college scorecard data from lecture that includes the following 8 universities:
    • UC-Berkeley = 001312
    • UCLA = 001315
    • USC = 001328
    • Stanford = 001305
    • Columbia = 002707
    • Columbia, Teacher’s College = 003979
    • NYU = 002785
    • Harvard = 002155.

Install Tidyverse (if you haven’t already) and load the library.

#install.packages("tidyverse") #uncomment this line out
library(tidyverse)

Read in a subset of the college scorecard data

#Read in data set (data frame)
load(url("https://github.com/anyone-can-cook/educ152/raw/main/data/college_scorecard/output_data/scorecard_edu_small.RData"))

#print first 6 observations or rows
scorecard_edu_sm
## # A tibble: 26 x 24
##    opeid6 unitid instnm control  ccbasic stabbr city  cipdig2 cipcode cipdesc
##    <chr>   <dbl> <chr>  <chr+l> <dbl+lb> <chr>  <chr> <chr>   <chr>   <chr>  
##  1 001305 243744 Stanf… Privat… 15 [Doc… CA     Stan… 13      1301    "Educa…
##  2 001315 110662 Unive… Public  15 [Doc… CA     Los … 13      1301    "Educa…
##  3 001328 123961 Unive… Privat… 15 [Doc… CA     Los … 13      1304    "Educa…
##  4 001328 123961 Unive… Privat… 15 [Doc… CA     Los … 13      1306    "Educa…
##  5 001328 123961 Unive… Privat… 15 [Doc… CA     Los … 13      1311    "Stude…
##  6 001328 123961 Unive… Privat… 15 [Doc… CA     Los … 13      1313    "Teach…
##  7 001328 123961 Unive… Privat… 15 [Doc… CA     Los … 13      1314    "Teach…
##  8 002155 166027 Harva… Privat… 15 [Doc… MA     Camb… 13      1301    "Educa…
##  9 002785 193900 New Y… Privat… 15 [Doc… NY     New … 13      1304    "Educa…
## 10 002785 193900 New Y… Privat… 15 [Doc… NY     New … 13      1305    "Educa…
## # … with 16 more rows, and 14 more variables: credlev <dbl+lbl>,
## #   creddesc <chr>, ipedscount1 <chr>, ipedscount2 <chr>,
## #   debt_all_stgp_eval_n <dbl>, debt_all_stgp_eval_mean <dbl>,
## #   debt_all_stgp_eval_mdn <dbl>, debt_all_stgp_eval_mdn10yrpay <dbl>,
## #   earn_count_wne_hi_1yr <dbl>, earn_mdn_hi_1yr <dbl>,
## #   earn_count_wne_hi_2yr <dbl>, earn_mdn_hi_2yr <dbl>, region <dbl+lbl>,
## #   locale <dbl+lbl>

Pipes

  • Now that we have Tidyverse installed and loaded, we can begin to use pipes.
  • Run the code chunk below using the names() to print the variable names in the scorecard_edu_sm dataframe.
names(scorecard_edu_sm)
##  [1] "opeid6"                        "unitid"                       
##  [3] "instnm"                        "control"                      
##  [5] "ccbasic"                       "stabbr"                       
##  [7] "city"                          "cipdig2"                      
##  [9] "cipcode"                       "cipdesc"                      
## [11] "credlev"                       "creddesc"                     
## [13] "ipedscount1"                   "ipedscount2"                  
## [15] "debt_all_stgp_eval_n"          "debt_all_stgp_eval_mean"      
## [17] "debt_all_stgp_eval_mdn"        "debt_all_stgp_eval_mdn10yrpay"
## [19] "earn_count_wne_hi_1yr"         "earn_mdn_hi_1yr"              
## [21] "earn_count_wne_hi_2yr"         "earn_mdn_hi_2yr"              
## [23] "region"                        "locale"
  • Say we wanted to view only the institution names. Run the code chunk below using the select() function to select the column or variable instnm.
scorecard_edu_sm %>% select(instnm)
## # A tibble: 26 x 1
##    instnm                              
##    <chr>                               
##  1 Stanford University                 
##  2 University of California-Los Angeles
##  3 University of Southern California   
##  4 University of Southern California   
##  5 University of Southern California   
##  6 University of Southern California   
##  7 University of Southern California   
##  8 Harvard University                  
##  9 New York University                 
## 10 New York University                 
## # … with 16 more rows
  • What if I only wanted to see universities in California? Run the code chunk below using the filter() function to filter for rows or observations of universities in California.
scorecard_edu_sm %>% select(instnm, stabbr) %>% filter(stabbr == "CA")
## # A tibble: 7 x 2
##   instnm                               stabbr
##   <chr>                                <chr> 
## 1 Stanford University                  CA    
## 2 University of California-Los Angeles CA    
## 3 University of Southern California    CA    
## 4 University of Southern California    CA    
## 5 University of Southern California    CA    
## 6 University of Southern California    CA    
## 7 University of Southern California    CA
  • Say I wanted to create an object of universities in CA. I would have to assign it using the assignment operator <- or =. The cali_schools is an object I created below.
cali_schools <- scorecard_edu_sm %>% filter(stabbr == "CA") %>% select(instnm, stabbr) %>% unique() #using the unique() function to only grab unique rows/observations
  • It is considered a list object.
typeof(cali_schools)
## [1] "list"
class(cali_schools)
## [1] "tbl_df"     "tbl"        "data.frame"

Now it is your turn:

Question 1: Using the dataframe scorecard_edu_sm, select the variables/columns instnm, stabbr, and control. Don’t forget to use a pipe %>%

Solutions
scorecard_edu_sm %>% select(instnm, stabbr, control)
## # A tibble: 26 x 3
##    instnm                               stabbr control           
##    <chr>                                <chr>  <chr+lbl>         
##  1 Stanford University                  CA     Private, nonprofit
##  2 University of California-Los Angeles CA     Public            
##  3 University of Southern California    CA     Private, nonprofit
##  4 University of Southern California    CA     Private, nonprofit
##  5 University of Southern California    CA     Private, nonprofit
##  6 University of Southern California    CA     Private, nonprofit
##  7 University of Southern California    CA     Private, nonprofit
##  8 Harvard University                   MA     Private, nonprofit
##  9 New York University                  NY     Private, nonprofit
## 10 New York University                  NY     Private, nonprofit
## # … with 16 more rows


Question 2: Using the same code from above, filter for universities in the state stabbr of “NY”. Don’t forget to use a pipes %>% in your code.

Solutions
scorecard_edu_sm %>% select(instnm, stabbr, control) %>% filter(stabbr == "NY")
## # A tibble: 18 x 3
##    instnm                                  stabbr control           
##    <chr>                                   <chr>  <chr+lbl>         
##  1 New York University                     NY     Private, nonprofit
##  2 New York University                     NY     Private, nonprofit
##  3 New York University                     NY     Private, nonprofit
##  4 New York University                     NY     Private, nonprofit
##  5 New York University                     NY     Private, nonprofit
##  6 New York University                     NY     Private, nonprofit
##  7 New York University                     NY     Private, nonprofit
##  8 Teachers College at Columbia University NY     Private, nonprofit
##  9 Teachers College at Columbia University NY     Private, nonprofit
## 10 Teachers College at Columbia University NY     Private, nonprofit
## 11 Teachers College at Columbia University NY     Private, nonprofit
## 12 Teachers College at Columbia University NY     Private, nonprofit
## 13 Teachers College at Columbia University NY     Private, nonprofit
## 14 Teachers College at Columbia University NY     Private, nonprofit
## 15 Teachers College at Columbia University NY     Private, nonprofit
## 16 Teachers College at Columbia University NY     Private, nonprofit
## 17 Teachers College at Columbia University NY     Private, nonprofit
## 18 Teachers College at Columbia University NY     Private, nonprofit


Question 3: Now say we want to save our code from above as an object. Create an object called ny_schools. Don’t forget to use a pipes %>% in your code. Hint: you maybe want to use unique() at the end of your code

Solutions
ny_schools <- scorecard_edu_sm %>% select(instnm, stabbr, control) %>% filter(stabbr == "NY") %>% unique()