In-class exercise: Objects, Tidyverse, & Pipes

Instructions

Take a few minutes to review section 3, Tidyverse & Pipes, from the R walkthrough document here

Task

Install/load packages and read in data

Run the code chunks below to install and load Tidyverse.
You will be reading in a subset of the college scorecard data from lecture that includes the following 8 universities:
- UC-Berkeley = 001312
- UCLA = 001315
- USC = 001328
- Stanford = 001305
- Columbia = 002707
- Columbia, Teacher’s College = 003979
- NYU = 002785
- Harvard = 002155.

Install Tidyverse (if you haven’t already) and load the library.

#install.packages("tidyverse") #uncomment this line out
library(tidyverse)

Read in a subset of the college scorecard data

#Read in data set (data frame)
load(url("https://github.com/anyone-can-cook/educ152/raw/main/data/college_scorecard/output_data/scorecard_edu_small.RData"))

#print first 6 observations or rows
scorecard_edu_sm

## # A tibble: 26 x 24
##    opeid6 unitid instnm control  ccbasic stabbr city  cipdig2 cipcode cipdesc
##    <chr>   <dbl> <chr>  <chr+l> <dbl+lb> <chr>  <chr> <chr>   <chr>   <chr>  
##  1 001305 243744 Stanf… Privat… 15 [Doc… CA     Stan… 13      1301    "Educa…
##  2 001315 110662 Unive… Public  15 [Doc… CA     Los … 13      1301    "Educa…
##  3 001328 123961 Unive… Privat… 15 [Doc… CA     Los … 13      1304    "Educa…
##  4 001328 123961 Unive… Privat… 15 [Doc… CA     Los … 13      1306    "Educa…
##  5 001328 123961 Unive… Privat… 15 [Doc… CA     Los … 13      1311    "Stude…
##  6 001328 123961 Unive… Privat… 15 [Doc… CA     Los … 13      1313    "Teach…
##  7 001328 123961 Unive… Privat… 15 [Doc… CA     Los … 13      1314    "Teach…
##  8 002155 166027 Harva… Privat… 15 [Doc… MA     Camb… 13      1301    "Educa…
##  9 002785 193900 New Y… Privat… 15 [Doc… NY     New … 13      1304    "Educa…
## 10 002785 193900 New Y… Privat… 15 [Doc… NY     New … 13      1305    "Educa…
## # … with 16 more rows, and 14 more variables: credlev <dbl+lbl>,
## #   creddesc <chr>, ipedscount1 <chr>, ipedscount2 <chr>,
## #   debt_all_stgp_eval_n <dbl>, debt_all_stgp_eval_mean <dbl>,
## #   debt_all_stgp_eval_mdn <dbl>, debt_all_stgp_eval_mdn10yrpay <dbl>,
## #   earn_count_wne_hi_1yr <dbl>, earn_mdn_hi_1yr <dbl>,
## #   earn_count_wne_hi_2yr <dbl>, earn_mdn_hi_2yr <dbl>, region <dbl+lbl>,
## #   locale <dbl+lbl>

Pipes

Now that we have Tidyverse installed and loaded, we can begin to use pipes.
Run the code chunk below using the names() to print the variable names in the scorecard_edu_sm dataframe.

names(scorecard_edu_sm)

##  [1] "opeid6"                        "unitid"                       
##  [3] "instnm"                        "control"                      
##  [5] "ccbasic"                       "stabbr"                       
##  [7] "city"                          "cipdig2"                      
##  [9] "cipcode"                       "cipdesc"                      
## [11] "credlev"                       "creddesc"                     
## [13] "ipedscount1"                   "ipedscount2"                  
## [15] "debt_all_stgp_eval_n"          "debt_all_stgp_eval_mean"      
## [17] "debt_all_stgp_eval_mdn"        "debt_all_stgp_eval_mdn10yrpay"
## [19] "earn_count_wne_hi_1yr"         "earn_mdn_hi_1yr"              
## [21] "earn_count_wne_hi_2yr"         "earn_mdn_hi_2yr"              
## [23] "region"                        "locale"

Say we wanted to view only the institution names. Run the code chunk below using the select() function to select the column or variable instnm.

scorecard_edu_sm %>% select(instnm)

## # A tibble: 26 x 1
##    instnm                              
##    <chr>                               
##  1 Stanford University                 
##  2 University of California-Los Angeles
##  3 University of Southern California   
##  4 University of Southern California   
##  5 University of Southern California   
##  6 University of Southern California   
##  7 University of Southern California   
##  8 Harvard University                  
##  9 New York University                 
## 10 New York University                 
## # … with 16 more rows

What if I only wanted to see universities in California? Run the code chunk below using the filter() function to filter for rows or observations of universities in California.

scorecard_edu_sm %>% select(instnm, stabbr) %>% filter(stabbr == "CA")

## # A tibble: 7 x 2
##   instnm                               stabbr
##   <chr>                                <chr> 
## 1 Stanford University                  CA    
## 2 University of California-Los Angeles CA    
## 3 University of Southern California    CA    
## 4 University of Southern California    CA    
## 5 University of Southern California    CA    
## 6 University of Southern California    CA    
## 7 University of Southern California    CA

Say I wanted to create an object of universities in CA. I would have to assign it using the assignment operator <- or =. The cali_schools is an object I created below.

cali_schools <- scorecard_edu_sm %>% filter(stabbr == "CA") %>% select(instnm, stabbr) %>% unique() #using the unique() function to only grab unique rows/observations

It is considered a list object.

typeof(cali_schools)

## [1] "list"

class(cali_schools)

## [1] "tbl_df"     "tbl"        "data.frame"

Now it is your turn:

Question 1: Using the dataframe scorecard_edu_sm, select the variables/columns instnm, stabbr, and control. Don’t forget to use a pipe %>%

Solutions

scorecard_edu_sm %>% select(instnm, stabbr, control)

## # A tibble: 26 x 3
##    instnm                               stabbr control           
##    <chr>                                <chr>  <chr+lbl>         
##  1 Stanford University                  CA     Private, nonprofit
##  2 University of California-Los Angeles CA     Public            
##  3 University of Southern California    CA     Private, nonprofit
##  4 University of Southern California    CA     Private, nonprofit
##  5 University of Southern California    CA     Private, nonprofit
##  6 University of Southern California    CA     Private, nonprofit
##  7 University of Southern California    CA     Private, nonprofit
##  8 Harvard University                   MA     Private, nonprofit
##  9 New York University                  NY     Private, nonprofit
## 10 New York University                  NY     Private, nonprofit
## # … with 16 more rows

Question 2: Using the same code from above, filter for universities in the state stabbr of “NY”. Don’t forget to use a pipes %>% in your code.

Solutions

scorecard_edu_sm %>% select(instnm, stabbr, control) %>% filter(stabbr == "NY")

## # A tibble: 18 x 3
##    instnm                                  stabbr control           
##    <chr>                                   <chr>  <chr+lbl>         
##  1 New York University                     NY     Private, nonprofit
##  2 New York University                     NY     Private, nonprofit
##  3 New York University                     NY     Private, nonprofit
##  4 New York University                     NY     Private, nonprofit
##  5 New York University                     NY     Private, nonprofit
##  6 New York University                     NY     Private, nonprofit
##  7 New York University                     NY     Private, nonprofit
##  8 Teachers College at Columbia University NY     Private, nonprofit
##  9 Teachers College at Columbia University NY     Private, nonprofit
## 10 Teachers College at Columbia University NY     Private, nonprofit
## 11 Teachers College at Columbia University NY     Private, nonprofit
## 12 Teachers College at Columbia University NY     Private, nonprofit
## 13 Teachers College at Columbia University NY     Private, nonprofit
## 14 Teachers College at Columbia University NY     Private, nonprofit
## 15 Teachers College at Columbia University NY     Private, nonprofit
## 16 Teachers College at Columbia University NY     Private, nonprofit
## 17 Teachers College at Columbia University NY     Private, nonprofit
## 18 Teachers College at Columbia University NY     Private, nonprofit

Question 3: Now say we want to save our code from above as an object. Create an object called ny_schools. Don’t forget to use a pipes %>% in your code. Hint: you maybe want to use unique() at the end of your code

Solutions

ny_schools <- scorecard_edu_sm %>% select(instnm, stabbr, control) %>% filter(stabbr == "NY") %>% unique()