Install/load packages and read in data
Tidyverse.Install Tidyverse (if you haven’t already) and load the library.
#install.packages("tidyverse") #uncomment this line out
library(tidyverse)
Read in a subset of the college scorecard data
#Read in data set (data frame)
load(url("https://github.com/anyone-can-cook/educ152/raw/main/data/college_scorecard/output_data/scorecard_edu_small.RData"))
#print first 6 observations or rows
scorecard_edu_sm
## # A tibble: 26 x 24
## opeid6 unitid instnm control ccbasic stabbr city cipdig2 cipcode cipdesc
## <chr> <dbl> <chr> <chr+l> <dbl+lb> <chr> <chr> <chr> <chr> <chr>
## 1 001305 243744 Stanf… Privat… 15 [Doc… CA Stan… 13 1301 "Educa…
## 2 001315 110662 Unive… Public 15 [Doc… CA Los … 13 1301 "Educa…
## 3 001328 123961 Unive… Privat… 15 [Doc… CA Los … 13 1304 "Educa…
## 4 001328 123961 Unive… Privat… 15 [Doc… CA Los … 13 1306 "Educa…
## 5 001328 123961 Unive… Privat… 15 [Doc… CA Los … 13 1311 "Stude…
## 6 001328 123961 Unive… Privat… 15 [Doc… CA Los … 13 1313 "Teach…
## 7 001328 123961 Unive… Privat… 15 [Doc… CA Los … 13 1314 "Teach…
## 8 002155 166027 Harva… Privat… 15 [Doc… MA Camb… 13 1301 "Educa…
## 9 002785 193900 New Y… Privat… 15 [Doc… NY New … 13 1304 "Educa…
## 10 002785 193900 New Y… Privat… 15 [Doc… NY New … 13 1305 "Educa…
## # … with 16 more rows, and 14 more variables: credlev <dbl+lbl>,
## # creddesc <chr>, ipedscount1 <chr>, ipedscount2 <chr>,
## # debt_all_stgp_eval_n <dbl>, debt_all_stgp_eval_mean <dbl>,
## # debt_all_stgp_eval_mdn <dbl>, debt_all_stgp_eval_mdn10yrpay <dbl>,
## # earn_count_wne_hi_1yr <dbl>, earn_mdn_hi_1yr <dbl>,
## # earn_count_wne_hi_2yr <dbl>, earn_mdn_hi_2yr <dbl>, region <dbl+lbl>,
## # locale <dbl+lbl>
Pipes
Tidyverse installed and loaded, we can begin to use pipes.names() to print the variable names in the scorecard_edu_sm dataframe.names(scorecard_edu_sm)
## [1] "opeid6" "unitid"
## [3] "instnm" "control"
## [5] "ccbasic" "stabbr"
## [7] "city" "cipdig2"
## [9] "cipcode" "cipdesc"
## [11] "credlev" "creddesc"
## [13] "ipedscount1" "ipedscount2"
## [15] "debt_all_stgp_eval_n" "debt_all_stgp_eval_mean"
## [17] "debt_all_stgp_eval_mdn" "debt_all_stgp_eval_mdn10yrpay"
## [19] "earn_count_wne_hi_1yr" "earn_mdn_hi_1yr"
## [21] "earn_count_wne_hi_2yr" "earn_mdn_hi_2yr"
## [23] "region" "locale"
select() function to select the column or variable instnm.scorecard_edu_sm %>% select(instnm)
## # A tibble: 26 x 1
## instnm
## <chr>
## 1 Stanford University
## 2 University of California-Los Angeles
## 3 University of Southern California
## 4 University of Southern California
## 5 University of Southern California
## 6 University of Southern California
## 7 University of Southern California
## 8 Harvard University
## 9 New York University
## 10 New York University
## # … with 16 more rows
filter() function to filter for rows or observations of universities in California.scorecard_edu_sm %>% select(instnm, stabbr) %>% filter(stabbr == "CA")
## # A tibble: 7 x 2
## instnm stabbr
## <chr> <chr>
## 1 Stanford University CA
## 2 University of California-Los Angeles CA
## 3 University of Southern California CA
## 4 University of Southern California CA
## 5 University of Southern California CA
## 6 University of Southern California CA
## 7 University of Southern California CA
<- or =. The cali_schools is an object I created below.cali_schools <- scorecard_edu_sm %>% filter(stabbr == "CA") %>% select(instnm, stabbr) %>% unique() #using the unique() function to only grab unique rows/observations
list object.typeof(cali_schools)
## [1] "list"
class(cali_schools)
## [1] "tbl_df" "tbl" "data.frame"
Question 1: Using the dataframe scorecard_edu_sm, select the variables/columns instnm, stabbr, and control. Don’t forget to use a pipe %>%
scorecard_edu_sm %>% select(instnm, stabbr, control)
## # A tibble: 26 x 3
## instnm stabbr control
## <chr> <chr> <chr+lbl>
## 1 Stanford University CA Private, nonprofit
## 2 University of California-Los Angeles CA Public
## 3 University of Southern California CA Private, nonprofit
## 4 University of Southern California CA Private, nonprofit
## 5 University of Southern California CA Private, nonprofit
## 6 University of Southern California CA Private, nonprofit
## 7 University of Southern California CA Private, nonprofit
## 8 Harvard University MA Private, nonprofit
## 9 New York University NY Private, nonprofit
## 10 New York University NY Private, nonprofit
## # … with 16 more rows
Question 2: Using the same code from above, filter for universities in the state stabbr of “NY”. Don’t forget to use a pipes %>% in your code.
scorecard_edu_sm %>% select(instnm, stabbr, control) %>% filter(stabbr == "NY")
## # A tibble: 18 x 3
## instnm stabbr control
## <chr> <chr> <chr+lbl>
## 1 New York University NY Private, nonprofit
## 2 New York University NY Private, nonprofit
## 3 New York University NY Private, nonprofit
## 4 New York University NY Private, nonprofit
## 5 New York University NY Private, nonprofit
## 6 New York University NY Private, nonprofit
## 7 New York University NY Private, nonprofit
## 8 Teachers College at Columbia University NY Private, nonprofit
## 9 Teachers College at Columbia University NY Private, nonprofit
## 10 Teachers College at Columbia University NY Private, nonprofit
## 11 Teachers College at Columbia University NY Private, nonprofit
## 12 Teachers College at Columbia University NY Private, nonprofit
## 13 Teachers College at Columbia University NY Private, nonprofit
## 14 Teachers College at Columbia University NY Private, nonprofit
## 15 Teachers College at Columbia University NY Private, nonprofit
## 16 Teachers College at Columbia University NY Private, nonprofit
## 17 Teachers College at Columbia University NY Private, nonprofit
## 18 Teachers College at Columbia University NY Private, nonprofit
Question 3: Now say we want to save our code from above as an object. Create an object called ny_schools. Don’t forget to use a pipes %>% in your code. Hint: you maybe want to use unique() at the end of your code
ny_schools <- scorecard_edu_sm %>% select(instnm, stabbr, control) %>% filter(stabbr == "NY") %>% unique()