All you need is a Twitter account (user name and password) and you can be up in running in minutes!

Source: rtweet README

1 Introduction

Load packages:

# install.packages('rtweet')

library(rtweet)
library(tidyverse)

Documentation and resources:

2 rtweet package


What is the rtweet package?

  • R package for interacting with Twitter’s API
  • An API (application programming interface) is a computing interface which defines interactions between multiple software intermediaries (e.g., the kinds of calls or requests that can be made)
  • We can use it to make requests to Twitter’s API and request data


How to get started:

  • Create a Twitter account if you don’t have one already
  • Load the rtweet package
  • In your console (i.e., an interactive session of R), make a request to Twitter’s API using one of rtweet’s functions (e.g., search_tweets())
  • A browser window should pop up, prompting you to log in your Twitter account and authorize the rstats2twitter app

    > tweets <- search_tweets('#ucla')
    Requesting token on behalf of user...
    Waiting for authentication in browser...
    Press Esc/Ctrl + C to abort
Twitter authentication

3 search_tweets() function


Description:

search_tweets() is a function from the rtweet package that returns Twitter statuses matching a user-provided search query.


Usage:

?search_tweets

# SYNTAX AND DEFAULT VALUES
search_tweets(
  q,
  n = 100,
  type = "recent",
  include_rts = TRUE,
  geocode = NULL,
  max_id = NULL,
  parse = TRUE,
  token = NULL,
  retryonratelimit = FALSE,
  verbose = TRUE,
  ...
)


Arguments:

  • q: query to be searched (same as if you searched here)
    • Use OR to search for tweets containing either search terms (e.g., data OR science)
    • Use from: to search for tweets from certain handles (e.g., from:UCLA)
    • Use filter: to only include certain kinds of tweets (e.g., filter:verified)
    • Use -filter: to exclude certain kinds of tweets (e.g., -filter:replies)
  • n: total number of tweets to return
  • include_rts: whether or not to include retweets in search results
  • etc.

4 Example


Request data:

We will use rtweet to pull Twitter data from the PAC-12 universities. We will use the university admissions Twitter handle if there is one, or the main Twitter handle for the university if there isn’t one:

# PAC-12 university handles
p12 <- c('uaadmissions', 'FutureSunDevils', 'caladmissions', 'UCLAAdmission',
         'futurebuffs', 'uoregon', 'BeaverVIP', 'USCAdmission',
         'engagestanford', 'UtahAdmissions', 'UW', 'WSUPullman')

# Use `OR` to search for tweets containing any of the handles
my_query <- paste0('from:', p12, collapse = ' OR ')
my_query
[1] "from:uaadmissions OR from:FutureSunDevils OR from:caladmissions OR from:UCLAAdmission OR from:futurebuffs OR from:uoregon OR from:BeaverVIP OR from:USCAdmission OR from:engagestanford OR from:UtahAdmissions OR from:UW OR from:WSUPullman"
# Request Twitter data
p12_df_2 <- search_tweets(my_query, n = 500)


Analyze data:

search_tweets() conveniently returns the data in a dataframe, so it’s ready for you to perform data manipulations!

# Inspect dataframe
str(p12_df, max.level = 1, strict.width = 'cut')
Classes 'tbl_df', 'tbl' and 'data.frame':   328 obs. of  90 variables:
 $ user_id                : chr  "22080148" "22080148" "22080148" "22080"..
 $ status_id              : chr  "1254177694599675904" "1253431405993840"..
 $ created_at             : POSIXct, format: "2020-04-25 22:37:18" "2020"..
 $ screen_name            : chr  "WSUPullman" "WSUPullman" "WSUPullman" "..
 $ text                   : chr  "Big Dez is headed to Indy!\n\n#GoCougs"..
 $ source                 : chr  "Twitter for iPhone" "Twitter Web App" "..
 $ display_text_width     : num  125 58 246 83 56 64 156 271 69 140 ...
 $ reply_to_status_id     : chr  NA NA NA NA ...
 $ reply_to_user_id       : chr  NA NA NA NA ...
 $ reply_to_screen_name   : chr  NA NA NA NA ...
 $ is_quote               : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
 $ is_retweet             : logi  TRUE FALSE FALSE FALSE FALSE FALSE ...
 $ favorite_count         : int  0 322 30 55 186 53 22 44 11 0 ...
 $ retweet_count          : int  230 32 1 5 0 3 2 6 2 6 ...
 $ quote_count            : int  NA NA NA NA NA NA NA NA NA NA ...
 $ reply_count            : int  NA NA NA NA NA NA NA NA NA NA ...
 $ hashtags               :List of 328
 $ symbols                :List of 328
 $ urls_url               :List of 328
 $ urls_t.co              :List of 328
 $ urls_expanded_url      :List of 328
 $ media_url              :List of 328
 $ media_t.co             :List of 328
 $ media_expanded_url     :List of 328
 $ media_type             :List of 328
 $ ext_media_url          :List of 328
 $ ext_media_t.co         :List of 328
 $ ext_media_expanded_url :List of 328
 $ ext_media_type         : chr  NA NA NA NA ...
 $ mentions_user_id       :List of 328
 $ mentions_screen_name   :List of 328
 $ lang                   : chr  "en" "en" "en" "en" ...
 $ quoted_status_id       : chr  NA NA NA NA ...
 $ quoted_text            : chr  NA NA NA NA ...
 $ quoted_created_at      : POSIXct, format: NA NA ...
 $ quoted_source          : chr  NA NA NA NA ...
 $ quoted_favorite_count  : int  NA NA NA NA NA NA NA NA NA NA ...
 $ quoted_retweet_count   : int  NA NA NA NA NA NA NA NA NA NA ...
 $ quoted_user_id         : chr  NA NA NA NA ...
 $ quoted_screen_name     : chr  NA NA NA NA ...
 $ quoted_name            : chr  NA NA NA NA ...
 $ quoted_followers_count : int  NA NA NA NA NA NA NA NA NA NA ...
 $ quoted_friends_count   : int  NA NA NA NA NA NA NA NA NA NA ...
 $ quoted_statuses_count  : int  NA NA NA NA NA NA NA NA NA NA ...
 $ quoted_location        : chr  NA NA NA NA ...
 $ quoted_description     : chr  NA NA NA NA ...
 $ quoted_verified        : logi  NA NA NA NA NA NA ...
 $ retweet_status_id      : chr  "1254159118996127746" NA NA NA ...
 $ retweet_text           : chr  "Big Dez is headed to Indy!\n\n#GoCougs"..
 $ retweet_created_at     : POSIXct, format: "2020-04-25 21:23:29" NA ...
 $ retweet_source         : chr  "Twitter for iPhone" NA NA NA ...
 $ retweet_favorite_count : int  1402 NA NA NA NA NA NA NA NA 26 ...
 $ retweet_retweet_count  : int  230 NA NA NA NA NA NA NA NA 6 ...
 $ retweet_user_id        : chr  "1250265324" NA NA NA ...
 $ retweet_screen_name    : chr  "WSUCougarFB" NA NA NA ...
 $ retweet_name           : chr  "Washington State Football" NA NA NA ...
 $ retweet_followers_count: int  77527 NA NA NA NA NA NA NA NA 996 ...
 $ retweet_friends_count  : int  1448 NA NA NA NA NA NA NA NA 316 ...
 $ retweet_statuses_count : int  15363 NA NA NA NA NA NA NA NA 1666 ...
 $ retweet_location       : chr  "Pullman, WA" NA NA NA ...
 $ retweet_description    : chr  "Official Twitter home of Washington St"..
 $ retweet_verified       : logi  TRUE NA NA NA NA NA ...
 $ place_url              : chr  NA NA NA NA ...
 $ place_name             : chr  NA NA NA NA ...
 $ place_full_name        : chr  NA NA NA NA ...
 $ place_type             : chr  NA NA NA NA ...
 $ country                : chr  NA NA NA NA ...
 $ country_code           : chr  NA NA NA NA ...
 $ geo_coords             :List of 328
 $ coords_coords          :List of 328
 $ bbox_coords            :List of 328
 $ status_url             : chr  "https://twitter.com/WSUPullman/status/"..
 $ name                   : chr  "WSU Pullman" "WSU Pullman" "WSU Pullma"..
 $ location               : chr  "Pullman, Washington USA" "Pullman, Was"..
 $ description            : chr  "We are an award-winning research unive"..
 $ url                    : chr  "http://t.co/VxKZH9BuMS" "http://t.co/V"..
 $ protected              : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
 $ followers_count        : int  43914 43914 43914 43914 43914 43914 4391..
 $ friends_count          : int  9717 9717 9717 9717 9717 9717 9717 9717 ..
 $ listed_count           : int  556 556 556 556 556 556 556 556 556 556 ..
 $ statuses_count         : int  15234 15234 15234 15234 15234 15234 1523..
 $ favourites_count       : int  20124 20124 20124 20124 20124 20124 2012..
 $ account_created_at     : POSIXct, format: "2009-02-26 23:39:34" "2009"..
 $ verified               : logi  TRUE TRUE TRUE TRUE TRUE TRUE ...
 $ profile_url            : chr  "http://t.co/VxKZH9BuMS" "http://t.co/V"..
 $ profile_expanded_url   : chr  "http://www.wsu.edu" "http://www.wsu.ed"..
 $ account_lang           : logi  NA NA NA NA NA NA ...
 $ profile_banner_url     : chr  "https://pbs.twimg.com/profile_banners/"..
 $ profile_background_url : chr  "http://abs.twimg.com/images/themes/the"..
 $ profile_image_url      : chr  "http://pbs.twimg.com/profile_images/57"..
# Select certain variables
df <- p12_df %>% select('user_id', 'created_at', 'screen_name', 'text', 'location')
head(df)
# A tibble: 6 x 5
  user_id  created_at          screen_name text                 location   
  <chr>    <dttm>              <chr>       <chr>                <chr>      
1 22080148 2020-04-25 22:37:18 WSUPullman  "Big Dez is headed … Pullman, W…
2 22080148 2020-04-23 21:11:49 WSUPullman  Cougar Cheese. That… Pullman, W…
3 22080148 2020-04-21 04:00:00 WSUPullman  "Darien McLaughlin … Pullman, W…
4 22080148 2020-04-24 03:00:00 WSUPullman  6 houses, one pick.… Pullman, W…
5 22080148 2020-04-20 19:00:21 WSUPullman  Why did you choose … Pullman, W…
6 22080148 2020-04-20 02:20:01 WSUPullman  Tell us one of your… Pullman, W…
# Filter and sort variables
df %>% filter(screen_name == 'UCLAAdmission') %>% arrange(created_at)
# A tibble: 6 x 5
  user_id  created_at          screen_name  text                  location 
  <chr>    <dttm>              <chr>        <chr>                 <chr>    
1 2938776… 2020-04-22 01:45:02 UCLAAdmissi… Study abroad is one … Los Ange…
2 2938776… 2020-04-23 02:00:14 UCLAAdmissi… How do you know what… Los Ange…
3 2938776… 2020-04-24 02:11:51 UCLAAdmissi… A big benefit to liv… Los Ange…
4 2938776… 2020-04-25 00:05:59 UCLAAdmissi… Admission decisions … Los Ange…
5 2938776… 2020-04-25 00:29:12 UCLAAdmissi… Loving the Bruin fam… Los Ange…
6 2938776… 2020-04-25 00:34:47 UCLAAdmissi… @Ercramzep Congratul… Los Ange…
# Aggregate data
df %>% group_by(screen_name) %>% count() %>% arrange(desc(n))
# A tibble: 11 x 2
   screen_name         n
   <chr>           <int>
 1 CalAdmissions      85
 2 uoregon            61
 3 FutureSunDevils    51
 4 UW                 49
 5 WSUPullman         22
 6 uaadmissions       20
 7 UtahAdmissions     20
 8 UCLAAdmission       6
 9 USCAdmission        6
10 futurebuffs         5
11 BeaverVIP           3