Introduction to R

1 What is R? Why R?

For detailed info visit R-project.org

The Inter-University Consortium for Political and Social Research (ICPSR) says:

R is “an alternative to traditional statistical packages such as SPSS, SAS, and Stata such that it is an extensible, open-source language and computing environment for Windows, Macintosh, UNIX, and Linux platforms. Such software allows for the user to freely distribute, study, change, and improve the software under the Free Software Foundation’s GNU General Public License.”

I don’t find this definition particularly helpful. I think of R as:

An “open source” programming language and software that provide collections of interrelated “functions”
“open source” means that R is free and created by the user community. The user community can modify basic things about R and add new capabilities to what R can do
a “function” is usually something that takes in some “input,” processes this input in some way, and creates some “output”
- e.g., the max() function takes as input a collection of numbers (e.g., 3,5,6) and returns as output the number with the maximum value
- e.g., the lm() function takes in as inputs a dataset and a statistical model you specify within the function, and returns as output the results of the regression model

1.1 Base R vs. R packages

Base R

When you install R, you automatically install the “Base R” set of functions
Example of a few of the functions in in Base R:
- as.character() function
- print() function
- setwd() function

R packages

an R “package” (or “library”) is a collection of (related) functions developed by the R community
Examples of R packages:
- tidyverse package for manipulating and visualizing data
- igraph package for network analyses
- leaflet package for mapping
- rvest package for webscraping
All R packages are free!
Often a package we install may be a collection of packages (e.g., tidyverse) and/or may depend on other packages, which will be automatically installed for you

Installing and Loading R packages

You only need to install a package once. To install an R package use install.package() function.

#install.packages("tidyverse")

You need to load a package everytime you plan to use it. To load a package use the library() function.

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.4.3     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

1.2 RStudio

“RStudio is an integrated development environment (IDE) for R. It includes a console, syntax-highlighting editor that supports direct code execution, as well as tools for plotting, history, debugging and workspace management.”

Pane Layout from RStudio User Guide

1.3 Markdown

What is Markdown?

Markdown is a set of rules that define how the layout and presentation of text and images appear in a document. One of the most popular markup languages is HTML. For example, when you go to a website, there are certain sections like a header, navigation bar, specific headings, color schemes, font styles, etc.
The Markdown language was created by John Gruber in 2004.

How is Markdown different than a WYSIWYG (WIZ–ee–wig) editor?

What you see is what you get (WYSIWYG) editing software enables you to point and click to make changes to the format, text, and images in a document and view the changes immediately (e.g., Microsoft Word, Google Docs).
In a Markdown file, you use Markdown syntax to format text and images.

Example Markdown syntax:

1.4 Level two heading

1.4.1 Level three heading

Bullet point

Bold text

italics

What is markdown used for?

HTML or PDF documents
Websites
Note taking (Obsidian)
Books
Presentations and more

See The Markdown Guide for more on Markdown.

1.5 Quarto (.qmd)

What is Quarto?

From How Quarto works in R

From What can I use Quarto for?
- “Quarto is an open-source scientific and technical publishing system built on Pandoc. You can weave together narrative text and code to produce elegantly formatted output as documents, web pages, blog posts, books and more.”
Think of a Quarto file with the extension .qmd as a document written in Markdown syntax that embeds R code (Markdown + R code).
The integration of Quarto in R is relatively new and builds off of R Markdown.
- From RStudio: “R Markdown is a file format for making dynamic documents with R. An R Markdown document is written in markdown (an easy-to-write plain text format) and contains chunks of embedded R code.”
Similar to Quarto, for the last ten years, R users have relied on R Markdown to produce static and dynamic documents. However, “Quarto is a multi-language, next generation version of R Markdown from Posit, with many new new features and capabilities. Like R Markdown, Quarto uses knitr to execute R code, and is therefore able to render most existing Rmd files without modification.” From Quarto.

Key Takeaways:

Quarto does not depend on R; therefore, anyone using Python, Javascript, Julia, etc., can create documents with Quarto (.qmd) and collaborate on projects much more easily.
The Quarto visual editor in R Studio provides a WYSIWYG editing interface, making document creation seamless.
Quarto works with R Markdown and has additional features, making it easier to use.

What is Quarto used for?

HTML, PDF, MS Word documents
Presentations in Revealjs, Powerpoint, and Beamer
Markdown (GitHub)
Wikis

How we will be using Quarto in this class

All lectures created using Quarto (.qmd) file extension
You will use Quarto to complete homework assignments

After this class you might:

never use Microsoft Word again!
Use Quarto to create: papers for class; presentations; journal manuscripts; your dissertation; etc.

Further Reading:

Markdown Guide, Basic Syntax: https://www.markdownguide.org/basic-syntax/
A Brief History of R Markdown: https://slides.yihui.org/2021-Brazilian-R-Day.html#1
R Markdown the Definitive Guide: https://bookdown.org/yihui/rmarkdown/
RStudio User Guide Quarto visual editor: https://docs.posit.co/ide/user/ide/guide/documents/visual-editor.html

2 Executing R commands

2.1 R as a calculator

[1] 5

5+2

[1] 7

10*3

[1] 30

2.2 Shortcuts for Executing Commands in R

Three ways to execute commands in R

Type/copy commands directly into the “console”
`code chunks’ in RMarkdown (.Rmd & .QMD files)
- Can execute one command at a time, one chunk at a time, or “knit” the entire
- Cmd/Ctrl + Enter: execute highlighted line(s) within chunk
- Cmd/Ctrl + Shift + k: “knit” entire document
R scripts (.R files)
- This is just a text file full of R commands
- Cmd/Ctrl + Enter: execute highlighted line(s)
- Cmd/Ctrl + Shift + Enter (without highlighting any lines): run entire script

5+2

[1] 7

10*3

[1] 30

3 R objects and data structures

3.1 Preview of lecture on objects

This section of the lecture provides a conceptual and practical introduction to “objects” in R
Important: goal is to begin to develop familiarity with concepts that we will introduce in more detail in later weeks
- I don’t expect you to understand or retain all this information perfectly
- So just focus on understanding as much as you can and ask any questions that come to mind

3.2 Assignment

Assignment refers to creating an “object” and assigning values to it

The object may be a variable, a dataset, a bit of text that reads “la la la”
<- is the assignment operator
- in other languages = is the assignment operator
general syntax:
- object_name <- object_values
- good practice to put a space before and after assignment operator

# Create an object and assign value
a <- 5
a

[1] 5

b <- "yay!"
b

[1] "yay!"

3.3 Objects

R is an “object-oriented” programming language (like Python, JavaScript). So, what is an object?

formal computer science definitions are confusing because they require knowledge of concepts we haven’t introduced yet
More intuitively, I think objects as anything I assign values to
- For example, below, a and b are the names of objects I assigned values to

a <- 5
a

[1] 5

b <- "yay!"
b

[1] "yay!"

Ben Skinner (R maven) says “Objects are like boxes in which we can put things: data, functions, and even other objects.”

Many commercial statistical software packages (e.g., SPSS, Stata) operate on datasets, which consist of rows of observations and columns of variables

Usually, these packages can open only one dataset at a time
By contrast, in R everything is an object and there is no limit to the number of objects R can hold (except memory)

3.4 (Atomic) Vectors

The fundamental data structure in R is the “vector”

A vector is a collection of values
The individual values within a vector are called “elements”
Values in a vector can be numeric, character (e.g., “Apple”), or some other type

Formal classification of vectors in R

Here, I introduce the classification of vectors by Grolemund and Wickham

There are two broad types of vectors

Atomic vectors. An object that contains elements. Six “types” of atomic vectors:
- logical, integer, double, character, complex, and raw.
  - Integer and double vectors are collectively known as numeric vectors.
Lists. Like atomic vectors, lists are objects that contain elements
- elements within a list may be atomic vectors
- elements within a list may also be other lists; that is lists can contain other lists

Table 1. The six types of atomic vectors in R: double, integer, logical, character, complex, and raw

Type	Example	Comment
double (or numeric)	-0.5, 120.9, 5.0	Floating point numbers with double precision
integer	-1L, 3L, 5L	"Long" integers
logical	TRUE, FALSE	Boolean
character	"California", "New York"	Text
complex	-5+11i, 3+4i, 0+3i	Real+imaginary numbers
raw	01, ff	Raw bytes (as hexadecimal)

Atomic Vectors table adapted from (DiSCDown, Introduction to Programming with R, Chapter 4 Vectors)

Visual representation of the Grolemund and Wickham classification

Overview of data structures (Grolemund and Wickham, 2018, chapter 20)

One difference between atomic vectors and lists: homogeneous vs. heterogeneous elements

atomic vectors are homogeneous: all elements within atomic vector must be of the same type
lists can be heterogeneous: e.g., one element can be an integer and another element can be character

Intuitive approach to vectors used by Dr. Ben Skinner:

data type: logical, numeric (integer and double), character, etc.
data structure: vector, list, matrix, etc.

I find Skinner’s classification more intuitive conceptually. However, it isn’t completely consistent with how R and R functions think about objects.

Let’s practice creating simple vectors

Below we use the combine function c() to create a numeric vector that contains three elements

Help file says that c() “combines values into a vector or list”

#?c # to see help file for the c() "combine" function
x <- c(4, 7, 9) # create object called x, which is a vector with three elements 
# (each an integer)
x # print object x

[1] 4 7 9

Vector where the elements are characters

animals <- c("lions", "tigers", "bears", "oh my") # create object called animals
animals

[1] "lions"  "tigers" "bears"  "oh my"

Student task

Either in the R console or within the R markdown file, do the following:

Create a vector called v1 with three elements, where all the elements are numbers. Then print the values.
Create a vector called v2 with four elements, where all the elements are characters (i.e., enclosed in single ’’ or double “” quotes). Then print the values.
Create a vector called v3 with five elements, where some elements are numeric and some elements are characters. Then print the values.

Solutions

v1 <- c(1, 2, 3) 
# create a vector called v1 with three elements
# all the elements are numbers
v1 # print value

[1] 1 2 3

v2 <- c("a", "b", "c", "d") 
# create a vector called v2 with four elements
# all the elements are characters
v2 # print value

[1] "a" "b" "c" "d"

v3 <- c(1, 2, 3, "a", "b") 
# create a vector called v3 with five element
# some elements are numeric and some elements are characters
v3 # print value

[1] "1" "2" "3" "a" "b"

3.4.1 length() function to get the number of elements

“Length” of an atomic vector is the number of elements

For remainder of lecture, I’ll use the term vector to refer to atomic vectors

Use length() function to examine vector length

x <- c(4, 7, 9)
x

[1] 4 7 9

length(x)

[1] 3

animals <- c("lions", "tigers", "bears", "oh my")
animals

[1] "lions"  "tigers" "bears"  "oh my"

length(animals)

[1] 4

A single number (or a single string/character) is a vector with length==1

z <- 5
length(z)

[1] 1

length("Tommy")

[1] 1

3.4.2 typeof() function to get the data type of a vector

The “type” of an atomic vector refers to the elements within the vector.

While there are six “types” of atomic vectors, we’ll focus on the following types:

numeric:
- “integer” (e.g., 5)
- “double” (e.g., 5.5)
character (e.g., “ozan”)
logical (e.g., TRUE, FALSE)

Use typeof() function to examine vector type

[1] 4 7 9

typeof(x)

[1] "double"

p <- c(1.5, 1.6)
p

[1] 1.5 1.6

typeof(p)

[1] "double"

animals

[1] "lions"  "tigers" "bears"  "oh my"

typeof(animals)

[1] "character"

Data type of a vector, numeric

Numeric vectors can be “integer” (e.g., 5) or “double” (e.g., 5.5)

typeof(1.5)

[1] "double"

R stores numbers as doubles by default.

[1] 4 7 9

typeof(x)

[1] "double"

To make an integer, place an L after the number:

typeof(5)

[1] "double"

typeof(5L)

[1] "integer"

Data type of a vector, character

In contrast to “numeric” data types which are used to store numbers, the “character” data type is used to store strings of text.

Strings may contain any combination of numbers, letters, symbols, etc.
Character vectors are sometimes referred to as string vectors

When creating a vector where elements have type==character (or when referring to the value of a string), place single `` or double “” quotes around text

the text within quotes is the “string”

c1 <- c("cat",'cash','candy cane')
c1

[1] "cat"        "cash"       "candy cane"

typeof(c1)

[1] "character"

length(c1)

[1] 3

Numeric values can also be stored as strings

c2 <- c("1","2","3")
c2

[1] "1" "2" "3"

typeof(c2)

[1] "character"

Data type of a vector, logical

Logical vectors can take three possible values: TRUE, FALSE, NA

TRUE, FALSE, NA are special keywords; they are different from the character strings "TRUE", "FALSE", "NA"
Don’t worry about NA for now

typeof(TRUE)

[1] "logical"

typeof("TRUE")

[1] "character"

typeof(c(TRUE,FALSE,NA))

[1] "logical"

typeof(c(TRUE,FALSE,NA,"FALSE"))

[1] "character"

log <- c(TRUE,TRUE,FALSE,NA,FALSE)
typeof(log)

[1] "logical"

length(log)

[1] 5

We’ll learn more about logical vectors later

All elements in (atomic) vector must have same data type.

Atomic vectors are homogenous;

An atomic vector has one data type
all elements within an atomic vector must have the same data “type”

If a vector contains elements of different type, the vector type will be type of the most “complex” element

Atomic vector types from simplest to most complex:

logical < integer < double < character

typeof(c(TRUE,TRUE,NA))

[1] "logical"

# recall L after an integer forces type to be integer 
# rather than double
typeof(c(TRUE,TRUE,NA,1L))

[1] "integer"

typeof(c(TRUE,TRUE,NA,1.5))

[1] "double"

typeof(c(TRUE,TRUE,NA,1.5,"howdy!"))

[1] "character"

3.4.3 Named vectors

All vectors can be “named” (i.e., name individual elements within vector)

Example of creating an unnamed vector

the str() function “compactly display[s] the internal structure of an R object” [from help file]; very useful for describing objects

#?str
x <- c(1,2,3,"hi!")
x

[1] "1"   "2"   "3"   "hi!"

str(x)

 chr [1:4] "1" "2" "3" "hi!"

Example of creating a named vector

y <- c(a=1,b=2,3,c="hi!")
y

    a     b           c 
  "1"   "2"   "3" "hi!"

str(y)

 Named chr [1:4] "1" "2" "3" "hi!"
 - attr(*, "names")= chr [1:4] "a" "b" "" "c"

3.5 Sequences

(Loose) definition: a sequence is a set of numbers in ascending or descending order

A vector containing a “sequence” of numbers (e.g., 1, 2, 3) can be created using the colon operator : with the notation start:end

-5:5

 [1] -5 -4 -3 -2 -1  0  1  2  3  4  5

5:-5

 [1]  5  4  3  2  1  0 -1 -2 -3 -4 -5

s<- 1:10 #same as this: s<- c(1:10)
s

 [1]  1  2  3  4  5  6  7  8  9 10

length(s)

[1] 10

Creating sequences using seq() function - basic syntax [with default values]:

seq(from = 1, to = 1, by = 1)

seq(10,15)

[1] 10 11 12 13 14 15

seq(from=10,to=15,by=1)

[1] 10 11 12 13 14 15

seq(from=100,to=150,by=10)

[1] 100 110 120 130 140 150

3.6 Vectorized math

Most mathematical operations operate on each element of the vector

e.g., add a single value to a vector and that value will be added to each element of the vector

1:3

[1] 1 2 3

1:3+.5

[1] 1.5 2.5 3.5

(1:3)*2

[1] 2 4 6

Mathematical operations involving two vectors with the same length behave differently

e.g., for addition: add element 1 of vector 1 to element 1 of vector 2, add element 2 of vector 1 to element 2 of vector 2, etc.

c(1,1,1)+c(1,0,2)

[1] 2 1 3

c(1,1,1)*c(1,0,2)

[1] 1 0 2

3.7 Lists

Lists

What is a list?

Like (atomic) vectors, a list is an object that contains elements
Unlike vectors, data types can differ across elements within a list
An element within a list can be another list
- this characteristic makes lists more complicated than vectors
- suitable for representing hierarchical data

Lists are more complicated than vectors; today we’ll just provide a basic introduction

Create lists using list() function

Create a vector (for comparison purposes)

a <- c(1,2,3)
typeof(a)

[1] "double"

length(a)

[1] 3

Create a list

b <- list(1,2,3)
typeof(b)

[1] "list"

length(b)

[1] 3

b # print list is awkward

[[1]]
[1] 1

[[2]]
[1] 2

[[3]]
[1] 3

Investigate structure of lists using str() function

When investigating lists, str() is better than printing the list

b <- list(1,2,3)
typeof(b)

[1] "list"

length(b)

[1] 3

str(b) # 3 elements, each element is a numeric vector w/ length=1

List of 3
 $ : num 1
 $ : num 2
 $ : num 3

Each element of a list can be a vector of different length (i.e., different number of elements)

c <- list(c(3,4),c(-5,1,3))
typeof(c)

[1] "list"

length(c)

[1] 2

str(c) # 2 elements; element 1=vector w/ length=2; element 2=vector w/length=3

List of 2
 $ : num [1:2] 3 4
 $ : num [1:3] -5 1 3

Elements within lists can have different data types

Lists are heterogeneous

data types can differ across elements within a list

b <- list(1,2,"apple")
typeof(b)

[1] "list"

length(b)

[1] 3

str(b)

List of 3
 $ : num 1
 $ : num 2
 $ : chr "apple"

Vectors are homogeneous

a <- c(1,2,"apple")
typeof(a)

[1] "character"

str(a)

 chr [1:3] "1" "2" "apple"

Lists can contain other lists

x1 <- list(c(1,2), list("apple", "orange"), list(1, 2, 3))
str(x1)

List of 3
 $ : num [1:2] 1 2
 $ :List of 2
  ..$ : chr "apple"
  ..$ : chr "orange"
 $ :List of 3
  ..$ : num 1
  ..$ : num 2
  ..$ : num 3

The first element of list is a numeric vector with length=2

x1[[1]]

[1] 1 2

The second element is a list with length=2

first element is character vector with length=1
second element is character vector with length=1

x1[[2]]

[[1]]
[1] "apple"

[[2]]
[1] "orange"

The third element is a list with length=3

first element is numeric vector with length=1
second element is numeric vector with length=1
third element is numeric vector with length=1

x1[[3]]

[[1]]
[1] 1

[[2]]
[1] 2

[[3]]
[1] 3

You can name each element in the list

x2 <- list(a=c(1,2), b=list("apple", "orange"), c=list(1, 2, 3))
str(x2)

List of 3
 $ a: num [1:2] 1 2
 $ b:List of 2
  ..$ : chr "apple"
  ..$ : chr "orange"
 $ c:List of 3
  ..$ : num 1
  ..$ : num 2
  ..$ : num 3

names() function shows names of elements in the list

names(x2) # has names

[1] "a" "b" "c"

names(x1) # no names

NULL

Access individual elements in a “named” list

Syntax: list_name$element_name

x2 <- list(a=1, b=list("apple", "orange"), c=list(1, 2, 3))
x2$a

[1] 1

typeof(x2$a)

[1] "double"

length(x2$a)

[1] 1

typeof(x2$b)

[1] "list"

length(x2$b)

[1] 2

typeof(x2$c)

[1] "list"

length(x2$c)

[1] 3

Note: We’ll spend more time practicing “accessing elements of a list” in upcoming weeks

Compare structure of list to structure of element within a list

str(x2)

List of 3
 $ a: num 1
 $ b:List of 2
  ..$ : chr "apple"
  ..$ : chr "orange"
 $ c:List of 3
  ..$ : num 1
  ..$ : num 2
  ..$ : num 3

str(x2$c)

List of 3
 $ : num 1
 $ : num 2
 $ : num 3

A DATASET IS JUST A LIST!!!!!

A data frame is a list with the following characteristics:

Data type can differ across elements (like all lists)
Each element (column) is a variable
Each element in a data frame must have the same length
- The length of an element is the number of observations (rows)
- Thus, each variable in a data frame has same number of observations
Each element is named
- these element names are the variable names
Typically, each element(variable) in a data frame is a vector
- Elements can also be lists. Happens when the variable has a complicated data structure
  - e.g., a variable that identifies the “@” mentions in a tweet

names(df)

[1] "mpg" "cyl" "hp"

head(df, n=4) # print first few rows

                mpg cyl  hp
Mazda RX4      21.0   6 110
Mazda RX4 Wag  21.0   6 110
Datsun 710     22.8   4  93
Hornet 4 Drive 21.4   6 110

Additionally, data frames have “attributes”; we’ll discuss those in upcoming weeks

A data frame is a named list

head(df, n= 5)

                   mpg cyl  hp
Mazda RX4         21.0   6 110
Mazda RX4 Wag     21.0   6 110
Datsun 710        22.8   4  93
Hornet 4 Drive    21.4   6 110
Hornet Sportabout 18.7   8 175

typeof(df)

[1] "list"

names(df)

[1] "mpg" "cyl" "hp"

length(df) # length=number of variables

[1] 3

str(df)

'data.frame':   32 obs. of  3 variables:
 $ mpg: num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
 $ cyl: num  6 6 4 6 8 6 8 4 4 6 ...
 $ hp : num  110 110 93 110 175 105 245 62 95 123 ...

Like any named list, we can examine the elements

Individual elements of a data frame are the variables
These variables are vectors with length equal to the number of rows/observations

typeof(df$mpg)

[1] "double"

length(df$mpg) # length=number of rows/obs

[1] 32

str(df$mpg)

 num [1:32] 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...

3.8 Main takeaways about atomic vectors and lists

Basic data stuctures

(Atomic) vectors: logical, integer, double, character.
- each element in vector must have same data type
Lists:
- Data type can differ across elements

Takeaways

These concepts are difficult; ok to feel confused
I will reinforce these concepts throughout the course
Good practice: run simple diagnostics on any new object
- length() : how many elements in the object
- typeof() : what type of data is the object
- str() : hierarchical structure of the object
These data structures (vectors, lists) and data types (e.g., character, numeric, logical) are the basic building blocks of all object oriented programming languages
Application to statistical analysis
- Datasets are just lists
- The individual elements – columns/variables – within a dataset are just vectors
These structures and data types are foundational for all “data science” applications
- e.g., mapping, webscraping, network analysis, etc.

4 Using R functions

4.1 What are functions

**Functions** are pre-written bits of code that accomplish some task.

Functions generally follow three sequential steps:

take in an input object(s)
process the input.
return (A) a new object or (B) a visualizatoin (e.g., plot)

For example, sum() function calculates sum of elements in a vector

input. Takes in a vector of elements (numeric or logical)
processing. Calculates the sum of elements
return. Returns numeric vector of length=1; value is sum of input vector

sum(c(1,2,3))

[1] 6

typeof(sum(c(1,2,3))) # type of object created by sum()

[1] "double"

length(sum(c(1,2,3))) # length of object created by sum()

[1] 1

#sum(c(TRUE,TRUE,FALSE))
#typeof(sum(c(TRUE,TRUE,FALSE))); length(sum(c(TRUE,TRUE,FALSE)))

4.2 Function syntax

Components of a function

function name (e.g., sum(), length(), seq())
function arguments
- Inputs that the function takes, which determine what function does
  - can be vectors, data frames, logical statements, etc.
- In “function call” you specify values to assign to these function arguments
  - e.g., sum(c(1,2,3))
- Separate arguments with a comma ,
  - e.g., seq(10,15) Example: the sequence function, seq()

seq(10,15)

[1] 10 11 12 13 14 15

4.3 Function syntax: More on function arguments

Usually, function arguments have names

e.g., the seq() function includes the arguments from, to, by
when you call the function, you need to assign values to these arguments; but you usually don’t have to specify the name of the argument

seq(from=10, to=20, by=2)

[1] 10 12 14 16 18 20

seq(10,20,2)

[1] 10 12 14 16 18 20

Many function arguments have “default values”, set by whoever wrote the function

if you don’t specify a value for that argument, the default value is inserted
e.g., partial list of default values for seq(): seq(from=1, to=1, by=1)

seq()

[1] 1

seq(to=10)

 [1]  1  2  3  4  5  6  7  8  9 10

seq(10) # R assigned value of 10 to "to" rather than "from" or "by"

 [1]  1  2  3  4  5  6  7  8  9 10

4.4 Function arguments, the `na.rm` argument

When R performs a calculation and an input has value NA, output value is NA

5+4+NA

[1] NA

R functions that perform calculations often have argument named na.rm

na.rm argument asks whether to remove NA values prior to calculation
For most functions, default value is na.rm = FALSE
- This means “do not remove NAs” prior to calculation
- e.g., default values for sum() function: sum(..., na.rm = FALSE)
```
sum(c(1,2,3,NA), na.rm = FALSE) # default value
```
```
[1] NA
```
```
sum(c(1,2,3,NA))
```
```
[1] NA
```
if you specify, na.rm = TRUE, NA values removed prior to calculation

sum(c(1,2,3,NA), na.rm = TRUE)

[1] 6

4.5 Help files for functions

To see help file on a function, type ?function_name without parentheses

?sum
?seq

Contents of help files

Description. What the function does
Usage. Syntax, including default values for arguments
Arguments. Description of function arguments
Details. Details and idiosyncracies of about how the function works.
Value. What (object) the function “returns”
- e.g., sum() returns vector of length 1 whose value is sum of input vector
References. Additional reading
See Also. Related functions
Examples. Examples of function in action
Bottom of help file identifies the package the function comes from

**Practice!**

when you encounter a new function, spend two minutes reading the help file
over time, help files will feel less cryptic and will start to feel helpful

4.6 Function arguments, the dot-dot-dot (`...`) argument

On help file for many functions, you will see an argument called ..., referred to as the “dot-dot-dot” argument

?sum
?seq

“Dot-dot-dot” arguments have several uses. What you should know for now:

... refers to arguments that are “un-named”; but user can specify values
- e.g., default syntax for sum(): sum(..., na.rm = FALSE)
  - argument na.rm is “named” (name is na.rm); argument ... un-named
... used to allow a function to take an arbitrary number of arguments:

#Here, sum function takes 1 un-named argument, specifically c(10,5,NA)
sum(c(10,5,NA),na.rm=TRUE)

[1] 15

#Here the sum function takes 3 un-named arguments
sum(10,5,NA,na.rm=TRUE)

[1] 15

#Here the sum function takes 5 un-named arguments
sum(10,5,10,20,NA,na.rm=TRUE)

[1] 45

5 Appendix

5.1 Directories and filepaths (skim)

Directories and filepaths

Goal: Give you a very brief overview of “directories” (i.e., folders) and “filepaths” (tells you where folder is located) in R
For the most part, this course won’t require extensive knowledge of working with filepaths. But the second course in this sequence will.

(Current) Working directory

The folder/directory in which you are currently working
This is where R looks for files
Files located in your current working directory can be accessed without specifying a filepath because R automatically looks in this folder

Function getwd() shows current working directory

getwd()

[1] "/Users/jaquette/Documents/rclass1/lectures/intro_to_r"

Command list.files() lists all files located in working directory

getwd()

[1] "/Users/jaquette/Documents/rclass1/lectures/intro_to_r"

list.files()

 [1] "data-structures-overview.png" "fp1.JPG"                     
 [3] "fp2.JPG"                      "intro_to_r_files"            
 [5] "intro_to_r.html"              "intro_to_r.qmd"              
 [7] "intro_to_r.rmarkdown"         "pane_layout_23.jpeg"         
 [9] "pane_layout.png"              "rstudio-qmd-how-it-works.png"

Working directory, “Code chunks” vs. “console” and “R scripts”

When you run code chunks in RMarkdown files (.Rmd), the working directory is set to the filepath where the .Rmd file is stored

getwd()

[1] "/Users/jaquette/Documents/rclass1/lectures/intro_to_r"

list.files()

 [1] "data-structures-overview.png" "fp1.JPG"                     
 [3] "fp2.JPG"                      "intro_to_r_files"            
 [5] "intro_to_r.html"              "intro_to_r.qmd"              
 [7] "intro_to_r.rmarkdown"         "pane_layout_23.jpeg"         
 [9] "pane_layout.png"              "rstudio-qmd-how-it-works.png"

When you run code from the R Console or an R Script, the working directory is your R Project directory (we’ll cover this in the next section).

Command getwd() shows current working directory

getwd()

[1] "/Users/jaquette/Documents/rclass1/lectures/intro_to_r"

Absolute vs. relative filepath

Absolute file path: The absolute file path is the complete list of directories needed to locate a file or folder.
setwd("/Users/pm/Desktop/rclass1/lectures/intro_to_r")

Relative file path: The relative file path is the path relative to your current location/directory. Assuming your current working directory is in the “intro_to_r” folder and you want to change your directory to the data folder, your relative file path would look something like this:
setwd("../../data")

        File path shortcuts (Mac)

Key	Description
~	tilde is a shortcut for user’s home directory (mine is my name pm)
../	moves up a level
../../	moves up two level

Use relative path to move working directory up one level

getwd()

[1] "/Users/jaquette/Documents/rclass1/lectures/intro_to_r"

setwd('../')
getwd()

[1] "/Users/jaquette/Documents/rclass1/lectures"

Exercise

Let’s create a folder on our desktop and name it red
Inside the red folder, create two subfolders named orange and yellow
Inside the yellow folder create another subfolder named green

Make sure to name these folders in lowercase.

You should have 1 folder on your desktop called red. Inside the red folder you have two folders called orange and yellow. Inside the yellow folder you have a folder called green.

Here is a visual of how it should look…

File path visual

Exercise continued

Let’s say we want to get to the green folder using the absolute file path.
1. View your current working directory getwd()
2. Set your working directory to the green folder using the absolute file path
3. Now set your working directory to the orange folder using the relative file path (hint: use ../)

Solutions

[Solution for Mac users]

getwd()
setwd("~/Desktop/red/yellow/green")
getwd() 
setwd("../../orange")
getwd()

5.2 Create “R project”

What is an R project?

Helps you keep all files for a project in one place
When you open an R project, the file-path of your current working directory is automatically set to the file-path of your R-project

How to create an “R project”

In RStudio, click on “File” >> “New Project” >> “Existing Directory” >> “New Project”
“Browse” to find folder where you want in which you want to save the “.Rproj” folder you just saved

let’s call this folder “my_rclass_folder”, but it can be any folder we want
the name of the “.Rproj” file will be my_rclass_folder.Rproj

Then click on Create Project

Using R Project

save files/folders associated with this project within the “my_rclass_folder”
When working you want to work on the project, click on the .Rproj folder “my_rclass_folder.Rproj”
whenever you run a command from your R Console or an R Script, the “working directory” will be set to the filepath where “my_rclass_folder.Rproj” is saved
As you will see in the future (EDUC260B), this will make it easy to collaborate with colleagues
- you will be able to run the exact same scripts on different computers!

Next, you follow these steps

You can add any additional sub-folders you want to the “rclass1” folder
- e.g., “syllabus”, “resources”
You can add any additional files you want to the sub-directory folders you unzipped
- e.g., in “rclass1/lectures/intro_to_r” you might add an additional document of notes you took

5.3 R Markdown

What is R Markdown

R Markdown documents embed R code, output associated with R code, and text into one document
An R Markdown document is a “‘Living’ document that updates every time you compile [”knit”] it”
R Markdown documents have the extension .Rmd
- Can think of them as text files with the extension .Rmd rather than .txt
At top of .Rmd file you specify the “output” style, which dictates what kind of formatted document will be created
- e.g., html_document or pdf_document
When you compile [“knit”] a .Rmd file, the resulting formatted document can be an HTML document, a PDF document, an MS Word document, or many other types

This slide borrows from Darin Christensen

How people use R Markdown

R Markdown creates many types of static and dynamic/interactive documents

Example of static policy report
Example of dynamic/interactive presentation

How I use R Markdown

Journal manuscripts; reports; presentations; for taking notes when I am learning new methods or reading an empirical paper

How we will be using R Markdown files in this class:

Homework you submit will be .Rmd files, where “output” style will be html_document or pdf_document
Lectures we write are .Rmd files, where the output style will html_document(can also be beamer_presentation or word_document)
- beamer_presentation is essentially a PDF document, where each page is a slide

Creating R Markdown documents

Do this with a partner

Approach for creating a RMarkdown document.

Point-and-click from within RStudio
- Click on File >> New File >> R Markdown >> Document >> choose HTML >> click OK
  - Optional: add title (this is not the file name, just what appears at the top of document)
  - Optional: add author name
- Save the .Rmd file; File >> Save As
  - Any file name
  - Recommend you save it in same folder you saved this lecture
- “Knit” the entire .Rmd file
  - Point-and-click OR shortcut: Cmd/Ctrl + Shift + k

Components of a .Rmd file

An R Markdown (.Rmd) file consists of several parts

YAML header
- YAML stands for “yet another markup language”
- Controls settings that apply to the whole document (e.g., “output” should be html_document or pdf_document, whether to include table of contents, etc.)
- YAML header goes at the very top of the document
- Starts with a line of three horizontal dashes ---; ends with a line of three horizontal dashes ---
Text in body of .Rmd file
- e.g., headings; description of results, etc.
R code chunks in body of .Rmd file

a <- c(2,4,6)
a
a-1

R output associated with code chunks

[1] 2 4 6

[1] 1 3 5

5.4 Comment: Running R code chunks vs. “knit” entire .Rmd file

Two ways to execute R commands in .Rmd file:

“Knit” entire .Rmd file
- shortcut: Cmd/Ctrl + Shift + k
“Run” code chunk or selected lines within code chunk
- Run selected line(s): Cmd/Ctrl + Enter
- Run current chunk: Cmd/Ctrl + Shift + Enter

Comment on default settings for RStudio:

When you knit entire .Rmd file, “objects” created within .Rmd file will not be available after file compiles
When you run code chunk (or selected lines in chunk), objects created by lines you run will be in your “environment” until you remove them or quit R session

Output types of .Rmd file

Common/important output types:

html_document: R Markdown originally designed to create HTML documents
- Most features/code in .Rmd files were written for html_document
- Many of these features are available in other output types
- When learning R Markdown, best to start by learning html_document
pdf_document: Requires installation of tinytex R package or LaTeX (MiKTeX/MacTeX)
- How it works:
  - You write .Rmd code
  - When you compile, this .Rmd code is transformed into LaTeX code
  - LaTeX “engine” creates the formatted .pdf file
- Can include some of the same features available for html_document
- Can insert LaTeX commands in .Rmd file with pdf_document output
beamer_presentation: Requires installation of LaTeX
- “beamer” is the name for presentations written in LaTeX
- Essentially creates PDF of presentation slides
- Lectures for this class created with beamer_presentation output
- Note: YAML header includes beamer_header.tex file, which creates some formatting rules and additional commands

Learning more about R Markdown

Resources

Cheat sheets and quick reference:
- Cheat Sheet
- Quick Reference [I prefer the quick reference]
Chapters/books
- Chapter 27 of “R for Data Science” book
- R Markdown: The Definative Guide book [I prefer this book]

How you will learn R Markdown

Lectures written as .Rmd file
- During class run “code chunks” and try to “knit” entire .Rmd file
I’ll assign small amount of reading on R Markdown
- Prior to next week:
  - Spend 15 minutes familiarizing yourself with Quick Reference
  - Read section 3.1 of R Markdown: The Definative Guide, about creating html_document
Homework must be written in .Rmd file
- You will submit .Rmd file AND output of compiled file
- For next week, you will submit homework as html_document output

5.5 Matrices [optional]

Matrices

A matrix is a collection of elements arranged in a two-dimensional rectangular layout

A matrix is another “data structure,” in addition to vectors and lists
Create a matrix named m with 2 rows and 3 columns

m <- matrix( 
  c(2, 4, 3, 1, 5, 7), # the data elements 
  nrow=2,              # number of rows 
  ncol=3,              # number of columns 
  byrow = TRUE         # fill matrix by rows 
)

m # print matrix m

     [,1] [,2] [,3]
[1,]    2    4    3
[2,]    1    5    7

Investigate matrix m

typeof(m) # type = "double"

[1] "double"

str(m) # type = numeric; has two rows and three columns

 num [1:2, 1:3] 2 1 4 5 3 7

class(m) # class = matrix; more on class later

[1] "matrix" "array"

Like atomic vectors, matrices are homogenous data structures

m2 <- matrix( 
  c(2, 4, 3, "a", "b", "c"), # the data elements 
  nrow=2,              # number of rows 
  ncol=3,              # number of columns 
  byrow = TRUE         # fill matrix by rows 
)

m2

     [,1] [,2] [,3]
[1,] "2"  "4"  "3" 
[2,] "a"  "b"  "c"