1 Course information

Resource Link
Class website (public) https://anyone-can-cook.github.io/rclass1/
Questions & Discussion (private) https://github.com/anyone-can-cook/rclass1_student_issues_f21
Announcements (private) https://github.com/orgs/anyone-can-cook/teams/rclass1_f21_announcements
Class Zoom link (Meeting ID: 940 4537 6896) https://ucla.zoom.us/j/94045376896

2 Course description

The primary goals of this course are (1) to teach fundamental skills of “data management,” which are important regardless of which programming language you use, and (2) to develop a strong foundation in the R programming language. The course is designed for students who never thought they would become programmers and no prior experience with R is required. For goal (1), most statistics courses teach you how to analyze data that are ready for analysis. In real research projects, data management – the process of cleaning, manipulating, and integrating datasets in order to create analysis datasets – is often more challenging than coginducting analyses. For goal (2), R is a free, open-source, object-oriented programming language. R is the most popular language for statistical analysis and one of the most popular languages for “data science” applications (e.g., web-scraping, interactive maps, network analysis). Students will become proficient in data management and R programming through weekly problem sets, which will be completed in groups.

2.1 Extended description

Data management consists of acquiring, investigating, cleaning, combining, and manipulating data. Most statistics courses teach you how to analyze data that are ready for analysis. In real research projects, cleaning the data and creating analysis datasets is often more time consuming than conducting analyses. This course teaches the fundamental data management and data manipulation skills necessary for creating analysis datasets.

The course will be taught using R, a free, open-source programming language. R has become the most popular language for statistical analysis, surpassing SPSS, Stata, and SAS. What differentiates R from these other languages is the thousands of open-source “libraries” created by R users. R is one of the most popular languages for “data science” because R libraries have been created for web-scraping, mapping, network analysis, etc. By learning R you can be confident that you know a programming language that can run any modeling technique you might need and has amazing capabilities for data collection and data visualization. By learning fundamentals of R in this course, you will be “one step away” from web-scraping, network analysis, interactive maps, quantitative text analysis, or whatever other data science application you are interested in.

The data management and programming skills you learn in this course will transfer to other object-oriented programming languages (e.g., Python).

The course primarily use data and examples from education research. However, the course is designed to teach skills that are important for social science research more broadly and also for computational research within the humanities. We welcome students from across the university.

Recommended prerequisites (not absolutely required)

  • One prior introductory statistics course (e.g., as an undergraduate)
  • Proficiency in general computer skills is helpful, e.g., downloading files from internet, renaming files, saving them to a folder of your choosing, finding this folder on your computer, etc.

3 Instructor and teaching assistants

3.1 Instructor

Ozan Jaquette

  • Pronouns: he/him/his
  • Office: Moore Hall, Room 3038
  • Email: ozanj@ucla.edu
  • Office hours:
    • Zoom office hours: Wednesdays 3-4PM, zoom link
    • And by appointment (afternoons)

3.2 Teaching assistants

Brianna Wright

Liza Chavac

  • Pronouns: she/her/hers
  • Email: lizachavac@gmail.com
  • Office hours:
    • Zoom office hours: Tuesdays 11am-12pm, zoom link
    • And by appointment

4 Course learning goals

  1. Understand fundamental concepts of object-oriented programming
    • What are the basic object types and how do they apply to statistical analysis?
    • What are object attributes and how do they apply to statistical analysis?
  2. Become familiar with Base R approach to data manipulation and Tidyverse approach to data manipulation
  3. Investigate data patterns
    • Sort datasets in ways that generate insights about data structure
    • Select specific observations and specific variables in order to identify data structure and to examine whether variables are created correctly
    • Create summary statistics of particular variables to diagnose errors in data
  4. Create variables
    • Create variables that require calculations across columns
    • Create variables that require processing across rows
  5. Combine multiple datasets
    • Join (merge) datasets
    • Append (stack) datasets
  6. Manipulate the organizational structure of datasets
    • Summarize and collapse observations by group
    • Reshape and “tidy” untidy data
  7. Learn guidelines and practical strategies for ensuring data quality when cleaning data and creating analysis variables
  8. Become proficient at using GitHub – the industry standard platform used by programmers to collaborate on projects – to ask questions about course material and to collaborate with your classmates

Another broad goal of the course is for students to begin developing practical proficiency in “computational thinking.” The California Computer Science Standards define computational thinking as “the human ability to formulate problems so that their solutions can be represented as computational steps or algorithms to be executed by a computer.” This course will encourage students to work on the following elements of computational thinking:

  • Before you start writing code to accomplish some task, write out the individual steps that must be completed to accomplish the task
  • When a particular piece of code is not working, develop a problem-solving approach where you change one element of the code at a time in order to systematically isolate and fix the problem
  • For when you conceptually understand what you need to do but you don’t know the code to accomplish the task, develop a set of “go to” practices to help you figure it out, for example:
    • Ask Google
    • Post a question on the course GitHub “issues” page
    • Become proficient at searching the course lecture slides and course textbook for answers
    • When you know the right function, but not how to use it, become proficient at reading function documentation

4.1 Course structure

Overview. For the first two or three weeks of the course, we will have synchronous lectures and class time that will take all or most of the allotted Friday 9:00am to 11:50am time. In the subsequent weeks, students will only be required to attend the Friday synchronous zoom class from 9AM-10AM. However, the instructor will continue lecturing until all material for the week has been covered. All lectures will be recorded. Weekly work students are expected to do outside of class time will consist of students working through lecture material on their own, a modest amount of required reading, and weekly problem sets completed in groups of three.

Asynchronous course materials. Asynchronous course materials for each week will focus on the topic for that week (e.g., “creating variables”). Course materials will consist of three types of resources: first, detailed lecture slides (PDF or HTML) with sample code; second, video lecture of the instructor working through these slides (recorded on the pevious Friday); and, third, the “.Rmd” file that created the PDF/HTML lecture slides. This .Rmd file will contain all “code chunks” and links to all data utilized in the lecture. Thus, students will “learn by doing” in that they will run R code on their own computer while they work through lecture materials on their own.

Example of weekly work flow:

  • Friday 10/15/21 9AM-11:50AM synchronous class
    • Ozan lectures about “Pipes, dplyr, and variable creation”
    • students only required to attend zoom from 9AM-10AM
  • Prior to Friday 10/22/21 9AM-11:50AM synchronous class
    • students work through “Pipes, dplyr, and variable creation” course materials
      • important: run code on your own in the “.Rmd” file
    • do any additional required reading (very modest)
    • complete problem set about “Pipes, dplyr, and variable creation” with your problem set group
  • Friday 10/22/21 9AM-11:50AM synchronous class
    • Ozan lectures about “processing across rows”
    • students only required to attend zoom from 9AM-10AM

4.2 How to succeed in this class

In just a few words, the keys to success in this class are: start early, ask for help, help others

Here are some substantive tips to help you succeed:

  • Work through weekly asynchronous lecture materials as soon as you can
    • The weekly asynchronous lecture materials (lecture PDF/HTML, lecture .Rmd file with code, video lecture) are the core of this course. Lecture materials are designed for you to run the code on your computer as you work through the lecture. Therefore, treat each lecture as an active learning experience rather than passively reading slides.
  • Start the weekly problem set early so that have time to seek help on questions you are struggling with
  • If you can’t figure something out, ask for help!
    • Discuss with your problem set group
    • Ask a question on GitHub
    • Come to office hours
  • Be supportive of your classmates; together, we will create a classroom environment where we all help each other succeed!

5 Classroom environment

We all have a responsibility to ensure that every member of the class feels valued, respected, and comfortable feeling uncomfortable. Be mindful that our words affects others in ways we might not fully understand. We have a responsibility to express our ideas in a way that doesn’t make disparaging generalizations and doesn’t make people feel excluded. As an instructor, I am responsible for setting an example through my own conduct.

Learning data management, while trying to get a handle on R and unfamiliar data, can feel overwhelming! We must create an environment where students feel comfortable asking questions and talking about what they did not understand. Discomfort is part of the learning process. Unburden yourself from the weight of being an “expert.” Focus your energy on improving and helping your classmates improve.

5.1 Towards an anti-racist, anti-heteronormative learning experience

This course teaches data management and R programming, tools that are often perceived as objective, independent of context and content. But this is certainly not true! Racism, white supremacy, and heteronormative ideas of gender identity and sexual orientation are rooted in every aspect of data. Seemingly objective rules (e.g., “the right way to handle data”) affect the way data are gathered, how variables are created, the questions asked (or not asked), etc.

At times, this course will utilize data that reflect systemic gaps based on race, ethnicity, immigration status, and gender identity, among other aspects of identity. It is critical that we acknowledge that: the social and economic marginalization reflected in data is rooted in systemic oppression that upholds white supremacy and heteronormativity; and that the processes used to create these data (e.g., how data collected, the categories chosen to represent identity) are often based on notions of white supremacy and heteronormativity. We should all be reflecting about our own role in upholding these systems. When you encounter a data management strategy that may cause harm, we encourage you to raise concerns. It may be that your instructor/TAs may need to think more critically about strategies they have been using for a long time!

6 Course website and communication

6.2 Course website

All course related material can be found on the course website. Pre-recorded lecture videos, lecture slides (PDF/HTML), and .Rmd files will be posted on the class website under the associated sections. Additional resources (e.g., syllabus) may also be posted on the class website.

6.3 Course discussion

We will be using GitHub teams for class announcements HERE.

  • GitHub teams: The teaching team will post all class announcements using GitHub teams. The GitHub team discussions feature allows for quick and seamless communication to all members of an organization or team – in this case, to all students with a GitHub account enrolled in the course. Some features include:

    1. The class team can be viewed and @mentioned by all students enrolled in the class and part of the organization.
    2. Posts can include code snippets, links, images, and references to issues which make them ideal for this class discussion and participation.