|Class website (public)||https://anyone-can-cook.github.io/rclass1/|
|Questions & Discussion (private)||https://github.com/anyone-can-cook/rclass1_student_issues_f21|
|Class Zoom link (Meeting ID: 940 4537 6896)||https://ucla.zoom.us/j/94045376896|
The primary goals of this course are (1) to teach fundamental skills of “data management,” which are important regardless of which programming language you use, and (2) to develop a strong foundation in the R programming language. The course is designed for students who never thought they would become programmers and no prior experience with R is required. For goal (1), most statistics courses teach you how to analyze data that are ready for analysis. In real research projects, data management – the process of cleaning, manipulating, and integrating datasets in order to create analysis datasets – is often more challenging than coginducting analyses. For goal (2), R is a free, open-source, object-oriented programming language. R is the most popular language for statistical analysis and one of the most popular languages for “data science” applications (e.g., web-scraping, interactive maps, network analysis). Students will become proficient in data management and R programming through weekly problem sets, which will be completed in groups.
Data management consists of acquiring, investigating, cleaning, combining, and manipulating data. Most statistics courses teach you how to analyze data that are ready for analysis. In real research projects, cleaning the data and creating analysis datasets is often more time consuming than conducting analyses. This course teaches the fundamental data management and data manipulation skills necessary for creating analysis datasets.
The course will be taught using R, a free, open-source programming language. R has become the most popular language for statistical analysis, surpassing SPSS, Stata, and SAS. What differentiates R from these other languages is the thousands of open-source “libraries” created by R users. R is one of the most popular languages for “data science” because R libraries have been created for web-scraping, mapping, network analysis, etc. By learning R you can be confident that you know a programming language that can run any modeling technique you might need and has amazing capabilities for data collection and data visualization. By learning fundamentals of R in this course, you will be “one step away” from web-scraping, network analysis, interactive maps, quantitative text analysis, or whatever other data science application you are interested in.
The data management and programming skills you learn in this course will transfer to other object-oriented programming languages (e.g., Python).
The course primarily use data and examples from education research. However, the course is designed to teach skills that are important for social science research more broadly and also for computational research within the humanities. We welcome students from across the university.
Recommended prerequisites (not absolutely required)
Another broad goal of the course is for students to begin developing practical proficiency in “computational thinking.” The California Computer Science Standards define computational thinking as “the human ability to formulate problems so that their solutions can be represented as computational steps or algorithms to be executed by a computer.” This course will encourage students to work on the following elements of computational thinking:
Overview. For the first two or three weeks of the course, we will have synchronous lectures and class time that will take all or most of the allotted Friday 9:00am to 11:50am time. In the subsequent weeks, students will only be required to attend the Friday synchronous zoom class from 9AM-10AM. However, the instructor will continue lecturing until all material for the week has been covered. All lectures will be recorded. Weekly work students are expected to do outside of class time will consist of students working through lecture material on their own, a modest amount of required reading, and weekly problem sets completed in groups of three.
Asynchronous course materials. Asynchronous course materials for each week will focus on the topic for that week (e.g., “creating variables”). Course materials will consist of three types of resources: first, detailed lecture slides (PDF or HTML) with sample code; second, video lecture of the instructor working through these slides (recorded on the pevious Friday); and, third, the “.Rmd” file that created the PDF/HTML lecture slides. This .Rmd file will contain all “code chunks” and links to all data utilized in the lecture. Thus, students will “learn by doing” in that they will run R code on their own computer while they work through lecture materials on their own.
Example of weekly work flow:
In just a few words, the keys to success in this class are: start early, ask for help, help others
Here are some substantive tips to help you succeed:
We all have a responsibility to ensure that every member of the class feels valued, respected, and comfortable feeling uncomfortable. Be mindful that our words affects others in ways we might not fully understand. We have a responsibility to express our ideas in a way that doesn’t make disparaging generalizations and doesn’t make people feel excluded. As an instructor, I am responsible for setting an example through my own conduct.
Learning data management, while trying to get a handle on R and unfamiliar data, can feel overwhelming! We must create an environment where students feel comfortable asking questions and talking about what they did not understand. Discomfort is part of the learning process. Unburden yourself from the weight of being an “expert.” Focus your energy on improving and helping your classmates improve.
This course teaches data management and R programming, tools that are often perceived as objective, independent of context and content. But this is certainly not true! Racism, white supremacy, and heteronormative ideas of gender identity and sexual orientation are rooted in every aspect of data. Seemingly objective rules (e.g., “the right way to handle data”) affect the way data are gathered, how variables are created, the questions asked (or not asked), etc.
At times, this course will utilize data that reflect systemic gaps based on race, ethnicity, immigration status, and gender identity, among other aspects of identity. It is critical that we acknowledge that: the social and economic marginalization reflected in data is rooted in systemic oppression that upholds white supremacy and heteronormativity; and that the processes used to create these data (e.g., how data collected, the categories chosen to represent identity) are often based on notions of white supremacy and heteronormativity. We should all be reflecting about our own role in upholding these systems. When you encounter a data management strategy that may cause harm, we encourage you to raise concerns. It may be that your instructor/TAs may need to think more critically about strategies they have been using for a long time!
All course related material can be found on the course website. Pre-recorded lecture videos, lecture slides (PDF/HTML), and .Rmd files will be posted on the class website under the associated sections. Additional resources (e.g., syllabus) may also be posted on the class website.
We will be using GitHub teams for class announcements HERE.
GitHub teams: The teaching team will post all class announcements using GitHub teams. The GitHub team discussions feature allows for quick and seamless communication to all members of an organization or team – in this case, to all students with a GitHub account enrolled in the course. Some features include:
@mentionedby all students enrolled in the class and part of the organization.