EDUC 260A
Introduction to Programming and Data Management

Fall 2020

This is the first course in a programming/data science sequence designed for students who do not have a programming background and perhaps never thought they would write code. (The second course in the sequence is EDUC 260B: Fundamentals of Programming)

The course has two foundational goals: (1) to develop core skills in "data management," which are important regardless of which programming language you use, and (2) to learn the fundamentals of the R programming language.

Data management consists of acquiring, investigating, cleaning, combining, and manipulating data. Most statistics courses teach you how to analyze data that are ready for analysis. In real research projects, cleaning the data and creating analysis datasets is often more time consuming than conducting analyses. This course teaches the fundamental data management and data manipulation skills necessary for creating analysis datasets.

Syllabus & Resources

Class materials including the syllabus, class Zoom link, GitHub issues, GitHub teams, and course textbook are linked below.

Lecture Materials

Lecture materials are organized by topic. All lectures will have associated materials for that lecture topic linked below. This includes lecture slides (html/pdf), the R markdown file used to create the slides (.Rmd), pre-recorded video(s), and any materials used during synchronous class sessions.

PDF
.Rmd
Video recordings
PDF
.Rmd
Video recordings
HTML
.Rmd
Video recordings

Readings & Assignments

Weekly pre-recorded lectures, slides (html/pdf), readings, and problem sets will be posted on the class website a week in advance. Students are expected to work through the asynchronous lecture materials (videos, slides, etc.) and submit the problem set prior to the synchronous class meeting on Friday.

Required readings
Encouraged readings
Problem set files
Problem set solutions
Required readings
Encouraged readings
Problem set files
Problem set solutions
Required readings
Encouraged readings
Problem set files
Problem set solutions
Required readings
Encouraged readings
Problem set files
Problem set solutions
Required readings
  • From Enter the tidyverse: processing across rows:
    • Section 1: Introduction
    • Section 2: Introduce group_by() and summarize()
    • Section 3: Combining group_by() and summarize()
    • Section 4: Summarize multiple columns
    • Section 5: Attach aggregate measures to your data frame
  • Section 5.6 from R for Data Science:
    • The textbook uses the flights data frame. You will have to install the nycflights13 package (install.packages('nycflights13')) and load (library(nycflights13)) it in R.
Problem set files
Problem set solutions
Required readings
  • From Strings and dates lecture:
    • Section 1: Introduction
    • Section 2: Data structures and types
    • Section 3: String basics
    • Section 4: stringr package
    • Section 5: Dates and times
Encouraged readings
Problem set files
Problem set solutions
Required readings
  • From Attributes and class lecture:
    • Section 1: Attributes and augmented vectors
    • Section 2: Object class
    • Section 3: Class == factor
    • Section 4: Class == labelled
    • Section 5: Comparing labelled class to factor class
    • Appendix: Creating factor variables
Encouraged readings
Problem set files
Problem set solutions
Required readings
  • From Data quality lecture:
    • Section 1: Introduction
    • Section 2: Exploratory data analysis (EDA)
    • Section 3: Skip patterns in survey data
Problem set files
Problem set solutions
Required readings
  • From Tidy data:
    • Section 1: Introduction
    • Section 2: Data Structure vs. Data semantics
    • Section 3: Tidy vs. untidy data
    • Section 4: Tidying untidy data
    • Section 5: Missing values
Encouraged readings
Problem set files
Problem set solutions

Course Communication

GitHub is the industry standard platform used by programmers to collaborate on projects. We will use GitHub for course communication and discussion.

You will be using GitHub issues to post any questions you have relating to the course material. When you open an issue, please title it appropriately with the question you have and make sure to tag the instructors in your post (@ozanj, @mpatricia01, @cyouh95). We encourage students to answer each other's questions as well and discuss ideas.
We will be posting all class announcements using GitHub teams. The GitHub team discussions feature allows for quick and seamless communication to all members of an organization or team – in this case, to all students with a GitHub account enrolled in the course. You will also be creating separate teams for your problem set groups.
If you have a personal question or issue, you can email the instructor or TA directly. Additionally, we are available for office hours or by appointment if there is anything you would like to discuss with us in private.