EDUC 260A
Introduction to Programming and Data Management

Promo Video

Fall 2024

This is the first course in a programming/data science sequence designed for students who do not have a programming background and perhaps never thought they would write code. (The second course in the sequence is EDUC 260B: Fundamentals of Programming)

The course has two foundational goals: (1) to develop core skills in "data management," which are important regardless of which programming language you use, and (2) to learn the fundamentals of the R programming language.

Data management consists of acquiring, investigating, cleaning, combining, and manipulating data. Most statistics courses teach you how to analyze data that are ready for analysis. In real research projects, cleaning the data and creating analysis datasets is often more time consuming than conducting analyses. This course teaches the fundamental data management and data manipulation skills necessary for creating analysis datasets.

Syllabus & Resources

Class materials including the syllabus, class Zoom link, GitHub issues, GitHub teams, and course textbook are linked below.

Lecture Materials

Lecture materials are organized by topic. All lectures will have associated materials for that lecture topic linked below. This includes lecture slides (html/pdf), the R markdown file used to create the slides (.Rmd), and pre-recorded lecture videos.

HTML
.qmd
Video recordings
HTML
.qmd
Video recordings
HTML
.qmd
Video recordings

Readings & Assignments

Weekly pre-recorded lectures, slides (html/pdf), readings, and problem sets will be posted on the class website a week in advance. Students are expected to work through the asynchronous lecture materials (videos, slides, etc.) and submit the problem set prior to the synchronous class meeting on Friday.

Required readings
Encouraged readings
Problem set files
Required readings
Encouraged readings
Problem set files
Required readings
  • From Into the tidyverse: pipes and dplyr:
    • Section 1: Introduction (1.1, 1.2)
    • Section 2: Investigating data patterns (2.1, 2.2, 2.3)
    • Section 3: Pipes (3.1, 3.2, 3.3, 3.4, 3.5, 3.6)
Encouraged readings
Problem set files
Required readings
Encouraged readings
Problem set files
Required readings
Encouraged readings
Problem set files
Required readings
  • From Attributes and class lecture:
    • Section 1: Introduction
    • Section 2.2: Atomic vs augmented vectors
    • Section 3: Object class
    • Section 4: Class = factor
    • Section 5: Class = labelled
    • Section 6: Comparing class labelled to class factor
Encouraged readings
Problem set files
Required readings
  • From Strings and dates lecture:
    • Section 1: Introduction
    • Section 2: Data structures and types
    • Section 3: String basics
    • Section 4: stringr package
    • Section 5.1-5.2: Dates and times
Encouraged readings
Problem set files
Required readings
  • From ggplot lecture:
    • Section 1: Introduction
    • Section 2: Concepts
    • Section 3: Creating graphs using ggplot
    • Section 4: Customization
    • Section 5: Exporting plots
    • ggplot cheatsheet:
Problem set files
Required readings
  • From Tidy data:
    • Section 1: Introduction
    • Section 2: Data Structure vs. Data semantics
    • Section 3: Tidy vs. untidy data
    • Section 4: Tidying untidy data
    • Section 5: Missing values
Encouraged readings
Required readings
  • From Joining Datasets:
    • Section 1: Introduction
    • Section 2: Keys
    • Section 3: Mutating joins
    • Section 4: Filtering joins
    • Section 5: Appending data
Encouraged readings
Problem set files
Required readings
Encouraged readings
Problem set files

Course Communication

GitHub is the industry standard platform used by programmers to collaborate on projects. We will use GitHub for course communication and discussion.

You will be using GitHub issues to post any questions you have relating to the course material. When you open an issue, please title it appropriately with the question you have. For urgent matters, tag the instructors in your post (@ozanj, @augias/a>, @ezamora0646). We encourage students to answer each other's questions as well and discuss ideas.
We will be posting all class announcements using GitHub teams. The GitHub team discussions feature allows for quick and seamless communication to all members of an organization or team – in this case, to all students with a GitHub account enrolled in the course. You will also be creating separate teams for your problem set groups.
If you have a personal question or issue, you can email the instructor or TA directly. Additionally, we are available for office hours or by appointment if there is anything you would like to discuss with us in private.