Anyone Can Cook:
Foundations of Programming and Data Science
Promo Video
Est. 2020
Anyone Can Cook: Foundations of Programming and Data Science is a two-course sequence that teaches foundational programming and data science skills. The sequence is explictly designed for students who do not have a programming background and perhaps never thought they would learn to write code.
The sequence primarily uses R, a free, open-source, object-oriented programming language. R is the industry standard programming language for statistical analysis in academic research. R is also one of the most popular languages for "data science." Eventually (e.g., 5, 10 years), when another programming language (e.g., Python) surpasses R as the most popular do-it-all tool for academic research, course materials will be revised to use that programming language but course learning goals will remain largely unchanged.
The first course in the sequence is EDUC 260A: Introduction to Programming and Data Management. The primary goals of this course are (1) to teach fundamental skills of “data management” (e.g., cleaning data, creating variables, merging datasets) which are important regardless of which programming language you use, and (2) to develop a strong foundation in the R programming language. The second course in the sequence is EDUC 260B: Fundamentals of Programming. The course teaches practical programming skills/concepts (e.g., using Git for version control and collaboration, writing functions, regular expressions) that are fundamental across all modern object-oriented programming languages.
EDUC 260A: Introduction to Programming and Data Management
The primary goals of this course are (1) to teach fundamental skills of "data management," which are important regardless of which programming language you use, and (2) to develop a strong foundation in the R programming language. For goal (1), most statistics courses teach you how to analyze data that are ready for analysis. In real research projects, data management – the process of cleaning, manipulating, and integrating datasets in order to create analysis datasets – is often more challenging than conducting analyses. For goal (2), R is a free, open-source, object-oriented programming language. R is the most popular language for statistical analysis and one of the most popular languages for "data science." This course is designed for students who do not have a programming background. Students will become proficient in data management and R programming through weekly problem sets, which will be completed in groups. Course format consists of weekly asynchronous lectures and weekly synchronous workshop-style class sessions on Zoom.
EDUC 260B: Fundamentals of Programming
This course teaches practical programming skills and concepts that are important across all modern object-oriented programming languages (e.g., Python, Javascript). Course topics include: organizing files, folders, and scripts; reading (importing) and writing (exporting) data; using Git and GitHub for version control and collaboration; writing functions; iteration (e.g., "loops"); conditional execution; strings and regular expressions. These general programming skills are prerequisite for flashier data science applications (e.g., web-scraping, streaming and analyzing social media data, interactive maps). Students will become proficient in programming skills/concepts through weekly problem sets, which will be completed in groups. Course format consists of weekly asynchronous lectures and weekly synchronous workshop-style class sessions on Zoom.