Load packages:
library(tidyverse)
Resources used to create this lecture:
Video from Will Doyle, Professor at Vanderbilt University
What is version control?
How version control works:
cookies.txt
cookies.txt
(e.g., add
alternative baking time for people who like “soft and chewy”
cookies)cookies.txt
; rather, you save the
changes made relative to the previous version of
cookies.txt
Why use version control when you can just save new version of document?
Credit: Jorge Chan (and also, lifted this example from Benjamin Skinner’s intro to Git/GitHub lecture)
What is Git? (from git website)
“Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency”
What is a Git repository?
What is GitHub?
Even professional programmers find learning and understanding git to be challenging
“Whoah, I’ve just read this quick tutorial about git and oh my god it is cool. I feel now super comfortable using it, and I’m not afraid at all to break something.”— said no one ever (de Wulf)
Understanding and learning how to use Git can be intimidating. A lot of tutorials give you recipes for how to accomplish specific tasks (either point-and-click or issuing commands on command line), but don’t provide a conceptual understanding of how things work.
Here is how we will learn Git and GitHub over the course of the quarter:
What is a shell?
What is graphical user interface (GUI)?
In this course, we will perform Git operations solely using the command line. Why?
We will use the Unix shell called “Bash” to perform Git operations:
Why learn the command line and “command-line bullshittery,” from Philip J. Guo
“What is wonderful about doing applied computer science research in the modern era is that there are thousands of pieces of free software and other computer-based tools that researchers can leverage to create their research software. With the right set of tools, one can be 10x or even 100x more productive than peers who don’t know how to set up those tools.”
“But this power comes at a great cost: It takes a tremendous amount of command-line bullshittery to install, set up, and configure all of this wonderful free software. What I mean by command-line bullshittery is dealing with all of the arcane, obscure, strange bullshit of the command-line paradigm that most of these free tools are built upon….So perhaps what is more important to a researcher than programming ability is adeptness at dealing with command-line bullshittery, since that enables one to become 10x or even 100x more productive than peers by finding, installing, configuring, customizing, and remixing the appropriate pieces of free software.”
Helping my students overcome command-line bullshittery by Philip J. Guo
If you have a Windows computer, you will need to follow steps in this link to install Git for Windows, which will allow you to run Bash and Git commands.
If you have a Mac, you won’t need to download
anything because it already comes with a Terminal
app. However, if you have a newer version of Mac, you may need to
run xcode-select --install
in your Terminal before you’re
able to use Git commands (see here
for more info).
In RStudio, there is a
Terminal tab (next to the Console tab)
where you can run Bash commands and perform Git operations:
Credit: RStudio Terminal blog post by Gary Ritchie
If you are working from an R markdown file, you
can also create bash
code chunks (similar to R
code chunks) for running shell commands. All you need to do is indicate
{bash}
for the code chunk:
Try running this code chunk on your own
echo "Hello, World!"
## Hello, World!
What is the difference between the RStudio Console and Terminal?
In this section, we will go over some of the commonly used bash
command line commands. You can run these commands either in a standalone
Git Bash/Terminal application, your RStudio
Terminal, or in a bash
code chunk of an R
markdown file.
Generally, you can pull up the help file for a
command by running:
command_name --help
(Windows)man command_name
(Mac)
We’ll use the ls
command as an example:
ls --help
man ls
ls
: List directory contents
ls [<option(s)>] [<directory_name(s)>]
[]
indicates they are
optional and you do not have to specify these-
or --
(see help file)
-
is the way to
specify the short name version and --
is the way to specify
the long name version of an option [x]-a
: Include directory entries whose names begin with a
dot (.
)
-l
: List files in long format (i.e., include additional
information like file size, date of creation, etc.)directory_name(s)
: Which directories to list the
content of (default: current directory)
list.files()
Example: Using ls
to list content
in current directory (default)
ls
## git_and_github.Rmd
## git_and_github.html
## render_toc.R
## windows_credential_manager_screen_clip.png
Example: Using ls
to list content in
parent directory
ls ..
## apis_and_json
## ggplot
## git_and_github
## organizing_and_io
## programming
## strings_and_regex
ls ../
## apis_and_json
## ggplot
## git_and_github
## organizing_and_io
## programming
## strings_and_regex
ls ".."
## apis_and_json
## ggplot
## git_and_github
## organizing_and_io
## programming
## strings_and_regex
Example: Using ls -a
to list content in
parent directory including entries whose names begin with a dot
ls -a ..
## .
## ..
## apis_and_json
## ggplot
## git_and_github
## organizing_and_io
## programming
## strings_and_regex
This section shows bash code to:
echo
function)cat
function)head
function)tail
function)cp
function)mv
function)
Often, we want to insert text into a file from the command
line
echo
: Write to standard output
(i.e., print to terminal)
echo <text_to_print>
help echo
(note: this is different from the
usual function_name --help
syntaxman echo
text_to_print
: Text to print to terminalecho
function does:
echo <text_to_print>
, it will
simply print that text on terminalecho <text_to_print> > file_name
(overwrite
file)
>
>
will overwrite
contents of existing fileecho <text_to_print> >> file_name
(append
to file)
>>
(i.e., not overwrite existing content of
file)echo
interprets the following “backslash-escaped”
characters (will explain more fully in the strings/regex unit)
\a
: alert (bell)\b
: backspace\c
: suppress further output\e
: escape character\E
: escape character\f
: form feed\n
: new line\r
: carriage return\t
: horizontal tab\v
: vertical tab\\
: backslashecho
to print text to
terminal
# help echo # help Windows
# man echo # help Mac
echo "Hello, World!"
## Hello, World!
cat
: Concatenate and print files
cat <file_name>
file_name
: File to print to terminalecho
and >
to redirect text to file and cat
to print content of file
# Redirect text to file
echo "Hello, World!" > my_script.R
# Print contents of file
cat my_script.R
## Hello, World!
# We would overwrite contents of file when using `>`
echo "library(tidyverse)" > my_script.R
# Print contents of file
cat my_script.R
## library(tidyverse)
echo
and
>>
to append text to file and cat
to
print content of file
# Append line to R script by using `>>` (`>` would overwrite contents of file)
echo "mpg %>% head(5)" >> my_script.R
# Print contents of file
cat my_script.R
## library(tidyverse)
## mpg %>% head(5)
head
: Print first part of file
head [<option(s)>] [<file_name>]
-n <int>
: Print the first
<int>
lines (default: 10
)file_name
: File to printtail
: Print last part of file
tail [<option(s)>] [<file_name>]
-n <int>
: Print the last <int>
lines (default: 10
)file_name
: File to printhead
to print first part of
file
# Preview first 10 lines by default (or up to 10 lines)
head my_script.R
## library(tidyverse)
## mpg %>% head(5)
# Preview first line
head -n 1 my_script.R
## library(tidyverse)
tail
to print last part of
file
# Preview last 10 lines by default (or up to 10 lines)
tail my_script.R
## library(tidyverse)
## mpg %>% head(5)
# Preview last line
tail -n 1 my_script.R
## mpg %>% head(5)
cp
: Copies files or
directories
cp [<option(s)>] [<source_directory/file>] [<source_directory/file>]
-r
: Copies directories and their contents recursively
(this flag is required to copy a directory)source_directory/file
to
source_directory/file
cp
to copy a file
# Print contents of my_script.R
cat my_script.R
## library(tidyverse)
## mpg %>% head(5)
# Make a copy of my_script.R called my_script_copy.R inside my_folder/
cp my_script.R my_folder/my_script_copy.R
# Print contents of my_script_copy.R
cat my_folder/my_script_copy.R
## library(tidyverse)
## mpg %>% head(5)
cp -r
to copy a directory
# View contents of my_folder/
pwd
ls my_folder
## /c/Users/ozanj/Documents/rclass2/lectures/git_and_github
## my_script_copy.R
## test_script.R
# Make a copy of my_folder/ (with its contents) called my_folder_copy/
cp -r my_folder my_folder_copy
# View contents of my_folder_copy/
ls my_folder_copy
## my_script_copy.R
## test_script.R
mv
: Rename or move files
mv [<old_directory/file>] [<new_directory/file>]
mv [<directory/file(s)>] [<destination_directory>]
-n
: do not overwite an existing filemv
to rename a file or
directory
# Rename file
mv my_script.R create_dataset.R
# Rename directory
mv my_folder_copy my_folder_2
mv
to move files and
directories into a directory
# View contents of my_folder/
ls my_folder
## my_script_copy.R
## test_script.R
# Move file and directory into the destination directory (last arg)
mv create_dataset.R my_folder_2 my_folder
# View contents of my_folder/
ls my_folder
## create_dataset.R
## my_folder_2
## my_script_copy.R
## test_script.R
This section introduces some core concepts and explains the basic Git “workflow” (i.e., how Git works)
Version control systems that save differences:
twinkle.txt
twinkle.txt
has the
following contents:
twinkle, twinkle, little star
twinkle.txt
and save those
changes, resulting in “Version 2,” which has the following contents:
twinkle, twinkle, little star, how I wonder what you are!
twinkle.txt
, centralized
version control systems don’t store the entire file. Rather, they store
the changes relative to the previous version. In our example, “Version
2” stores:
, how I wonder what you are!
Credit: Getting Started - What is Git
Git stores data as snapshots rather than differences:
Credit: Getting Started - What is Git
What is a commit?
Credit: Lucas Maurer, medium.com
git add <filename(s)>
command)
cookies.txt
):
cookies.txt
in a text editor. These are
changes made in your local working directory.cookies.txt
and you want to
commit those changes to your
repositorygit commit
command)
cookies.txt
:
Credit: Modified from Atlassian, Git push
Credit: Simon Maple, JRebel, https://www.jrebel.com/blog/git-cheat-sheet
Git commands:
add
: Add file from working directory to staging
areacommit
: Commit file from staging area to local
repositorypush
: Send files from local repository (your machine)
to remote repository
push
as “uploading”fetch
: Get files from remote repository and put them in
local repositorypull
: Get files from remote repository and put them in
the working directory
pull
as “downloading”pull
is effectively fetch
followed by
merge
(discussed later)reset
: After you add
files from working
directory to staging area, reset
unstages those filesGit command cheatsheets:
When performing git operations on command line, all commands begin
with git
, for example:
git init
git clone url_of_remote_repository
git status
For an overview of git command syntax and a list of common git commands, type this in command line:
git --help
To see the help file for a particular git command (e.g.,
add
, commit
, clone
), type
git command_name --help
. For example:
git add --help
# or this:
# git help add
Basic/essential git commands:
git init
.git/
] within the existing directory that houses the
internal data structure required for version control” (Git
Handbook)git clone url_of_remote_repository
git add file_name(s)
git commit -m "commit message"
-m
is an option to the
git commit
command, which specifies that you will add a
brief description about changes you are committing. You can reference an
issue in the commit message by using a hashtag followed by the issue
number: #<issue_number>
. These commits will appear on
the issue page.git status
git push
git pull
What are local and remote repositories?
$ git clone https://github.com/username/repo.git Username: your_github_username Password: your_token
There are 2 basic ways to get your local
repository set up with a remote:
git remote
: Show list of connected
remote repositories
git remote --help
git remote [<option(s)>]
-v
: Show more detailed info about the remotes,
including its URL
Understanding how local and
remote repositories are connected:
git remote
to check which remote repository
is connected (i.e., which remote(s) you can push to and pull
from)
This is usually the easiest way to get a local repository set up with a remote repository
Step 1: Obtain the URL of the remote repository on GitHub:
Initialize this repository with
options before
creating
Code
buttonStep 2: Clone the repository to your local machine:
git clone
command to clone the repository to
your local machinegit add
changes to file(s) from the local working
directory to the staging areagit commit -m "commit message"
all staged changes to
the local repositorygit push
to push changes from your local
repository to the remote repositoryCredit: W3 docs, Git clone
git clone
: Clone a repository into
a new directory
git clone --help
git clone <repo_url>
repo_url
can be the HTTPS or SSH URLgit clone
to clone a
repository
https://github.com/btskinner/downloadipeds.git
git@github.com:btskinner/downloadipeds.git
cd ~ # change to root directory
rm -rf downloadipeds # force remove `downloadipeds` (if it exists)
# Change directory to where you want to clone the repository
cd ~
# This will be the directory where the `downloadipeds` repository will be cloned
# Note that you do not need to create a `downloadipeds` sub-directory yourself
pwd
## /c/Users/ozanj/Documents
cd ~
# Clone the remote repository
git clone https://github.com/btskinner/downloadipeds.git # HTTPS URL
# git clone git@github.com:btskinner/downloadipeds.git # SSH URL
## Cloning into 'downloadipeds'...
# Change directory to the newly cloned `downloadipeds`
cd downloadipeds
pwd
# List out contents of repository
ls -la
## /c/Users/ozanj/Documents/downloadipeds
## total 93
## drwxr-xr-x 1 ozanj None 0 Feb 9 15:45 .
## drwxr-xr-x 1 ozanj None 0 Feb 9 15:45 ..
## drwxr-xr-x 1 ozanj None 0 Feb 9 15:45 .git
## -rw-r--r-- 1 ozanj None 22 Feb 9 15:45 .gitignore
## -rw-r--r-- 1 ozanj None 1094 Feb 9 15:45 LICENSE
## -rw-r--r-- 1 ozanj None 4682 Feb 9 15:45 README.md
## -rw-r--r-- 1 ozanj None 6028 Feb 9 15:45 downloadipeds.R
## -rw-r--r-- 1 ozanj None 13876 Feb 9 15:45 ipeds_file_list.txt
# List out the connected remote, which is named `origin` by default
git remote
## /c/Users/ozanj/Documents/downloadipeds
## origin
# Display more details about the remote, including the repository URL
git remote -v
# https://github.com/btskinner/downloadipeds.git
## origin https://github.com/btskinner/downloadipeds.git (fetch)
## origin https://github.com/btskinner/downloadipeds.git (push)
Alternatively, you can create a new git repository on your local machine, and then connect it to the remote on GitHub, in three steps.
Step 1 = Create a local git repository:
git init
git add
changes to file(s) from the local working
directory to the staging areagit commit -m "commit message"
all staged changes to
the local repositoryStep 2 = Create a remote repository on GitHub:
Initialize this repository with
optionsStep 3 = Connect your local repository to the remote:
git remote add
to add a
new remote for your local repository
--set-upstream
option with the git push
command
Credit: Java T Point, Git Push
New git commands we will use in examples below
git remote
: Add or modify a remote
repository
git remote --help
git remote add <remote_name> <remote_url>
:
Add a new remote
remote_name
: Name we choose to call our remote
repository, conventionally originremote_url
: HTTPS/SSH URL of remote repositorygit remote set-url <remote_name> <remote_url>
:
Update the URL for the specified remote
remote_name
: Name of the remote we want to update URL
forremote_url
: HTTPS/SSH URL we want to update togit push
: Set and push to upstream
branch
git push --help
git push --set-upstream <remote_name> <branch_name>
remote_name
: Name of the remote repository to push
tobranch_name
: Name of the remote branch you want your
current branch to trackgit push
.# CREATING AND CHANGING DIRECTORIES
cd ~ # change directories to home directory
#cd documents # change to "documents" [if necessary]
ls # list files in directory
# make new directory that will be our git repository
# rm -rf gitr_practice # remove if it exists
mkdir gitr_practice
cd gitr_practice # move to new directory
ls -a # show all files in directory
# INITIALIZING GIT REPOSITORY
# turn the current, empty directory into a fresh Git repository
git init
ls -a # show all files in directory
# CHANGING FILES IN WORKING DIRECTORY
# create a new README file with some sample text
echo "Hello. I thought we would be learning R this quarter" >> README.txt
# view the file README.txt
cat README.txt
# create a simple R script
echo "library(tidyverse)" >> simple_script.r
echo "mpg %>% head(5)" >> simple_script.r # add another line to simple_script.r
cat simple_script.r # show contents of file simple_script.r
# STAGE AND COMMIT FILES TO LOCAL REPOSITORY
# check status of git repository
git status
# add README.txt from working directory to staging area (will now become a file that is "tracked" by git)
git add README.txt
# add simple_script.r from working directory to staging area (will now become a file that is "tracked" by git)
git add simple_script.r
# check status
git status
# commit changes to local repository
git commit -m "Initial commit, README.txt simple_script.r"
git status
# CONNECT AND PUSH TO REMOTE REPOSITORY
# rename default branch name
git branch -M main
# provide the path for the repository you created on GitHub in the first step
#git remote add origin https://github.com/YOUR-USERNAME/YOUR-REPOSITORY.git
git remote add origin https://github.com/ozanj/gitr_practice.git
# push changes to GitHub
git push --set-upstream origin main
git remote
to add a remote
cd ~ # change to root directory
rm -rf my_git_repo # force remove `my_git_repo` (if it exists)
mkdir my_git_repo # make directory `my_git_repo`
# Initialize a new git repository in `my_git_repo` directory
cd my_git_repo
git init
# next create a repo on github.com and name it my_git_repo
# don't have to give it this name, but I find it less confusing
# Add remote and name it `origin`
# paste the url you obtain from github
git remote add origin https://github.com/ozanj/my_git_repo.git
# Check remote
git remote -v
## Initialized empty Git repository in C:/Users/ozanj/Documents/my_git_repo/.git/
## origin https://github.com/ozanj/my_git_repo.git (fetch)
## origin https://github.com/ozanj/my_git_repo.git (push)
Note that we could’ve named the remote repository anything - it
doesn’t have to be origin:
# Add remote (https://github.com/anyone-can-cook/my_git_repo) and name it `my_remote`
git remote add my_remote https://github.com/anyone-can-cook/my_git_repo.git
# Check remote
git remote -v
## my_remote https://github.com/anyone-can-cook/my_git_repo.git (fetch)
## my_remote https://github.com/anyone-can-cook/my_git_repo.git (push)
git remote
to update URL
for a remote
# Check remote
git remote -v
## my_remote https://github.com/anyone-can-cook/my_git_repo.git (fetch)
## my_remote https://github.com/anyone-can-cook/my_git_repo.git (push)
# Change the URL for the remote named `my_remote`
git remote set-url my_remote https://github.com/anyone-can-cook/my_git_repo_2.git
# Check remote
git remote -v
## my_remote https://github.com/anyone-can-cook/my_git_repo_2.git (fetch)
## my_remote https://github.com/anyone-can-cook/my_git_repo_2.git (push)
git push
to push a new
branch
cd ~/my_git_repo
# Create new R script
echo "library(tidyverse)" > create_dataset.R
echo "mpg %>% head(5)" >> create_dataset.R
# Add R script and make a commit
git add create_dataset.R
git commit -m "initial commit"
git branch -M main
# Because this is a new local branch, we get an error if we just use `git push` on the initial push
git push
## fatal: The current branch main has no upstream branch.
## To push the current branch and set the remote as upstream, use
##
## git push --set-upstream my_remote main
As hinted in the error message, we need to use the
--set-upstream
option to set upstream branch on the initial
push for a new local branch:
# Recall that we are connected to a remote repository we named `my_remote`
git remote -v
## my_remote https://github.com/anyone-can-cook/my_git_repo_2.git (fetch)
## my_remote https://github.com/anyone-can-cook/my_git_repo_2.git (push)
# We can check status to see that we are currently on the `main` branch
# (Note that because we have yet to set an upstream branch,
# it does not say our main branch is ahead of remote by 1 commit)
git status
## On branch main
## nothing to commit, working tree clean
# Use the `--set-upstream` option with the remote and branch names to push new local branch
git push --set-upstream my_remote main
## To https://github.com/anyone-can-cook/my_git_repo_2.git
## * [new branch] main -> main
## Branch main set up to track remote branch main from my_remote.
# Check status
# (Now that we have set the upstream branch,
# it says our main branch is up-to-date with the remote's main branch)
git status
## On branch main
## Your branch is up-to-date with 'my_remote/main'.
##
## nothing to commit, working tree clean
Once a directory is initialized as a git repository, you can choose to track the changes to any file in the directory:
git add
)git status
can be used to check which files are tracked and
which are not. Untracked files, except those listed in your
.gitignore
file, will be listed under
Untracked files
.
What is a .gitignore
file? (see below for more details)
Untracked files
when you check git status
.gitignore
file yourself or
click Add .gitignore
when you are creating a new repository
on GitHub and select the R
template from the dropdown
menu:Credit: How to Make Git Forget Tracked Files Now In gitignore
Below are some common git commands you might use to
observe your repository:
git status
git status
: Shows the working tree
status
git status --help
git status [<option(s)>]
Changes to be committed
git add
git commit
Changes not staged for commit
git add
before) that have since been changed (e.g.,
modified, deleted) in the working directorygit add
Untracked files
git add
before)git add
Below is a sample output of git status
:
On branch main
Your branch is up-to-date with 'origin/main'.
Changes to be committed:
(use "git reset HEAD <file>..." to unstage)
new file: clean_dataset.R
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git checkout -- <file>..." to discard changes in working directory)
modified: create_dataset.R
Untracked files:
(use "git add <file>..." to include in what will be committed)
analyze_dataset.R
git status
after
creating a new file
create_dataset.R
in your git repositoryUntracked files
# Create new R script
echo "library(tidyverse)" > create_dataset.R
echo "mpg %>% head(5)" >> create_dataset.R
git status
On branch main
Your branch is up-to-date with 'origin/main'.
Untracked files:
(use "git add <file>..." to include in what will be committed)
create_dataset.R
nothing added to commit but untracked files present (use "git add" to track)
git status
after adding
a file
create_dataset.R
, you will see it listed
under Changes to be committed
# Add R script
git add create_dataset.R
git status
On branch main
Your branch is up-to-date with 'origin/main'.
Changes to be committed:
(use "git reset HEAD <file>..." to unstage)
new file: create_dataset.R
git status
after making
a commit
# Make a commit
git commit -m "add create_dataset.R"
git status
On branch main
Your branch is ahead of 'origin/main' by 1 commit.
(use "git push" to publish your local commits)
nothing to commit, working tree clean
git status
after
modifying a tracked file
Changes not staged for commit
(as compared to under
Untracked files
when it’s never been tracked before)# Modify create_dataset.R
echo "df <- mpg %>% filter(year == 2008)" >> create_dataset.R
git status
On branch main
Your branch is ahead of 'origin/main' by 1 commit.
(use "git push" to publish your local commits)
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git checkout -- <file>..." to discard changes in working directory)
modified: create_dataset.R
no changes added to commit (use "git add" and/or "git commit -a")
git log
git log
: Show commit logs
git log --help
git log [<option(s)>]
-n <int>
: Show the latest
<int>
commitscommit <commit_hash>
: Each commit can be uniquely
identified by their hash ID (SHA-1)
Author: <username> <email>
: Username and
email of the author of the commitDate: <commit_date>
: Date of the commit<commit_message>
: Commit messageq
to exit this read
mode.Below is a sample output of git log
:
commit 2e525e4b1c40f6cffb78438285a00cd7eed54ae0 (HEAD -> main)
Author: username <email@example.com>
Date: Thu Apr 2 23:53:30 2020 -0700
second commit
commit 8c20a14b99d7a490580045176287b979c93d9cb5
Author: username <email@example.com>
Date: Wed Apr 1 22:49:52 2020 -0700
initial commit
git diff
git diff
: Show changes between files,
commits, etc.
git diff --help
git diff [<file_name(s)>]
: Show changes made to
unstaged files in working directory compared to the “index”
git add
themChanges not staged for commit
when you check
git status
), since untracked files have no history in the
“index” to compare againstfile_name(s)
specified, git diff
shows changes made to all tracked, unstaged filesgit diff --cached [<file_name(s)>]
: Show changes
made to added files in staging area compared to the last commit
git commit
commandfile_name(s)
specified,
git diff --cached
shows changes made to all staged files
(i.e., files listed under Changes to be committed
when you
check git status
)git diff <commit_hash> <commit_hash> [<file_name(s)>]
:
Show changes between the two specified commits
file_name(s)
specified,
git diff <commit_hash> <commit_hash>
shows
changes between all filesgit diff
diff --git a/<file_name> b/<file_name>
, which
indicates that two versions of file_name
is being
comparedindex
) or if a new
file is involved (as in the case of git diff --cached
for
an untracked, staged file – see second example below)@@
-
in front of a line indicates that the line has been
removed in b/<file_name>
as compared to
a/<file_name>
+
in front of a line indicates that the line has been
added in b/<file_name>
as compared to
a/<file_name>
Below is a sample output of git diff
:
diff --git a/create_dataset.R b/create_dataset.R
index c1cff38..5ea84e9 100644
--- a/create_dataset.R
+++ b/create_dataset.R
@@ -1,2 +1,2 @@
library(tidyverse)
-mpg %>% head(5)
+mpg %>% filter(year == 2008)
git diff
for an
untracked file
create_dataset.R
in your git repositorygit diff
# Create new R script
echo "library(tidyverse)" > create_dataset.R
git diff # No output
git diff
for a staged
file
create_dataset.R
, it will be added to the
“index”git diff --cached
can be used to view all staged
changes# Add R script
git add create_dataset.R
git diff --cached
diff --git a/create_dataset.R b/create_dataset.R
new file mode 100644
index 0000000..8b151a2
--- /dev/null
+++ b/create_dataset.R
@@ -0,0 +1 @@
+library(tidyverse)
git diff
for a modified,
tracked file
git diff
to see changes between the versions in the
working directory and the staging area# Modify create_dataset.R
echo "mpg %>% head(5)" >> create_dataset.R
git diff
diff --git a/create_dataset.R b/create_dataset.R
index 8b151a2..c1cff38 100644
--- a/create_dataset.R
+++ b/create_dataset.R
@@ -1 +1,2 @@
library(tidyverse)
+mpg %>% head(5)
git diff
after
committing changes
library(tidyverse)
in create_dataset.R
)git diff
(i.e., comparing
changes between the working directory and “index”) is the same
as the previous example, when the changes were just staged and not yet
committed# Make a commit
git commit -m "add 1st line to create_dataset.R"
git diff
diff --git a/create_dataset.R b/create_dataset.R
index 8b151a2..c1cff38 100644
--- a/create_dataset.R
+++ b/create_dataset.R
@@ -1 +1,2 @@
library(tidyverse)
+mpg %>% head(5)
git diff
between commits
create_dataset.R
in the working directory (i.e.,
the line mpg %>% head(5)
) and make a second commit# Add create_dataset.R and make a commit
git add create_dataset.R
git commit -m "add 2nd line to create_dataset.R"
git log
commit aa89efba9adddf8547b3743ba81a421dd2a28881 (HEAD -> main)
Author: cyouh95 <25449416+cyouh95@users.noreply.github.com>
Date: Sat Apr 4 03:20:15 2020 -0700
add 2nd line to create_dataset.R
commit d5c6e0958fb173af04f7e2c5d5fd81457e8ffd0c
Author: cyouh95 <25449416+cyouh95@users.noreply.github.com>
Date: Sat Apr 4 03:11:38 2020 -0700
add 1st line to create_dataset.R
git diff
to check the differences between
the two commits by specifying their hash ID’smpg %>% head(5)
has been
added between the two commitsgit diff d5c6e09 aa89efb
diff --git a/create_dataset.R b/create_dataset.R
index 8b151a2..c1cff38 100644
--- a/create_dataset.R
+++ b/create_dataset.R
@@ -1 +1,2 @@
library(tidyverse)
+mpg %>% head(5)
mpg %>% head(5)
has been removed between the two
commitsgit diff aa89efb d5c6e09
diff --git a/create_dataset.R b/create_dataset.R
index c1cff38..8b151a2 100644
--- a/create_dataset.R
+++ b/create_dataset.R
@@ -1,2 +1 @@
library(tidyverse)
-mpg %>% head(5)
Remember, when we said that git stores data as snapshots (or checkins) over time? That is, each “commit” we make is a snapshot of a miniature file system. In the pcutre below, each “version” (version 1, version 2, …) represents a new “commit,” in which some files in the repository have changed and some files have not changed.
Credit: Getting Started - What is Git
Well, in this section we’re going deep “under the hood” of git, to
explain how this process works. It’s called the “Git
Object Model”
For this section, we’ll be working with a git repository on your local machine that is not connected to a remote repository.
In your everyday work with git, you usually won’t be going
“under the hood” of your git repository. So, why are we teaching you
this stuff?
.git/
directory
Every git repository that is created using git init
contains a .git/
directory that “contains
all the informations needed for git to work” (From Git series 1/3:
Understanding git for real by exploring the .git directory):
cd ~ # change to root directory
pwd
## /c/Users/ozanj/Documents
my_git_repo
directorycd ~ # change to root directory
pwd
rm -rf my_git_repo # force remove `my_git_repo` (if it exists)
mkdir my_git_repo # make directory `my_git_repo`
# Initialize a new git repository in `my_git_repo` directory
cd my_git_repo
git init
ls -al # list files: show hidden files -a; use long listing format
## /c/Users/ozanj/Documents
## Initialized empty Git repository in C:/Users/ozanj/Documents/my_git_repo/.git/
## total 52
## drwxr-xr-x 1 ozanj None 0 Feb 9 15:45 .
## drwxr-xr-x 1 ozanj None 0 Feb 9 15:45 ..
## drwxr-xr-x 1 ozanj None 0 Feb 9 15:45 .git
What’s inside the .git/
directory?
cd ~/my_git_repo
# List out the contents of the .git/ directory (in tree form)
find .git -print | sed -e 's;[^/]*/;|____;g;s;____|; |;g' # the quoted text is regular expressions; don't worry about understanding this!
## .git
## |____config
## |____description
## |____HEAD
## |____hooks
## | |____applypatch-msg.sample
## | |____commit-msg.sample
## | |____fsmonitor-watchman.sample
## | |____post-update.sample
## | |____pre-applypatch.sample
## | |____pre-commit.sample
## | |____pre-merge-commit.sample
## | |____pre-push.sample
## | |____pre-rebase.sample
## | |____pre-receive.sample
## | |____prepare-commit-msg.sample
## | |____push-to-checkout.sample
## | |____update.sample
## |____info
## | |____exclude
## |____objects
## | |____info
## | |____pack
## |____refs
## | |____heads
## | |____tags
We will be focusing on:
objects/
: Directory containing all git objectsHEAD
: Reference to the latest commit of the current
branchrefs/
: Directory containing the hash ID of commit
referred to by HEAD
We’ll get into git objects starting in the next section, and see an
example of HEAD
and refs/
in a later section.
What is a git object?
.git/objects
directory
.git/objects
that it is located
ingit cat-file
command to view information about
a git object whose hash you specifygit hash-object
to compute (show) the hash for
a git “blob” object based on the name of associated file
git cat-file
: Provide content or
type and size information for repository objects
git cat-file --help
git cat-file [<option(s)>] <object>
-p
: Pretty-print the contents of
<object>
based on its type-t
: Instead of the content, show the object type
identified by <object>
-s
: Instead of the content, show the object size
identified by <object>
There are 4 types of git objects (From The
Git Object Model)
A blob is generally a file which stores data, like text
.git/objects
directory
.git/objects
git hash-object
commandcd ~/my_git_repo
# Create new R script (working directory)
echo "library(tidyverse)" > create_dataset.R
echo "mpg %>% head(5)" >> create_dataset.R
# Add R script to staging area
git add create_dataset.R
# View .git/objects directory
find .git/objects -print | sed -e 's;[^/]*/;|____;g;s;____|; |;g'
## warning: in the working copy of 'create_dataset.R', LF will be replaced by CRLF the next time Git touches it
## |____objects
## | |____c1
## | | |____cff389562e8bc123e6691a60352fdf839df113
## | |____info
## | |____pack
git hash-object
: Compute hash for
a blob object from name of file
git hash-object --help
git hash-object <file_name>
We can use git hash-object
to verify the hash for
create_dataset.R
:
# Generate blob object hash for R script
git hash-object create_dataset.R
## c1cff389562e8bc123e6691a60352fdf839df113
git cat-file
to view blob
object content
# View content of create_dataset.R
git cat-file -p c1cff38
## library(tidyverse)
## mpg %>% head(5)
git cat-file
to view blob
object type
# View object type for create_dataset.R
git cat-file -t c1cff38
## blob
git cat-file
to view blob
object size
# View object size for create_dataset.R
git cat-file -s c1cff38
## 35
After a blob the next type of git object to discuss is a tree
A tree is a directory that contains references to blobs (files) or other trees (sub-directories)
But before saying more about trees, we’re gonna take a detour to give you some skills to diagnose all these different objects we are going to encounter as we start adding folders and making commits
my_git_repo/notes
(type of
git object = tree) and a couple text files to the sub-directory (type of
git object = blob)cd ~/my_git_repo
rm -rf notes # force remove notes directory if it exists
# Create a sub-directory
mkdir notes
# Add files to the sub-directory (since git doesn't track empty directories)
echo "This is my first set of notes." > notes/note_1.txt
echo "This is my second set of notes." > notes/note_2.txt
# Add new files
git add .
# View .git/objects directory
find .git/objects -print | sed -e 's;[^/]*/;|____;g;s;____|; |;g'
## warning: in the working copy of 'create_dataset.R', LF will be replaced by CRLF the next time Git touches it
## warning: in the working copy of 'notes/note_1.txt', LF will be replaced by CRLF the next time Git touches it
## warning: in the working copy of 'notes/note_2.txt', LF will be replaced by CRLF the next time Git touches it
## |____objects
## | |____47
## | | |____6fb98775843929ca6c55b16b04752d973b3d2a
## | |____61
## | | |____08458417308ddc15d7390a2f8db50cf65ec399
## | |____c1
## | | |____cff389562e8bc123e6691a60352fdf839df113
## | |____info
## | |____pack
We can use git hash-object
to verify the hashes for the
files notes/note_1.txt
and
notes/note_2.txt
:
# Generate blob object hash for the file notes/note_1.txt
echo 'hash for notes/note_1.txt:'
git hash-object notes/note_1.txt
# Generate blob object hash for the file notes/note_2.txt
echo 'hash for notes/note_2.txt:'
git hash-object notes/note_2.txt
## hash for notes/note_1.txt:
## 6108458417308ddc15d7390a2f8db50cf65ec399
## hash for notes/note_2.txt:
## 476fb98775843929ca6c55b16b04752d973b3d2a
Now that we know the hash associated with
note_1.txt
and note_2.txt
we can use
git cat-file
to print the contents of these files.
# View content of note_1.txt and note_2.txt
git cat-file -p 6108458 # note_1.txt
git cat-file -p 476fb98 # note_2.txt
## This is my first set of notes.
## This is my second set of notes.
The command git cat-file -p <hash>
is
different from the command cat <filename>
:
git cat-file -p <hash>
prints the contents of
files stored in the (hidden) .git directorycat <filename>
prints the contents of files
stored in a “regular” directory# View content of note_1.txt and note_2.txt, the versions not in .git directory
cat notes/note_1.txt
cat notes/note_2.txt
## This is my first set of notes.
## This is my second set of notes.
We also created the directory notes
, but tree
objects (i.e., directories) are not created until a commit has been
made
my_git_repo
After the files have been committed, tree objects will be
created for any sub-directories as well as for the root directory of the
repository:
cd ~/my_git_repo
# Make a commit
git commit -m "initial commit"
## [main (root-commit) 651cb58] initial commit
## 3 files changed, 4 insertions(+)
## create mode 100644 create_dataset.R
## create mode 100644 notes/note_1.txt
## create mode 100644 notes/note_2.txt
Now that we have made our first commit, let’s print contents of
.git/objects
directory
.git/objects
has 6 objects, each associated with a
different hash. Ugh. this is getting confusing! what the hell are all
these things!?# View .git/objects directory
find .git/objects -print | sed -e 's;[^/]*/;|____;g;s;____|; |;g'
## |____objects
## | |____47
## | | |____6fb98775843929ca6c55b16b04752d973b3d2a
## | |____61
## | | |____08458417308ddc15d7390a2f8db50cf65ec399
## | |____65
## | | |____1cb5811251ecfffa1768dbd2b895de45030837
## | |____6c
## | | |____f7bbf49af4f9fd5103cf9f0a3fa25226b12336
## | |____c1
## | | |____cff389562e8bc123e6691a60352fdf839df113
## | |____f5
## | | |____9085df29aed7826a89b23af3f67fc3ab96f643
## | |____info
## | |____pack
Commands to diagnose git objects
git ls-tree -rt HEAD
git rev-parse HEAD:<path/to/directory>
git rev-parse HEAD:notes
git log
my_git_repo
is root directory for our repository)
Example of using git log
to show hash associated
with commits
cd ~/my_git_repo
git log
## commit 651cb5811251ecfffa1768dbd2b895de45030837
## Author: Ozan Jaquette <ozanj@ucla.edu>
## Date: Thu Feb 9 15:45:35 2023 -0800
##
## initial commit
git ls-tree
: List the contents of
a tree object
git ls-tree --help
git ls-tree [<option(s)>] [<tree hash id>] [<path>]
-r
: recurse into subtrees (i.e, show contents of
sub-folders)-t
: show trees when recursing (i.e., when showing
contents of sub-folders, also show folders within these
sub-folders)-r
and -t
together, like:
-rt
[<path>]
HEAD
is basically a shortcut to the most recent commit.
so if we specify the path as HEAD
, this will show the
directory structure of the respository associated with the most recent
commit. For example,
git ls-tree -rt HEAD
HEAD:/path/to/directory
git ls-tree -rt HEAD:notes
cd ~/my_git_repo
# View .git/objects directory
find .git/objects -print | sed -e 's;[^/]*/;|____;g;s;____|; |;g'
## |____objects
## | |____47
## | | |____6fb98775843929ca6c55b16b04752d973b3d2a
## | |____61
## | | |____08458417308ddc15d7390a2f8db50cf65ec399
## | |____65
## | | |____1cb5811251ecfffa1768dbd2b895de45030837
## | |____6c
## | | |____f7bbf49af4f9fd5103cf9f0a3fa25226b12336
## | |____c1
## | | |____cff389562e8bc123e6691a60352fdf839df113
## | |____f5
## | | |____9085df29aed7826a89b23af3f67fc3ab96f643
## | |____info
## | |____pack
Examples of using git ls-tree [<path>]
to identify
git objects
cd ~/my_git_repo
# View .git/objects directory
git ls-tree -rt HEAD
## 100644 blob c1cff389562e8bc123e6691a60352fdf839df113 create_dataset.R
## 040000 tree 6cf7bbf49af4f9fd5103cf9f0a3fa25226b12336 notes
## 100644 blob 6108458417308ddc15d7390a2f8db50cf65ec399 notes/note_1.txt
## 100644 blob 476fb98775843929ca6c55b16b04752d973b3d2a notes/note_2.txt
cd ~/my_git_repo
git ls-tree -rt HEAD:notes
## 100644 blob 6108458417308ddc15d7390a2f8db50cf65ec399 note_1.txt
## 100644 blob 476fb98775843929ca6c55b16b04752d973b3d2a note_2.txt
Examples of using git ls-tree [<hash id>]
to
identify git objects
<path>
cd ~/my_git_repo
git ls-tree -rt 6cf7bbf4
## 100644 blob 6108458417308ddc15d7390a2f8db50cf65ec399 note_1.txt
## 100644 blob 476fb98775843929ca6c55b16b04752d973b3d2a note_2.txt
cd ~/my_git_repo
git ls-tree -rt f59085d
## 100644 blob c1cff389562e8bc123e6691a60352fdf839df113 create_dataset.R
## 040000 tree 6cf7bbf49af4f9fd5103cf9f0a3fa25226b12336 notes
## 100644 blob 6108458417308ddc15d7390a2f8db50cf65ec399 notes/note_1.txt
## 100644 blob 476fb98775843929ca6c55b16b04752d973b3d2a notes/note_2.txt
git rev-parse
command, find hash for a
particular folder associated with the most recent commit
git rev-parse HEAD
: retrieve hash associated with
latest commit
git rev-parse --short HEAD
: retrieve first 7 digits of
hash associated with latest commitgit rev-parse HEAD:path/to/directory
: retrieve hash for
a particular folder in most recent commit
git rev-parse HEAD:notes
Examples of using git rev-parse
cd ~/my_git_repo
echo 'retrieve hash associated with latest commit:'
git rev-parse HEAD
## retrieve hash associated with latest commit:
## 651cb5811251ecfffa1768dbd2b895de45030837
notes
in latest
commitcd ~/my_git_repo
echo 'retrieve hash associated folder notes in latest commit:'
git rev-parse HEAD:notes
## retrieve hash associated folder notes in latest commit:
## 6cf7bbf49af4f9fd5103cf9f0a3fa25226b12336
A tree is a directory that contains references to blobs (files) or other trees (sub-directories)
Show git hash associated with all blob (file) and tree (folder) objects within the root directory
cd ~/my_git_repo
# View .git/objects directory
git ls-tree -rt HEAD
## 100644 blob c1cff389562e8bc123e6691a60352fdf839df113 create_dataset.R
## 040000 tree 6cf7bbf49af4f9fd5103cf9f0a3fa25226b12336 notes
## 100644 blob 6108458417308ddc15d7390a2f8db50cf65ec399 notes/note_1.txt
## 100644 blob 476fb98775843929ca6c55b16b04752d973b3d2a notes/note_2.txt
As we now see, the tree objects for the my_git_repo/
root directory and notes/
sub-directory exists, and another
object has been created for the commit (more info on that in next section):
git cat-file -t
# View object type for my_git_repo/ and notes/ trees
git cat-file -t f59085d # this is hash for the root directory
git cat-file -t 6cf7bbf # this is hash for the "notes" sub-directory
# View object type for the commit
git cat-file -t $(git rev-parse --short HEAD) # git rev-parse retrieves latest commit hash
## tree
## tree
## commit
The content of a tree object is a list of all blobs
(files) and other trees (sub-directories) in the
directory. Each list entry follows the format:
<permission_code> <object_type> <object_hash> <object_name>
<permission_code>
: Code indicating who has
read/write access to the object
100644
for blobs and
100755
or 040000
for trees<object_type>
: Type of the object (i.e., blobs or
trees)<object_hash>
: Reference to the object (i.e., the
hash)<object_name>
: Name of the file or directorygit cat-file
to view tree
object content for my_git_repo/
root directory
First, show files in directory using ls
command with
options al
:
# Show files in directory
ls -al
## total 53
## drwxr-xr-x 1 ozanj None 0 Feb 9 15:45 .
## drwxr-xr-x 1 ozanj None 0 Feb 9 15:45 ..
## drwxr-xr-x 1 ozanj None 0 Feb 9 15:45 .git
## -rw-r--r-- 2 ozanj None 35 Feb 9 15:45 create_dataset.R
## drwxr-xr-x 1 ozanj None 0 Feb 9 15:45 notes
Second, show contents of tree (root directory) using
git cat-file
:
# View type and content of my_git_repo/ tree object
git cat-file -t f59085d # type
git cat-file -p f59085d # content
## tree
## 100644 blob c1cff389562e8bc123e6691a60352fdf839df113 create_dataset.R
## 040000 tree 6cf7bbf49af4f9fd5103cf9f0a3fa25226b12336 notes
git cat-file
to view tree
object content for notes/
sub-directory
# View type and content of notes/ tree object
git cat-file -t 6cf7bbf # type
git cat-file -p 6cf7bbf # content
## tree
## 100644 blob 6108458417308ddc15d7390a2f8db50cf65ec399 note_1.txt
## 100644 blob 476fb98775843929ca6c55b16b04752d973b3d2a note_2.txt
A commit object is created after a commit is made that contains information about the commit:
tree <tree_hash>
parent <commit_hash>
author <username> <email> <time>
committer <username> <email> <time>
<commit_message>
tree
: Reference to the root directory tree object
(i.e., “snapshot” of repository at the point of commit)parent
: Reference to the parent commit (if not the
first commit)author
,
committer
, commit_message
)
All commits except for the initial commit will contain a
reference to its parent
commit.
Investigate contents associated with our first commit using
git log
cd ~/my_git_repo
git log
## commit 651cb5811251ecfffa1768dbd2b895de45030837
## Author: Ozan Jaquette <ozanj@ucla.edu>
## Date: Thu Feb 9 15:45:35 2023 -0800
##
## initial commit
git cat-file
to view commit
object content for first commit
<br?
git rev-list HEAD | tail -n 1
to obtain hash associated
with the commit because commit hash depends on time# Retrieve commit hash for first commit
echo 'retrieve hash associated with first commit:'
git rev-list HEAD | tail -n 1
echo ''
echo 'show object type associated with the git object:'
git cat-file -t $(git rev-list HEAD | tail -n 1)
# View content of the commit object
echo ''
echo 'print contents of the commit:'
git cat-file -p $(git rev-list HEAD | tail -n 1)
## retrieve hash associated with first commit:
## 651cb5811251ecfffa1768dbd2b895de45030837
##
## show object type associated with the git object:
## commit
##
## print contents of the commit:
## tree f59085df29aed7826a89b23af3f67fc3ab96f643
## author Ozan Jaquette <ozanj@ucla.edu> 1675986335 -0800
## committer Ozan Jaquette <ozanj@ucla.edu> 1675986335 -0800
##
## initial commit
cd ~/my_git_repo
echo 'print contents root directory:'
git cat-file -p f59085df
## print contents root directory:
## 100644 blob c1cff389562e8bc123e6691a60352fdf839df113 create_dataset.R
## 040000 tree 6cf7bbf49af4f9fd5103cf9f0a3fa25226b12336 notes
Let’s create a second commit:
# Modify R script
echo "df <- mpg %>% filter(year == 2008)" >> create_dataset.R
# Add R script
git add create_dataset.R
# Make another commit
git commit -m "second commit"
# View .git/objects directory
find .git/objects -print | sed -e 's;[^/]*/;|____;g;s;____|; |;g'
## warning: in the working copy of 'create_dataset.R', LF will be replaced by CRLF the next time Git touches it
## [main dbe6fbb] second commit
## 1 file changed, 1 insertion(+)
## |____objects
## | |____47
## | | |____6fb98775843929ca6c55b16b04752d973b3d2a
## | |____49
## | | |____0ec1c138021b8d5c196c26a2a7b3de69afc2d1
## | |____52
## | | |____4db779f0a3e3b3b353b522285c7da4830e21f1
## | |____61
## | | |____08458417308ddc15d7390a2f8db50cf65ec399
## | |____65
## | | |____1cb5811251ecfffa1768dbd2b895de45030837
## | |____6c
## | | |____f7bbf49af4f9fd5103cf9f0a3fa25226b12336
## | |____c1
## | | |____cff389562e8bc123e6691a60352fdf839df113
## | |____db
## | | |____e6fbbdb626285fd1c8943eb814c2402008213d
## | |____f5
## | | |____9085df29aed7826a89b23af3f67fc3ab96f643
## | |____info
## | |____pack
git cat-file
to view commit
object content for second commit
# Retrieve commit hash for latest commit
echo 'Retrieve commit hash for latest commit:'
git rev-parse HEAD
# View content of the commit object
echo ''
echo 'print contents of most recent commit object:'
git cat-file -p $(git rev-parse HEAD)
## Retrieve commit hash for latest commit:
## dbe6fbbdb626285fd1c8943eb814c2402008213d
##
## print contents of most recent commit object:
## tree 524db779f0a3e3b3b353b522285c7da4830e21f1
## parent 651cb5811251ecfffa1768dbd2b895de45030837
## author Ozan Jaquette <ozanj@ucla.edu> 1675986341 -0800
## committer Ozan Jaquette <ozanj@ucla.edu> 1675986341 -0800
##
## second commit
A tag object is created after a tag is generated:
object <object_hash>
type <object_type>
tag <tag_name>
tagger <username> <email> <time>
<tag_message>
object
: Reference to the tagged objecttype
: Object type of the tagged object (usually a
commit
)tag
,
tagger
, tag_message
)Let’s create a tag for the current commit:
# Create a tag
git tag -a v1 -m "version 1.0"
# View .git/objects directory
find .git/objects -print | sed -e 's;[^/]*/;|____;g;s;____|; |;g'
## |____objects
## | |____47
## | | |____6fb98775843929ca6c55b16b04752d973b3d2a
## | |____49
## | | |____0ec1c138021b8d5c196c26a2a7b3de69afc2d1
## | |____52
## | | |____4db779f0a3e3b3b353b522285c7da4830e21f1
## | |____61
## | | |____08458417308ddc15d7390a2f8db50cf65ec399
## | |____65
## | | |____1cb5811251ecfffa1768dbd2b895de45030837
## | |____6c
## | | |____f7bbf49af4f9fd5103cf9f0a3fa25226b12336
## | |____c1
## | | |____cff389562e8bc123e6691a60352fdf839df113
## | |____d3
## | | |____d7fd4e4df4a9d983c1987465e00b2a5b3207aa
## | |____db
## | | |____e6fbbdb626285fd1c8943eb814c2402008213d
## | |____f5
## | | |____9085df29aed7826a89b23af3f67fc3ab96f643
## | |____info
## | |____pack
git cat-file
to view tag
object
echo 'print hash associated with v1 tag:'
(git show-ref -s v1)
echo ''
echo 'print contents of the v1 tag:'
git cat-file -p $(git show-ref -s v1) # retrieves hash for v1 tag
## print hash associated with v1 tag:
## d3d7fd4e4df4a9d983c1987465e00b2a5b3207aa
##
## print contents of the v1 tag:
## object dbe6fbbdb626285fd1c8943eb814c2402008213d
## type commit
## tag v1
## tagger Ozan Jaquette <ozanj@ucla.edu> 1675986342 -0800
##
## version 1.0
# The tagged object was the second commit
git log
## commit dbe6fbbdb626285fd1c8943eb814c2402008213d
## Author: Ozan Jaquette <ozanj@ucla.edu>
## Date: Thu Feb 9 15:45:41 2023 -0800
##
## second commit
##
## commit 651cb5811251ecfffa1768dbd2b895de45030837
## Author: Ozan Jaquette <ozanj@ucla.edu>
## Date: Thu Feb 9 15:45:35 2023 -0800
##
## initial commit
HEAD
and
refs/
The HEAD
file is a pointer to your current (active)
branch
HEAD
file points to the latest commit
of the branch you are working on, whose hash ID is stored in the
refs/
directory.
Especially when we get to working with multiple branches, the
HEAD
becomes important as it keeps track of which branch
you are currently on.
Note, when we print the full directory structure of .git
directory, we can see the file HEAD
and the directory
refs/heads
cd ~/my_git_repo
find .git -print | sed -e 's;[^/]*/;|____;g;s;____|; |;g'
## .git
## |____COMMIT_EDITMSG
## |____config
## |____description
## |____HEAD
## |____hooks
## | |____applypatch-msg.sample
## | |____commit-msg.sample
## | |____fsmonitor-watchman.sample
## | |____post-update.sample
## | |____pre-applypatch.sample
## | |____pre-commit.sample
## | |____pre-merge-commit.sample
## | |____pre-push.sample
## | |____pre-rebase.sample
## | |____pre-receive.sample
## | |____prepare-commit-msg.sample
## | |____push-to-checkout.sample
## | |____update.sample
## |____index
## |____info
## | |____exclude
## |____logs
## | |____HEAD
## | |____refs
## | | |____heads
## | | | |____main
## |____objects
## | |____47
## | | |____6fb98775843929ca6c55b16b04752d973b3d2a
## | |____49
## | | |____0ec1c138021b8d5c196c26a2a7b3de69afc2d1
## | |____52
## | | |____4db779f0a3e3b3b353b522285c7da4830e21f1
## | |____61
## | | |____08458417308ddc15d7390a2f8db50cf65ec399
## | |____65
## | | |____1cb5811251ecfffa1768dbd2b895de45030837
## | |____6c
## | | |____f7bbf49af4f9fd5103cf9f0a3fa25226b12336
## | |____c1
## | | |____cff389562e8bc123e6691a60352fdf839df113
## | |____d3
## | | |____d7fd4e4df4a9d983c1987465e00b2a5b3207aa
## | |____db
## | | |____e6fbbdb626285fd1c8943eb814c2402008213d
## | |____f5
## | | |____9085df29aed7826a89b23af3f67fc3ab96f643
## | |____info
## | |____pack
## |____refs
## | |____heads
## | | |____main
## | |____tags
## | | |____v1
The directory heads/refs
has a file named
main
cd ~/my_git_repo
find .git/refs -print | sed -e 's;[^/]*/;|____;g;s;____|; |;g'
## |____refs
## | |____heads
## | | |____main
## | |____tags
## | | |____v1
Below we are using cat
command (rather than
git cat
) to print contents of a file
If we output the contents of the file .git/HEAD
, we see
it contains a reference to the main branch:
# View content of HEAD
cat .git/HEAD
## ref: refs/heads/main
Following that reference, we can find the hash ID of the latest
commit stored inside refs/heads/main
:
# View content of refs/heads/main
cat .git/refs/heads/main
## dbe6fbbdb626285fd1c8943eb814c2402008213d
We can use git log
to verify that this is the hash ID of
the latest commit:
# View commit log
git log
## commit dbe6fbbdb626285fd1c8943eb814c2402008213d
## Author: Ozan Jaquette <ozanj@ucla.edu>
## Date: Thu Feb 9 15:45:41 2023 -0800
##
## second commit
##
## commit 651cb5811251ecfffa1768dbd2b895de45030837
## Author: Ozan Jaquette <ozanj@ucla.edu>
## Date: Thu Feb 9 15:45:35 2023 -0800
##
## initial commit
More generally, the refs/
directory stores
references to all branches. In particular, refs/heads/
stores all your local branches:
# View local branches
ls .git/refs/heads
## main
On the other hand, refs/remotes/
contains the remote
HEAD
and your remote-tracking branches. In other words, it
is a local copy of your remote repository.
Inside refs/remotes/
, There will be a folder for each of
your remotes. For example, to view all references for the remote
repository named origin, you can look under
refs/remotes/origin
:
# View remote HEAD and remote-tracking branches for origin
ls .git/refs/remotes/origin
## HEAD
## main
When you run git fetch
, it will update the references in
refs/remotes/
(i.e., your local copy of the remote
repository), but it will not change anything in refs/heads/
(i.e., your local repository). Thus, git fetch
is useful if
you want a local copy of the most up-to-date changes in the remote
repository (e.g., to preview changes), but don’t actually want to merge
these changes into your local repository yet.
On the other hand, git pull
is effectively a
git fetch
followed by a git merge
(discussed
more later). It will not only update refs/remotes/
but
refs/heads
as well to bring your local repository
up-to-date with the remote.
cd ~ # change to root directory
rm -rf my_git_repo # force remove `my_git_repo` (if it exists)
mkdir my_git_repo # make directory `my_git_repo`
# Initialize a new git repository in `my_git_repo` directory
cd my_git_repo
git init
## Initialized empty Git repository in C:/Users/ozanj/Documents/my_git_repo/.git/
# Create new R script
echo "library(tidyverse)" > create_dataset.R
echo "mpg %>% head(5)" >> create_dataset.R
# R script initially starts off under `Untracked Files`
git status
## On branch main
##
## No commits yet
##
## Untracked files:
## (use "git add <file>..." to include in what will be committed)
##
## create_dataset.R
##
## nothing added to commit but untracked files present (use "git add" to track)
# Add R script
git add create_dataset.R
# R script moves to `Changes to be committed`
git status
## On branch main
##
## No commits yet
##
## Changes to be committed:
## (use "git rm --cached <file>..." to unstage)
##
## new file: create_dataset.R
# Once R script has been added, a blob object is created for it in the .git/objects directory
find .git/objects -print | sed -e 's;[^/]*/;|____;g;s;____|; |;g'
## |____objects
## | |____c1
## | | |____cff389562e8bc123e6691a60352fdf839df113
## | |____info
## | |____pack
# We can use `git hash-object` to verify the hash of the blob object
git hash-object create_dataset.R
## c1cff389562e8bc123e6691a60352fdf839df113
# With this hash, we can view the content of create_dataset.R
git cat-file -p c1cff38
## library(tidyverse)
## mpg %>% head(5)
# Make a commit
git commit -m "add create_dataset.R"
# The R script is now no longer listed
git status
## On branch main
## nothing to commit, working tree clean
# Check the commit history
git log
## commit 572522a3f76f84867a47bc79acbe1115cfcf4312
## Author: Ozan Jaquette <ozanj@ucla.edu>
## Date: Thu Feb 9 15:45:46 2023 -0800
##
## add create_dataset.R
# Verify that `HEAD` is indeed pointing to the last commit made, which is our initial commit
cat .git/HEAD
cat .git/refs/heads/main
## ref: refs/heads/main
## 572522a3f76f84867a47bc79acbe1115cfcf4312
# Further modify R script, which is now a tracked file
echo "df <- mpg %>% filter(year == 2008)" >> create_dataset.R
# R script is now under `Changes not staged for commit`
git status
## On branch main
## Changes not staged for commit:
## (use "git add <file>..." to update what will be committed)
## (use "git restore <file>..." to discard changes in working directory)
## modified: create_dataset.R
##
## no changes added to commit (use "git add" and/or "git commit -a")
# View what new changes were made to R script
# below git diff command shows differences between last commit and changes made in working directory which are not yet staged
git diff
## warning: in the working copy of 'create_dataset.R', LF will be replaced by CRLF the next time Git touches it
## diff --git a/create_dataset.R b/create_dataset.R
## index c1cff38..490ec1c 100644
## --- a/create_dataset.R
## +++ b/create_dataset.R
## @@ -1,2 +1,3 @@
## library(tidyverse)
## mpg %>% head(5)
## +df <- mpg %>% filter(year == 2008)
# Add new changes made to R script
git add create_dataset.R
# .git/objects directory now contains blob objects for both versions of R script
# It also contains objects for the commit and root directory tree
find .git/objects -print | sed -e 's;[^/]*/;|____;g;s;____|; |;g'
## warning: in the working copy of 'create_dataset.R', LF will be replaced by CRLF the next time Git touches it
## |____objects
## | |____49
## | | |____0ec1c138021b8d5c196c26a2a7b3de69afc2d1
## | |____57
## | | |____2522a3f76f84867a47bc79acbe1115cfcf4312
## | |____96
## | | |____6cc780d5994bc8a4ed535484cd7f8268e8e874
## | |____c1
## | | |____cff389562e8bc123e6691a60352fdf839df113
## | |____info
## | |____pack
# We can use `git hash-object` to verify the hash for the new blob object
git hash-object create_dataset.R
## 490ec1c138021b8d5c196c26a2a7b3de69afc2d1
# With this hash, we can view the content of the modified create_dataset.R
git cat-file -p 490ec1c
## library(tidyverse)
## mpg %>% head(5)
## df <- mpg %>% filter(year == 2008)
Note that the hash and contents for the above blob (i.e., file) is different from the hash and contents of the blob associated with the first commit (below)
# With this hash, we can view the content of the modified create_dataset.R
git cat-file -p c1cff38
## library(tidyverse)
## mpg %>% head(5)
Commit changes to script file
# Make a commit
git commit -m "modify create_dataset.R"
## [main 21f7f13] modify create_dataset.R
## 1 file changed, 1 insertion(+)
# Check the commit history
git log
## commit 21f7f135ebcd86f7df2da9c0f16b3396e2940158
## Author: Ozan Jaquette <ozanj@ucla.edu>
## Date: Thu Feb 9 15:45:48 2023 -0800
##
## modify create_dataset.R
##
## commit 572522a3f76f84867a47bc79acbe1115cfcf4312
## Author: Ozan Jaquette <ozanj@ucla.edu>
## Date: Thu Feb 9 15:45:46 2023 -0800
##
## add create_dataset.R
# Verify that `HEAD` is pointing to the last commit made, which is now our second commit
cat .git/HEAD
cat .git/refs/heads/main
## ref: refs/heads/main
## 21f7f135ebcd86f7df2da9c0f16b3396e2940158
# View content of commit object for second commit
git cat-file -p $(git rev-parse HEAD)
## tree 6de1187f46bbf4d76cafca7c0e5d3d61db6b5a53
## parent 572522a3f76f84867a47bc79acbe1115cfcf4312
## author Ozan Jaquette <ozanj@ucla.edu> 1675986348 -0800
## committer Ozan Jaquette <ozanj@ucla.edu> 1675986348 -0800
##
## modify create_dataset.R
Sometimes we want to undo changes we have made to
files in our git repository.
When thinking about undoing changes to
files, helpful to keep in mind this visualization of the three
components and workflow of a git repository
Credit: Lucas Maurer, medium.com
Overview of four different undo changes operations
git restore <file(s)>
<file>
in your working
directory are discarded, gone forevergit restore --staged <file(s)>
git reset HEAD <file_name(s)>
<file>
are unstaged; these
unstaged changes are retained in your working directorygit reset <commit_hash>
<commit_hash>
git revert --no-edit <commit_hash>
<commit_hash>
git revert
does not remove any previous commits;
rather, it creates a new commit (e.g., commit 3) that changes things
back to the way they were prior to some commit (e.g., prior to commit
2)Image of git revert
vs. git reset
Credit: NUKE Designs, Git revert
git restore
: discard unstaged changesgit restore
: Discard/undo changes made
in working directory to tracked file(s)
git restore --help
git restore [<file_name(s)>]
file_name(s)
in
the working directory
Changes not staged for commit
when you check
git status
)git restore
to discard
changes to a tracked, unstaged file
In this git restore
example we will:
create_dataset.R
that contains two lines of codecreate_dataset.R
to “staging area” and
“commit” those changes to the local repositorycreate_dataset.R
git restore <file_name>
to undo changes
made in the “working directory” to the file
create_dataset.R
create_dataset.R
goes back to
the way it was after the initial commit and the result of
git status
is “nothing to commit, working tree clean”cd ~ # change to root directory
rm -rf my_git_repo # force remove `my_git_repo` (if it exists)
mkdir my_git_repo # make directory `my_git_repo`
cd my_git_repo
git init
# First, create new R script
echo "library(tidyverse)" > create_dataset.R
echo "mpg %>% head(5)" >> create_dataset.R
# Add/commit R script so it is now tracked
git add create_dataset.R
git commit -m "add create_dataset.R"
# View how create_dataset.R looks when it was committed
cat create_dataset.R
## library(tidyverse)
## mpg %>% head(5)
# Modify R script
echo "df <- mpg %>% filter(year == 2008)" >> create_dataset.R
# View how create_dataset.R looks now
cat create_dataset.R
## library(tidyverse)
## mpg %>% head(5)
## df <- mpg %>% filter(year == 2008)
echo 'output from git status:'
git status
# See exact changes that have been made to file since last commit
echo ''
echo 'output from git diff:'
git diff
## output from git status:
## On branch main
## Changes not staged for commit:
## (use "git add <file>..." to update what will be committed)
## (use "git restore <file>..." to discard changes in working directory)
## modified: create_dataset.R
##
## no changes added to commit (use "git add" and/or "git commit -a")
##
## output from git diff:
## warning: in the working copy of 'create_dataset.R', LF will be replaced by CRLF the next time Git touches it
## diff --git a/create_dataset.R b/create_dataset.R
## index c1cff38..490ec1c 100644
## --- a/create_dataset.R
## +++ b/create_dataset.R
## @@ -1,2 +1,3 @@
## library(tidyverse)
## mpg %>% head(5)
## +df <- mpg %>% filter(year == 2008)
# Undo those changes using git checkout
git restore create_dataset.R
# View file after discarding changes
echo 'view contents of file create_dataset.R after undoing changes using git checkout:'
cat create_dataset.R
echo ''
echo 'output from git status:'
git status
## view contents of file create_dataset.R after undoing changes using git checkout:
## library(tidyverse)
## mpg %>% head(5)
##
## output from git status:
## On branch main
## nothing to commit, working tree clean
git restore
: unstage staged changesUsing git restore
: to unstage staged
changes to a file(s)
--staged
option to
git restore
git restore --staged [<file(s)>]
<file(s)>
are unstagedChanges not staged for commit
when you
check git status
git restore --staged
to
unstage a file
In this example we will:
create_dataset.R
that contains two lines of codecreate_dataset.R
to “staging area” and
“commit” those changes to the local repositorycreate_dataset.R
create_dataset.R
to the staging
areagit restore --staged <file_name>
to
“unstage” changes we had “added” to the staging area.create_dataset.R
are now
unstaged changes in the working directory rather than staged changes
ready to be committed
git restore --staged
does not delete the line of code
we added after the initial commitcd ~ # change to root directory
rm -rf my_git_repo # force remove `my_git_repo` (if it exists)
mkdir my_git_repo # make directory `my_git_repo`
cd my_git_repo
git init
# First, create new R script
echo "library(tidyverse)" > create_dataset.R
echo "mpg %>% head(5)" >> create_dataset.R
# Add/commit R script so it is now tracked
git add create_dataset.R
git commit -m "add create_dataset.R"
# Modify R script
echo "df <- mpg %>% filter(year == 2008)" >> create_dataset.R
# Add new changes to the staging area
git add create_dataset.R
# Check status to verify it has been staged (listed under `Changes to be committed`)
git status
## warning: in the working copy of 'create_dataset.R', LF will be replaced by CRLF the next time Git touches it
## On branch main
## Changes to be committed:
## (use "git restore --staged <file>..." to unstage)
## modified: create_dataset.R
# Use git restore --staged to unstage file
echo 'use git reset to unstage changes added to the staging area'
git restore --staged create_dataset.R
# Check status to verify it has been unstaged (listed under `Changes not staged for commit`)
echo ''
echo 'output from git status (after using git reset to unstage changes):'
git status
echo ''
echo 'print the file create_dataset.R (after using git reset to unstage changes):'
cat create_dataset.R
## use git reset to unstage changes added to the staging area
##
## output from git status (after using git reset to unstage changes):
## On branch main
## Changes not staged for commit:
## (use "git add <file>..." to update what will be committed)
## (use "git restore <file>..." to discard changes in working directory)
## modified: create_dataset.R
##
## no changes added to commit (use "git add" and/or "git commit -a")
##
## print the file create_dataset.R (after using git reset to unstage changes):
## library(tidyverse)
## mpg %>% head(5)
## df <- mpg %>% filter(year == 2008)
git reset
: discard commitsgit reset
: Remove commit(s) prior to a
specific commit (previous commits discarded)
git reset --help
git reset <commit_hash>
:commit_hash
HEAD
pointer will be set to the specified
commitgit reset
to undo a commit
cd ~ # change to root directory
rm -rf my_git_repo # force remove `my_git_repo` (if it exists)
mkdir my_git_repo # make directory `my_git_repo`
cd my_git_repo
git init
# First, create new R script
echo "library(tidyverse)" > create_dataset.R
# Add/commit R script
git add create_dataset.R
git commit -m "add 1st line to create_dataset.R"
# Modify R script
echo "mpg %>% head(5)" >> create_dataset.R
# Add/commit R script
git add create_dataset.R
git commit -m "add 2nd line to create_dataset.R"
## warning: in the working copy of 'create_dataset.R', LF will be replaced by CRLF the next time Git touches it
## [main 12fa051] add 2nd line to create_dataset.R
## 1 file changed, 1 insertion(+)
# View commit log
git log
# this code retrieves the first commit hash
git rev-list HEAD | tail -n 1
## commit 12fa051d8fd7452519701667e93381bdceeeb41d
## Author: Ozan Jaquette <ozanj@ucla.edu>
## Date: Thu Feb 9 15:45:54 2023 -0800
##
## add 2nd line to create_dataset.R
##
## commit 996fd32ce120d748fb9360ec06f42cf2b31ac392
## Author: Ozan Jaquette <ozanj@ucla.edu>
## Date: Thu Feb 9 15:45:53 2023 -0800
##
## add 1st line to create_dataset.R
## 996fd32ce120d748fb9360ec06f42cf2b31ac392
# Specify the hash ID of the commit to undo up to
git reset $(git rev-list HEAD | tail -n 1) # this retrieves the first commit hash
# View commit log - the 2nd commit has been removed
git log
## Unstaged changes after reset:
## M create_dataset.R
## commit 996fd32ce120d748fb9360ec06f42cf2b31ac392
## Author: Ozan Jaquette <ozanj@ucla.edu>
## Date: Thu Feb 9 15:45:53 2023 -0800
##
## add 1st line to create_dataset.R
# changes to files associated with undone commit(s) are retained in working directory
cat create_dataset.R
git status
## library(tidyverse)
## mpg %>% head(5)
## On branch main
## Changes not staged for commit:
## (use "git add <file>..." to update what will be committed)
## (use "git restore <file>..." to discard changes in working directory)
## modified: create_dataset.R
##
## no changes added to commit (use "git add" and/or "git commit -a")
git revert
:git revert --no-edit <commit_hash>
<commit_hash>
git revert
does not remove any previous commits;
rather, it creates a new commit (e.g., commit 3) that changes things
back to the way they were prior to some commit (e.g., prior to commit
2)git revert
: Revert back to a specific
commit, previous commits retained
git revert --help
git revert --no-edit <commit_hash>
--no-edit
option means that you will use the
default message for the revert commit
--no-edit
, you’ll be taken
to a screen where you have a chance to edit the commit message of the
new commit. Just enter :q
to use the default message.<commit_hash>
; does this by creating a new commit
that takes the repository back to way it was before
<commit_hash>
The difference between git revert
and
git reset
(see figure below):
git reset
removes a previous commit, so there will be
no record of this commit in git log
, it’ll be like it never
happenedgit revert
does not remove any previous commits;
rather, it creates a new commit (e.g., commit 3) that changes things
back to the way they were prior to some commit (e.g., prior to commit
2)git revert
so that it does not permanently erase
historygit reset
may also be an
optionCredit: NUKE Designs, Git revert
git revert
to revert a
commit
# First, create new R script
echo "library(tidyverse)" > create_dataset.R
# Add/commit R script
git add create_dataset.R
git commit -m "add 1st line to create_dataset.R"
# Modify R script
echo "mpg %>% head(5)" >> create_dataset.R
# Add/commit R script
git add create_dataset.R
git commit -m "add 2nd line to create_dataset.R"
## warning: in the working copy of 'create_dataset.R', LF will be replaced by CRLF the next time Git touches it
## [main da3a943] add 2nd line to create_dataset.R
## 1 file changed, 1 insertion(+)
# View commit log
git log
# this code retrieve's hash id associated with the most recent commit
git rev-parse HEAD
## commit da3a94323d5341dccd1aac22ca89afe28a7ff53c
## Author: Ozan Jaquette <ozanj@ucla.edu>
## Date: Thu Feb 9 15:45:56 2023 -0800
##
## add 2nd line to create_dataset.R
##
## commit 7e83925fc96e8c87e4b4e1e3ea96d7001d89607c
## Author: Ozan Jaquette <ozanj@ucla.edu>
## Date: Thu Feb 9 15:45:56 2023 -0800
##
## add 1st line to create_dataset.R
## da3a94323d5341dccd1aac22ca89afe28a7ff53c
# Specify the hash ID of the unwanted commit
git revert --no-edit $(git rev-parse HEAD) # git rev-parse retrieves latest commit hash
# View commit log; note, now there are three commits
git log
## [main 4affe54] Revert "add 2nd line to create_dataset.R"
## Date: Thu Feb 9 15:45:57 2023 -0800
## 1 file changed, 1 deletion(-)
## commit 4affe54b218c8e821023695ee58b6146011ed9b2
## Author: Ozan Jaquette <ozanj@ucla.edu>
## Date: Thu Feb 9 15:45:57 2023 -0800
##
## Revert "add 2nd line to create_dataset.R"
##
## This reverts commit da3a94323d5341dccd1aac22ca89afe28a7ff53c.
##
## commit da3a94323d5341dccd1aac22ca89afe28a7ff53c
## Author: Ozan Jaquette <ozanj@ucla.edu>
## Date: Thu Feb 9 15:45:56 2023 -0800
##
## add 2nd line to create_dataset.R
##
## commit 7e83925fc96e8c87e4b4e1e3ea96d7001d89607c
## Author: Ozan Jaquette <ozanj@ucla.edu>
## Date: Thu Feb 9 15:45:56 2023 -0800
##
## add 1st line to create_dataset.R
# The file now only contains the 1st line
cat create_dataset.R
## library(tidyverse)
What is a branch?
Credit: Modified from W3 docs, Git branch
Defining branches in terms of
commits:
Credit: Modified from Mastering git branches by Henrique Mota
Commit objects show relationship between commits and “references”
c281829157334317e93172c8acca20ffaaa59cff
, let’s call it
commit2
commit2
:
b764bdbcfbe2009f01e89283cbbf35c95b9e2ad6
, which is the hash
for commit1
commit2
creates the “reference” to commit1
commit2
cd ~/my_git_repo
echo 'Print commit hash for latest commit:'
git rev-parse HEAD
echo ''
echo 'Print contents of most recent commit object:'
echo ''
git cat-file -p $(git rev-parse HEAD)
## Print commit hash for latest commit:
## c281829157334317e93172c8acca20ffaaa59cff
##
## print contents of most recent commit object:
##
## tree 524db779f0a3e3b3b353b522285c7da4830e21f1
## parent b764bdbcfbe2009f01e89283cbbf35c95b9e2ad6
## author Ozan Jaquette <ozanj@ucla.edu> 1643410447 -0800
## committer Ozan Jaquette <ozanj@ucla.edu> 1643410447 -0800
##
## second commit
Why use branches?
predict_grad.Rmd
. For example, one person writing functions
to clean data and create analysis variables and another person writing
functions to run models and store model results.predict_grad.Rmd
at the same timegit branch
git branch
: List, create, or delete
branches
git branch --help
git branch [<option(s)>]
:
*
next to your current branch-a
: List all branches, both local and remote (remote
branches will start with remotes/
)-r
: List only remote branches-v
: Display details about latest commits next to each
branchgit branch <branch_name>
git branch -d <branch_name>
:-m
or --move
: Move/rename a branch-f
or --force
: force; in combination with
-m
(move), “allow renaming the branch even if the new
branch name already exists”-M
: shortcut for -m -f
(move and
force)git branch -M <new_branch_name>
git branch
to list branches
Let’s create a new git repository in the example below. Note that we will not be able to list branches until we’ve made at least 1 commit:
# Initialize a new git repository in `my_git_repo` directory
cd my_git_repo
git init
# Note that you won't be able to list branches until you've made at least 1 commit
git branch
## Initialized empty Git repository in C:/Users/ozanj/Documents/my_git_repo/.git/
# Create new R script
echo "library(tidyverse)" > create_dataset.R
# Add/commit R script
git add create_dataset.R
git commit -m "import tidyverse in create_dataset.R"
git branch
## * main
We can use the -v
option to list branches with more
details about the latest commit on each branch:
# See detailed branch listing
git branch -v
## * main 488ccc8 import tidyverse in create_dataset.R
The -a
option will list both local and remote
branches.
remotes/
in the
output.origin/main
in the example below).HEAD
listed and where it’s pointing to (e.g., remote
HEAD
is pointing to remote main
branch in the
example below):# List local and remote branches
git branch -a
## * main
## remotes/origin/HEAD -> origin/main
## remotes/origin/main
To list only information on remote branches, we can use the
-r
option.
remotes/
prepended,
as that only appears when listing all branches using -a
to
be able to distinguish between local and remote branches:# List only remote branches
git branch -r
## origin/HEAD -> origin/main
## origin/main
git branch
to create new
branch
# See branch listing
git branch
## * main
# Create new branch
git branch dev
# See branch listing
git branch
## dev
## * main
git branch
to delete branch
A common practice around creating/deleting branches (but don’t have to do things this way)
dev
for “development” – to make some improvement to a script.dev
branch to the main
branch (merging covered below).dev
branch.# See branch listing
git branch
## dev
## * main
# Delete branch
git branch -d dev
## Deleted branch dev (was 488ccc8).
# See branch listing
git branch
## * main
git branch -M
to force
rename a branch
Recall that the option -M
is a shortcut for
-m
(move) and -f
(force)
git branch -M main
In the below example, we initialize a repo, make a commit, then force rename the branch name
cd ~ # change to root directory
rm -rf my_git_repo2 # force remove `my_git_repo` (if it exists)
mkdir my_git_repo2 # make directory `my_git_repo`
# Initialize a new git repository in `my_git_repo` directory
cd my_git_repo2
git init
# Create new R script
echo "library(tidyverse)" > create_dataset.R
# Add/commit R script
git add create_dataset.R
git commit -m "import tidyverse in create_dataset.R"
## Initialized empty Git repository in C:/Users/ozanj/Documents/my_git_repo2/.git/
## warning: in the working copy of 'create_dataset.R', LF will be replaced by CRLF the next time Git touches it
## [main (root-commit) 0d4de44] import tidyverse in create_dataset.R
## 1 file changed, 1 insertion(+)
## create mode 100644 create_dataset.R
List existing branches; note default branch was named “main”
cd ~/my_git_repo2
# See branch listing
git branch
## * main
Use git branch -M
to force rename a branch
cd ~/my_git_repo2
# Force rename current branch from current name to branch1
git branch -M branch1
List existing branches
# See branch listing
git branch
## * branch1
git checkout
git checkout
: Switch branches
git checkout --help
git checkout <branch_name>
: Switches to an
existing branch named branch_name
git checkout -b <branch_name>
: Creates a new
branch named branch_name
and switches to itCredit: Modified from Pham Quy, Git tutorial
git checkout
to create a
new branch and switch to it
# Force rename current branch to "main"
git branch -M main
# See branch listing
git branch
## * main
# Create new branch and switch to it
git checkout -b dev
## Switched to a new branch 'dev'
# See branch listing
git branch
## * dev
## main
git checkout
to switch to
an existing branch
# See branch listing
git branch
## * dev
## main
# Switch to an existing branch
git checkout main
## Switched to branch 'main'
# See branch listing
git branch
## dev
## * main
What is an upstream branch?
git push --set-upstream <remote_name> <branch_name>
(or equivalently,
git push -u <remote_name> <branch_name>
)
main
branch to GitHub
for the first timedev
branch is tracking
the remote dev
branch (i.e., the upstream
branch). Recall that under the
hood, we also have a local copy of the remote repository, so
origin/dev
here is this local, remote-tracking branch.Credit: devconnected, How To Set Upstream Branch on Git
Example: Pushing local branch to the remote
When you create a new local branch, you may choose to push it to the
remote if you want a copy of it on GitHub, or if you want others to be
able to contribute to it. When you push a local branch for the first
time, you are required to set the upstream branch, otherwise it won’t
let you push. Then, all subsequent pushes after this first one can just
be git push
.
In the below code chunks (code not run), we do the following:
dev
git remote add <remote_name> <remote_url>
to
connect to a remote repo (assume you created this remote repo on
github.com)
git push -u origin main
to push local branch named
“main” to the remote for the first timedev
branch and use
git push -u origin dev
push local branch named “dev” to the
remote for the first time
Initialize new repo, make a commit, create a new branch named
dev
cd ~ # change to root directory
rm -rf my_git_repo # force remove `my_git_repo` (if it exists)
mkdir my_git_repo # make directory `my_git_repo`
# Initialize a new git repository in `my_git_repo` directory
cd my_git_repo
git init
cd ~/my_git_repo
# Create new R script
echo "library(tidyverse)" > create_dataset.R
# Add/commit R script
git add create_dataset.R
git commit -m "import tidyverse in create_dataset.R"
git branch
# Create and switch to new branch
git checkout -b dev
## Initialized empty Git repository in C:/Users/ozanj/Documents/my_git_repo/.git/
## warning: in the working copy of 'create_dataset.R', LF will be replaced by CRLF the next time Git touches it
## [main (root-commit) 7c1e136] import tidyverse in create_dataset.R
## 1 file changed, 1 insertion(+)
## create mode 100644 create_dataset.R
## * main
## Switched to a new branch 'dev'
After creating repo on github, use `git remote add
cd ~/my_git_repo
git checkout main # switch to main branch; fine if we don't though
#git branch
git remote add origin https://github.com/ozanj/my_git_repo.git
git remote -v
cd ~/my_git_repo
# switch to main branch
git checkout main
git push -u origin main # we have to add -u origin main cuz this is first time we are connecting local main to remote main
## $ git push -u origin main
## Enumerating objects: 3, done.
## Counting objects: 100% (3/3), done.
## Writing objects: 100% (3/3), 259 bytes | 259.00 KiB/s, done.
## Total 3 (delta 0), reused 0 (delta 0), pack-reused 0
## To https://github.com/ozanj/my_git_repo.git
## * [new branch] main -> main
## Branch 'main' set up to track remote branch 'main' from 'origin'.
Show local and remote branches
cd ~/my_git_repo
git branch -a
## dev
## * main
## remotes/origin/main
Switch to dev
branch and push to changes to remote
repository (code not run)
cd ~/my_git_repo
# switch to dev branch
git checkout dev
# print local and remote branches
git branch -a
#push branch to remote for the first time
git push -u origin dev
#git push
$ git push -u origin dev
## Total 0 (delta 0), reused 0 (delta 0), pack-reused 0
## remote:
## remote: Create a pull request for 'dev' on GitHub by visiting:
## remote: https://github.com/ozanj/my_git_repo/pull/new/dev
## remote:
## To https://github.com/ozanj/my_git_repo.git
## * [new branch] dev -> dev
## Branch 'dev' set up to track remote branch 'dev' from 'origin'.
Show local and remote branches
cd ~/my_git_repo
git branch -a
## $ git branch -a
## * dev
## main
## remotes/origin/dev
## remotes/origin/main
What is a merge?
Credit: Modified from Eduard Lebedyuk
Merge terminology:
How programmers use branches and
merges in day-to-day work:
Types of merges:
HEAD
to point to the most recent commit from the “target
branch”
Credit: Modified from Atlassian, Git merge
git merge
git merge
: Merge branches
git merge --help
git merge <branch_name>
: All changes from
branch_name
will be merged into the current branchgit merge --abort
: If a conflict arises during the
merge, this can be run to restore both branches to their original
statesgit merge
for fast-forward
merge
Initialize new repo, make a commit, create a new branch named
dev
cd ~ # change to root directory
rm -rf my_git_repo # force remove `my_git_repo` (if it exists)
mkdir my_git_repo # make directory `my_git_repo`
# Initialize a new git repository in `my_git_repo` directory
cd my_git_repo
git init
cd ~/my_git_repo
# Create new R script
echo "library(tidyverse)" > create_dataset.R
# Add/commit R script
git add create_dataset.R
git commit -m "import tidyverse in create_dataset.R"
git branch
# Create and switch to new branch
git checkout -b dev
## Initialized empty Git repository in C:/Users/ozanj/Documents/my_git_repo/.git/
## warning: in the working copy of 'create_dataset.R', LF will be replaced by CRLF the next time Git touches it
## [main (root-commit) e6e8fc4] import tidyverse in create_dataset.R
## 1 file changed, 1 insertion(+)
## create mode 100644 create_dataset.R
## * main
## Switched to a new branch 'dev'
Continuing from previous examples, we have the main
and
dev
branches, which are even with the same initial
commit:
git checkout main
# View commit log for `main` branch
git log
## Switched to branch 'main'
## commit e6e8fc432c85f0307e10f59b31379aea430e7adb
## Author: Ozan Jaquette <ozanj@ucla.edu>
## Date: Thu Feb 9 15:46:04 2023 -0800
##
## import tidyverse in create_dataset.R
# Switch to `dev` branch
git checkout dev
# View commit log for `dev` branch
git log
## Switched to branch 'dev'
## commit e6e8fc432c85f0307e10f59b31379aea430e7adb
## Author: Ozan Jaquette <ozanj@ucla.edu>
## Date: Thu Feb 9 15:46:04 2023 -0800
##
## import tidyverse in create_dataset.R
# View content of R script, which is the same on both `main` and `dev` branches
cat create_dataset.R
## library(tidyverse)
Now, let’s make a second commit on the dev
branch:
git branch
# Modify R script
echo "mpg %>% head(5)" >> create_dataset.R
echo "df <- mpg %>% filter(year == 2008)" >> create_dataset.R
# Add/commit R script
git add create_dataset.R
git commit -m "manipulate mpg dataset"
## * dev
## main
## warning: in the working copy of 'create_dataset.R', LF will be replaced by CRLF the next time Git touches it
## [dev 745d052] manipulate mpg dataset
## 1 file changed, 2 insertions(+)
# View commit log for `dev` branch
git log
## commit 745d0529d6a9c8f93ddcba1d6b5f6576bf10b2a4
## Author: Ozan Jaquette <ozanj@ucla.edu>
## Date: Thu Feb 9 15:46:06 2023 -0800
##
## manipulate mpg dataset
##
## commit e6e8fc432c85f0307e10f59b31379aea430e7adb
## Author: Ozan Jaquette <ozanj@ucla.edu>
## Date: Thu Feb 9 15:46:04 2023 -0800
##
## import tidyverse in create_dataset.R
Let’s switch back to the main
branch and merge in
dev
. Since the dev
branch is ahead of
main
by 1 commit, the changes can be combined using a
fast-forward merge:
# Switch to `main` branch
git checkout main
# Merge `dev` branch into `main`
git merge dev
## Switched to branch 'main'
## Updating e6e8fc4..745d052
## Fast-forward
## create_dataset.R | 2 ++
## 1 file changed, 2 insertions(+)
# The commit log for `main` now matches the `dev` branch
git log
## commit 745d0529d6a9c8f93ddcba1d6b5f6576bf10b2a4
## Author: Ozan Jaquette <ozanj@ucla.edu>
## Date: Thu Feb 9 15:46:06 2023 -0800
##
## manipulate mpg dataset
##
## commit e6e8fc432c85f0307e10f59b31379aea430e7adb
## Author: Ozan Jaquette <ozanj@ucla.edu>
## Date: Thu Feb 9 15:46:04 2023 -0800
##
## import tidyverse in create_dataset.R
Let’s examine the git object associated with the commit:
# Commit object hash
git rev-parse HEAD # git rev-parse retrieves latest commit hash
git cat-file -t $(git rev-parse HEAD) # type = commit
git cat-file -p $(git rev-parse HEAD)
## 745d0529d6a9c8f93ddcba1d6b5f6576bf10b2a4
## commit
## tree 6de1187f46bbf4d76cafca7c0e5d3d61db6b5a53
## parent e6e8fc432c85f0307e10f59b31379aea430e7adb
## author Ozan Jaquette <ozanj@ucla.edu> 1675986366 -0800
## committer Ozan Jaquette <ozanj@ucla.edu> 1675986366 -0800
##
## manipulate mpg dataset
Examine the “tree” object associated with the commit:
git cat-file -t 6de1187f46bbf4d76cafca7c0e5d3d61db6b5a53 # type = tree
git cat-file -p 6de1187f46bbf4d76cafca7c0e5d3d61db6b5a53
## tree
## 100644 blob 490ec1c138021b8d5c196c26a2a7b3de69afc2d1 create_dataset.R
Examine the “blob” object (file) associated with the commit:
git cat-file -t 490ec1c138021b8d5c196c26a2a7b3de69afc2d1 # type = blob
git cat-file -p 490ec1c138021b8d5c196c26a2a7b3de69afc2d1
## blob
## library(tidyverse)
## mpg %>% head(5)
## df <- mpg %>% filter(year == 2008)
Examine the “parent” object associated with this commit:
# Parent commit hash
git rev-list HEAD | tail -n 1
git cat-file -t $(git rev-list HEAD | tail -n 1) # type = commit
git cat-file -p $(git rev-list HEAD | tail -n 1)
## e6e8fc432c85f0307e10f59b31379aea430e7adb
## commit
## tree cb70185218351236255cdea1297210ceeaf6e3b5
## author Ozan Jaquette <ozanj@ucla.edu> 1675986364 -0800
## committer Ozan Jaquette <ozanj@ucla.edu> 1675986364 -0800
##
## import tidyverse in create_dataset.R
git merge
for 3-way merge
Continuing from previous examples, we have the main
and
dev
branches, which are even with the same two commits:
# View commit log for `main` branch
git log
## commit 745d0529d6a9c8f93ddcba1d6b5f6576bf10b2a4
## Author: Ozan Jaquette <ozanj@ucla.edu>
## Date: Thu Feb 9 15:46:06 2023 -0800
##
## manipulate mpg dataset
##
## commit e6e8fc432c85f0307e10f59b31379aea430e7adb
## Author: Ozan Jaquette <ozanj@ucla.edu>
## Date: Thu Feb 9 15:46:04 2023 -0800
##
## import tidyverse in create_dataset.R
git branch
# View content of R script on the `main` branch
cat create_dataset.R
## dev
## * main
##
## library(tidyverse)
## mpg %>% head(5)
## df <- mpg %>% filter(year == 2008)
Now, let’s suppose the two branches diverge, both making changes
to the R script:
# Modify R script
echo "library(tidyverse)" > create_dataset.R
echo "mpg %>% head(10)" >> create_dataset.R # this line is modified
echo "df <- mpg %>% filter(year == 2008)" >> create_dataset.R
# Add and commit changes
git add create_dataset.R
git commit -m "update head() on line 2"
## warning: in the working copy of 'create_dataset.R', LF will be replaced by CRLF the next time Git touches it
## [main 118f803] update head() on line 2
## 1 file changed, 1 insertion(+), 1 deletion(-)
View updated content of R script on the main
branch,
which now shows head(10)
instead of
head(5)
:
git branch
# View content of R script
cat create_dataset.R
## dev
## * main
##
## library(tidyverse)
## mpg %>% head(10)
## df <- mpg %>% filter(year == 2008)
Switch to dev
branch, and make change to file
create_dataset.R
:
# Switch to `dev` branch
git checkout dev
# Modify R script
echo "df <- df %>% filter(manufacturer == 'audi')" >> create_dataset.R # add new line
# Add and commit changes
git add create_dataset.R
git commit -m "add additional filter() on line 4"
## Switched to branch 'dev'
## warning: in the working copy of 'create_dataset.R', LF will be replaced by CRLF the next time Git touches it
## [dev a6b4653] add additional filter() on line 4
## 1 file changed, 1 insertion(+)
View updated content of R script on the dev
branch,
which now has additional filter()
line at the end:
git branch
# View content of R script
cat create_dataset.R
## * dev
## main
##
## library(tidyverse)
## mpg %>% head(5)
## df <- mpg %>% filter(year == 2008)
## df <- df %>% filter(manufacturer == 'audi')
Before we attempt to merge main
and
dev
branches, we can use git diff
to compare
the two branches:
git diff <branch1_name> <branch2_name>
# View diff between `main` and `dev` branches
git diff main dev
## diff --git a/create_dataset.R b/create_dataset.R
## index da2f5c5..6665541 100644
## --- a/create_dataset.R
## +++ b/create_dataset.R
## @@ -1,3 +1,4 @@
## library(tidyverse)
## -mpg %>% head(10)
## +mpg %>% head(5)
## df <- mpg %>% filter(year == 2008)
## +df <- df %>% filter(manufacturer == 'audi')
Let’s switch back to the main
branch and merge in
dev
. Since both branches made changes to the R script on
different lines, the changes can be combined without any conflicts via a
3-way merge:
# Switch to `main` branch
git checkout main
# Merge changes from `dev` into `main`
git merge dev
## Switched to branch 'main'
## Auto-merging create_dataset.R
## Merge made by the 'ort' strategy.
## create_dataset.R | 1 +
## 1 file changed, 1 insertion(+)
# View commit log - note that a new merge commit was created during the 3-way merge
git log
## commit d1264b137e1dee8040cbf249b29fb626c054db55
## Merge: 118f803 a6b4653
## Author: Ozan Jaquette <ozanj@ucla.edu>
## Date: Thu Feb 9 15:46:11 2023 -0800
##
## Merge branch 'dev'
##
## commit a6b4653b12ed97c4e608f4d01cc297a4ab0bf4a1
## Author: Ozan Jaquette <ozanj@ucla.edu>
## Date: Thu Feb 9 15:46:10 2023 -0800
##
## add additional filter() on line 4
##
## commit 118f803e5d1291e6ccf82659b4105b420fc80985
## Author: Ozan Jaquette <ozanj@ucla.edu>
## Date: Thu Feb 9 15:46:09 2023 -0800
##
## update head() on line 2
##
## commit 745d0529d6a9c8f93ddcba1d6b5f6576bf10b2a4
## Author: Ozan Jaquette <ozanj@ucla.edu>
## Date: Thu Feb 9 15:46:06 2023 -0800
##
## manipulate mpg dataset
##
## commit e6e8fc432c85f0307e10f59b31379aea430e7adb
## Author: Ozan Jaquette <ozanj@ucla.edu>
## Date: Thu Feb 9 15:46:04 2023 -0800
##
## import tidyverse in create_dataset.R
# View merged content of R script
cat create_dataset.R
## library(tidyverse)
## mpg %>% head(10)
## df <- mpg %>% filter(year == 2008)
## df <- df %>% filter(manufacturer == 'audi')
git pull
git pull
: Incorporate remote changes
into your current branch
git pull --help
git pull
: This is equivalent to a
git fetch
followed by a git merge
to
incorporate remote changes to your current branchgit fetch
is useful if you want a local copy of the
most up-to-date changes in the remote repository (e.g., to preview
changes), but don’t actually want to merge these changes into your local
repository yet. On the other hand, running git pull
will
directly incorporate the changes.git fetch
will incorporate changes
into your remote-tracking branch (e.g., origin/main
, your
local copy of the remote main
branch) but not your local
branch (e.g., your local main
branch). Then,
git merge
can merge the change from your remote-tracking
branch into your local branch.Credit: Modified from Medium, Git Fetch vs Git Pull
git pull
to incorporate
remote changes
Let’s say your remote branch is ahead of your local branch by some
commits. You can run git pull
to incorporate those
changes:
# Incorporate remote changes to current branch
git pull
After you run the command, you may see some output indicating
the progress as remote changes are being fetched:
remote: Enumerating objects: 5, done.
remote: Counting objects: 100% (5/5), done.
remote: Compressing objects: 100% (3/3), done.
remote: Total 3 (delta 0), reused 0 (delta 0), pack-reused 0
Unpacking objects: 100% (3/3), done.
Then, the output will look something like the below:
origin/main
branch, which is our local copy of the remote
main
branchgit pull
is just git fetch
followed by
git merge
)From github.com:anyone-can-cook/student_lastname_firstname
1eeaff7..6c3e46f main -> origin/main
Updating 1eeaff7..6c3e46f
Fast-forward
README.md | 2 ++
my_script.R | 4 ++--
2 files changed, 4 insertions(+), 2 deletions(-)
As we’ll see in the next example, the first 2 lines of the output
comes from git fetch
being run and the remaining lines come
from git merge
.
git fetch
and
git merge
to incorporate remote changes
Running git pull
essentially performs a
git fetch
followed by git merge
. If we only
want to fetch the remote changes to our local repository but not
incorporate them into our current branch, we can use
git fetch
:
# Fetch remote changes
git fetch
From github.com:anyone-can-cook/student_lastname_firstname
1eeaff7..6c3e46f main -> origin/main
We can verify that the fetch only updated our remote-tracking
branch origin/main
(i.e., our local copy of the remote
main
branch) and not our local main
branch by
checking the commit history of the branches.
Assuming we are currently on our local main
branch, we
can run git log
to view the commit history. In the output,
we see HEAD -> main
next to the most recent commit,
indicating that HEAD
is pointing to this commit on the
main
branch:
# Check commit history of local `main` branch
git log
commit e329908682dfefba0417bd7337cc660d0d5f133d (HEAD -> main)
Author: username <email@example.com>
Date: Fri Jan 22 11:15:50 2021 -0800
initial commit
Next, we can check the commit log of the remote-tracking branch
origin/main
. In the output below, we can see that the
changes have indeed been fetched to this branch, as indicated by the
presence of the second commit. In parentheses next to the commits, we
can again see that our local main
branch still only
contains the first commit while origin/main
and
origin/HEAD
has been updated with the second.
HEAD
always points to the latest commit on your current
(active) branch, so it also appears next to the second commit:
# Check commit history of `origin/main`
git log origin/main
commit 1eeaff75a681213890e5ce4850d17a1672a4ada6 (HEAD, origin/main, origin/HEAD)
Author: username <email@example.com>
Date: Fri Jan 22 11:27:40 2021 -0800
second commit
commit e329908682dfefba0417bd7337cc660d0d5f133d (main)
Author: username <email@example.com>
Date: Fri Jan 22 11:15:50 2021 -0800
initial commit
After we are satisfied with the fetched changes, we can manually
merge them into our local main
branch:
# Merge changes from `origin/main` into local `main`
git merge origin/main
Updating 1eeaff7..6c3e46f
Fast-forward
README.md | 2 ++
my_script.R | 4 ++--
2 files changed, 4 insertions(+), 2 deletions(-)
Alternatively, we could have just run git pull
instead
of git merge origin/main
and it would’ve also merged in the
changes (after performing git fetch
again).
If we check the commit history on our local main
branch again, we can see it has now been updated:
# Check commit history of local `main` branch
git log
commit 1eeaff75a681213890e5ce4850d17a1672a4ada6 (HEAD -> main, origin/main, origin/HEAD)
Author: username <email@example.com>
Date: Fri Jan 22 11:27:40 2021 -0800
second commit
commit e329908682dfefba0417bd7337cc660d0d5f133d
Author: username <email@example.com>
Date: Fri Jan 22 11:15:50 2021 -0800
initial commit
When attempting a git merge
, two types of merge conflict
can arise for two different reasons: (1) when starting
a merge; and (2) during a merge (From Git
merge conflicts)
First, conflicts that arise when starting a merge
error: Your local changes to the following files would be overwritten by merge:
<file_name>
Please commit your changes or stash them before you merge.
Aborting
Second, conflicts that arise during the merge.
Auto-merging <file_name>
CONFLICT (content): Merge conflict in <file_name>
Automatic merge failed; fix conflicts and then commit the result.
<normal_line_of_code>
<normal_line_of_code>
<<<<<<< HEAD
<conflicted_line_of_code__current_branch_version>
=======
<conflicted_line_of_code__target_branch_version>
>>>>>>> <branch_name>
<normal_line_of_code>
<normal_line_of_code>
These conflicts will need to be resolved manually (described in next
section), or the merge can be aborted using
git merge --abort
.
Continuing from previous examples, our main
branch
currently looks like this:
# View commit log for `main` branch
git log
## commit d1264b137e1dee8040cbf249b29fb626c054db55
## Merge: 118f803 a6b4653
## Author: Ozan Jaquette <ozanj@ucla.edu>
## Date: Thu Feb 9 15:46:11 2023 -0800
##
## Merge branch 'dev'
##
## commit a6b4653b12ed97c4e608f4d01cc297a4ab0bf4a1
## Author: Ozan Jaquette <ozanj@ucla.edu>
## Date: Thu Feb 9 15:46:10 2023 -0800
##
## add additional filter() on line 4
##
## commit 118f803e5d1291e6ccf82659b4105b420fc80985
## Author: Ozan Jaquette <ozanj@ucla.edu>
## Date: Thu Feb 9 15:46:09 2023 -0800
##
## update head() on line 2
##
## commit 745d0529d6a9c8f93ddcba1d6b5f6576bf10b2a4
## Author: Ozan Jaquette <ozanj@ucla.edu>
## Date: Thu Feb 9 15:46:06 2023 -0800
##
## manipulate mpg dataset
##
## commit e6e8fc432c85f0307e10f59b31379aea430e7adb
## Author: Ozan Jaquette <ozanj@ucla.edu>
## Date: Thu Feb 9 15:46:04 2023 -0800
##
## import tidyverse in create_dataset.R
git branch
# View content of R script
cat create_dataset.R
## dev
## * main
##
## library(tidyverse)
## mpg %>% head(10)
## df <- mpg %>% filter(year == 2008)
## df <- df %>% filter(manufacturer == 'audi')
Let’s create a new branch called revision
that
branches off main
, then make a new commit on this
branch:
# Create and switch to new branch
git checkout -b revision
# Modify R script
echo "library(tidyverse)" > create_dataset.R
echo "mpg %>% head(10)" >> create_dataset.R
echo "df <- mpg %>% filter(year == 2008)" >> create_dataset.R
echo "df <- df %>% filter(manufacturer == 'lincoln')" >> create_dataset.R # this line is modified
# Add and commit change
git add create_dataset.R
git commit -m "filter for lincoln instead of audi"
## Switched to a new branch 'revision'
## warning: in the working copy of 'create_dataset.R', LF will be replaced by CRLF the next time Git touches it
## [revision f15f183] filter for lincoln instead of audi
## 1 file changed, 1 insertion(+), 1 deletion(-)
View updated content of R script on the revision
branch,
which now filters for lincoln
instead of audi
on the last line:
git branch
# View content of R script
cat create_dataset.R
## dev
## main
## * revision
##
## library(tidyverse)
## mpg %>% head(10)
## df <- mpg %>% filter(year == 2008)
## df <- df %>% filter(manufacturer == 'lincoln')
Back on the main
branch, let’s modify the same line
in the R script:
# Switch back to `main` branch
git checkout main
# Modify R script
echo "library(tidyverse)" > create_dataset.R
echo "mpg %>% head(10)" >> create_dataset.R
echo "df <- mpg %>% filter(year == 2008)" >> create_dataset.R
echo "df <- df %>% filter(manufacturer == 'chevrolet')" >> create_dataset.R # this line is modified
## Switched to branch 'main'
Notice that we have uncommitted changes in the working
directory:
# Check status
git status
## On branch main
## Changes not staged for commit:
## (use "git add <file>..." to update what will be committed)
## (use "git restore <file>..." to discard changes in working directory)
## modified: create_dataset.R
##
## no changes added to commit (use "git add" and/or "git commit -a")
If we try to merge changes from revision
into
main
now, there will be a merge conflict because we have
uncommited changes. The merge will be aborted:
# Merge changes from `revision` into `main`
git merge revision
## error: Your local changes to the following files would be overwritten by merge:
## create_dataset.R
## Please commit your changes or stash them before you merge.
## Aborting
Continuing from the previous example, let’s say we commited our
change to create_dataset.R
on the main
branch:
# Add and commit change
git add create_dataset.R
git commit -m "filter for chevrolet instead of audi"
## warning: in the working copy of 'create_dataset.R', LF will be replaced by CRLF the next time Git touches it
## [main d0a78e8] filter for chevrolet instead of audi
## 1 file changed, 1 insertion(+), 1 deletion(-)
View updated content of R script on the main
branch,
which now filters for chevrolet
instead of
audi
on the last line:
git branch
# View content of R script
cat create_dataset.R
## dev
## * main
## revision
##
## library(tidyverse)
## mpg %>% head(10)
## df <- mpg %>% filter(year == 2008)
## df <- df %>% filter(manufacturer == 'chevrolet')
Recall that create_dataset.R
on the
revision
branch looks like this:
## library(tidyverse)
## mpg %>% head(10)
## df <- mpg %>% filter(year == 2008)
## df <- df %>% filter(manufacturer == 'lincoln')
If we try to merge changes from revision
into
main
now, there will be a merge conflict because both
branches modified the same line of the same file:
# Merge changes from `revision` into `main`
git merge revision
## Auto-merging create_dataset.R
## CONFLICT (content): Merge conflict in create_dataset.R
## Automatic merge failed; fix conflicts and then commit the result.
You can also tell which file(s) failed to merge by checking
git status
:
## On branch main
## You have unmerged paths.
## (fix conflicts and run "git commit")
## (use "git merge --abort" to abort the merge)
##
## Unmerged paths:
## (use "git add <file>..." to mark resolution)
##
## both modified: create_dataset.R
##
## no changes added to commit (use "git add" and/or "git commit -a")
The file(s) that failed to merge will contain markings by Git
that indicates which line(s) are conflicted:
# View content of R script
cat create_dataset.R
## library(tidyverse)
## mpg %>% head(10)
## df <- mpg %>% filter(year == 2008)
## <<<<<<< HEAD
## df <- df %>% filter(manufacturer == 'chevrolet')
## =======
## df <- df %>% filter(manufacturer == 'lincoln')
## >>>>>>> revision
What to do when you encounter a merge conflict?
git merge --abort
to
abort the merge and restore the branches back to their original
states<<<<<<< HEAD
, =======
,
>>>>>>> <branch_name>
) and
choose which version of the conflicted line to keepgit add
the file(s) after you are done resolving the
conflictsgit commit -m "<commit_message>"
to complete the
merge# View content of R script
cat create_dataset.R
## library(tidyverse)
## mpg %>% head(10)
## df <- mpg %>% filter(year == 2008)
## <<<<<<< HEAD
## df <- df %>% filter(manufacturer == 'chevrolet')
## =======
## df <- df %>% filter(manufacturer == 'lincoln')
## >>>>>>> revision
We can manually edit the file to resolve the conflicts, usually
by opening up file in a text editor.
'volkswagen'
instead:## library(tidyverse)
## mpg %>% head(10)
## df <- mpg %>% filter(year == 2008)
## df <- df %>% filter(manufacturer == 'volkswagen')
Finally, we can add and commit the file to complete the
merge:
# Add/commit R script
git add create_dataset.R
git commit -m "merge revision branch"
What is a pull request?
“Pull requests let you tell others about changes you’ve pushed to a branch in a repository on GitHub. Once a pull request is opened, you can discuss and review the potential changes with collaborators and add follow-up commits before your changes are merged into the base branch.” – GitHub Help
Why make a pull request?
Let’s say we create a new R script and add/commit that to the main branch:
# Create new R script
echo "library(tidyverse)" > create_dataset.R
# Add/commit R script
git add create_dataset.R
git commit -m "import tidyverse library"
Then, we create a new branch and make further changes to the R script on the branch:
# Create and switch to new branch
git checkout -b dev
# Modify R script
echo "mpg %>% head(5)" >> create_dataset.R
# Add/commit R script
git add create_dataset.R
git commit -m "preview mpg dataset"
## Switched to a new branch 'dev'
##
## warning: in the working copy of 'create_dataset.R', LF will be replaced by CRLF the next time Git touches it
## [dev 78ce323] preview mpg dataset
## 1 file changed, 1 insertion(+)
At this point, we can push this new branch to the remote if we wanted to open a pull request. But the alternative is to directly merge the changes to main:
# Switch back to main
git checkout main
# Merge in changes from the branch
git merge dev
## Switched to branch 'main'
## Updating 87138bc..78ce323
## Fast-forward
## create_dataset.R | 1 +
## 1 file changed, 1 insertion(+)
Then, we can push the changes to the remote’s main branch, which would also be the ultimate goal of a pull request:
# Push to remote's main
git push
All image credits: GitHub Help
Creating a topical branch:
Making the pull request:
On GitHub, select your branch and click
New pull request
:
Add a title and (optionally) a description for your pull request.
You can also @
users/teams if you want:
Click Create Pull Request
:
Your pull request will appear under the tab
Pull requests
:
Assigning reviewers:
On the right-hand side of the pull request, you are also able to assign Reviewers or Assignees, similar to an issue:
Reviewers should be someone who you want to review the changes you made, while Assignees could be anyone else more generally involved in the pull request
The users listed under Reviewers (unlike Assignees) will also have a status icon:
Similar to the previous example, let’s say we create a new R script and added/committed that to the main branch:
# Create new R script
echo "library(tidyverse)" > create_dataset.R
# Add/commit R script
git add create_dataset.R
git commit -m "import tidyverse library"
Then, we create a new branch and make further changes to the R script on the branch:
# Create and switch to new branch
git checkout -b dev
# Modify R script
echo "mpg %>% head(5)" >> create_dataset.R
# Add/commit R script
git add create_dataset.R
git commit -m "preview mpg dataset"
## Switched to a new branch 'dev'
##
## warning: in the working copy of 'create_dataset.R', LF will be replaced by CRLF the next time Git touches it
## [dev 2c0fbc7] preview mpg dataset
## 1 file changed, 1 insertion(+)
At this point, we can push this new branch to the remote repository. Remember to set the upstream branch if this is the first time you are pushing the branch to remote:
# Push branch to remote (say our remote is called `origin` here)
git push --set-upstream origin dev
All the subsequent steps to open the pull request will be performed on GitHub.
There are two ultimate responses to a pull request.
But before coming to one of these decisions, you will likely want to review the changes in more detail.
Under the Files
tab, you can view all changes that would
potentially be merged if the pull request is completed:
There, you will also see a button called Review changes
that contains three options for leaving a review:
Comment:
Submit review
Approve:
Submit review
Request changes:
Select this option to request further changes before merging
Submit review
The reviewer status will be changed to
You will see that the merge box on the main pull request page is outlined in orange, along with a list of reviewers who requested changes:
To respond to the change request from each reviewer, there are three options:
Approve changes
: The reviewer can select this to
resolve the change request
See review
insteadDismiss review
: The review can be dismissed by anyone
Re-request review
: Another review from the reviewer can
be requested
Note that the merge box outline color and reviewer status do not affect the ability to merge the pull request
.gitignore
fileWhat is a .gitignore
file? (gitignore
documentation)
.gitignore
file
specifies a pattern to ignore (more below)fnmatch
style patternsUntracked files
when you check git status
.gitignore
does not affect files already
being trackedgit rm --cached
.gitignore
file is usually in your project root
directory
.gitignore
or
.gitignore
in various subdirectories if you need to ignore
different files in different locations.gitignore
file yourself or
click Add .gitignore
when you are creating a new repository
on GitHub and select the R
template from the dropdown menu
.gitignore
can be found here (e.g., the R
template)Credit: How to Make Git Forget Tracked Files Now In gitignore
Pattern formats in .gitignore
file:
#
are treated as comments!
means do not ignore this
pattern\
to escape literal #
, !
,
or trailing spaces*
matches anything except /
?
matches any one character except /
[a-z]
, [0-9]
) can be
used to match one of the characters in a range/
at the end will only match directories
and not files/
in the beginning or middle will only
match relative to the directory the .gitignore
file is in
and not any subdirectories
**/
to
the start of the pattern/**/
in the middle of the path matches zero or more
directoriesLet’s say we have a git repository with the following files and directory structure:
## .
## |____A1.csv
## |____A1.png
## |____A1.tsv
## |____ABC
## | |____README.md
## |____B2.csv
## |____blank.txt
## |____de.csv
When we check git status
, all the files are
untracked:
# Check status
git status -u
## On branch main
## Untracked files:
## (use "git add <file>..." to include in what will be committed)
## A1.csv
## A1.png
## A1.tsv
## ABC/README.md
## B2.csv
## de.csv
##
## nothing added to commit but untracked files present (use "git add" to track)
Let’s create a .gitignore
file in the root
directory. In .gitignore
, we can specify which files to
ignore:
# Ignores `A1.csv`, `A1.png`, and `A1.tsv`
echo "A1.csv" > .gitignore
echo "A1.png" >> .gitignore
echo "A1.tsv" >> .gitignore
cat .gitignore
## A1.csv
## A1.png
## A1.tsv
# Check status
git status -u
## On branch main
## Untracked files:
## (use "git add <file>..." to include in what will be committed)
## .gitignore
## ABC/README.md
## B2.csv
## de.csv
##
## nothing added to commit but untracked files present (use "git add" to track)
We can use the wildcard *
to match any characters
that’s not a /
. For example, A*
matches all
files and directories that starts with an A
:
# Ignores `A1.csv`, `A1.png`, `A1.tsv`, and `ABC/` directory using `*`
echo "A*" > .gitignore
cat .gitignore
## A*
# Check status
git status -u
## On branch main
## Untracked files:
## (use "git add <file>..." to include in what will be committed)
## .gitignore
## B2.csv
## de.csv
##
## nothing added to commit but untracked files present (use "git add" to track)
To specify a file or pattern not to match (i.e., not
ignore), put !
at the start of the line:
# Ignores all files and directories starting with `A` except `A1.png`
echo "A*" > .gitignore
echo "!A1.png" >> .gitignore
cat .gitignore
## A*
## !A1.png
# Check status
git status -u
## On branch main
## Untracked files:
## (use "git add <file>..." to include in what will be committed)
## .gitignore
## A1.png
## B2.csv
## de.csv
##
## nothing added to commit but untracked files present (use "git add" to track)
To only match directories, add a trailing /
to your
pattern:
# Ignores `ABC/` directory only and not files starting with `A`
echo "A*/" > .gitignore
cat .gitignore
## A*/
# Check status
git status -u
## On branch main
## Untracked files:
## (use "git add <file>..." to include in what will be committed)
## .gitignore
## A1.csv
## A1.png
## A1.tsv
## B2.csv
## de.csv
##
## nothing added to commit but untracked files present (use "git add" to track)
The ?
can be used to match any one
character that’s not a /
:
# Ignores `A1.csv` and `A1.tsv` using `?`
echo "A1.?sv" > .gitignore
cat .gitignore
## A1.?sv
# Check status
git status -u
## On branch main
## Untracked files:
## (use "git add <file>..." to include in what will be committed)
## .gitignore
## A1.png
## ABC/README.md
## B2.csv
## de.csv
##
## nothing added to commit but untracked files present (use "git add" to track)
Square brackets []
can be used to specify specific
characters to match:
# Ignores `A1.csv` and `A1.tsv` using `[]`
echo "A1.[ct]sv" > .gitignore
cat .gitignore
## A1.[ct]sv
# Check status
git status -u
## On branch main
## Untracked files:
## (use "git add <file>..." to include in what will be committed)
## .gitignore
## A1.png
## ABC/README.md
## B2.csv
## de.csv
##
## nothing added to commit but untracked files present (use "git add" to track)
Ranges can also be specified using square brackets
[]
to match a range of characters (e.g., alphabet or
numeric):
# Ignores `A1.csv` and `B2.csv` using ranges
echo "[a-z][0-9].csv" > .gitignore
cat .gitignore
## [a-z][0-9].csv
# Check status
git status -u
## On branch main
## Untracked files:
## (use "git add <file>..." to include in what will be committed)
## .gitignore
## A1.png
## A1.tsv
## ABC/README.md
## de.csv
##
## nothing added to commit but untracked files present (use "git add" to track)
Ranges can also be alphanumeric:
# Ignores `A1.csv`, `B2.csv`, and `de.csv` using ranges
echo "[a-z][a-z0-9].csv" > .gitignore
cat .gitignore
## [a-z][a-z0-9].csv
# Check status
git status -u
## On branch main
## Untracked files:
## (use "git add <file>..." to include in what will be committed)
## .gitignore
## A1.png
## A1.tsv
## ABC/README.md
##
## nothing added to commit but untracked files present (use "git add" to track)
Let’s say we have a git repository with the following files and directory structure:
## .
## |____blank.txt
## |____doc
## | |____README.md
## |____intput
## | |____doc
## | | |____README.md
## |____output
## | |____doc
## | | |____README.md
## | |____plots
## | | |____doc
## | | | |____README.md
## |____README.md
When we check git status
, all the README.md
files are untracked:
# Check status
git status -u
## On branch main
## Untracked files:
## (use "git add <file>..." to include in what will be committed)
## README.md
## doc/README.md
## intput/doc/README.md
## output/doc/README.md
## output/plots/doc/README.md
##
## nothing added to commit but untracked files present (use "git add" to track)
Let’s create a .gitignore
file in the root
directory. If we add README.md
to .gitignore
,
all the README.md
files will be ignored:
# Ignores all `README.md`
echo "README.md" > .gitignore
cat .gitignore
## README.md
# Check status
git status -u
## On branch main
## Untracked files:
## (use "git add <file>..." to include in what will be committed)
## .gitignore
##
## nothing added to commit but untracked files present (use "git add" to track)
If we add doc/README.md
to the
.gitignore
file, only the doc/README.md
in the
project root directory (i.e., where the .gitignore
file is
located) will be ignored because there’s a /
in the middle
of the pattern:
# Ignores `doc/README.md` in the root directory where `.gitignore` is located
echo "doc/README.md" > .gitignore
cat .gitignore
## doc/README.md
# Check status
git status -u
## On branch main
## Untracked files:
## (use "git add <file>..." to include in what will be committed)
## .gitignore
## README.md
## intput/doc/README.md
## output/doc/README.md
## output/plots/doc/README.md
##
## nothing added to commit but untracked files present (use "git add" to track)
Similarly, if we start a pattern with /
like
/doc
, it will only match things in the directory where the
.gitignore
file is located (i.e., not the /doc
folders nested within the subdirectories):
# Ignores `doc/` in the root directory where `.gitignore` is located
echo "/doc" > .gitignore
cat .gitignore
## /doc
# Check status
git status -u
## On branch main
## Untracked files:
## (use "git add <file>..." to include in what will be committed)
## .gitignore
## README.md
## intput/doc/README.md
## output/doc/README.md
## output/plots/doc/README.md
##
## nothing added to commit but untracked files present (use "git add" to track)
In order to match things in subdirectories, we need to add
**/
to the start of the pattern. So **/doc
will match /doc
in both the directory where
.gitignore
is located as well as in subdirectories:
# Ignores all `doc/` in both the root directory and within subdirectories
echo "**/doc" > .gitignore
cat .gitignore
## **/doc
# Check status
git status -u
## On branch main
## Untracked files:
## (use "git add <file>..." to include in what will be committed)
## .gitignore
## README.md
##
## nothing added to commit but untracked files present (use "git add" to track)
Having /**/
in the middle of the path will match
zero or more directories:
# Ignores `output/doc` and `output/plots/doc`
echo "output/**/doc" > .gitignore
cat .gitignore
## output/**/doc
# Check status
git status -u
## On branch main
## Untracked files:
## (use "git add <file>..." to include in what will be committed)
## .gitignore
## README.md
## doc/README.md
## intput/doc/README.md
##
## nothing added to commit but untracked files present (use "git add" to track)
Having just *
in the path will match any
one directory:
# Ignores `output/plots/doc`
echo "output/*/doc" > .gitignore
cat .gitignore
## output/*/doc
# Check status
git status -u
## On branch main
## Untracked files:
## (use "git add <file>..." to include in what will be committed)
## .gitignore
## README.md
## doc/README.md
## intput/doc/README.md
## output/doc/README.md
##
## nothing added to commit but untracked files present (use "git add" to track)
This matches all doc/
folders that’s inside some
arbitrary folder (indicated by *
) that’s located in the
root directory (i.e., directory where .gitignore
is
located):
# Ignores `output/doc` and `input/doc`
echo "*/doc" > .gitignore
cat .gitignore
## */doc
# Check status
git status -u
## On branch main
## Untracked files:
## (use "git add <file>..." to include in what will be committed)
## .gitignore
## README.md
## doc/README.md
## output/plots/doc/README.md
##
## nothing added to commit but untracked files present (use "git add" to track)
Two primary ways people collaborate on GitHub:
What is a fork?
Why use forks?
Credit: Shaumik Daityari
Overview of fork and pull workflow:
central_repo
repository
your_fork
your_fork
repository only exists on
GitHubclone
the your_fork
repository to your
local machine
add
changes to index/staging areacommit
changes to local your_fork
repositorypush
changes to remote your_fork
repositoryyour_fork
repository be incorporated to
the main central_repo
repository