Introduction to git/GitHub

module 1 week 1 programming version control git GitHub

Version control is a game changer; or how I learned to love git/GitHub

Stephanie Hicks https://stephaniehicks.com/ (Department of Biostatistics, Johns Hopkins)https://www.jhsph.edu
08-31-2021

Pre-lecture materials

Read ahead

Before class, you can prepare by reading the following materials:

  1. Happy Git with R from Jenny Bryan
  2. Chapter on git and GitHub in dsbook from Rafael Irizarry

Acknowledgements

Material for this lecture was borrowed and adopted from

Learning objectives

At the end of this lesson you will:

Introduction to git/GitHub

This document gives a brief explanation of GitHub and how we will use it for this course.

git

Git is what is called a version control system for file management. The main idea is that as you (and your collaborators) work on a project, the software tracks, and records any changes made by anyone.

GitHub

GitHub is a hosting service on internet for git-aware folders and projects

Since we will only be using Git through GitHub, I tend to not distinguish between the two. In the following, I refer to all of it as just GitHub. Note that other interfaces to Git exist, e.g., Bitbucket, but GitHub is the most widely used one.

Why use git/GitHub?

You want to use GitHub to avoid this:

How not to use GitHub [image from PhD Comics]

Figure 1: How not to use GitHub [image from PhD Comics]

[Source: PhD Comics]

GitHub gives you a clean way to track your projects. It is also very well suited to collaborative work. Historically, version control was used for software development. However, it has become broader and is now used for many types of projects, including data science projects.

To learn a bit more about Git/GitHub and why you might want to use it, read this article by Jenny Bryan.

Note her explanation of what’s special with the README.md file on GitHub.

What to (not) do

GitHub is ideal if you have a project with a fair number of files, most of those files are text files (such as code, LaTeX, (R)markdown, etc.) and different people work on different parts of the project.

GitHub is less useful if you have a lot of non-text files (e.g. Word or Powerpoint) and different team members might want to edit the same document at the same time. In that instance, a solution like Google Docs, Word+Dropbox, Word+Onedrive, etc. might be better.

How to use Git/GitHub

Git and GitHub is fundamentally based on commands you type into the command line. Lots of online resources show you how to use the command line. This is the most powerful, and the way I almost always interact with git/GitHub. However, many folks find this the most confusing way to use git/GitHub. Alternatively, there are graphical interfaces.

Note: As student, you can (and should) upgrade to the Pro version of GitHub for free (i.e. access to unlimited private repositories is one benefit), see the GitHub student developer pack on how to do this.

Getting Started

One of my favorite resources for getting started with git/GitHub is the Happy Git with R from Jenny Bryan:

A screenshot of the Happy Git with R online book from Jenny Bryan .

Figure 2: A screenshot of the Happy Git with R online book from Jenny Bryan .

It truly is one of the best resources out there for getting started with git/GitHub, especially with the integration to RStudio. Therefore, at this point, I will encourage all of you to go read through the online book.

Some of you may only need to skim it, others will need to spend some time reading through it. Either way, I will bet that you won’t regret the time investment.

Using git/GitHub in our course

In this course, you will use git/GitHub in the following ways:

  1. Project 0 (optional) - You will create a website introducing yourself to folks in the course and deploy it on GitHub.
  2. Projects 1-3 - You will be asked to practice using git locally (on your compute environment) to track your changes over time and, if you wish (but highly suggested), you can practice pushing your project solutions to a private GitHub repository on your GitHub account (i.e. git add, git commit, git push, git pull, etc) .

Learning these skills will be useful down the road if you ever work collaboratively on a project (i.e. writing code as a group). In this scenario, you will use the skills you have been practicing in your projects to work together as a team in a single GitHub repository.

Post-lecture materials

Final Questions

Here are some post-lecture questions to help you think about the material discussed.

Questions:

  1. What is version control?

  2. What is the difference between git and GitHub?

  3. What are other version controls software/tools that are available besides git?

Additional Resources

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY-NC-SA 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Hicks (2021, Aug. 31). Statistical Computing: Introduction to git/GitHub. Retrieved from https://stephaniehicks.com/jhustatcomputing2021/posts/2021-08-31-introduction-to-gitgithub/

BibTeX citation

@misc{hicks2021introduction,
  author = {Hicks, Stephanie},
  title = {Statistical Computing: Introduction to git/GitHub},
  url = {https://stephaniehicks.com/jhustatcomputing2021/posts/2021-08-31-introduction-to-gitgithub/},
  year = {2021}
}