This is a website dedicated to bringing together tips, tricks, and tools that I have found useful for the analysis of data. The main focus is on command line tools and two programming languages R and Python. Before we begin, I am working on tutorials for learning specific programming languages (e.g. Python) that may be helpful to work through first if you are not familiar with the programming languages.

Since there are many good data science courses available (for example the Open Source Data Science Masters and Johns Hopkins Data Science Specialization Course) I am not going to re-invent the wheel. This is a collection of tools, tips/tricks, articles and resources that I have collected helpful for data science. In each section, I give examples of relevant tools from each of the command line, R and Python. This is definitely a work in progress and is mostly a way for me to keep track of useful things in the analysis of data. My hope is others will find it useful too.

Before starting any data analysis, it’s best to be clear on the type of question being asked. This will ultimately guide the decision of which analyses to perform. Jeff Leek defines this idea as “the data analytic question” (see Figure 2.1 in The Elements of Data Analytic Style).

Once you have clearly defined the type of questioned being asked, the actual mechanics of the data analysis will be (generally) broken up into the following tasks:

  1. Scraping data (e.g. from web or API)
  2. Data cleaning, wrangling, munging, transforming, reshaping, carpentry etc.
  3. Exploratory data analysis (e.g. histograms, scatterplots, boxplots)
  4. Applying models (e.g. inference, prediction)
  5. Creating beautiful graphics and summaries of the analysis

Not all the categories may need to be used for your specific analysis. For example, it may be the case that the data you are working with is already cleaned (this is increasingly becoming the more rare scenario, but you never know).

Scraping data

Tools

Data wrangling

Tools

Exploratory Data Analysis

Tools

Statistical Models

Tools

Visualization and Summaries

Tools

Further reading & resources: