Exploratory Data Analysis matplotlib
Matplotlib is a module which is used for 2D and 3D plotting in python, IPython, or other interfaces such as IPython notebooks. This module will do a lot of what you would expect from a tool to generate scientific figures. For example you can use \LaTeX within the figure texts and can save the figures in many different formats such as .eps, .png, .pdf, etc.
Import matplotlib
We import the pyplot which is part of matplotlib. pyplot is a collection of functions that make matplotlib work like MATLAB.
%matplotlib inline import matplotlib.pyplot as plt import matplotlib import rcParams # this module controls the default values for plotting in matplotlib
For example, to change the font size, line width and figure size,
rcParams['font.size'] = 14 rcParams['lines.linewidth'] = 2 rcParams['figure.figsize'] = (10, 6)
Note: When you use matplotlib to plot figures in an IPython notebook, you can configure the figures to be embedded in the notebook (vs opening in a new window) using the %matplotlib inline option.
Plotting with pyplot
You provide a set of pyplot functions to change a figure and then we ask python to show us the figure.
import numpy as np x = np.arange(0, 5, 0.1) y = np.sin(x) plt.plot(x, y)
A slightly more complicated version is
plt.plot(x, y, 'b-', x, y*2, 'rs, x, y*4, 'g^') plt.legend(loc= "topright") plt.legend("X axis") plt.ylabel("Sine curve") plt.title('Pretty since curve') plt.show()
Here we plot three sine curves with different y values. the string following each pair of (x,y) values represents the color and type of line plotted. The first line is a blue line, the second is a sine wave represented by red squares, the third is a sine wave represented by green triangles.
Other statistical graphs
| Matplotlib function | Description |
|---|---|
plt.plot(x, y) |
Plot x, y values as lines |
plt.scatter(x, y) |
Scatter plots as dots |
plt.hist(x, bins) |
Histogram with defined bin cutoff values |
plt.bar(pos, heights) |
Bar plot |
plt.barh(pos, heights) |
Bar plot (horizontal) |
plt.pie() |
Pie chart |
plt.boxplot([np.random.rand(1000), np.random.rand(1000) + 1]) |
Boxplot |
Customizing the matplotlib pyplot
| Matplotlib.pyplot function | Description |
|---|---|
plt.title() |
Title of plot |
plt.xlabel(), plt.ylabel() |
X and Y axis labels |
plt.xlim(a,b), plt.ylim(a,b) |
X and Y axis limits |
plt.legend(name, loc="topright") |
Title and location of legend |
plt.xticks(loc, labels), plt.yticks(loc, labels) |
X and Y axis ticks (locations and labels). (use option rotation=45 to rotate 45 degrees) |
plt.grid() |
Controls the axis grids (on, off, colors, etc) |
plt.annotate() |
Create a piece of text referring to a data point (annotation) |
plt.subplot(brows, ncols, plot_number) |
Defines which subplot to plot next (e.g. plt.subplot(121) plots 1 row with 2 columns and the last number is the specific subplot |