Lead by Team Anaconda, Data Science Training
This course provides a stronger foundation in data visualization in Python, broader coverage of the Matplotlib library and an overview of seaborn, a package for statistical graphics. Topics covered include customizing graphics, plotting two-dimensional arrays (like pseudocolor plots, contour plots, and images), statistical graphics (like visualizing distributions and regressions), and working with time series and image data.
Review of basic plotting with Matplotlib, customizing plots using Matplotlib. Overlaying plots, making subplots, controlling axes, adding legends and annotations, and using different plot styles.
- Ploting many graphs on common axes
- Creating axes within a figure:
axes([x_lo, y_lo, width, height])
- Units between 0 and 1 ( gure dimensions)
- Creating subplots within a figure:
subplot(nrows, ncols, nsubplot)
- Row-wise from top left
- Indexed from 1
- Controlling axis extents:
axis([xmin, xmax, ymin, ymax])
- Control over individual axis extents:
xlim([xmin, xmax])
,ylim([ymin, ymax])
- Other axis() options:
axis('off'), axis('equal'), axis('square'), axis('tight')
- Other axis() options:
import matplotlib.pyplot as plt
plt.plot(t, temperature, 'r')
plt.plot(t, dewpoint, 'b')
plt.xlabel('Date')
plt.title('Temperature & Dew Point')
plt.show()
plt.axes([0.05,0.05,0.425,0.9])
plt.plot(t, temperature, 'r')
plt.xlabel('Date')
plt.title('Temperature')
plt.axes([0.525,0.05,0.425,0.9])
plt.plot(t, dewpoint, 'b')
plt.xlabel('Date')
plt.title('Dew Point')
plt.show()
plt.subplot(2, 1, 1)
plt.plot(t, temperature, 'r')
plt.xlabel('Date')
plt.title('Temperature')
plt.subplot(2, 1, 2)
plt.plot(t, dewpoint, 'b')
plt.xlabel('Date')
plt.title('Dew Point')
plt.tight_layout()
plt.show()
plt.plot(yr, gdp)
plt.xlabel('Year')
plt.ylabel('Billions of Dollars')
plt.title('US Gross Domestic Product')
plt.xlim((1947, 1957))
plt.ylim((0, 1000))
plt.show()
- Legends provide labels for overlaid points and curves
- Legend locations:
upper left
,center right
,best
, ...
- Legend locations:
- Plot annotations
- Text labels and arrows using
annotate()
method - Options for
annotate()
:xy
,xytext
,arrorprops
- Keyword
arrowprops
s is a dict of arrow properties:width
,color
, etc.
- Text labels and arrows using
- Working with plot styles
- Style sheets in matplotlib
- Defaults for lines, points, backgrounds, etc.
- Switch styles globally with
plt.style.use()
plt.style.availabl
: list of styles
import matplotlib.pyplot as plt
plt.scatter(setosa_len, setosa_wid, marker='o', color='red', label='setosa')
plt.scatter(versicolor_len, versicolor_wid, marker='o', color='green', label='versicolor')
plt.scatter(virginica_len, virginica_wid, marker='o', color='blue', label='virginica')
plt.legend(loc='upper right')
plt.title('Iris data')
plt.xlabel('sepal length (cm)')
plt.ylabel('sepal width (cm)')
plt.show()
plt.annotate('setosa', xy=(5.0, 3.5))
plt.annotate('virginica', xy=(7.25, 3.5))
plt.annotate('versicolor', xy=(5.0, 2.0))
plt.show()
plt.style.use('ggplot')
plt.style.use('fivethirtyeight')
Various techniques for visualizing two-dimensional arrays. The use, presentation, and orientation of grids for representing two-variable functions followed by discussions of pseudocolor plots, contour plots, color maps, two-dimensional histograms, and images.
- Using meshgrid()
- Orientations of 2D arrays & images
- Color bar
- Color map
- Visualizing bivariate functions
- Contour plots
- More examples at [http://matplotlib.org/gallery.html]
import numpy as np
u = np.linspace(-2, 2, 3)
v = np.linspace(-1, 1, 5)
X,Y = np.meshgrid(u, v)
Z = X**2/25 + Y**2/4
plt.set_cmap('gray')
plt.pcolor(Z)
plt.colorbar()
plt.show()
- Visualizing bivariate distributions
- Histograms in 1D
- Bins in 2D
- Different shapes available for binning points
- Common choices: rectangles & hexagons
counts, bins, patches = plt.hist(x, bins=25)
plt.show()
# x & y are 1D arrays of same length
plt.hist2d(x, y, bins=(10, 20))
plt.colorbar()
plt.xlabel('weight ($\\mathrm{kg}$)')
plt.ylabel('acceleration ($\\mathrm{ms}^{-2}$)}')
plt.show()
- Working with images
- Grayscale images: rectangular 2D arrays
- Color images: typically three 2D arrays (channels)
- Loading images
- Reduction to gray-scale image
- Uneven samples, adjusting aspect ratio, adjusting extent
img = plt.imread('sunflower.jpg')
plt.imshow(img)
plt.axis('off')
plt.show()
collapsed = img.mean(axis=2)
plt.set_cmap('gray')
plt.imshow(collapsed, cmap='gray')
plt.axis('off')
plt.show()
uneven = collapsed[::4,::2]
plt.imshow(uneven, aspect=2.0)
plt.axis('off')
plt.show()
plt.imshow(uneven, cmap='gray',
extent=(0,640,0,480))
plt.axis('off')
plt.show()
High-level tour of the Seaborn plotting library for producing statistical graphics in Python. Tools for computing and visualizing linear regressions, as well as tools for visualizing univariate distributions (like strip, swarm, and violin plots) and multivariate distributions (like joint plots, pair plots, and heatmaps). Grouping categories in plots.
- Linear regression plots
- 95% confdence interval highlighted
- Grouping factors (same plot)
- Grouping factors (subplots)
- Residual plots
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
tips = sns.load_dataset('tips')
sns.lmplot(x='total_bill', y='tip', data=tips)
sns.lmplot(x='total_bill', y='tip', data=tips, hue='sex', palette='Set1')
sns.lmplot(x='total_bill', y='tip', data=tips, col='sex')
plt.show()
sns.residplot(x='age',y='fare', data=tips, color='indianred')
plt.show()
- Visualizing univariate ("one variable") distributions
- Strip plots
- Swarm plots
- Violin plots
- Combining plots
sns.stripplot(y='tip', data=tips)
sns.stripplot(x='day', y='tip', data=tip)
sns.stripplot(x='day', y='tip', data=tip, size=4, jitter=True)
plt.ylabel('tip ($)')
plt.show()
sns.swarmplot(x='day', y='tip', data=tips)
sns.swarmplot(x='day', y='tip', data=tips, hue='sex')
sns.swarmplot(x='tip', y='day', data=tips, hue='sex', orient='h')
plt.ylabel('tip ($)')
plt.show()
plt.subplot(1,2,1)
sns.boxplot(x='day', y='tip', data=tips)
plt.ylabel('tip ($)')
plt.subplot(1,2,2)
sns.violinplot(x='day', y='tip', data=tips)
plt.ylabel('tip ($)')
plt.tight_layout()
plt.show()
sns.violinplot(x='day', y='tip', data=tips, inner=None, color='lightgray')
sns.stripplot(x='day', y='tip', data=tips, size=4, jitter=True)
plt.ylabel('tip ($)')
plt.show()
- Visualizing bivariate and multivariate distributions
- Joint plots
- Pair plots
- Heat maps
sns.jointplot(x= 'total_bill', y= 'tip', data=tips)
sns.jointplot(x= 'total_bill', y= 'tip', data=tips, kind='kde')
plt.show()
sns.pairplot(tips, hue='sex')
plt.show()
sns.heatmap(covariance)
plt.title('Covariance plot')
plt.show()
This chapter examines time series data and images. Customize plots of stock data, generate histograms of image pixel intensities, and enhance image contrast through histogram equalization.
- Time series
- pandas time series: datetime as index
- Datetime: represents periods or time-stamps
- Datetime index: specialized slicing
- weather['2010-07-04']
- weather['2010-03':'2010-04']
- weather['2010-05']
- Plotting time series slices
- Time series with moving windows
- Averages, medians, standard deviations
temperature = weather['Temperature']
march_apr = temperature['2010-03':'2010-04'] # data of March & April 2010 only
plt.plot(temperature['2010-01'], color='red', label='Temperature')
dew point = weather['DewPoint']
plt.plot(dewpoint['2010-01'], color='blue', label='Dewpoint')
plt.legend(loc='upper right')
plt.xticks(rotation=60)
plt.show()
jan = temperature['2010-01']
dates = jan.index[::96] # Pick every 4th day
labels = dates.strftime('%b %d') # Make formatted labels
plt.xticks(dates, labels, rotation=60)
- Histogram equalization in images
- Equalized image
- Image histograms
- Rescaling the image
- Original and rescaled histograms
- Image histogram & CDF
- Equalizing intensity values
- Equalized histogram & CDF
orig = plt.imread('low-contrast-moon.jpg')
pixels = orig.flatten()
plt.hist(pixels, bins=256, range=(0,256), normed=True, color='blue', alpha=0.3)
plt.show()
minval, maxval = orig.min(), orig.max()
print(minval, maxval)
rescaled = (255/(maxval-minval)) * (pixels - minval)
print(rescaled.min(), rescaled.max())
plt.imshow(rescaled)
plt.axis('off')
plt.show()
plt.hist(orig.flatten(), bins=256, range=(0,255), normed=True, color='blue', alpha=0.2))
plt.hist(rescaled.flatten(), bins=256, range=(0,255), normed=True, color='green', alpha=0.2))
plt.legend(['original', 'rescaled'])
plt.show()
plt.hist(pixels, bins=256, range=(0,256), normed=True, color='blue', alpha=0.3)
plt.twinx()
orig_cdf, bins, patches = plt.hist(pixels, cumulative=True, bins=256, range=(0,256), normed=True, color='red', alpha=0.3)
plt.title('Image histogram and CDF')
plt.xlim((0, 255))
plt.show()
new_pixels = np.interp(pixels,bins[:-1], orig_cdf*255)
new = new_pixels.reshape(orig.shape)
plt.imshow(new)
plt.axis('off')
plt.title('Equalized image')
plt.show()
plt.hist(new_pixels, bins=256, range=(0,256), normed=True, color='blue', alpha=0.3)
plt.twinx()
plt.hist(new_pixels, bins=256, range=(0,256), normed=True, cumulative=True, color='red', alpha=0.1)
plt.title('Equalized image histogram and CDF')
plt.xlim((0, 255))
plt.show()