Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save misho-kr/7db24486d0341a50f86fd7250a58741e to your computer and use it in GitHub Desktop.
Save misho-kr/7db24486d0341a50f86fd7250a58741e to your computer and use it in GitHub Desktop.
Summary of "Introduction to Data Visualization in Python" course on Datacamp

Lead by Team Anaconda, Data Science Training

This course provides a stronger foundation in data visualization in Python, broader coverage of the Matplotlib library and an overview of seaborn, a package for statistical graphics. Topics covered include customizing graphics, plotting two-dimensional arrays (like pseudocolor plots, contour plots, and images), statistical graphics (like visualizing distributions and regressions), and working with time series and image data.

Customizing plots

Review of basic plotting with Matplotlib, customizing plots using Matplotlib. Overlaying plots, making subplots, controlling axes, adding legends and annotations, and using different plot styles.

  • Ploting many graphs on common axes
  • Creating axes within a figure: axes([x_lo, y_lo, width, height])
    • Units between 0 and 1 ( gure dimensions)
  • Creating subplots within a figure: subplot(nrows, ncols, nsubplot)
    • Row-wise from top left
    • Indexed from 1
  • Controlling axis extents: axis([xmin, xmax, ymin, ymax])
  • Control over individual axis extents: xlim([xmin, xmax]), ylim([ymin, ymax])
    • Other axis() options: axis('off'), axis('equal'), axis('square'), axis('tight')
import matplotlib.pyplot as plt

plt.plot(t, temperature, 'r')
plt.plot(t, dewpoint, 'b')
plt.xlabel('Date')
plt.title('Temperature & Dew Point')
plt.show()

plt.axes([0.05,0.05,0.425,0.9])
plt.plot(t, temperature, 'r')
plt.xlabel('Date')
plt.title('Temperature')
plt.axes([0.525,0.05,0.425,0.9])
plt.plot(t, dewpoint, 'b')
plt.xlabel('Date')
plt.title('Dew Point')
plt.show()

plt.subplot(2, 1, 1)
plt.plot(t, temperature, 'r')
plt.xlabel('Date')
plt.title('Temperature')
plt.subplot(2, 1, 2)
plt.plot(t, dewpoint, 'b')
plt.xlabel('Date')
plt.title('Dew Point')
plt.tight_layout()
plt.show()

plt.plot(yr, gdp)
plt.xlabel('Year')
plt.ylabel('Billions of Dollars')
plt.title('US Gross Domestic Product')
plt.xlim((1947, 1957))
plt.ylim((0, 1000))
plt.show()
  • Legends provide labels for overlaid points and curves
    • Legend locations: upper left, center right, best, ...
  • Plot annotations
    • Text labels and arrows using annotate() method
    • Options for annotate(): xy, xytext, arrorprops
    • Keyword arrowpropss is a dict of arrow properties: width, color, etc.
  • Working with plot styles
  • Style sheets in matplotlib
    • Defaults for lines, points, backgrounds, etc.
    • Switch styles globally with plt.style.use()
    • plt.style.availabl : list of styles
import matplotlib.pyplot as plt

plt.scatter(setosa_len, setosa_wid, marker='o', color='red', label='setosa')
plt.scatter(versicolor_len, versicolor_wid, marker='o', color='green', label='versicolor')
plt.scatter(virginica_len, virginica_wid, marker='o', color='blue', label='virginica')
plt.legend(loc='upper right')
plt.title('Iris data')
plt.xlabel('sepal length (cm)')
plt.ylabel('sepal width (cm)')
plt.show()

plt.annotate('setosa', xy=(5.0, 3.5))
plt.annotate('virginica', xy=(7.25, 3.5))
plt.annotate('versicolor', xy=(5.0, 2.0))
plt.show()

plt.style.use('ggplot')
plt.style.use('fivethirtyeight')

Plotting 2D arrays

Various techniques for visualizing two-dimensional arrays. The use, presentation, and orientation of grids for representing two-variable functions followed by discussions of pseudocolor plots, contour plots, color maps, two-dimensional histograms, and images.

  • Using meshgrid()
    • Orientations of 2D arrays & images
    • Color bar
    • Color map
  • Visualizing bivariate functions
    • Contour plots
  • More examples at [http://matplotlib.org/gallery.html]
import numpy as np
u = np.linspace(-2, 2, 3)
v = np.linspace(-1, 1, 5)

X,Y = np.meshgrid(u, v)
Z = X**2/25 + Y**2/4

plt.set_cmap('gray')
plt.pcolor(Z)
plt.colorbar()

plt.show()
  • Visualizing bivariate distributions
    • Histograms in 1D
    • Bins in 2D
      • Different shapes available for binning points
      • Common choices: rectangles & hexagons
counts, bins, patches = plt.hist(x, bins=25)
plt.show()

# x & y are 1D arrays of same length
plt.hist2d(x, y, bins=(10, 20))
plt.colorbar()
plt.xlabel('weight ($\\mathrm{kg}$)')
plt.ylabel('acceleration ($\\mathrm{ms}^{-2}$)}')
plt.show()
  • Working with images
    • Grayscale images: rectangular 2D arrays
    • Color images: typically three 2D arrays (channels)
  • Loading images
  • Reduction to gray-scale image
  • Uneven samples, adjusting aspect ratio, adjusting extent
img = plt.imread('sunflower.jpg')

plt.imshow(img)
plt.axis('off')
plt.show()

collapsed = img.mean(axis=2)

plt.set_cmap('gray')
plt.imshow(collapsed, cmap='gray')
plt.axis('off')
plt.show()

uneven = collapsed[::4,::2]
plt.imshow(uneven, aspect=2.0)
plt.axis('off')
plt.show()

plt.imshow(uneven, cmap='gray',
extent=(0,640,0,480))
plt.axis('off')
plt.show()

Statistical plots with Seaborn

High-level tour of the Seaborn plotting library for producing statistical graphics in Python. Tools for computing and visualizing linear regressions, as well as tools for visualizing univariate distributions (like strip, swarm, and violin plots) and multivariate distributions (like joint plots, pair plots, and heatmaps). Grouping categories in plots.

  • Linear regression plots
    • 95% confdence interval highlighted
    • Grouping factors (same plot)
    • Grouping factors (subplots)
    • Residual plots
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

tips = sns.load_dataset('tips')
sns.lmplot(x='total_bill', y='tip', data=tips)
sns.lmplot(x='total_bill', y='tip', data=tips, hue='sex', palette='Set1')
sns.lmplot(x='total_bill', y='tip', data=tips, col='sex')
plt.show()

sns.residplot(x='age',y='fare', data=tips, color='indianred')
plt.show()
  • Visualizing univariate ("one variable") distributions
    • Strip plots
    • Swarm plots
    • Violin plots
    • Combining plots
sns.stripplot(y='tip', data=tips)
sns.stripplot(x='day', y='tip', data=tip)
sns.stripplot(x='day', y='tip', data=tip, size=4, jitter=True)
plt.ylabel('tip ($)')
plt.show()

sns.swarmplot(x='day', y='tip', data=tips)
sns.swarmplot(x='day', y='tip', data=tips, hue='sex')
sns.swarmplot(x='tip', y='day', data=tips, hue='sex', orient='h')
plt.ylabel('tip ($)')
plt.show()

plt.subplot(1,2,1)
sns.boxplot(x='day', y='tip', data=tips)
plt.ylabel('tip ($)')

plt.subplot(1,2,2)
sns.violinplot(x='day', y='tip', data=tips)
plt.ylabel('tip ($)')

plt.tight_layout()
plt.show()

sns.violinplot(x='day', y='tip', data=tips, inner=None, color='lightgray')
sns.stripplot(x='day', y='tip', data=tips, size=4, jitter=True)
plt.ylabel('tip ($)')
plt.show()
  • Visualizing bivariate and multivariate distributions
    • Joint plots
    • Pair plots
    • Heat maps
sns.jointplot(x= 'total_bill', y= 'tip', data=tips)
sns.jointplot(x= 'total_bill', y= 'tip', data=tips, kind='kde')
plt.show()

sns.pairplot(tips, hue='sex')
plt.show()

sns.heatmap(covariance)
plt.title('Covariance plot')
plt.show()

Analyzing time series and images

This chapter examines time series data and images. Customize plots of stock data, generate histograms of image pixel intensities, and enhance image contrast through histogram equalization.

  • Time series
    • pandas time series: datetime as index
    • Datetime: represents periods or time-stamps
    • Datetime index: specialized slicing
    • weather['2010-07-04']
    • weather['2010-03':'2010-04']
    • weather['2010-05']
  • Plotting time series slices
  • Time series with moving windows
  • Averages, medians, standard deviations
temperature = weather['Temperature']
march_apr = temperature['2010-03':'2010-04']  # data of March & April 2010 only

plt.plot(temperature['2010-01'], color='red', label='Temperature')
dew point = weather['DewPoint']
plt.plot(dewpoint['2010-01'], color='blue', label='Dewpoint')
plt.legend(loc='upper right')
plt.xticks(rotation=60)
plt.show()

jan = temperature['2010-01']
dates = jan.index[::96]  # Pick every 4th day
labels = dates.strftime('%b %d') # Make formatted labels

plt.xticks(dates, labels, rotation=60)
  • Histogram equalization in images
    • Equalized image
    • Image histograms
  • Rescaling the image
    • Original and rescaled histograms
    • Image histogram & CDF
    • Equalizing intensity values
    • Equalized histogram & CDF
orig = plt.imread('low-contrast-moon.jpg')
pixels = orig.flatten()
plt.hist(pixels, bins=256, range=(0,256), normed=True, color='blue', alpha=0.3)
plt.show()

minval, maxval = orig.min(), orig.max()
print(minval, maxval)

rescaled = (255/(maxval-minval)) * (pixels - minval)
print(rescaled.min(), rescaled.max())

plt.imshow(rescaled)
plt.axis('off')
plt.show()

plt.hist(orig.flatten(), bins=256, range=(0,255), normed=True, color='blue', alpha=0.2))
plt.hist(rescaled.flatten(), bins=256, range=(0,255), normed=True, color='green', alpha=0.2))
plt.legend(['original', 'rescaled'])
plt.show()
plt.hist(pixels, bins=256, range=(0,256), normed=True, color='blue', alpha=0.3)
plt.twinx()
orig_cdf, bins, patches = plt.hist(pixels, cumulative=True, bins=256, range=(0,256), normed=True, color='red', alpha=0.3)
plt.title('Image histogram and CDF')
plt.xlim((0, 255))
plt.show()

new_pixels = np.interp(pixels,bins[:-1], orig_cdf*255)
new = new_pixels.reshape(orig.shape)
plt.imshow(new)
plt.axis('off')
plt.title('Equalized image')
plt.show()

plt.hist(new_pixels, bins=256, range=(0,256), normed=True, color='blue', alpha=0.3)
plt.twinx()
plt.hist(new_pixels, bins=256, range=(0,256), normed=True, cumulative=True, color='red', alpha=0.1)
plt.title('Equalized image histogram and CDF')
plt.xlim((0, 255))
plt.show()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment