Skip to content

Instantly share code, notes, and snippets.

@zhuang-hao-ming
Last active July 11, 2018 03:23
Show Gist options
  • Save zhuang-hao-ming/062f23017ef21ccaf9d25f82d7d923c7 to your computer and use it in GitHub Desktop.
Save zhuang-hao-ming/062f23017ef21ccaf9d25f82d7d923c7 to your computer and use it in GitHub Desktop.
使用matplotlib绘制平行坐标图

使用matplotlib绘制平行坐标图

思路

  1. 如果需要绘制n维数据, 生成一个1n-1列的图像, 同一行的子图如果共享y轴, 那么只有第一个子图的y tick label会显示出来, 在平行坐标图中,需要使得每一个子图都有自己的y tick label,所以需要把sharey设置为False

  2. 获得每一个维度数据的最大值,最小值,数据范围,用于tick label。归一化每一个维度的数据到[0,1],用于绘图。

  3. 将多维数据作为一个折线图在每一个子图上都绘制一遍。x = [0,1,2,...,n-1], y = [y1,y2,y3,...,yn]。 限制第一个子图只显示x为[0,1]的范围,限制第二个子图只显示x为[1,2]的范围,以此类推,使得每个子图只显示两个维度的数据,且当前子图第二个显示的维度和下一个子图第一个显示的维度一致。

  4. 使用归一化后的值为所有子图绘制y tick,使用未归一化的值为所有子图标志y tick label。由于子图的数目为n-1,最后一个维度的ticklabel需要绘制在最后一个子图的右侧。

  5. 将所有子图的水平距离调整为0,使得它们在整体上看起来成为一个图。

  6. 绘制图例,标题

函数

获得一个新的子图, 和旧的子图共享同一个x轴,但是有一个新的y轴在旧的y轴的对面。 这个做法,可以实现为一个子图配置两个y轴的功能。

plt.twinx(axes[-1])

规定tick只能在固定的位置

ax.xaxis.set_major_locator(ticker.FixedLocator([x[-2], x[-1]]))

设置图例, 可以将loc理解为锚点,此处的锚点为图例的右上角。 可以将bbox_to_anchor理解为位置,此处的位置为子图的右侧往右20%和顶部

plt.legend(
    [plt.Line2D((0,1),(0,0), color=colours[cat]) for cat in df['mpg'].cat.categories],
    df['mpg'].cat.categories,
    bbox_to_anchor=(1.2, 1), loc=2, borderaxespad=0.)

参考

http://benalexkeen.com/parallel-coordinates-in-matplotlib/

# -*- coding: utf-8 -*-
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from matplotlib import ticker
df = pd.read_csv('Auto.csv')
df['horsepower'] = pd.to_numeric(df['horsepower'].replace('?', np.nan))
df['mpg'] = pd.cut(df['mpg'], [8, 16, 24, 32, 50])
exit()
# 使用pd绘制
# pd.plotting.parallel_coordinates(df[['mpg', 'displacement', 'cylinders', 'horsepower', 'weight', 'acceleration']], 'mpg')
# plt.show()
cols = ['displacement', 'cylinders', 'horsepower', 'weight', 'acceleration']
x = [i for i, _ in enumerate(cols)]
colours = ['#2e8ad8', '#cd3785', '#c64c00', '#889a00']
# create dict of categories: colours
colours = {df['mpg'].cat.categories[i]: colours[i] for i, _ in enumerate(df['mpg'].cat.categories)}
# Create (X-1) sublots along x axis
fig, axes = plt.subplots(1, len(x)-1, sharey=False, figsize=(15,5))
# Get min, max and range for each column
# Normalize the data for each column
min_max_range = {}
for col in cols:
min_max_range[col] = [df[col].min(), df[col].max(), np.ptp(df[col])]
df[col] = np.true_divide(df[col] - df[col].min(), np.ptp(df[col]))
# Plot each row
for i, ax in enumerate(axes):
for idx in df.index:
mpg_category = df.loc[idx, 'mpg']
ax.plot(x, df.loc[idx, cols], colours[mpg_category])
ax.set_xlim([x[i], x[i+1]])
# Set the tick positions and labels on y axis for each plot
# Tick positions based on normalised data
# Tick labels are based on original data
def set_ticks_for_axis(dim, ax, ticks):
min_val, max_val, val_range = min_max_range[cols[dim]]
step = val_range / float(ticks-1)
tick_labels = [round(min_val + step * i, 2) for i in range(ticks)]
norm_min = df[cols[dim]].min()
norm_range = np.ptp(df[cols[dim]])
norm_step = norm_range / float(ticks-1)
ticks = [round(norm_min + norm_step * i, 2) for i in range(ticks)]
ax.yaxis.set_ticks(ticks)
ax.set_yticklabels(tick_labels)
for dim, ax in enumerate(axes):
ax.xaxis.set_major_locator(ticker.FixedLocator([dim]))
set_ticks_for_axis(dim, ax, ticks=6)
ax.set_xticklabels([cols[dim]])
# Move the final axis' ticks to the right-hand side
ax = plt.twinx(axes[-1])
dim = len(axes)
ax.xaxis.set_major_locator(ticker.FixedLocator([x[-2], x[-1]]))
set_ticks_for_axis(dim, ax, ticks=6)
ax.set_xticklabels([cols[-2], cols[-1]])
# Remove space between subplots
plt.subplots_adjust(wspace=0)
# Add legend to plot
plt.legend(
[plt.Line2D((0,1),(0,0), color=colours[cat]) for cat in df['mpg'].cat.categories],
df['mpg'].cat.categories,
bbox_to_anchor=(1.2, 1), loc=2, borderaxespad=0.)
plt.title("Values of car attributes by MPG category")
plt.show()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment