Skip to content

Instantly share code, notes, and snippets.

@allisonking
Last active January 8, 2020 22:27
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
  • Save allisonking/3dc3e9b4b005e8f81f8e4901b07fcd7e to your computer and use it in GitHub Desktop.
Save allisonking/3dc3e9b4b005e8f81f8e4901b07fcd7e to your computer and use it in GitHub Desktop.
Tufte slopegraph with different data sets
scrolling: yes

This is a reusable graph for implementing a slopegraph, a type of graph introduced by Edward Tufte, best for comparing changes across different keys. The goal of this gist is to be a quick substitute to transform a typical table with numbers to a table like structure with numbers but also lines to quickly see trends.

The graph expects data in the form of a CSV, where the first column is a 'key' variable (e.g. Cancer Types) and the rest of the columns are numbers. You only need to specify the SVG ID, the path to the data file, as well as the names for your table headers.

The first example is from Tufte's Beautiful Evidence. The dataset itself is from Hermann Brenner, "Long-term survival rates of cancer patients achieved by the end of the 20th century: a period analysis," The Lancet, 360 (October 12, 2002), 1131-1135.

The second example is just a dataset I had from scraping metadata of Harry Potter fan fictions. This graph should only be treated as a notional example of switching data out and getting a working graph without any modifications. Actually comparing the data in that graph would be inaccurate, since the 'canon percentage' is on a different scale than the 'fan fiction percentage' (canon is how many times a character was mentioned in a book out of all character mentions, fan fiction is the percentage of times a character is tagged as a main character in a fan fiction).

Possible improvements:

  • A lot of room to add interactivity
  • The code prepares enough space for the worst case scenario, where the span between all data values is the max possible. While the spans for the cancer set are relatively similar, the spans for the Harry Potter set are more ranging- we can see that the difference between canon Voldemort and fan fiction Voldemort is much smaller than the difference between canon Draco Malfoy and fan fiction Draco Malfoy. The code prepares the graph so that every character can have as much space as a Draco Malfoy if necessary. This leads to the extra space at the bottom since many of the characters do not need that much space. An improvement would be to dynamically determine proper space ratios.
  • Could have another type of slopegraph that has all values on the same axis, as described by Dave Nash in this thread.
cancer_type 5 year 10 year 15 year 20 year
Prostate 98.8 95.2 87.1 81.1
Thyroid 96.0 95.8 94.0 95.4
Testis 94.7 94.0 91.1 88.2
Melanomas 89.0 86.7 83.5 82.8
Breast 86.4 78.3 71.3 65.0
Hodgkin’s disease 85.1 79.8 73.8 67.1
Corpus uteri, uterus 84.3 83.2 80.8 79.2
Urinary, bladder 82.1 76.2 70.3 67.9
Cervis, uteri 70.5 64.1 62.8 60.0
Larynx 68.8 56.7 54.8 38.7
Rectum 62.6 55.2 51.8 49.2
Kidney, renal pelvis 61.8 55.4 49.8 47.3
Colon 61.7 55.4 53.9 52.3
Non-Hodgkin’s 57.8 46.3 38.3 34.3
Oral cavity, pharynx 56.7 44.3 37.5 33.0
Ovary 55.0 49.3 49.9 49.6
Leukemia 42.5 32.4 29.7 26.2
Brain, nervous system 32.0 29.2 27.6 26.1
Multiple myeloma 29.5 12.7 7.0 4.8
Stomach 23.8 19.4 19.0 14.9
Lung and bronchus 15.0 10.6 8.1 6.5
Esophagus 14.2 7.9 7.7 5.4
Liver, bile duct 7.5 5.8 6.3 7.6
Pancreas 4.0 3.0 2.7 2.7
name canon_percentage fanfiction_percentage
Harry Potter 28.35771773928133 25.926886490636863
Ron Weasley 9.669987732998236 6.490124984003752
Hermione Granger 8.206923376118241 21.711214435012585
Albus Dumbledore 3.6217574723992466 1.7697393678283498
Rubeus Hagrid 3.027855069862071 0.12660495670349355
Severus Snape 2.9261287137599856 8.640702981700295
Voldemort 2.688268557580109 3.387109158384166
Sirius Black 2.200580438620112 8.102546602397304
Draco Malfoy 1.7921790383867396 18.863626668941688
<!DOCTYPE html>
<meta charset="utf-8">
<style>
.label-background {
fill: white;
fill-opacity: 1;
}
body {
font-family: Helvetica;
}
</style>
<body>
<h3>Cancer Survival Rates</h3>
<svg width="700" height="700" id="cancer-container"></svg>
<h3>Harry Potter Canon Mentions vs. Fan Fiction Tags</h3>
<svg width="700" height="700" id="hp-container"></svg>
<script src="https://d3js.org/d3.v4.min.js"></script>
<script src="slopegraph.js"></script>
<script>
SlopeGraph( {
container: '#cancer-container',
data: 'cancer_survival.csv',
header: ['Cancer Type', '5 years', '10 years', '15 years', '20 years']
});
SlopeGraph( {
container: '#hp-container',
data: 'hp_canon_ff.csv',
header: ['Character', 'Canon Percentage', 'Fan Fiction Percentage']
});
</script>
</body>
function SlopeGraph(options) {
var svg = d3.select(options.container),
width = +svg.attr("width"),
height = +svg.attr("height");
var margin = {top: 15, right: 50, bottom: 0, left: 100};
var xScale = d3.scaleBand().rangeRound([0, width-margin.right]).padding(0.1),
miniScaler = d3.scaleLinear();
var focus = svg.append('g')
.attr('transform', 'translate(' + margin.left + ',' + margin.top + ')')
// function to convert input data from strings to floats
var dataConverter = function(d) {
var dataKeys = d3.keys(d);
var row = {};
options.header.forEach(function(label, i) {
if (i!=0) {
// round decimals. this will leave one decimal place. change here if more are needed
row[label]= Math.round(+d[dataKeys[i]] * 10) / 10;
} else {
row[label] = d[dataKeys[i]];
}
})
return row;
}
var data;
d3.csv(options.data, dataConverter, function(input_data) {
data = input_data;
// write out the header with xScale
var headers = d3.keys(data[0]);
xScale.domain(options.header);
focus.selectAll('.table-header')
.data(headers)
.enter()
.append('text')
.attr('text-anchor', 'middle')
.attr('class', 'table-header')
.attr('x', function(d) { return xScale(d)})
.text(function(d) { return d});
// y value to start with
var startHeight = 30;
// padding between rows (15 seems to be about text size)
var textSize = 15;
var padding = 5;
// mini scale for within each row
miniScaler.range([0,height/data.length]);
miniScaler.domain([0,getMaxSpread(data)])
// transform the data into something more usable for the table
var lineData = [];
data.forEach(function(d) {
var entries = d3.entries(d);
entries.forEach(function(entry) {
entry['first'] = getFirstColumnValue(d);
entry['max'] = getMaxValue(d);
})
lineData.push(entries);
})
// create a g for each row, transformed down based on max value
var rows = focus.selectAll('.table-row')
.data(data)
.enter()
.append('g')
.attr('class', 'table-row')
.attr('transform', function(d) {
var spread = getSpread(d);
var transform = 'translate(0, ' + startHeight + ')';
startHeight = startHeight + textSize + miniScaler(spread);
return transform;
})
// add the line
var lines = rows.append('path')
.datum(function(d, i) {
// cuts out the key- that part won't be in the line
return lineData[i].slice(1,lineData[i].length);
})
.attr('class', 'link')
.attr('d', line)
.attr('stroke', 'blue')
.attr('stroke-width', 2)
.attr('fill','none');
// add a group to store the text as well as a background rect
var datums = rows.selectAll('.datum')
.data(function(d, i) {
return lineData[i]
})
.enter().append('g')
.attr('class', 'datum');
// add the background rect
var rectWidth = 40; // seems like a good amount for 3 numbers (e.g. 12.3)
var rectBackgrounds = datums.append('rect')
.attr('class', 'label-background')
.attr('y', -textSize)
.attr('x', -rectWidth/2)
.attr('transform', transformDatum)
.attr('width', rectWidth)
.attr('height', textSize + padding);
// add the text
var textLabels = datums
.append('text')
.attr('class','datum-text')
.attr('text-anchor', 'middle')
.attr('transform', transformDatum)
.text(function(d) { return d.value});
})
// function to find where to transform each datum/rect to
function transformDatum(d) {
if (d.key == options.header[0]) {
return 'translate(' + xScale(d.key) +',' + miniScaler(d.max - d.first)+')'
}
var y = d.max - d.value;
return 'translate('+ xScale(d.key)+',' + (miniScaler(y)) +')';
}
// generates each slope line for each row
var line = d3.line()
.x(function(d) {
return xScale(d.key)+10;
})
.y(function(d) {
return miniScaler(d.max - d.value)-5;
})
.curve(d3.curveLinear);
// below are helper functions to calculate max's and differences
function getFirstColumnValue(d) {
return d[options.header[1]];
}
function getMaxValue(d) {
var keys = d3.keys(d);
// remove the header
keys.shift();
var max = -100000000000;
keys.forEach(function(key) {
if (d[key] > max) {
max = d[key];
}
})
return max;
}
function getSpread(d) {
var keys = d3.keys(d);
var min = 10000000000;
var max = -100000000000;
keys.forEach(function(key) {
if (d[key] > max) {
max = d[key];
}
if (d[key] < min) {
min = d[key];
}
})
return max-min;
}
function getMaxSpread(d){
var keys = d3.keys(d[0]);
// remove the header
keys.shift();
var widestSpread = 0;
d.forEach(function(item) {
var itemSpread = getSpread(item);
if (itemSpread > widestSpread) {
widestSpread = itemSpread;
}
})
return widestSpread;
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment