Skip to content

Instantly share code, notes, and snippets.

@schnerd
Last active September 15, 2016 13:09
Show Gist options
  • Save schnerd/3b2ceb117c6f6a52997211eb58859440 to your computer and use it in GitHub Desktop.
Save schnerd/3b2ceb117c6f6a52997211eb58859440 to your computer and use it in GitHub Desktop.
D3 Cluster Scale
license: mit

D3 Cluster Scale is a scale that supports clustering values around natural breaks in the input domain. The scale is similar to d3.scaleQuantile and d3.scaleQuantize in that it maps a continuous domain to a discrete range, however quantile and quantize often produce sub-optimal results when the distribution of data is skewed or when there are extreme outliers.

This block demonstrates how these three scale types would perform for a given input domain. Notice how quantile splits 1, 2, 4, and 5 across three different quantiles, despite their differences being very small. quantize, on the other hand, does a decent job of accurately representing the magnitude of the values, but does not end up using the full color spectrum. Our cluster scale uses the full spectrum and each cluster contains values of a similar magnitude.

While this new clustering scale does produce nice results, it should be noted that it can be an order of magnitude slower than quantile and quantize. For a sample size of 1000, the algorithm takes roughly 30ms on Chrome 53 for a 2013 Macbook Pro 3.1GHz Intel Core i7. The underlying clustering algorithm is Ckmeans from the simple-statistics library (the original algorithm described by Haizhou Wang and Mingzhou Song). Accordingly, it should currently only be used in places where the input domain has less than 1000 values, or performance is not mission critical. I'll be investigating faster clustering algorithms to use in this scale–if you have any suggestions I'd love to hear them.

d3-cluster-scale is available as an npm module and as a d3 plugin included via <script> tag.

###Usage

This scale largely has the same API as d3.scaleQuantile (however we use clusters() instead of quantiles())

var scale = d3.scaleCluster()
    .domain([1, 2, 4, 5, 12, 43, 52, 123, 234, 1244])
    .range(['#E5D6EA', '#C798D3', '#9E58AF', '#7F3391', '#581F66', '#30003A']);

var clusters = scale.clusters(); // [12, 43, 123, 234, 1244]
var color = scale(52); // '#9E58AF'
var extent = scale.invertExtent('#9E58AF'); // [43, 123]

Enjoy!

html, body {
font-family: 'Helvetica Neue', Helvetica, sans-serif;
}
body {
padding: 20px;
}
.scale {
margin-top: 30px;
}
pre {
margin: 0;
}
.values {
text-align: center;
font-size: 16px;
display: inline-block;
padding: 6px 10px;
margin-left: 4px;
border-bottom: 16px solid #ccc;
}
h2 {
margin: 0;
}
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>D3 Cluster Scale</title>
<link href="index.css" rel="stylesheet" type="text/css">
</head>
<body>
<div id="root">
<pre>Values: [1, 2, 4, 5, 12, 43, 52, 123, 234, 1244]</pre>
<div class="scale" data-scale-type="quantile">
<div>
<h2>Quantile</h2>
</div>
<div class="legend"></div>
</div>
<div class="scale" data-scale-type="quantize">
<div>
<h2>Quantize</h2>
</div>
<div class="legend"></div>
</div>
<div class="scale" data-scale-type="cluster">
<div>
<h2>Cluster</h2>
</div>
<div class="legend"></div>
</div>
</div>
<script src="https://d3js.org/d3.v4.min.js"></script>
<script src="https://unpkg.com/d3-scale-cluster@1.0.1/dist/d3-scale-cluster.min.js"></script>
<script src="index.js"></script>
</body>
</html>
(function(){
var domain = [1, 2, 4, 5, 12, 43, 52, 123, 234, 1244];
var range = ['#E5D6EA', '#C798D3', '#9E58AF', '#7F3391', '#581F66', '#30003A'];
document.querySelectorAll('.scale').forEach(function(element) {
var scaleType = element.dataset.scaleType;
var scale;
if (scaleType === 'quantile') {
scale = d3.scaleQuantile().domain(domain).range(range);
} else if (scaleType === 'quantize') {
scale = d3.scaleQuantize().domain(d3.extent(domain)).range(range);
} else {
scale = d3.scaleCluster().domain(domain).range(range);
}
var colors = {}, color;
for (var i = 0; i < domain.length; i++) {
color = scale(domain[i]);
if (!colors[color]) {
colors[color] = [];
}
colors[color].push(domain[i]);
}
element.querySelector('.legend').innerHTML = range.map(function(color){
var extent = colors[color] ? colors[color].join(', ') : '&nbsp;';
return '<span class="values" style="border-color:' + color + '">' + extent + '</span>';
}).join('');
});
})();
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment