Skip to content

Instantly share code, notes, and snippets.

@odanoburu
Last active December 11, 2017 17:07
Show Gist options
  • Save odanoburu/5723e35f05c7df0aa1709a2aa4c41588 to your computer and use it in GitHub Desktop.
Save odanoburu/5723e35f05c7df0aa1709a2aa4c41588 to your computer and use it in GitHub Desktop.
Programming language history visualization
license: gpl-3.0
height: 500
scrolling: yes
border: yes
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
VISUALIZING PROGRAMMING LANGUAGE HISTORY
bruno cuconato
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
2017-12-11
Table of Contents
─────────────────
1 About
2 Reason for existing
3 Intention
4 Target audience
5 Format
6 Resources
7 Problems and future work
1 About
═══════
We offer a visualization of programming language genealogy, improving
and extending previous work. The basic layout is an interactive graph
showing which programming languages are said to have influenced other
programming languages[1], according to [Wikidata].
On the horizontal axis, we have the year in which a programming
language first appeared. Nodes are sorted vertically (by year)
according to the number of languages they have influenced, but their
exact position is randomly picked at every page rendering (only their
order is the same). The exception is over the nodes which have no
information about inception date; those are sorted horizontally on a
single line on the bottom of the graph. As an additional visual cue to
a node's importance, its radius grows linearly with its outdegree, up
to a certain limit.
By virtue of the [d3-sparql] library, our visualization is
self-updating, querying Wikidata's interface at every refresh. This
means that every future improvements to the underlying data (for
instance, adding a new programming language) will be incorporated
automatically to the visualization.
At the time of writing (2017-12-11), there are 1437 programming
languages Wikidata knows about, 902 of which have no information about
date of creation. The graph is currently relatively sparse, as there
is only information about 594 relations of influence between
languages.
[Wikidata] https://wikidata.org
[d3-sparql] https://github.com/zazuko/d3-sparql
2 Reason for existing
═════════════════════
The main reason for this visualization is to provide an interface for
visual exploration of programming language history, on the lines of
[this one] or the one in Figure [ref:planghist1972]. We have no
narrative to advance or support.
[./sammet1972.png]
[this one] http://rigaux.org/language-study/diagram.html
3 Intention
═══════════
Our intent with this proposal is to offer a pragmatic and analytical
setting for the visualization of programming language genealogy. We
have no need for convincing our readers of anything, or in provoking
emotions in them. In fact, our readers are invited to explore to data
so that they can provoke their own curiosity.
4 Target audience
═════════════════
We have no target audience in mind, although it is unlikely that
anyone without an interest in programming languages will read it. We
therefore assume from our readers a certain level of attraction to the
visualization's theme, which keeps us from having to catch their
interest in the way an ad designer has to catch her audience's
attention.
5 Format
════════
We employ the [D3.js library] to produce our visualization project. We
use [ariutta's pan-and-zoom library for SVG] to add those capabilities
to the SVG produced by D3. Another important component of our project
is the Wikidata query interface, which we access to obtain our
data. All in all, this project requires a working knowledge of
Javascript (programming language), SVG (XML-based vector image
format), HTML (markup language), and of SPARQL (semantic query
language).
Replicability: All code written will be available on GitHub, so that any
replicability requirements should be sufficed.
Scalability: Scalability should not be a problem, as the growth in
programming languages is likely to be limited. There
currently exists a problem of scale, however, in that the
great number of nodes and edges can already hinder browser
performance.
[D3.js library] https://d3js.org
[ariutta's pan-and-zoom library for SVG]
https://github.com/ariutta/svg-pan-zoom
6 Resources
═══════════
Server: the visualization can be done 100% on client-side, so obtaining
a decent server is not necessary. We depend on Wikidata's and on
D3.js's servers, though.
Browsers: we will take D3.js's statement of compatibility at face value,
so we believe our visualization should work on any modern
browser (Google Chrome, Mozilla Firefox, Apple Safari,
Microsoft Edge). Tests will be carried solely on Chrome.
7 Problems and future work
══════════════════════════
Node positioning: We decided not to use a graph where nodes were placed
by force simulation, using D3's module /d3-force/. We
had a technical reason for this: in this kind of
visualizations, a node's position has no meaning in
itself, which prevents us from using it to convey
information to the viewer. We also had a practical
motive: the simulation necessary to determine the
nodes positions is quite expensive in terms of
resources, specially if there are many nodes – this
made our visualization very slow to render.
Wikidata's data: We found several shortcomings on Wikidata's data: some
programming languages had more than one inception date,
while most of them had none. The first problem was
overcome by choosing the minimal date provided, while
the latter demanded special positions in the graph for
these nodes.
Wikidata's query interface: there is an open bug (see tickets [T178564]
and [T165228]) in Wikidata's query interface
in which the query results have wrong
encoding. This does not prevent the
visualization, but does create some weird
characters in the name's of a few languages.
Exploration: the wealth of information made it very difficult to create
a visualization that was easy to navigate. We add zooming
and panning to the visualization in order to improve this
situation, but as a trade-off we lost the capability of
searching for a given node. We are not satisfied with our
approach to the problem.
[T178564] https://phabricator.wikimedia.org/T178564
[T165228] https://phabricator.wikimedia.org/T165228
Footnotes
─────────
[1] I am using programming language /history/, /genealogy/, and
/network/ interchangeably, to this meaning.
<!DOCTYPE html>
<html>
<head>
<meta name="viewport" content="user-scalable=no, width=device-width, initial-scale=1, maximum-scale=1"/>
<style tyle="text/css">
html,body{
width: 100%;
height: 100%;
}
#mainViewContainer {
width: 95%;
height: 95%;
border: 1px solid black;
margin: 10px;
padding: 3px;
overflow: hidden;
}
#mainView {
width: 100%;
height: 100%;
min-height: 100%;
display: inline;
}
#scopeContainer {
z-index: 120;
}
</style>
</head>
<body>
<script src="https://d3js.org/d3.v4.min.js"></script>
<script src="https://cdn.jsdelivr.net/npm/d3-sparql@1.0.0/build/d3-sparql.min.js"></script>
<script src="langs-graph.js"></script>
<script src="https://cdn.jsdelivr.net/npm/svg-pan-zoom@3.5.2/dist/svg-pan-zoom.min.js"></script>
<div id="mainViewContainer">
<svg id="mainView"></svg>
</div>
</body>
</html>
var wikidataUrl = 'https://query.wikidata.org/bigdata/namespace/wdq/sparql';
var query = `SELECT ?langId ?langLabel (MIN(?langYear) as ?languageYear) ?influencedId ?influencedLabel
WHERE {
?lang (wdt:P31/wdt:P279*) wd:Q9143.
BIND(SUBSTR(STR(?lang), STRLEN(STR(wd:)) + 2) AS ?langId)
OPTIONAL { ?lang wdt:P571 ?langDate. }
OPTIONAL { ?lang wdt:P577 ?langPub. }
BIND(IF(BOUND(?langDate), ?langDate, ?langPub) AS ?langDate)
BIND(IF(BOUND(?langDate), YEAR(?langDate), "<nothing>") AS ?langYear)
OPTIONAL { ?influenced (wdt:P31/wdt:P279*) wd:Q9143;
wdt:P737 ?lang. }
BIND(IF(BOUND(?influenced), SUBSTR(STR(?influenced), STRLEN(STR(wd:)) + 2), "<nothing>") AS ?influencedId)
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
GROUP BY ?langId ?langLabel ?influencedId ?influencedLabel`;
d3.sparql(wikidataUrl, query).get(function(error, query_data) {
if (error) throw error;
// nest by ID
var influenceData = d3.nest()
.key(function(d) { return d.langId; })
.entries(query_data);
// returns [{"key": year, "values": [{langId: "837537", langLabel:
// "QBasic", languageYear: "1991", influencedId: "8335348",
// influencedLabel: "FreeBASIC", influencedYear: "2004"} ...]...]
var completeYearData = d3.nest()
.key(function(d) { return d.languageYear; })
.entries(query_data);
var yearData = [],
missingYearData = [];
// sort by inception date presence
completeYearData.forEach(function (d) {if (d.key == "<nothing>") {
missingYearData.push(d);
} else {
yearData.push(d);}});
// no need for key here, as there's only one
missingYearData = missingYearData[0];
// x-axis is arranged according to inception year
var [xMin, xMax] = d3.extent(yearData,
function(d) {return parseInt(d.key,10);});
// y-axis size is determined by maximum number of influences
// accross array
var ySize = d3.max(yearData, function(d) {return d.values.length;});
/// layout vars and scale functions
var nodeRadius = 5, // radius base size
radiusSpan = 5, // how much the radius varies according to influence
horizontalOffset = 30,
verticalOffset = 30,
nodeHeight = 2*nodeRadius*radiusSpan + verticalOffset,
yearWidth = 2*nodeRadius*radiusSpan + horizontalOffset;
// In the SVG container the top-left corner is (0,0) . It is the
// origin and the x axis runs from left to right and y axis runs
// from top to bottom.
var svg = d3.select("svg"),
height = nodeHeight * ySize,
realHeight = height + nodeHeight * 3, // space for no-date
// nodes
width = yearWidth * Math.max(yearData.length, missingYearData.values.length),
realWidth = width + yearWidth * 3;
var xScale = d3.scaleLinear()
.domain([xMin, xMax])
.range([yearWidth, width]);
//// make map of coordinates of every language node
var countMap = new Map(),
xCoordMap = new Map(),
yCoordMap = new Map(),
maxInfluence = 0;
for (let element of influenceData) {
let influenceSize = element.values.length;
if (influenceSize > maxInfluence) { maxInfluence = influenceSize; };
countMap.set(element.key, influenceSize);
};
function influenceCountComp (a,b) {
if (countMap.get(a.langId) > countMap.get(b.langId)) {
return 1;
}
if (countMap.get(a.langId) < countMap.get(b.langId)) {
return -1;
}
return 0;
};
// coordinates for nodes without inception: x-axis according to
// index
var noInceptionXScale = d3.scaleLinear()
.domain([0, missingYearData.values.length])
.range([yearWidth, width]);
var sortedNoInceptionLangs = missingYearData.values.sort(influenceCountComp),
index = 0;
for (let lang of sortedNoInceptionLangs) {
xCoordMap.set(lang.langId, yearWidth + noInceptionXScale(index));
yCoordMap.set(lang.langId, 2*nodeHeight + height); // y-axis constant
index += 1;
};
// coordinates for nodes with inception
for (let yearNest of yearData) {
var yearNum = parseInt(yearNest["key"], 10),
sortedLangs = yearNest.values.sort(influenceCountComp),
langsNum = sortedLangs.length,
leftBound = nodeHeight,
ySpace = height - nodeHeight;
for (let lang of sortedLangs) {
let rightBound = leftBound + ySpace/langsNum,
thisY = d3.randomUniform(leftBound, rightBound)();
xCoordMap.set(lang.langId, xScale(yearNum));
yCoordMap.set(lang.langId, thisY);
leftBound = thisY;
}
};
////
var nodeActiveColour = "#558E6F",
linkActiveColour = "#818182";
var graph = svg.append("g")
.classed("graph", true);
var yearLabels = graph.selectAll("text")
.data(yearData)
.enter()
.append("text")
.attr("x", function(d){ return xScale(parseInt(d.key, 10)); })
.attr("y", nodeHeight/2)
.text(function(d){ return d.key; });
var gLangs = graph.selectAll("g")
.data(influenceData)
.enter()
.append("g")
.attr("id", function(d) {return d.key;});
var radiusScale = d3.scaleLinear()
.domain([0, maxInfluence])
.range([nodeRadius, radiusSpan*nodeRadius]);
var gNodes = gLangs.append("circle")
.attr("r", function(d) { return radiusScale(countMap.get(d.key)); })
.attr("cx", function(d) { return xCoordMap.get(d.key); })
.attr("cy", function(d) { return yCoordMap.get(d.key); })
.style("fill", nodeActiveColour)
.text(function(d) {return d.values[0].langLabel;})
.attr("id", function(d) {return d.key;});
gLangs.each(function(d,i) {
var lang = d3.select(this);
let x1 = xCoordMap.get(d.key),
y1 = yCoordMap.get(d.key);
if (d.values[0].influencedId != "<nothing>") {
lang.selectAll("line")
.data(d.values)
.enter()
.append("line")
.attr("x1", x1)
.attr("y1", y1)
.attr("x2", function(d) {return xCoordMap.get(d.influencedId);})
.attr("y2", function(d) {return yCoordMap.get(d.influencedId);})
.style("stroke-width", "1")
.style("stroke-opacity", "0.8")
.style("stroke", linkActiveColour);
}
lang.append("text")
.attr("x",x1)
.attr("y",y1 - nodeRadius)
.attr("transform", "rotate(30 " + x1 +','+y1+')')
.text(d.values[0].langLabel);
});
gLangs.on("click", highlightChildren);
function highlightChildren(d) {
gLangs.selectAll("circle")
.attr("fill-opacity", "0.5");
gLangs.selectAll("line")
.attr("stroke-opacity", "0.2");
var lang = d3.select(this);
lang.selectAll("line")
.attr("stroke-opacity", "1");
lang.select("circle")
.attr("fill-opacity", "1");
};
// zoom + pan
var langsPan = svgPanZoom("#mainView", {
viewportSelector: ".svg-pan-zoom_viewport"
, preventMouseEventsDefault: true
, zoomScaleSensitivity: 1.25
, maxZoom: 100
});
});
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment