Last active
December 11, 2017 17:07
-
-
Save odanoburu/5723e35f05c7df0aa1709a2aa4c41588 to your computer and use it in GitHub Desktop.
Programming language history visualization
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
license: gpl-3.0 | |
height: 500 | |
scrolling: yes | |
border: yes |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ | |
VISUALIZING PROGRAMMING LANGUAGE HISTORY | |
bruno cuconato | |
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ | |
2017-12-11 | |
Table of Contents | |
───────────────── | |
1 About | |
2 Reason for existing | |
3 Intention | |
4 Target audience | |
5 Format | |
6 Resources | |
7 Problems and future work | |
1 About | |
═══════ | |
We offer a visualization of programming language genealogy, improving | |
and extending previous work. The basic layout is an interactive graph | |
showing which programming languages are said to have influenced other | |
programming languages[1], according to [Wikidata]. | |
On the horizontal axis, we have the year in which a programming | |
language first appeared. Nodes are sorted vertically (by year) | |
according to the number of languages they have influenced, but their | |
exact position is randomly picked at every page rendering (only their | |
order is the same). The exception is over the nodes which have no | |
information about inception date; those are sorted horizontally on a | |
single line on the bottom of the graph. As an additional visual cue to | |
a node's importance, its radius grows linearly with its outdegree, up | |
to a certain limit. | |
By virtue of the [d3-sparql] library, our visualization is | |
self-updating, querying Wikidata's interface at every refresh. This | |
means that every future improvements to the underlying data (for | |
instance, adding a new programming language) will be incorporated | |
automatically to the visualization. | |
At the time of writing (2017-12-11), there are 1437 programming | |
languages Wikidata knows about, 902 of which have no information about | |
date of creation. The graph is currently relatively sparse, as there | |
is only information about 594 relations of influence between | |
languages. | |
[Wikidata] https://wikidata.org | |
[d3-sparql] https://github.com/zazuko/d3-sparql | |
2 Reason for existing | |
═════════════════════ | |
The main reason for this visualization is to provide an interface for | |
visual exploration of programming language history, on the lines of | |
[this one] or the one in Figure [ref:planghist1972]. We have no | |
narrative to advance or support. | |
[./sammet1972.png] | |
[this one] http://rigaux.org/language-study/diagram.html | |
3 Intention | |
═══════════ | |
Our intent with this proposal is to offer a pragmatic and analytical | |
setting for the visualization of programming language genealogy. We | |
have no need for convincing our readers of anything, or in provoking | |
emotions in them. In fact, our readers are invited to explore to data | |
so that they can provoke their own curiosity. | |
4 Target audience | |
═════════════════ | |
We have no target audience in mind, although it is unlikely that | |
anyone without an interest in programming languages will read it. We | |
therefore assume from our readers a certain level of attraction to the | |
visualization's theme, which keeps us from having to catch their | |
interest in the way an ad designer has to catch her audience's | |
attention. | |
5 Format | |
════════ | |
We employ the [D3.js library] to produce our visualization project. We | |
use [ariutta's pan-and-zoom library for SVG] to add those capabilities | |
to the SVG produced by D3. Another important component of our project | |
is the Wikidata query interface, which we access to obtain our | |
data. All in all, this project requires a working knowledge of | |
Javascript (programming language), SVG (XML-based vector image | |
format), HTML (markup language), and of SPARQL (semantic query | |
language). | |
Replicability: All code written will be available on GitHub, so that any | |
replicability requirements should be sufficed. | |
Scalability: Scalability should not be a problem, as the growth in | |
programming languages is likely to be limited. There | |
currently exists a problem of scale, however, in that the | |
great number of nodes and edges can already hinder browser | |
performance. | |
[D3.js library] https://d3js.org | |
[ariutta's pan-and-zoom library for SVG] | |
https://github.com/ariutta/svg-pan-zoom | |
6 Resources | |
═══════════ | |
Server: the visualization can be done 100% on client-side, so obtaining | |
a decent server is not necessary. We depend on Wikidata's and on | |
D3.js's servers, though. | |
Browsers: we will take D3.js's statement of compatibility at face value, | |
so we believe our visualization should work on any modern | |
browser (Google Chrome, Mozilla Firefox, Apple Safari, | |
Microsoft Edge). Tests will be carried solely on Chrome. | |
7 Problems and future work | |
══════════════════════════ | |
Node positioning: We decided not to use a graph where nodes were placed | |
by force simulation, using D3's module /d3-force/. We | |
had a technical reason for this: in this kind of | |
visualizations, a node's position has no meaning in | |
itself, which prevents us from using it to convey | |
information to the viewer. We also had a practical | |
motive: the simulation necessary to determine the | |
nodes positions is quite expensive in terms of | |
resources, specially if there are many nodes – this | |
made our visualization very slow to render. | |
Wikidata's data: We found several shortcomings on Wikidata's data: some | |
programming languages had more than one inception date, | |
while most of them had none. The first problem was | |
overcome by choosing the minimal date provided, while | |
the latter demanded special positions in the graph for | |
these nodes. | |
Wikidata's query interface: there is an open bug (see tickets [T178564] | |
and [T165228]) in Wikidata's query interface | |
in which the query results have wrong | |
encoding. This does not prevent the | |
visualization, but does create some weird | |
characters in the name's of a few languages. | |
Exploration: the wealth of information made it very difficult to create | |
a visualization that was easy to navigate. We add zooming | |
and panning to the visualization in order to improve this | |
situation, but as a trade-off we lost the capability of | |
searching for a given node. We are not satisfied with our | |
approach to the problem. | |
[T178564] https://phabricator.wikimedia.org/T178564 | |
[T165228] https://phabricator.wikimedia.org/T165228 | |
Footnotes | |
───────── | |
[1] I am using programming language /history/, /genealogy/, and | |
/network/ interchangeably, to this meaning. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<!DOCTYPE html> | |
<html> | |
<head> | |
<meta name="viewport" content="user-scalable=no, width=device-width, initial-scale=1, maximum-scale=1"/> | |
<style tyle="text/css"> | |
html,body{ | |
width: 100%; | |
height: 100%; | |
} | |
#mainViewContainer { | |
width: 95%; | |
height: 95%; | |
border: 1px solid black; | |
margin: 10px; | |
padding: 3px; | |
overflow: hidden; | |
} | |
#mainView { | |
width: 100%; | |
height: 100%; | |
min-height: 100%; | |
display: inline; | |
} | |
#scopeContainer { | |
z-index: 120; | |
} | |
</style> | |
</head> | |
<body> | |
<script src="https://d3js.org/d3.v4.min.js"></script> | |
<script src="https://cdn.jsdelivr.net/npm/d3-sparql@1.0.0/build/d3-sparql.min.js"></script> | |
<script src="langs-graph.js"></script> | |
<script src="https://cdn.jsdelivr.net/npm/svg-pan-zoom@3.5.2/dist/svg-pan-zoom.min.js"></script> | |
<div id="mainViewContainer"> | |
<svg id="mainView"></svg> | |
</div> | |
</body> | |
</html> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
var wikidataUrl = 'https://query.wikidata.org/bigdata/namespace/wdq/sparql'; | |
var query = `SELECT ?langId ?langLabel (MIN(?langYear) as ?languageYear) ?influencedId ?influencedLabel | |
WHERE { | |
?lang (wdt:P31/wdt:P279*) wd:Q9143. | |
BIND(SUBSTR(STR(?lang), STRLEN(STR(wd:)) + 2) AS ?langId) | |
OPTIONAL { ?lang wdt:P571 ?langDate. } | |
OPTIONAL { ?lang wdt:P577 ?langPub. } | |
BIND(IF(BOUND(?langDate), ?langDate, ?langPub) AS ?langDate) | |
BIND(IF(BOUND(?langDate), YEAR(?langDate), "<nothing>") AS ?langYear) | |
OPTIONAL { ?influenced (wdt:P31/wdt:P279*) wd:Q9143; | |
wdt:P737 ?lang. } | |
BIND(IF(BOUND(?influenced), SUBSTR(STR(?influenced), STRLEN(STR(wd:)) + 2), "<nothing>") AS ?influencedId) | |
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". } | |
} | |
GROUP BY ?langId ?langLabel ?influencedId ?influencedLabel`; | |
d3.sparql(wikidataUrl, query).get(function(error, query_data) { | |
if (error) throw error; | |
// nest by ID | |
var influenceData = d3.nest() | |
.key(function(d) { return d.langId; }) | |
.entries(query_data); | |
// returns [{"key": year, "values": [{langId: "837537", langLabel: | |
// "QBasic", languageYear: "1991", influencedId: "8335348", | |
// influencedLabel: "FreeBASIC", influencedYear: "2004"} ...]...] | |
var completeYearData = d3.nest() | |
.key(function(d) { return d.languageYear; }) | |
.entries(query_data); | |
var yearData = [], | |
missingYearData = []; | |
// sort by inception date presence | |
completeYearData.forEach(function (d) {if (d.key == "<nothing>") { | |
missingYearData.push(d); | |
} else { | |
yearData.push(d);}}); | |
// no need for key here, as there's only one | |
missingYearData = missingYearData[0]; | |
// x-axis is arranged according to inception year | |
var [xMin, xMax] = d3.extent(yearData, | |
function(d) {return parseInt(d.key,10);}); | |
// y-axis size is determined by maximum number of influences | |
// accross array | |
var ySize = d3.max(yearData, function(d) {return d.values.length;}); | |
/// layout vars and scale functions | |
var nodeRadius = 5, // radius base size | |
radiusSpan = 5, // how much the radius varies according to influence | |
horizontalOffset = 30, | |
verticalOffset = 30, | |
nodeHeight = 2*nodeRadius*radiusSpan + verticalOffset, | |
yearWidth = 2*nodeRadius*radiusSpan + horizontalOffset; | |
// In the SVG container the top-left corner is (0,0) . It is the | |
// origin and the x axis runs from left to right and y axis runs | |
// from top to bottom. | |
var svg = d3.select("svg"), | |
height = nodeHeight * ySize, | |
realHeight = height + nodeHeight * 3, // space for no-date | |
// nodes | |
width = yearWidth * Math.max(yearData.length, missingYearData.values.length), | |
realWidth = width + yearWidth * 3; | |
var xScale = d3.scaleLinear() | |
.domain([xMin, xMax]) | |
.range([yearWidth, width]); | |
//// make map of coordinates of every language node | |
var countMap = new Map(), | |
xCoordMap = new Map(), | |
yCoordMap = new Map(), | |
maxInfluence = 0; | |
for (let element of influenceData) { | |
let influenceSize = element.values.length; | |
if (influenceSize > maxInfluence) { maxInfluence = influenceSize; }; | |
countMap.set(element.key, influenceSize); | |
}; | |
function influenceCountComp (a,b) { | |
if (countMap.get(a.langId) > countMap.get(b.langId)) { | |
return 1; | |
} | |
if (countMap.get(a.langId) < countMap.get(b.langId)) { | |
return -1; | |
} | |
return 0; | |
}; | |
// coordinates for nodes without inception: x-axis according to | |
// index | |
var noInceptionXScale = d3.scaleLinear() | |
.domain([0, missingYearData.values.length]) | |
.range([yearWidth, width]); | |
var sortedNoInceptionLangs = missingYearData.values.sort(influenceCountComp), | |
index = 0; | |
for (let lang of sortedNoInceptionLangs) { | |
xCoordMap.set(lang.langId, yearWidth + noInceptionXScale(index)); | |
yCoordMap.set(lang.langId, 2*nodeHeight + height); // y-axis constant | |
index += 1; | |
}; | |
// coordinates for nodes with inception | |
for (let yearNest of yearData) { | |
var yearNum = parseInt(yearNest["key"], 10), | |
sortedLangs = yearNest.values.sort(influenceCountComp), | |
langsNum = sortedLangs.length, | |
leftBound = nodeHeight, | |
ySpace = height - nodeHeight; | |
for (let lang of sortedLangs) { | |
let rightBound = leftBound + ySpace/langsNum, | |
thisY = d3.randomUniform(leftBound, rightBound)(); | |
xCoordMap.set(lang.langId, xScale(yearNum)); | |
yCoordMap.set(lang.langId, thisY); | |
leftBound = thisY; | |
} | |
}; | |
//// | |
var nodeActiveColour = "#558E6F", | |
linkActiveColour = "#818182"; | |
var graph = svg.append("g") | |
.classed("graph", true); | |
var yearLabels = graph.selectAll("text") | |
.data(yearData) | |
.enter() | |
.append("text") | |
.attr("x", function(d){ return xScale(parseInt(d.key, 10)); }) | |
.attr("y", nodeHeight/2) | |
.text(function(d){ return d.key; }); | |
var gLangs = graph.selectAll("g") | |
.data(influenceData) | |
.enter() | |
.append("g") | |
.attr("id", function(d) {return d.key;}); | |
var radiusScale = d3.scaleLinear() | |
.domain([0, maxInfluence]) | |
.range([nodeRadius, radiusSpan*nodeRadius]); | |
var gNodes = gLangs.append("circle") | |
.attr("r", function(d) { return radiusScale(countMap.get(d.key)); }) | |
.attr("cx", function(d) { return xCoordMap.get(d.key); }) | |
.attr("cy", function(d) { return yCoordMap.get(d.key); }) | |
.style("fill", nodeActiveColour) | |
.text(function(d) {return d.values[0].langLabel;}) | |
.attr("id", function(d) {return d.key;}); | |
gLangs.each(function(d,i) { | |
var lang = d3.select(this); | |
let x1 = xCoordMap.get(d.key), | |
y1 = yCoordMap.get(d.key); | |
if (d.values[0].influencedId != "<nothing>") { | |
lang.selectAll("line") | |
.data(d.values) | |
.enter() | |
.append("line") | |
.attr("x1", x1) | |
.attr("y1", y1) | |
.attr("x2", function(d) {return xCoordMap.get(d.influencedId);}) | |
.attr("y2", function(d) {return yCoordMap.get(d.influencedId);}) | |
.style("stroke-width", "1") | |
.style("stroke-opacity", "0.8") | |
.style("stroke", linkActiveColour); | |
} | |
lang.append("text") | |
.attr("x",x1) | |
.attr("y",y1 - nodeRadius) | |
.attr("transform", "rotate(30 " + x1 +','+y1+')') | |
.text(d.values[0].langLabel); | |
}); | |
gLangs.on("click", highlightChildren); | |
function highlightChildren(d) { | |
gLangs.selectAll("circle") | |
.attr("fill-opacity", "0.5"); | |
gLangs.selectAll("line") | |
.attr("stroke-opacity", "0.2"); | |
var lang = d3.select(this); | |
lang.selectAll("line") | |
.attr("stroke-opacity", "1"); | |
lang.select("circle") | |
.attr("fill-opacity", "1"); | |
}; | |
// zoom + pan | |
var langsPan = svgPanZoom("#mainView", { | |
viewportSelector: ".svg-pan-zoom_viewport" | |
, preventMouseEventsDefault: true | |
, zoomScaleSensitivity: 1.25 | |
, maxZoom: 100 | |
}); | |
}); |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment