Skip to content

Instantly share code, notes, and snippets.

@allisonking
Last active April 20, 2021 00:01
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save allisonking/c9f6ea277e4194cf7e13f49efb83b5b5 to your computer and use it in GitHub Desktop.
Save allisonking/c9f6ea277e4194cf7e13f49efb83b5b5 to your computer and use it in GitHub Desktop.
Harry Potter fan fiction cooccurrence matrix

This is a cooccurrence matrix based off of Les Miserables Co-occurrence only with Harry Potter characters, where cooccurrence is defined as if two characters are tagged as characters featured in a given fan fiction. Therefore the diagonal values are all the same, since every fan fiction that has Albus Dumbledore also has Albus Dumbledore.

Instead of the intersecting values being the number of fan fictions between two characters, the number is normalized by dividing by the number of total fan fictions about that character. Therefore, the value in [Draco Malfoy][Harry Potter] is most likely not the same as [Harry Potter][Draco Malfoy]. This allows us to see some interesting properties- for example, of all Lupin fan fictions, 16.23% also feature Tonks. However, of Tonks fan fictions, 70.57% also feature Lupin. This shows perhaps a conflict in the fandom of who else Lupin is spending a lot of time with in fan fiction... WolfStar, anyone?

character Albus D. Albus S. P. Andromeda T. Angelina J. Astoria G. Bellatrix L. Bill W. Blaise Z. Cedric D. Charlie W. Cho C. Draco M. Fleur D. Fred W. George W. Ginny W. Harry P. Hermione G. James P. James S. P. Katie B. Lily Evans P. Lily Luna P. Lucius M. Luna L. Marauders Minerva M. Molly W. N. Tonks Narcissa M. Neville L. OC Oliver W. Pansy P. Percy W. Peter P. Petunia D. Regulus B. Remus L. Ron W. Rose W. Scorpius M. Seamus F. Severus S. Sirius B. Teddy L. Theodore N. Tom R. Jr. Victoire W. Voldemort
Albus D. 10372 25 6 1 1 51 3 4 11 6 7 190 14 33 36 72 2239 357 133 12 0 196 6 40 54 46 2941 15 26 13 32 371 4 4 8 16 37 17 238 136 7 17 7 1948 265 9 9 321 0 445
Albus S. P. 25 6693 0 0 11 7 1 2 0 5 2 187 2 7 8 190 785 60 76 1174 0 35 617 2 11 7 32 7 3 2 27 1000 1 1 1 0 0 5 5 45 1580 3316 2 149 22 169 3 23 40 15
Andromeda T. 6 0 2658 1 6 669 4 0 1 2 1 53 2 2 3 10 99 27 11 1 0 25 2 93 4 6 11 26 215 602 5 53 0 1 2 1 1 52 81 3 1 7 0 19 194 179 0 4 14 16
Angelina J. 1 0 1 1735 2 0 2 3 8 11 7 18 3 549 910 17 41 39 0 3 45 1 1 2 2 0 1 7 3 0 3 27 69 2 9 0 0 0 4 17 1 2 2 4 3 3 0 0 0 0
Astoria G. 1 11 6 2 2357 4 1 69 3 1 1 1838 3 7 5 63 132 133 3 6 2 2 12 47 31 0 0 1 3 55 28 55 5 62 0 0 0 1 2 45 27 159 4 9 2 10 43 5 5 3
Bellatrix L. 51 7 669 0 4 7242 5 1 2 17 2 235 20 15 10 50 354 422 51 1 0 60 8 270 49 11 45 44 126 894 64 206 2 6 3 23 1 121 101 36 5 7 3 304 1037 10 2 144 0 1255
Bill W. 3 1 4 2 1 5 1808 0 3 213 0 34 561 44 57 116 163 231 3 2 4 1 1 5 23 1 4 71 51 6 1 100 6 9 62 1 2 4 48 95 3 1 1 26 21 29 3 1 68 2
Blaise Z. 4 2 0 3 69 1 0 3975 8 16 10 1148 4 11 7 500 525 1186 1 0 5 2 2 5 256 0 3 3 3 13 77 231 6 386 3 0 0 4 9 122 5 13 38 41 10 1 305 12 3 10
Cedric D. 11 0 1 8 3 2 3 8 2426 11 382 80 70 29 31 41 616 502 2 0 13 5 3 1 61 1 2 0 9 1 20 318 70 12 9 5 0 2 10 24 0 0 4 18 15 1 4 13 0 14
Charlie W. 6 5 2 11 1 17 213 16 11 2655 5 189 41 53 64 69 280 503 4 6 57 4 3 4 46 1 9 60 328 4 14 381 55 21 49 1 0 1 25 76 3 11 0 39 22 13 13 1 5 5
Cho C. 7 2 1 7 1 2 0 10 382 5 2209 190 18 9 26 110 762 136 1 1 5 0 2 3 76 0 4 0 4 4 15 42 25 7 12 1 2 0 9 40 2 1 11 14 3 1 4 4 0 6
Draco M. 190 187 53 18 1838 235 34 1148 80 189 190 110555 43 253 261 12946 34761 44837 66 35 13 79 74 1495 1800 14 98 34 48 1003 526 4130 58 1659 39 7 9 29 217 2612 206 930 100 1601 297 101 494 236 9 512
Fleur D. 14 2 2 3 3 20 561 4 70 41 18 43 2100 10 9 57 464 530 2 1 2 4 0 5 25 0 12 34 53 7 10 72 7 2 17 2 0 2 13 58 2 1 1 11 22 15 1 2 58 5
Fred W. 33 7 2 549 7 15 44 11 29 53 9 253 10 10712 4824 256 646 3017 44 24 205 28 8 5 135 43 47 172 32 2 39 1568 79 20 130 4 0 4 92 289 16 7 14 121 126 16 5 9 5 38
George W. 36 8 3 910 5 10 57 7 31 64 26 261 9 4824 11944 319 810 1699 26 8 151 19 8 9 315 32 42 183 16 1 36 1239 77 24 197 4 1 4 71 386 12 11 22 167 95 24 4 12 10 42
Ginny W. 72 190 10 17 63 50 116 500 41 69 110 12946 57 256 319 44599 22153 3583 156 220 14 165 206 76 942 23 59 234 100 41 545 367 79 111 80 4 23 20 273 1646 19 54 82 317 316 124 53 947 25 466
Harry P. 2239 785 99 41 132 354 163 525 616 280 762 34761 464 646 810 22153 151951 27888 2593 610 60 2221 516 654 2821 250 698 272 572 241 1288 6345 180 908 120 85 635 142 2189 9173 127 356 158 12273 4805 831 220 2278 61 5256
Hermione G. 357 60 27 39 133 422 231 1186 502 503 136 44837 530 3017 1699 3583 27888 127244 290 51 11 246 36 592 1004 107 883 123 118 287 690 1795 286 613 167 10 21 111 1458 26687 508 295 129 11213 1897 93 480 1140 11 615
James P. 133 76 11 0 3 51 3 1 2 4 1 66 2 44 26 156 2593 290 45143 57 4 33888 663 96 9 477 104 17 36 97 16 866 6 2 3 1093 91 224 3620 38 30 35 2 1247 8342 35 6 56 1 125
James S. P. 12 1174 1 3 6 1 2 0 0 6 1 35 1 24 8 220 610 51 57 4355 1 37 550 0 12 17 29 10 2 1 22 1220 2 1 2 5 3 4 20 20 584 526 3 21 37 404 3 8 90 5
Katie B. 0 0 0 45 2 0 4 5 13 57 5 13 2 205 151 14 60 11 4 1 2016 4 0 1 6 0 0 2 4 1 7 35 1062 7 18 0 0 0 4 8 0 0 5 7 1 2 7 3 1 1
Lily Evans P. 196 35 25 1 2 60 1 2 5 4 0 79 4 28 19 165 2221 246 33888 37 4 46677 49 116 25 1146 68 56 45 151 20 770 2 3 2 110 657 169 2285 36 23 114 2 6619 3648 22 1 46 2 211
Lily Luna P. 6 617 2 1 12 8 1 2 3 3 2 74 0 8 8 206 516 36 663 550 0 49 5876 4 11 7 14 13 1 6 14 336 0 0 2 2 25 1 30 22 352 1808 1 162 40 667 0 22 56 7
Lucius M. 40 2 93 2 47 270 5 5 1 4 3 1495 5 5 9 76 654 592 96 0 1 116 4 6666 43 6 19 32 30 2114 9 242 2 25 12 11 5 44 96 28 6 52 0 910 123 3 4 42 0 313
Luna L. 54 11 4 2 31 49 23 256 61 46 76 1800 25 135 315 942 2821 1004 9 12 6 25 11 43 11735 15 33 13 30 19 1929 320 26 105 36 2 3 14 110 614 5 11 62 265 69 14 295 71 2 50
Marauders 46 7 6 0 0 11 1 0 1 1 0 14 0 43 32 23 250 107 477 17 0 1146 7 6 15 2754 62 4 33 5 4 571 0 0 0 59 10 76 348 14 4 1 1 271 332 13 0 7 1 29
Minerva M. 2941 32 11 1 0 45 4 3 2 9 4 98 12 47 42 59 698 883 104 29 0 68 14 19 33 62 7325 47 23 11 95 287 15 7 11 7 20 5 159 71 12 12 3 1133 185 17 1 241 3 67
Molly W. 15 7 26 7 1 44 71 3 0 60 0 34 34 172 183 234 272 123 17 10 2 56 13 32 13 4 47 2615 48 36 4 34 2 0 68 1 7 2 45 188 20 15 2 42 48 21 3 5 17 16
N. Tonks 26 3 215 3 3 126 51 3 9 328 4 48 53 32 16 100 572 118 36 2 4 45 1 30 30 33 23 48 8514 30 15 138 7 8 25 4 1 15 6008 25 0 0 2 260 350 241 0 6 15 12
Narcissa M. 13 2 602 0 55 894 6 13 1 4 4 1003 7 2 1 41 241 287 97 1 1 151 6 2114 19 5 11 36 30 5535 10 105 0 22 6 4 9 83 103 17 5 31 1 310 206 16 6 15 1 111
Neville L. 32 27 5 3 28 64 1 77 20 14 15 526 10 39 36 545 1288 690 16 22 7 20 14 9 1929 4 95 4 15 10 7109 477 14 98 20 2 2 2 51 258 21 33 80 254 43 28 45 14 64 55
OC 371 1000 53 27 55 206 100 231 318 381 42 4130 72 1568 1239 367 6345 1795 866 1220 35 770 336 242 320 571 287 34 138 105 477 32822 410 66 71 53 46 347 1763 747 504 690 155 2816 4161 315 133 926 71 538
Oliver W. 4 1 0 69 5 2 6 6 70 55 25 58 7 79 77 79 180 286 6 2 1062 2 0 2 26 0 15 2 7 0 14 410 3513 9 502 1 0 0 12 9 1 0 7 10 13 3 1 3 3 8
Pansy P. 4 1 1 2 62 6 9 386 12 21 7 1659 2 20 24 111 908 613 2 1 7 3 0 25 105 0 7 0 8 22 98 66 9 4250 27 0 0 0 8 303 2 6 26 39 5 6 117 12 4 13
Percy W. 8 1 2 9 0 3 62 3 9 49 12 39 17 130 197 80 120 167 3 2 18 2 2 12 36 0 11 68 25 6 20 71 502 27 3002 7 0 3 12 124 0 4 2 32 11 4 4 5 0 18
Peter P. 16 0 1 0 0 23 1 0 5 1 1 7 2 4 4 4 85 10 1093 5 0 110 2 11 2 59 7 1 4 4 2 53 1 0 7 2474 6 21 1056 21 3 1 1 69 1190 1 0 5 0 118
Petunia D. 37 0 1 0 0 1 2 0 0 0 2 9 0 0 1 23 635 21 91 3 0 657 25 5 3 10 20 7 1 9 2 46 0 0 0 6 1907 9 32 8 0 0 2 185 65 3 0 4 0 3
Regulus B. 17 5 52 0 1 121 4 4 2 1 0 29 2 4 4 20 142 111 224 4 0 169 1 44 14 76 5 2 15 83 2 347 0 0 3 21 9 3582 176 6 1 2 1 182 1598 5 6 11 0 63
Remus L. 238 5 81 4 2 101 48 9 10 25 9 217 13 92 71 273 2189 1458 3620 20 4 2285 30 96 110 348 159 45 6008 103 51 1763 12 8 12 1056 32 176 37010 96 3 4 12 2149 17923 403 7 12 20 50
Ron W. 136 45 3 17 45 36 95 122 24 76 40 2612 58 289 386 1646 9173 26687 38 20 8 36 22 28 614 14 71 188 25 17 258 747 9 303 124 21 8 6 96 38037 468 120 59 448 113 35 29 57 10 120
Rose W. 7 1580 1 1 27 5 3 5 0 3 2 206 2 16 12 19 127 508 30 584 0 23 352 6 5 4 12 20 0 5 21 504 1 2 0 3 0 1 3 468 10266 7893 1 19 9 142 0 9 75 7
Scorpius M. 17 3316 7 2 159 7 1 13 0 11 1 930 1 7 11 54 356 295 35 526 0 114 1808 52 11 1 12 15 0 31 33 690 0 6 4 1 0 2 4 120 7893 13813 1 27 8 117 4 8 60 7
Seamus F. 7 2 0 2 4 3 1 38 4 0 11 100 1 14 22 82 158 129 2 3 5 2 1 0 62 1 3 2 2 1 80 155 7 26 2 1 2 1 12 59 1 1 1778 13 6 2 20 3 2 3
Severus S. 1948 149 19 4 9 304 26 41 18 39 14 1601 11 121 167 317 12273 11213 1247 21 7 6619 162 910 265 271 1133 42 260 310 254 2816 10 39 32 69 185 182 2149 448 19 27 13 50641 1745 28 24 148 2 756
Sirius B. 265 22 194 3 2 1037 21 10 15 22 3 297 22 126 95 316 4805 1897 8342 37 1 3648 40 123 69 332 185 48 350 206 43 4161 13 5 11 1190 65 1598 17923 113 9 8 6 1745 44733 70 9 64 3 126
Teddy L. 9 169 179 3 10 10 29 1 1 13 1 101 15 16 24 124 831 93 35 404 2 22 667 3 14 13 17 21 241 16 28 315 3 6 4 1 3 5 403 35 142 117 2 28 70 5306 1 11 1754 3
Theodore N. 9 3 0 0 43 2 3 305 4 13 4 494 1 5 4 53 220 480 6 3 7 1 0 4 295 0 1 3 0 6 45 133 1 117 4 0 0 6 7 29 0 4 20 24 9 1 1977 13 1 10
Tom R. Jr. 321 23 4 0 5 144 1 12 13 1 4 236 2 9 12 947 2278 1140 56 8 3 46 22 42 71 7 241 5 6 15 14 926 3 12 5 5 4 11 12 57 9 8 3 148 64 11 13 7498 3 796
Victoire W. 0 40 14 0 5 0 68 3 0 5 0 9 58 5 10 25 61 11 1 90 1 2 56 0 2 1 3 17 15 1 64 71 3 4 0 0 0 0 20 10 75 60 2 2 3 1754 1 3 2385 0
Voldemort 445 15 16 0 3 1255 2 10 14 5 6 512 5 38 42 466 5256 615 125 5 1 211 7 313 50 29 67 16 12 111 55 538 8 13 18 118 3 63 50 120 7 7 3 756 126 3 10 796 0 12353
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>After All This Time?</title>
<!-- Import D3 -->
<script src="https://d3js.org/d3.v4.min.js"></script>
<link rel="stylesheet" type='text/css' href="style.css">
</head>
<body>
<div id='matrix-container'></div>
<script src="matrix.js"></script>
<script>
Matrix({
container: '#matrix-container',
start_color : 'white',
end_color : '#054169'
})
</script>
</body>
</html>
/* Adapted from https://bl.ocks.org/arpitnarechania/caeba2e6579900ea12cb2a4eb157ce74 */
function Matrix(options) {
var margin = {top: 100, right: 100, bottom: 50, left: 100},
width = 700,
height = 700,
container = options.container,
startColor = options.start_color,
endColor = options.end_color;
var widthLegend = 100;
// create the svg
var svg = d3.select(container).append("svg")
.attr('height', height + margin.top + margin.bottom)
.attr('width', width + margin.left + margin.right)
var focus = svg.append('g')
.attr('transform', 'translate(' + margin.left + ',' + margin.top + ')');
var xScale = d3.scaleBand()
.range([0, width]);
var yScale = d3.scaleBand()
.range([0, height]);
var colorScale = d3.scaleLinear()
.range([startColor, endColor]);
var character_names = [];
d3.csv('cooccurrences-min.csv', function(data) {
var characters = d3.nest()
.key(function(d) { return d.character})
.entries(data);
var max_value = 0;
// make a matrix out of the cooccurrency csv
var matrix = [];
characters.forEach(function(character, i) {
// add the character names to a list
var name = character.key;
character_names.push(name);
// an extra field- clean that up
delete character.values[0]['character'];
// this happens to give us the total number of ff's written about that character (diagonal in cooccurrency matrix)
var sum = character.values[0][name];
var matrix_row = [];
for (var key in character.values[0]) {
// get the percentage by dividing by the sum
character.values[0][key] = character.values[0][key]/sum;
var val = character.values[0][key];
// store the info in the matrix
var info = {row_name: name, col_name: key, value : val}
matrix_row.push(info);
// see what the max percentage is so our scales will be right
if (val != 1 && val > max_value) {
max_value = val;
}
}
matrix.push(matrix_row);
});
var num_characters = characters.length;
xScale.domain(d3.range(num_characters));
yScale.domain(d3.range(num_characters));
colorScale.domain([0, max_value]);
// add rows
var row = focus.selectAll('.row')
.data(matrix)
.enter().append('g')
.attr('class', 'row')
.attr('transform', function(d, i) { return 'translate(0, ' + yScale(i) +')';})
// add cells to each row
var cell = row.selectAll('.cell')
.data(function(d) { return d; })
.enter().append('g')
.attr('class', 'cell')
.attr('transform', function(d, i) { return 'translate('+xScale(i)+',0)'})
.on('mouseover', handleCellMouseOver)
.on('mouseout', handleCellMouseOut);
cell.append('rect')
.attr('width', xScale.bandwidth())
.attr('height', yScale.bandwidth())
.style('stroke-width', 0);
// fill in the color using the percentage
row.selectAll('.cell')
.data(function(d, i) {
return matrix[i];
})
.attr('fill', function(d, i) {
if (d.value==1) {
return 'rgb(175, 175, 175)';
} else {
return colorScale(d.value);
}
});
// manually make an axis
var labels = focus.append('g')
.attr('class', 'labels');
var columnLabels = labels.selectAll('.column-label')
.data(character_names)
.enter().append('g')
.attr('class','column-label')
.attr('transform', function(d, i) { return 'translate('+xScale(i) + ',' +'-8)'});
// tick marks
columnLabels.append('line')
.style('stroke', 'black')
.style('stroke-width', '1px')
.attr('x1', xScale.bandwidth()/2)
.attr('x2', xScale.bandwidth()/2)
.attr('y1', 0)
.attr('y2', 5);
columnLabels.append('text')
.attr('x', xScale.bandwidth()/2)
.attr('y', -yScale.bandwidth()/2)
.attr('dy', '.82em')
.attr('text-anchor', 'start')
.attr('transform', 'rotate(-60)')
.text(function(d, i) { return d; });
// row labels
var rowLabels = labels.selectAll('.row-label')
.data(character_names)
.enter().append('g')
.attr('class', 'column-label')
.attr('transform', function(d, i) { return 'translate(0, '+ yScale(i) +')'; })
rowLabels.append('line')
.style('stroke', 'black')
.style('stroke-width', '1px')
.attr('x1', 0)
.attr('x2', -5)
.attr('y1', yScale.bandwidth()/2)
.attr('y2', yScale.bandwidth()/2);
rowLabels.append('text')
.attr('x', -8)
.attr('y', yScale.bandwidth()/2)
.attr('dy', '.32em')
.attr('text-anchor', 'end')
.text(function(d, i) { return d;});
});
/* function to add info about that particular cell as a text box when hovering */
function handleCellMouseOver(d, i) {
var row_idx = character_names.indexOf(d.row_name);
var percentage = (d.value*100).toFixed(2);
var group = focus.append('g')
.attr('id', 'id-name');
// since we know all the text will look about the same, can hard code the word wrap
group.append('text')
.attr('x', xScale(i))
.attr('y', yScale(row_idx)-20)
.attr('text-anchor', 'middle')
.text(function() {
var t = 'Of '+ d.row_name + ' fan fictions, ';
return t;
});
group.append('text')
.attr('x', xScale(i))
.attr('y', yScale(row_idx)-8)
.attr('text-anchor','middle')
.text(function(d2) {
var t = percentage + "% feature " + d.col_name;
return t;
});
// get the bbox so we can place a background that makes text easier to see
var bbox = group.node().getBBox();
var bboxPadding = 5;
// place the background
var rect = group.insert('rect', ':first-child')
.attr('x', bbox.x - bboxPadding/2)
.attr('y', bbox.y - bboxPadding/2)
.attr('width', bbox.width + bboxPadding)
.attr('height', bbox.height + bboxPadding)
.attr('rx', 10)
.attr('ry', 10)
.attr('class', 'label-background-strong');
};
function handleCellMouseOut(d, i) {
d3.select('#id-name').remove();
}
}
.axis {
font-family: Helvetica, sans-serif;
font-size: 14px;
}
.axis text{
fill: black;
}
svg text{
font-family: Helvetica, sans-serif;
font-size: 14px;
}
.label-background-strong {
fill: white;
fill-opacity: .8;
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment