once you have the data summarized properly in the spreadsheet you can encode it as JSON
grab all the relevant data from the spreadsheet:
id name role
1 Cindy engineering
2 Jonathan sales
3 Marilyn sales
4 Mark sales
5 Robert engineering
id account value
101 Exxon Mobil $2,000,000
102 Valero Energy $1,600,000
103 General Electric $1,500,000
104 Apple $1,450,000
interaction occurrences
1_101 1
2_102 6
3_103 1
3_101 1
1_103 2
2_103 3
3_104 4
2_101 2
1_104 1
clean up the data so you only have what you need
this table stays the same, you need all of this data
id name role
1 Cindy engineering
2 Jonathan sales
3 Marilyn sales
4 Mark sales
5 Robert engineering
this table had extra data about account value which you won't render
id account
101 Exxon Mobil
102 Valero Energy
103 General Electric
104 Apple
this table had an interaction column which could be broken up to give source and target of the interaction
also, the source (employee) is split back out to a single number matching the identifiers in the table above
source target occurrences
1 101 1
2 102 6
3 103 1
3 101 1
1 103 2
2 103 3
3 104 4
2 101 2
1 104 1
now that the data is cleaned up, you can do the work to encode it as JSON
what's tricky about this part is maintaining the integrity of your sample data while moving through a complex transformation of the data
your goal is to produce something that takes this shape:
{
"nodes" : [
{"name" : "EMPLOYEE_NAME", "group" : EMPLOYEE_ROLE},
{"name" : "EMPLOYEE_NAME", "group" : EMPLOYEE_ROLE},
...
],
"links" : [
{"source" : INDEX_OF_EMPLOYEE, "target" : INDEX_OF_CUSTOMER , "value" : NUMBER_OF_INTERACTIONS},
{"source" : INDEX_OF_EMPLOYEE, "target" : INDEX_OF_CUSTOMER , "value" : NUMBER_OF_INTERACTIONS},
...
]
}
critically, the data types of each of these should be specified.
here's what you decide based on the sample data for the force directed graph you chose
field type notes
EMPLOYEE_NAME string a string is just a piece of text
EMPLOYEE_ROLE integer the roles will be reduced to numbers
INDEX_OF_EMPLOYEE integer starting at 0, the row in which the employee name appears
INDEX_OF_CUSTOMER integer starting at 0, the row in which the customer name appears
NUMBER_OF_INTERACTIONS integer the number of occurrences of an interaction
given the decisions made above, you set out to transform the data into a JSON object
you start by copying the template
{
"nodes" : [
{"name" : "EMPLOYEE_NAME", "group" : EMPLOYEE_ROLE},
{"name" : "EMPLOYEE_NAME", "group" : EMPLOYEE_ROLE},
...
],
"links" : [
{"source" : INDEX_OF_EMPLOYEE, "target" : INDEX_OF_CUSTOMER , "value" : NUMBER_OF_INTERACTIONS},
{"source" : INDEX_OF_EMPLOYEE, "target" : INDEX_OF_CUSTOMER , "value" : NUMBER_OF_INTERACTIONS},
...
]
}
then encoding the collection employees as nodes
in other words, employees ...
1 Cindy engineering
2 Jonathan sales
3 Marilyn sales
4 Mark sales
5 Robert engineering
become nodes in the graph
{
"nodes" : [
{"name" : "Cindy", "group" : 1},
{"name" : "Jonathan", "group" : 2},
{"name" : "Marilyn", "group" : 2},
{"name" : "Mark", "group" : 2},
{"name" : "Robert", "group" : 1},
...
],
"links" : [
{"source" : INDEX_OF_EMPLOYEE, "target" : INDEX_OF_CUSTOMER , "value" : NUMBER_OF_INTERACTIONS},
{"source" : INDEX_OF_EMPLOYEE, "target" : INDEX_OF_CUSTOMER , "value" : NUMBER_OF_INTERACTIONS},
...
]
}
and then customers ...
id account
101 Exxon Mobil
102 Valero Energy
103 General Electric
104 Apple
become nodes in the graph
note that the company's group is also a number, but set to a value that is different from employee roles
{
"nodes" : [
{"name" : "Cindy", "group" : 1},
{"name" : "Jonathan", "group" : 2},
{"name" : "Marilyn", "group" : 2},
{"name" : "Mark", "group" : 2},
{"name" : "Robert", "group" : 1},
{"name" : "Exxon Mobil", "group" : 3},
{"name" : "Valero Energy", "group" : 3},
{"name" : "General Electric", "group" : 3},
{"name" : "Apple", "group" : 3}
],
"links" : [
{"source" : INDEX_OF_EMPLOYEE, "target" : INDEX_OF_CUSTOMER , "value" : NUMBER_OF_INTERACTIONS},
{"source" : INDEX_OF_EMPLOYEE, "target" : INDEX_OF_CUSTOMER , "value" : NUMBER_OF_INTERACTIONS},
...
]
}
finally you decide to encode the interactions as links between the nodes
this is where keeping track of your changes becomes tricky.
the source and target refer to identifiers
source target occurrences
1 101 1
2 102 6
3 103 1
3 101 1
1 103 2
2 103 3
3 104 4
2 101 2
1 104 1
{
"nodes" : [
{"name" : "Cindy", "group" : 1},
{"name" : "Jonathan", "group" : 2},
{"name" : "Marilyn", "group" : 2},
{"name" : "Mark", "group" : 2},
{"name" : "Robert", "group" : 1},
{"name" : "Exxon Mobil", "group" : 3},
{"name" : "Valero Energy", "group" : 3},
{"name" : "General Electric", "group" : 3},
{"name" : "Apple", "group" : 3}
],
"links" : [
{"source" : INDEX_OF_EMPLOYEE, "target" : INDEX_OF_CUSTOMER , "value" : NUMBER_OF_INTERACTIONS},
{"source" : INDEX_OF_EMPLOYEE, "target" : INDEX_OF_CUSTOMER , "value" : NUMBER_OF_INTERACTIONS},
...
]
}
the process you follow here is to encode each interaction and mark it as "encoded" somehow while you work, like replacing the identifiers with names to mark a row as encoded to keep things straight.
so your first few steps might leave you with this:
employees table for reference
id name role
1 Cindy engineering
2 Jonathan sales
3 Marilyn sales
4 Mark sales
5 Robert engineering
customers table for reference
id account
101 Exxon Mobil
102 Valero Energy
103 General Electric
104 Apple
interactions table with source and target identifiers converted to their text equivalents
source target occurrences
Cindy Exxon Mobil 1
Jonathan Valero Energy 6
Marilyn General Electric 1
3 101 1
1 103 2
2 103 3
3 104 4
2 101 2
1 104 1
{
"nodes" : [
0 {"name" : "Cindy", "group" : 1},
1 {"name" : "Jonathan", "group" : 2},
2 {"name" : "Marilyn", "group" : 2},
3 {"name" : "Mark", "group" : 2},
4 {"name" : "Robert", "group" : 1},
5 {"name" : "Exxon Mobil", "group" : 3},
6 {"name" : "Valero Energy", "group" : 3},
7 {"name" : "General Electric", "group" : 3},
8 {"name" : "Apple", "group" : 3}
],
"links" : [
{"source" : 0, "target" : 5 , "value" : 1},
{"source" : 1, "target" : 6 , "value" : 6},
{"source" : 2, "target" : 7 , "value" : 1},
{"source" : INDEX_OF_EMPLOYEE, "target" : INDEX_OF_CUSTOMER , "value" : NUMBER_OF_INTERACTIONS},
...
]
}
and so on.
of course you might realize at this point that it would have been helpful to have queried the original data for interactions differently, resolving employee and customer identifiers as names instead of numbers.
nevertheless, you press on because things are looking up -- you're almost ready to visualize your data!
here's the final result of continuing the process outlined above
interactions table with source and target identifiers converted to their text equivalents
source target occurrences
Cindy Exxon Mobil 1
Jonathan Valero Energy 6
Marilyn General Electric 1
Marilyn Exxon Mobil 1
Cindy General Electric 2
Jonathan General Electric 3
Marilyn Apple 4
Jonathan Exxon Mobil 2
Cindy Apple 1
or, if we resolve these names to their positions in the "nodes" collection
0 5 1 <--- these first 3 rows are already encoded as JSON
1 6 6
2 7 1
2 5 1 <--- so you'll start here
0 7 2
1 7 3
2 8 4
1 5 2
0 8 1
and now ... encoded as JSON
{
"nodes" : [
{"name" : "Cindy", "group" : 1},
{"name" : "Jonathan", "group" : 2},
{"name" : "Marilyn", "group" : 2},
{"name" : "Mark", "group" : 2},
{"name" : "Robert", "group" : 1},
{"name" : "Exxon Mobil", "group" : 3},
{"name" : "Valero Energy", "group" : 3},
{"name" : "General Electric", "group" : 3},
{"name" : "Apple", "group" : 3}
],
"links" : [
{"source" : 0, "target" : 5 , "value" : 1},
{"source" : 1, "target" : 6 , "value" : 6},
{"source" : 2, "target" : 7 , "value" : 1},
{"source" : 2, "target" : 5 , "value" : 1},
{"source" : 0, "target" : 7 , "value" : 2},
{"source" : 1, "target" : 7 , "value" : 3},
{"source" : 2, "target" : 8 , "value" : 4},
{"source" : 1, "target" : 5 , "value" : 2},
{"source" : 0, "target" : 8 , "value" : 1}
]
}
Here is the spreadsheet that produced the input data for this work