amgando/force-directed-graph.md Secret

## force-directed-graph.md

      
    Raw
  

              force-directed-graph.md
            
          
    solution

once you have the data summarized properly in the spreadsheet you can encode it as JSON
grab all the relevant data from the spreadsheet:
employees

id  name      role
1   Cindy     engineering
2   Jonathan  sales
3   Marilyn   sales
4   Mark      sales
5   Robert    engineering

customers

id    account           value
101   Exxon Mobil       $2,000,000
102   Valero Energy     $1,600,000
103   General Electric  $1,500,000
104   Apple             $1,450,000

interactions

interaction  occurrences
1_101        1
2_102        6
3_103        1
3_101        1
1_103        2
2_103        3
3_104        4
2_101        2
1_104        1


clean up the data so you only have what you need
employees

this table stays the same, you need all of this data
id  name      role
1   Cindy     engineering
2   Jonathan  sales
3   Marilyn   sales
4   Mark      sales
5   Robert    engineering

customers

this table had extra data about account value which you won't render
id    account         
101   Exxon Mobil     
102   Valero Energy   
103   General Electric
104   Apple           

interactions

this table had an interaction column which could be broken up to give source and target of the interaction
also, the source (employee) is split back out to a single number matching the identifiers in the table above
source  target  occurrences
1       101     1
2       102     6
3       103     1
3       101     1
1       103     2
2       103     3
3       104     4
2       101     2
1       104     1


now that the data is cleaned up, you can do the work to encode it as JSON
what's tricky about this part is maintaining the integrity of your sample data while moving through a complex transformation of the data
your goal is to produce something that takes this shape:
{
  "nodes" : [
    {"name" : "EMPLOYEE_NAME", "group" : EMPLOYEE_ROLE},
    {"name" : "EMPLOYEE_NAME", "group" : EMPLOYEE_ROLE},
    ...
  ],
  "links" : [
    {"source" : INDEX_OF_EMPLOYEE, "target" : INDEX_OF_CUSTOMER , "value" : NUMBER_OF_INTERACTIONS},
    {"source" : INDEX_OF_EMPLOYEE, "target" : INDEX_OF_CUSTOMER , "value" : NUMBER_OF_INTERACTIONS},
    ...
  ]
}
critically, the data types of each of these should be specified.
here's what you decide based on the sample data for the force directed graph you chose
field                   type      notes
EMPLOYEE_NAME           string    a string is just a piece of text
EMPLOYEE_ROLE           integer   the roles will be reduced to numbers
INDEX_OF_EMPLOYEE       integer   starting at 0, the row in which the employee name appears
INDEX_OF_CUSTOMER       integer   starting at 0, the row in which the customer name appears
NUMBER_OF_INTERACTIONS  integer   the number of occurrences of an interaction


given the decisions made above, you set out to transform the data into a JSON object
you start by copying the template
{
  "nodes" : [
    {"name" : "EMPLOYEE_NAME", "group" : EMPLOYEE_ROLE},
    {"name" : "EMPLOYEE_NAME", "group" : EMPLOYEE_ROLE},
    ...
  ],
  "links" : [
    {"source" : INDEX_OF_EMPLOYEE, "target" : INDEX_OF_CUSTOMER , "value" : NUMBER_OF_INTERACTIONS},
    {"source" : INDEX_OF_EMPLOYEE, "target" : INDEX_OF_CUSTOMER , "value" : NUMBER_OF_INTERACTIONS},
    ...
  ]
}
then encoding the collection employees as nodes
in other words, employees ...
1   Cindy     engineering
2   Jonathan  sales
3   Marilyn   sales
4   Mark      sales
5   Robert    engineering

become nodes in the graph
{
  "nodes" : [
    {"name" : "Cindy", "group" : 1},
    {"name" : "Jonathan", "group" : 2},
    {"name" : "Marilyn", "group" : 2},
    {"name" : "Mark", "group" : 2},
    {"name" : "Robert", "group" : 1},
    ...
  ],
  "links" : [
    {"source" : INDEX_OF_EMPLOYEE, "target" : INDEX_OF_CUSTOMER , "value" : NUMBER_OF_INTERACTIONS},
    {"source" : INDEX_OF_EMPLOYEE, "target" : INDEX_OF_CUSTOMER , "value" : NUMBER_OF_INTERACTIONS},
    ...
  ]
}

and then customers ...
id    account
101   Exxon Mobil
102   Valero Energy
103   General Electric
104   Apple

become nodes in the graph
note that the company's group is also a number, but set to a value that is different from employee roles
{
  "nodes" : [
    {"name" : "Cindy", "group" : 1},
    {"name" : "Jonathan", "group" : 2},
    {"name" : "Marilyn", "group" : 2},
    {"name" : "Mark", "group" : 2},
    {"name" : "Robert", "group" : 1},
    
    {"name" : "Exxon Mobil", "group" : 3},
    {"name" : "Valero Energy", "group" : 3},
    {"name" : "General Electric", "group" : 3},
    {"name" : "Apple", "group" : 3}
  ],
  "links" : [
    {"source" : INDEX_OF_EMPLOYEE, "target" : INDEX_OF_CUSTOMER , "value" : NUMBER_OF_INTERACTIONS},
    {"source" : INDEX_OF_EMPLOYEE, "target" : INDEX_OF_CUSTOMER , "value" : NUMBER_OF_INTERACTIONS},
    ...
  ]
}

finally you decide to encode the interactions as links between the nodes
this is where keeping track of your changes becomes tricky.
the source and target refer to identifiers
source  target  occurrences
1       101     1
2       102     6
3       103     1
3       101     1
1       103     2
2       103     3
3       104     4
2       101     2
1       104     1

{
  "nodes" : [
    {"name" : "Cindy", "group" : 1},
    {"name" : "Jonathan", "group" : 2},
    {"name" : "Marilyn", "group" : 2},
    {"name" : "Mark", "group" : 2},
    {"name" : "Robert", "group" : 1},
    {"name" : "Exxon Mobil", "group" : 3},
    {"name" : "Valero Energy", "group" : 3},
    {"name" : "General Electric", "group" : 3},
    {"name" : "Apple", "group" : 3}
  ],
  "links" : [
    {"source" : INDEX_OF_EMPLOYEE, "target" : INDEX_OF_CUSTOMER , "value" : NUMBER_OF_INTERACTIONS},
    {"source" : INDEX_OF_EMPLOYEE, "target" : INDEX_OF_CUSTOMER , "value" : NUMBER_OF_INTERACTIONS},
    ...
  ]
}

the process you follow here is to encode each interaction and mark it as "encoded" somehow while you work, like replacing the identifiers with names to mark a row as encoded to keep things straight.
so your first few steps might leave you with this:
employees table for reference
id  name      role
1   Cindy     engineering
2   Jonathan  sales
3   Marilyn   sales
4   Mark      sales
5   Robert    engineering

customers table for reference
id    account         
101   Exxon Mobil     
102   Valero Energy   
103   General Electric
104   Apple           

interactions table with source and target identifiers converted to their text equivalents
source    target            occurrences
Cindy     Exxon Mobil       1
Jonathan  Valero Energy     6
Marilyn   General Electric  1
3       101     1
1       103     2
2       103     3
3       104     4
2       101     2
1       104     1

{
  "nodes" : [
0    {"name" : "Cindy", "group" : 1},
1    {"name" : "Jonathan", "group" : 2},
2    {"name" : "Marilyn", "group" : 2},
3    {"name" : "Mark", "group" : 2},
4    {"name" : "Robert", "group" : 1},
5    {"name" : "Exxon Mobil", "group" : 3},
6    {"name" : "Valero Energy", "group" : 3},
7    {"name" : "General Electric", "group" : 3},
8    {"name" : "Apple", "group" : 3}
  ],
  "links" : [
    {"source" : 0, "target" : 5 , "value" : 1},
    {"source" : 1, "target" : 6 , "value" : 6},
    {"source" : 2, "target" : 7 , "value" : 1},
    {"source" : INDEX_OF_EMPLOYEE, "target" : INDEX_OF_CUSTOMER , "value" : NUMBER_OF_INTERACTIONS},
    ...
  ]
}
and so on.
of course you might realize at this point that it would have been helpful to have queried the original data for interactions differently, resolving employee and customer identifiers as names instead of numbers.
nevertheless, you press on because things are looking up -- you're almost ready to visualize your data!

here's the final result of continuing the process outlined above
interactions table with source and target identifiers converted to their text equivalents
source    target            occurrences
Cindy     Exxon Mobil       1
Jonathan  Valero Energy     6
Marilyn   General Electric  1
Marilyn   Exxon Mobil       1
Cindy     General Electric  2
Jonathan  General Electric  3
Marilyn   Apple             4
Jonathan  Exxon Mobil       2
Cindy     Apple             1

or, if we resolve these names to their positions in the "nodes" collection
0 5 1 <--- these first 3 rows are already encoded as JSON
1 6 6
2 7 1

2 5 1 <--- so you'll start here
0 7 2
1 7 3
2 8 4
1 5 2
0 8 1

and now ... encoded as JSON
{
  "nodes" : [
    {"name" : "Cindy", "group" : 1},
    {"name" : "Jonathan", "group" : 2},
    {"name" : "Marilyn", "group" : 2},
    {"name" : "Mark", "group" : 2},
    {"name" : "Robert", "group" : 1},
    {"name" : "Exxon Mobil", "group" : 3},
    {"name" : "Valero Energy", "group" : 3},
    {"name" : "General Electric", "group" : 3},
    {"name" : "Apple", "group" : 3}
  ],
  "links" : [
    {"source" : 0, "target" : 5 , "value" : 1},
    {"source" : 1, "target" : 6 , "value" : 6},
    {"source" : 2, "target" : 7 , "value" : 1},
    {"source" : 2, "target" : 5 , "value" : 1},
    {"source" : 0, "target" : 7 , "value" : 2},
    {"source" : 1, "target" : 7 , "value" : 3},
    {"source" : 2, "target" : 8 , "value" : 4},
    {"source" : 1, "target" : 5 , "value" : 2},
    {"source" : 0, "target" : 8 , "value" : 1}
  ]
}