Skip to content

Instantly share code, notes, and snippets.

@amgando
Created November 8, 2014 07:25
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save amgando/f732090803dca06f7352 to your computer and use it in GitHub Desktop.
Save amgando/f732090803dca06f7352 to your computer and use it in GitHub Desktop.

solution

once you have the data summarized properly in the spreadsheet you can encode it as JSON

grab all the relevant data from the spreadsheet:

employees

id  name      role
1   Cindy     engineering
2   Jonathan  sales
3   Marilyn   sales
4   Mark      sales
5   Robert    engineering

customers

id    account           value
101   Exxon Mobil       $2,000,000
102   Valero Energy     $1,600,000
103   General Electric  $1,500,000
104   Apple             $1,450,000

interactions

interaction  occurrences
1_101        1
2_102        6
3_103        1
3_101        1
1_103        2
2_103        3
3_104        4
2_101        2
1_104        1

clean up the data so you only have what you need

employees

this table stays the same, you need all of this data

id  name      role
1   Cindy     engineering
2   Jonathan  sales
3   Marilyn   sales
4   Mark      sales
5   Robert    engineering

customers

this table had extra data about account value which you won't render

id    account         
101   Exxon Mobil     
102   Valero Energy   
103   General Electric
104   Apple           

interactions

this table had an interaction column which could be broken up to give source and target of the interaction

also, the source (employee) is split back out to a single number matching the identifiers in the table above

source  target  occurrences
1       101     1
2       102     6
3       103     1
3       101     1
1       103     2
2       103     3
3       104     4
2       101     2
1       104     1

now that the data is cleaned up, you can do the work to encode it as JSON

what's tricky about this part is maintaining the integrity of your sample data while moving through a complex transformation of the data

your goal is to produce something that takes this shape:

{
  "nodes" : [
    {"name" : "EMPLOYEE_NAME", "group" : EMPLOYEE_ROLE},
    {"name" : "EMPLOYEE_NAME", "group" : EMPLOYEE_ROLE},
    ...
  ],
  "links" : [
    {"source" : INDEX_OF_EMPLOYEE, "target" : INDEX_OF_CUSTOMER , "value" : NUMBER_OF_INTERACTIONS},
    {"source" : INDEX_OF_EMPLOYEE, "target" : INDEX_OF_CUSTOMER , "value" : NUMBER_OF_INTERACTIONS},
    ...
  ]
}

critically, the data types of each of these should be specified.

here's what you decide based on the sample data for the force directed graph you chose

field                   type      notes
EMPLOYEE_NAME           string    a string is just a piece of text
EMPLOYEE_ROLE           integer   the roles will be reduced to numbers
INDEX_OF_EMPLOYEE       integer   starting at 0, the row in which the employee name appears
INDEX_OF_CUSTOMER       integer   starting at 0, the row in which the customer name appears
NUMBER_OF_INTERACTIONS  integer   the number of occurrences of an interaction

given the decisions made above, you set out to transform the data into a JSON object

you start by copying the template

{
  "nodes" : [
    {"name" : "EMPLOYEE_NAME", "group" : EMPLOYEE_ROLE},
    {"name" : "EMPLOYEE_NAME", "group" : EMPLOYEE_ROLE},
    ...
  ],
  "links" : [
    {"source" : INDEX_OF_EMPLOYEE, "target" : INDEX_OF_CUSTOMER , "value" : NUMBER_OF_INTERACTIONS},
    {"source" : INDEX_OF_EMPLOYEE, "target" : INDEX_OF_CUSTOMER , "value" : NUMBER_OF_INTERACTIONS},
    ...
  ]
}

then encoding the collection employees as nodes

in other words, employees ...

1   Cindy     engineering
2   Jonathan  sales
3   Marilyn   sales
4   Mark      sales
5   Robert    engineering

become nodes in the graph

{
  "nodes" : [
    {"name" : "Cindy", "group" : 1},
    {"name" : "Jonathan", "group" : 2},
    {"name" : "Marilyn", "group" : 2},
    {"name" : "Mark", "group" : 2},
    {"name" : "Robert", "group" : 1},
    ...
  ],
  "links" : [
    {"source" : INDEX_OF_EMPLOYEE, "target" : INDEX_OF_CUSTOMER , "value" : NUMBER_OF_INTERACTIONS},
    {"source" : INDEX_OF_EMPLOYEE, "target" : INDEX_OF_CUSTOMER , "value" : NUMBER_OF_INTERACTIONS},
    ...
  ]
}

and then customers ...

id    account
101   Exxon Mobil
102   Valero Energy
103   General Electric
104   Apple

become nodes in the graph

note that the company's group is also a number, but set to a value that is different from employee roles

{
  "nodes" : [
    {"name" : "Cindy", "group" : 1},
    {"name" : "Jonathan", "group" : 2},
    {"name" : "Marilyn", "group" : 2},
    {"name" : "Mark", "group" : 2},
    {"name" : "Robert", "group" : 1},
    
    {"name" : "Exxon Mobil", "group" : 3},
    {"name" : "Valero Energy", "group" : 3},
    {"name" : "General Electric", "group" : 3},
    {"name" : "Apple", "group" : 3}
  ],
  "links" : [
    {"source" : INDEX_OF_EMPLOYEE, "target" : INDEX_OF_CUSTOMER , "value" : NUMBER_OF_INTERACTIONS},
    {"source" : INDEX_OF_EMPLOYEE, "target" : INDEX_OF_CUSTOMER , "value" : NUMBER_OF_INTERACTIONS},
    ...
  ]
}

finally you decide to encode the interactions as links between the nodes

this is where keeping track of your changes becomes tricky.

the source and target refer to identifiers

source  target  occurrences
1       101     1
2       102     6
3       103     1
3       101     1
1       103     2
2       103     3
3       104     4
2       101     2
1       104     1
{
  "nodes" : [
    {"name" : "Cindy", "group" : 1},
    {"name" : "Jonathan", "group" : 2},
    {"name" : "Marilyn", "group" : 2},
    {"name" : "Mark", "group" : 2},
    {"name" : "Robert", "group" : 1},
    {"name" : "Exxon Mobil", "group" : 3},
    {"name" : "Valero Energy", "group" : 3},
    {"name" : "General Electric", "group" : 3},
    {"name" : "Apple", "group" : 3}
  ],
  "links" : [
    {"source" : INDEX_OF_EMPLOYEE, "target" : INDEX_OF_CUSTOMER , "value" : NUMBER_OF_INTERACTIONS},
    {"source" : INDEX_OF_EMPLOYEE, "target" : INDEX_OF_CUSTOMER , "value" : NUMBER_OF_INTERACTIONS},
    ...
  ]
}

the process you follow here is to encode each interaction and mark it as "encoded" somehow while you work, like replacing the identifiers with names to mark a row as encoded to keep things straight.

so your first few steps might leave you with this:

employees table for reference

id  name      role
1   Cindy     engineering
2   Jonathan  sales
3   Marilyn   sales
4   Mark      sales
5   Robert    engineering

customers table for reference

id    account         
101   Exxon Mobil     
102   Valero Energy   
103   General Electric
104   Apple           

interactions table with source and target identifiers converted to their text equivalents

source    target            occurrences
Cindy     Exxon Mobil       1
Jonathan  Valero Energy     6
Marilyn   General Electric  1
3       101     1
1       103     2
2       103     3
3       104     4
2       101     2
1       104     1
{
  "nodes" : [
0    {"name" : "Cindy", "group" : 1},
1    {"name" : "Jonathan", "group" : 2},
2    {"name" : "Marilyn", "group" : 2},
3    {"name" : "Mark", "group" : 2},
4    {"name" : "Robert", "group" : 1},
5    {"name" : "Exxon Mobil", "group" : 3},
6    {"name" : "Valero Energy", "group" : 3},
7    {"name" : "General Electric", "group" : 3},
8    {"name" : "Apple", "group" : 3}
  ],
  "links" : [
    {"source" : 0, "target" : 5 , "value" : 1},
    {"source" : 1, "target" : 6 , "value" : 6},
    {"source" : 2, "target" : 7 , "value" : 1},
    {"source" : INDEX_OF_EMPLOYEE, "target" : INDEX_OF_CUSTOMER , "value" : NUMBER_OF_INTERACTIONS},
    ...
  ]
}

and so on.

of course you might realize at this point that it would have been helpful to have queried the original data for interactions differently, resolving employee and customer identifiers as names instead of numbers.

nevertheless, you press on because things are looking up -- you're almost ready to visualize your data!


here's the final result of continuing the process outlined above

interactions table with source and target identifiers converted to their text equivalents

source    target            occurrences
Cindy     Exxon Mobil       1
Jonathan  Valero Energy     6
Marilyn   General Electric  1
Marilyn   Exxon Mobil       1
Cindy     General Electric  2
Jonathan  General Electric  3
Marilyn   Apple             4
Jonathan  Exxon Mobil       2
Cindy     Apple             1

or, if we resolve these names to their positions in the "nodes" collection

0 5 1 <--- these first 3 rows are already encoded as JSON
1 6 6
2 7 1

2 5 1 <--- so you'll start here
0 7 2
1 7 3
2 8 4
1 5 2
0 8 1

and now ... encoded as JSON

{
  "nodes" : [
    {"name" : "Cindy", "group" : 1},
    {"name" : "Jonathan", "group" : 2},
    {"name" : "Marilyn", "group" : 2},
    {"name" : "Mark", "group" : 2},
    {"name" : "Robert", "group" : 1},
    {"name" : "Exxon Mobil", "group" : 3},
    {"name" : "Valero Energy", "group" : 3},
    {"name" : "General Electric", "group" : 3},
    {"name" : "Apple", "group" : 3}
  ],
  "links" : [
    {"source" : 0, "target" : 5 , "value" : 1},
    {"source" : 1, "target" : 6 , "value" : 6},
    {"source" : 2, "target" : 7 , "value" : 1},
    {"source" : 2, "target" : 5 , "value" : 1},
    {"source" : 0, "target" : 7 , "value" : 2},
    {"source" : 1, "target" : 7 , "value" : 3},
    {"source" : 2, "target" : 8 , "value" : 4},
    {"source" : 1, "target" : 5 , "value" : 2},
    {"source" : 0, "target" : 8 , "value" : 1}
  ]
}
@amgando
Copy link
Author

amgando commented Nov 8, 2014

Here is the spreadsheet that produced the input data for this work

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment