Skip to content

Instantly share code, notes, and snippets.

@szeitlin
Last active August 29, 2015 14:06
Show Gist options
  • Save szeitlin/047223d978069d0ead2e to your computer and use it in GitHub Desktop.
Save szeitlin/047223d978069d0ead2e to your computer and use it in GitHub Desktop.
forkable version

TraitSync k-Nearest Neighbors and Cosine Similarity

Introduction

Using k-nearest neighbors, similarities are calculated between each element in the data set using some distance / similarity metric ^[1]^ that the researcher chooses (there are many distance / similarity metrics), where the distance / similarity between any two elements is calculated based on the two elements' attributes. A data element’s k-NN are the k closest data elements according to this distance / similarity.

In this Graph Gist, I’m using k-NN with cosine similarity ^[2] Using their traits, I’ll calculate the cosine similarity between each person’s assessment of their own traits vs. someone else’s assessment of them.


Initial Data Model

For now, the database consists of Person nodes, Assessment nodes, and Trait properties on the relationships, where (Person0)-[:assessed]-> (Assessment0)-[:RATED {Trait:00}]->(Person0). There will be 5 people, 9 assessments and 55 traits in the dataset.


Initial Data Setup

CREATE
(Person1:Person {name:'Nicole White'}),
(Person2:Person {name:'Greta Workman'}),
(Person3:Person {name:'Kenny Bastani'}),
(Person4:Person {name:'Grace Andrews'}),
(Person5:Person {name:'Michael Hunger'}),
(Assessment1:Assessment {name:'assessment1'}),
(Assessment2:Assessment {name:'assessment2'}),
(Assessment3:Assessment {name:'assessment3'}),
(Assessment4:Assessment {name:'assessment4'}),
(Assessment5:Assessment {name:'assessment5'}),
(Assessment6:Assessment {name:'assessment6'}),
(Assessment7:Assessment {name:'assessment7'}),
(Assessment8:Assessment {name:'assessment8'}),
(Assessment9:Assessment {name:'assessment9'}),
(Person1)-[:assessed]-> (Assessment5)-[:RATED {Analytical:50}]->(Person1),
(Person5)-[:assessed]-> (Assessment6)-[:RATED {LikesFamiliarity:95}]->(Person5),
(Person3)-[:assessed]-> (Assessment4)-[:RATED {Practical:42}]->(Person3),
(Person1)-[:assessed]-> (Assessment1)-[:RATED {Realistic:56}]->(Person1),
(Person5)-[:assessed]-> (Assessment9)-[:RATED {Rational:7}]->(Person5),
(Person5)-[:assessed]-> (Assessment5)-[:RATED {Deep:65}]->(Person5),
(Person2)-[:assessed]-> (Assessment1)-[:RATED {Observant:18}]->(Person2),
(Person1)-[:assessed]-> (Assessment7)-[:RATED {OpenMinded:16}]->(Person1),
(Person4)-[:assessed]-> (Assessment2)-[:RATED {Confident:47}]->(Person4),
(Person3)-[:assessed]-> (Assessment4)-[:RATED {LikesNovelty:60}]->(Person3),
(Person4)-[:assessed]-> (Assessment6)-[:RATED {HighEnergy:59}]->(Person4),
(Person2)-[:assessed]-> (Assessment6)-[:RATED {Realistic:32}]->(Person2),
(Person4)-[:assessed]-> (Assessment7)-[:RATED {LikesFamiliarity:95}]->(Person4),
(Person4)-[:assessed]-> (Assessment9)-[:RATED {Decisive:52}]->(Person4),
(Person3)-[:assessed]-> (Assessment9)-[:RATED {Outgoing:79}]->(Person3),
(Person2)-[:assessed]-> (Assessment2)-[:RATED {Patient:27}]->(Person2),
(Person5)-[:assessed]-> (Assessment9)-[:RATED {GoalOriented:37}]->(Person5),
(Person1)-[:assessed]-> (Assessment3)-[:RATED {Serious:67}]->(Person1),
(Person3)-[:assessed]-> (Assessment2)-[:RATED {IntellectuallyDriven:61}]->(Person3),
(Person1)-[:assessed]-> (Assessment3)-[:RATED {Relaxed:13}]->(Person1),
(Person2)-[:assessed]-> (Assessment9)-[:RATED {PrefersWorkingAlone:72}]->(Person2),
(Person2)-[:assessed]-> (Assessment1)-[:RATED {Intuitive:74}]->(Person2),
(Person3)-[:assessed]-> (Assessment6)-[:RATED {Spontaneous:16}]->(Person3),
(Person5)-[:assessed]-> (Assessment2)-[:RATED {Compulsive:17}]->(Person5),
(Person1)-[:assessed]-> (Assessment5)-[:RATED {Independent:14}]->(Person1),
(Person5)-[:assessed]-> (Assessment7)-[:RATED {DetailOriented:90}]->(Person5),
(Person3)-[:assessed]-> (Assessment9)-[:RATED {ProcessOriented:59}]->(Person3),
(Person5)-[:assessed]-> (Assessment9)-[:RATED {Inquisitive:9}]->(Person5),
(Person2)-[:assessed]-> (Assessment5)-[:RATED {DetailOriented:99}]->(Person2),
(Person5)-[:assessed]-> (Assessment7)-[:RATED {Sensitive:66}]->(Person5),
(Person3)-[:assessed]-> (Assessment1)-[:RATED {ProcessOriented:52}]->(Person3),
(Person2)-[:assessed]-> (Assessment9)-[:RATED {ProcessOriented:64}]->(Person2),
(Person2)-[:assessed]-> (Assessment3)-[:RATED {Steady:86}]->(Person2),
(Person2)-[:assessed]-> (Assessment5)-[:RATED {Intuitive:86}]->(Person2),
(Person4)-[:assessed]-> (Assessment5)-[:RATED {Flexible:8}]->(Person4),
(Person4)-[:assessed]-> (Assessment3)-[:RATED {Visionary:65}]->(Person4),
(Person4)-[:assessed]-> (Assessment7)-[:RATED {Protective:69}]->(Person4),
(Person4)-[:assessed]-> (Assessment7)-[:RATED {Stable:47}]->(Person4),
(Person4)-[:assessed]-> (Assessment9)-[:RATED {Passionate:78}]->(Person4),
(Person2)-[:assessed]-> (Assessment9)-[:RATED {Direct:7}]->(Person2),
(Person5)-[:assessed]-> (Assessment4)-[:RATED {Particular:91}]->(Person5),
(Person5)-[:assessed]-> (Assessment6)-[:RATED {Altruistic:30}]->(Person5),
(Person3)-[:assessed]-> (Assessment4)-[:RATED {Daring:55}]->(Person3),
(Person3)-[:assessed]-> (Assessment2)-[:RATED {LikesFamiliarity:48}]->(Person3),
(Person3)-[:assessed]-> (Assessment3)-[:RATED {Inquisitive:86}]->(Person3),
(Person3)-[:assessed]-> (Assessment6)-[:RATED {Empathic:55}]->(Person3),
(Person2)-[:assessed]-> (Assessment2)-[:RATED {OpenMinded:73}]->(Person2),
(Person4)-[:assessed]-> (Assessment7)-[:RATED {Loyal:51}]->(Person4),
(Person5)-[:assessed]-> (Assessment1)-[:RATED {Steady:70}]->(Person5),
(Person1)-[:assessed]-> (Assessment1)-[:RATED {Empathic:35}]->(Person1),
(Person3)-[:assessed]-> (Assessment6)-[:RATED {Leader:30}]->(Person3),
(Person5)-[:assessed]-> (Assessment4)-[:RATED {Confident:59}]->(Person5),
(Person1)-[:assessed]-> (Assessment1)-[:RATED {Competitive:13}]->(Person1),
(Person3)-[:assessed]-> (Assessment2)-[:RATED {Relaxed:85}]->(Person3),
(Person4)-[:assessed]-> (Assessment5)-[:RATED {FutureFocused:25}]->(Person4),
(Person4)-[:assessed]-> (Assessment6)-[:RATED {Deliberate:64}]->(Person4),
(Person5)-[:assessed]-> (Assessment6)-[:RATED {Creative:14}]->(Person2),
(Person1)-[:assessed]-> (Assessment1)-[:RATED {Direct:32}]->(Person1),
(Person1)-[:assessed]-> (Assessment4)-[:RATED {Meticulous:10}]->(Person2),
(Person5)-[:assessed]-> (Assessment1)-[:RATED {LikesFamiliarity:62}]->(Person5),
(Person1)-[:assessed]-> (Assessment5)-[:RATED {Rational:13}]->(Person2),
(Person2)-[:assessed]-> (Assessment5)-[:RATED {Optimistic:10}]->(Person5),
(Person5)-[:assessed]-> (Assessment9)-[:RATED {IntellectuallyDriven:97}]->(Person1),
(Person1)-[:assessed]-> (Assessment3)-[:RATED {Intuitive:18}]->(Person2),
(Person2)-[:assessed]-> (Assessment9)-[:RATED {Direct:87}]->(Person1),
(Person3)-[:assessed]-> (Assessment5)-[:RATED {Direct:62}]->(Person1),
(Person1)-[:assessed]-> (Assessment3)-[:RATED {Practical:86}]->(Person3),
(Person2)-[:assessed]-> (Assessment1)-[:RATED {IntellectuallyDriven:54}]->(Person5),
(Person5)-[:assessed]-> (Assessment9)-[:RATED {Unconventional:61}]->(Person2),
(Person3)-[:assessed]-> (Assessment6)-[:RATED {ProcessOriented:67}]->(Person1),
(Person1)-[:assessed]-> (Assessment8)-[:RATED {Direct:7}]->(Person1),
(Person5)-[:assessed]-> (Assessment2)-[:RATED {Alert:24}]->(Person2),
(Person4)-[:assessed]-> (Assessment4)-[:RATED {Analytical:23}]->(Person4),
(Person3)-[:assessed]-> (Assessment1)-[:RATED {Patient:6}]->(Person5),
(Person3)-[:assessed]-> (Assessment4)-[:RATED {Spontaneous:78}]->(Person3),
(Person1)-[:assessed]-> (Assessment2)-[:RATED {Optimistic:13}]->(Person5),
(Person2)-[:assessed]-> (Assessment9)-[:RATED {IntellectuallyDriven:66}]->(Person3),
(Person3)-[:assessed]-> (Assessment3)-[:RATED {ProcessOriented:91}]->(Person4),
(Person1)-[:assessed]-> (Assessment5)-[:RATED {Leader:98}]->(Person4),
(Person4)-[:assessed]-> (Assessment5)-[:RATED {HighEnergy:93}]->(Person1),
(Person5)-[:assessed]-> (Assessment2)-[:RATED {Daring:50}]->(Person3),
(Person5)-[:assessed]-> (Assessment7)-[:RATED {Altruistic:73}]->(Person3),
(Person1)-[:assessed]-> (Assessment1)-[:RATED {Meticulous:27}]->(Person4),
(Person5)-[:assessed]-> (Assessment5)-[:RATED {LikesNovelty:95}]->(Person2),
(Person4)-[:assessed]-> (Assessment5)-[:RATED {Serious:54}]->(Person2),
(Person5)-[:assessed]-> (Assessment3)-[:RATED {LikesNovelty:82}]->(Person5),
(Person5)-[:assessed]-> (Assessment9)-[:RATED {Direct:31}]->(Person4),
(Person1)-[:assessed]-> (Assessment4)-[:RATED {Patient:6}]->(Person4),
(Person2)-[:assessed]-> (Assessment3)-[:RATED {Earthy:71}]->(Person4),
(Person5)-[:assessed]-> (Assessment7)-[:RATED {Creative:99}]->(Person4),
(Person2)-[:assessed]-> (Assessment4)-[:RATED {PhysicallyDriven:23}]->(Person5),
(Person1)-[:assessed]-> (Assessment1)-[:RATED {Aggressive:20}]->(Person2),
(Person4)-[:assessed]-> (Assessment8)-[:RATED {PrefersWorkingwithOthers:14}]->(Person2),
(Person3)-[:assessed]-> (Assessment2)-[:RATED {Leader:89}]->(Person4),
(Person1)-[:assessed]-> (Assessment4)-[:RATED {Steady:45}]->(Person5),
(Person2)-[:assessed]-> (Assessment4)-[:RATED {Particular:1}]->(Person3),
(Person4)-[:assessed]-> (Assessment2)-[:RATED {OpenMinded:98}]->(Person4),
(Person2)-[:assessed]-> (Assessment5)-[:RATED {PhysicallyDriven:59}]->(Person2),
(Person3)-[:assessed]-> (Assessment6)-[:RATED {Realistic:71}]->(Person4),
(Person1)-[:assessed]-> (Assessment5)-[:RATED {Sensitive:6}]->(Person3),
(Person2)-[:assessed]-> (Assessment8)-[:RATED {Steady:54}]->(Person4),
(Person4)-[:assessed]-> (Assessment6)-[:RATED {DetailOriented:82}]->(Person1),
(Person5)-[:assessed]-> (Assessment7)-[:RATED {GoalOriented:75}]->(Person5),
(Person4)-[:assessed]-> (Assessment8)-[:RATED {HighEnergy:82}]->(Person5),
(Person2)-[:assessed]-> (Assessment4)-[:RATED {Leader:52}]->(Person1),
(Person2)-[:assessed]-> (Assessment5)-[:RATED {Intuitive:35}]->(Person1),
(Person5)-[:assessed]-> (Assessment2)-[:RATED {Unconventional:98}]->(Person5),
(Person4)-[:assessed]-> (Assessment6)-[:RATED {Empathic:17}]->(Person5),
(Person3)-[:assessed]-> (Assessment5)-[:RATED {ProcessOriented:89}]->(Person2),
(Person2)-[:assessed]-> (Assessment5)-[:RATED {Direct:82}]->(Person4),
(Person3)-[:assessed]-> (Assessment3)-[:RATED {Flexible:21}]->(Person2),
(Person2)-[:assessed]-> (Assessment6)-[:RATED {Patient:84}]->(Person1);

MATCH n RETURN n;

Cosine Similarity

Introduction & Example

Cosine similarity is the cosine of the angle between two n-dimensional vectors in an n-dimensional space. It is the dot product of the two vectors divided by the product of the two vectors' lengths (or magnitudes). For two vectors A and B in an n-dimensional space:

\( \LARGE similarity(A, B) = \frac{A \cdot B}{\|A\| \times \|B\|} = \frac{\sum\limits_{i=1}^n A_{i} \times B_{i}}{\sqrt{\sum\limits_{i=1}^n A_{i}^2} \times \sqrt{\sum\limits_{i=1}^n B_{i}^2}} \)

Cosine similarity ranges between -1 and 1, where -1 is perfectly dissimilar and 1 is perfectly similar. ^[3]^

To be as clear as possible, I’ll pull two people from the data set and show how to manually calculate their cosine similarity.

MATCH (n:Person) <-- (a:Assessment)
RETURN n,a
MATCH  (a1:Assessment)-[r1:RATED]->(p2:Person)<-[r2:RATED]-(a2:Assessment)
RETURN p2.name As Person, r1 AS `One`, r2 AS `Two`

Each person should be thought of as a vector where their coordinates are defined by their ratings.

Add Cosine Similarities to the Graph

I want to create a [:SIMILARITY] relationship between each person in the graph, where their cosine similarity is a property of the relationship. The query that accomplishes this is:

MATCH (a1:Assessment)-[x:RATED]->(p1:Person)<-[y:RATED]-(a2:Assessment)
RETURN  SUM(x.Patient * y.Patient) AS xyDotProduct
MATCH (a1:Assessment)-[x:RATED]->(p1:Person)<-[y:RATED]-(a2:Assessment)
WITH  SUM(x.Patient * y.Patient) AS xyDotProduct,
      SQRT(REDUCE(xDot = 1.0, a IN COLLECT(x.rating) | xDot + a^2)) AS xLength,
      SQRT(REDUCE(yDot = 1.0, b IN COLLECT(y.rating) | yDot + b^2)) AS yLength,
      a1, a2
MERGE (a1)-[s:SIMILARITY]-(a2)
SET   s.similarity = xyDotProduct / (xLength * yLength)
RETURN s.similarity as `Cosine Similarity`
MATCH 	 (a1:Assessment)-[s:SIMILARITY]-(a2:Assessment)
WITH 	 a2, s.similarity AS sim
ORDER BY sim DESC
LIMIT 	 5
RETURN 	 a2.name AS Neighbor, sim AS Similarity

1. A distance metric measures distance; the higher the distance the further apart the neighbors. A similarity metric measures similarity; the higher the similarity the closer the neighbors.
2. Cosine similarity is ideal for sparse vectors
3. In this data set in particular, similarities will be positive and generally near 1 since all attributes are within the narrow range of 1-10.
@szeitlin
Copy link
Author

To view as a graph gist: paste this URL into the top right corner at http://gist.neo4j.org

Summary:
Assessments are performed by Persons on themselves or other Persons using Traitify. Traits from Traitify (using dummy data to start with) are scored (0-100) as weights on relationships. Goal is to return a similarity score between a Person's Assessment of themselves vs. someone else's Assessment of them. Potentially useful for applications like corporate 360 evaluations, hiring, and dating.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment