Skip to content

Instantly share code, notes, and snippets.

@sushihangover
Forked from nicolewhite/k-NN.adoc
Created July 31, 2016 17:25
Show Gist options
  • Save sushihangover/1af1056462051ab881f9f4c7298dd15d to your computer and use it in GitHub Desktop.
Save sushihangover/1af1056462051ab881f9f4c7298dd15d to your computer and use it in GitHub Desktop.

Movie Recommendations with k-Nearest Neighbors and Cosine Similarity


Introduction

The k-nearest neighbors (k-NN) algorithm is among the simplest algorithms in the data mining field. Distances / similarities are calculated between each element in the data set using some distance / similarity metric ^[1]^ that the researcher chooses (there are many distance / similarity metrics), where the distance / similarity between any two elements is calculated based on the two elements' attributes. A data element’s k-NN are the k closest data elements according to this distance / similarity.

In this Graph Gist, I’m using k-NN with cosine similarity ^[2]^ as the similarity metric to calculate movie recommendations. I wanted a fun data set, so I asked people on Twitter and bothered a few people via email to fill out this form. Using their movie ratings, I’ll calculate the cosine similarity between each person. I’ll then calculate movie recommendations for a person’s unrated movies based on an average rating from that person’s k-nearest neighbors. My methodology will be explained in detail throughout.


Initial Data Model

For now, the database only consists of Person nodes and Movie nodes, where (Person)-[:RATED {rating:00}]->(Movie). There are 15 people and 30 movies in the dataset.

personratedmovie

Initial Data Setup

CREATE
(Person1:Person {name:'Michael Sherman'}),
(Person2:Person {name:'Zoltan Varju'}),
(Person3:Person {name:'Peter Neubauer'}),
(Person4:Person {name:'Grace Andrews'}),
(Person5:Person {name:'Michael Hunger'}),
(Person6:Person {name:'Toby Craig'}),
(Person7:Person {name:'Huston Hedinger'}),
(Person8:Person {name:'Nigel Small'}),
(Person9:Person {name:'Wes Freeman'}),
(Person10:Person {name:'Luanne Misquitta'}),
(Person11:Person {name:'shiv swami'}),
(Person12:Person {name:'Pernilla Lindh'}),
(Person13:Person {name:'Max De Marzi'}),
(Person14:Person {name:'Chris Leishman'}),
(Person15:Person {name:'Kenny Bastani'}),
(Movie1:Movie {name:'Titanic'}),
(Movie2:Movie {name:'Forrest Gump'}),
(Movie3:Movie {name:'Mean Girls'}),
(Movie4:Movie {name:'The Bourne Trilogy'}),
(Movie5:Movie {name:'Jurassic Park'}),
(Movie6:Movie {name:'The 40 Year Old Virgin'}),
(Movie7:Movie {name:'Thank You for Smoking'}),
(Movie8:Movie {name:'Happy Gilmore'}),
(Movie9:Movie {name:'Knocked Up'}),
(Movie10:Movie {name:'A Beautiful Mind'}),
(Movie11:Movie {name:'Bridesmaids'}),
(Movie12:Movie {name:'The Dark Knight Trilogy'}),
(Movie13:Movie {name:'Charlie\'s Angels'}),
(Movie14:Movie {name:'Avatar'}),
(Movie15:Movie {name:'Children of Men'}),
(Movie16:Movie {name:'Gladiator'}),
(Movie17:Movie {name:'Shutter Island'}),
(Movie18:Movie {name:'Forgetting Sarah Marshall'}),
(Movie19:Movie {name:'Inception'}),
(Movie20:Movie {name:'The Social Network'}),
(Movie21:Movie {name:'Marley and Me'}),
(Movie22:Movie {name:'Taken'}),
(Movie23:Movie {name:'Pan\'s Labyrinth'}),
(Movie24:Movie {name:'Inglourious Basterds'}),
(Movie25:Movie {name:'The Ocean\'s Trilogy'}),
(Movie26:Movie {name:'The Notebook'}),
(Movie27:Movie {name:'The Devil Wears Prada'}),
(Movie28:Movie {name:'The Truman Show'}),
(Movie29:Movie {name:'WALL-E'}),
(Movie30:Movie {name:'Paranormal Activity'}),
(Person1)-[:RATED {rating:4}]->(Movie1),
(Person1)-[:RATED {rating:3}]->(Movie2),
(Person1)-[:RATED {rating:8}]->(Movie3),
(Person1)-[:RATED {rating:8}]->(Movie5),
(Person1)-[:RATED {rating:6}]->(Movie6),
(Person1)-[:RATED {rating:6}]->(Movie8),
(Person1)-[:RATED {rating:4}]->(Movie9),
(Person1)-[:RATED {rating:7}]->(Movie12),
(Person1)-[:RATED {rating:5}]->(Movie14),
(Person1)-[:RATED {rating:2}]->(Movie16),
(Person1)-[:RATED {rating:9}]->(Movie20),
(Person1)-[:RATED {rating:4}]->(Movie27),
(Person1)-[:RATED {rating:9}]->(Movie28),
(Person1)-[:RATED {rating:10}]->(Movie29),
(Person2)-[:RATED {rating:3}]->(Movie1),
(Person2)-[:RATED {rating:8}]->(Movie2),
(Person2)-[:RATED {rating:7}]->(Movie5),
(Person2)-[:RATED {rating:5}]->(Movie6),
(Person2)-[:RATED {rating:8}]->(Movie7),
(Person2)-[:RATED {rating:6}]->(Movie9),
(Person2)-[:RATED {rating:8}]->(Movie10),
(Person2)-[:RATED {rating:7}]->(Movie13),
(Person2)-[:RATED {rating:5}]->(Movie16),
(Person2)-[:RATED {rating:9}]->(Movie24),
(Person2)-[:RATED {rating:9}]->(Movie25),
(Person2)-[:RATED {rating:7}]->(Movie27),
(Person2)-[:RATED {rating:2}]->(Movie28),
(Person3)-[:RATED {rating:8}]->(Movie1),
(Person3)-[:RATED {rating:10}]->(Movie2),
(Person3)-[:RATED {rating:7}]->(Movie3),
(Person3)-[:RATED {rating:9}]->(Movie4),
(Person3)-[:RATED {rating:8}]->(Movie5),
(Person3)-[:RATED {rating:5}]->(Movie6),
(Person3)-[:RATED {rating:8}]->(Movie7),
(Person3)-[:RATED {rating:7}]->(Movie8),
(Person3)-[:RATED {rating:7}]->(Movie9),
(Person3)-[:RATED {rating:9}]->(Movie10),
(Person3)-[:RATED {rating:5}]->(Movie11),
(Person3)-[:RATED {rating:7}]->(Movie12),
(Person3)-[:RATED {rating:3}]->(Movie13),
(Person3)-[:RATED {rating:10}]->(Movie14),
(Person3)-[:RATED {rating:7}]->(Movie15),
(Person3)-[:RATED {rating:9}]->(Movie16),
(Person3)-[:RATED {rating:7}]->(Movie17),
(Person3)-[:RATED {rating:5}]->(Movie18),
(Person3)-[:RATED {rating:8}]->(Movie19),
(Person3)-[:RATED {rating:3}]->(Movie20),
(Person3)-[:RATED {rating:4}]->(Movie21),
(Person3)-[:RATED {rating:6}]->(Movie22),
(Person3)-[:RATED {rating:9}]->(Movie23),
(Person3)-[:RATED {rating:8}]->(Movie24),
(Person3)-[:RATED {rating:7}]->(Movie25),
(Person3)-[:RATED {rating:5}]->(Movie26),
(Person3)-[:RATED {rating:2}]->(Movie27),
(Person3)-[:RATED {rating:4}]->(Movie28),
(Person3)-[:RATED {rating:8}]->(Movie29),
(Person3)-[:RATED {rating:5}]->(Movie30),
(Person4)-[:RATED {rating:8}]->(Movie1),
(Person4)-[:RATED {rating:9}]->(Movie2),
(Person4)-[:RATED {rating:7}]->(Movie3),
(Person4)-[:RATED {rating:9}]->(Movie4),
(Person4)-[:RATED {rating:8}]->(Movie5),
(Person4)-[:RATED {rating:6}]->(Movie6),
(Person4)-[:RATED {rating:7}]->(Movie8),
(Person4)-[:RATED {rating:6}]->(Movie9),
(Person4)-[:RATED {rating:8}]->(Movie10),
(Person4)-[:RATED {rating:8}]->(Movie11),
(Person4)-[:RATED {rating:9}]->(Movie12),
(Person4)-[:RATED {rating:7}]->(Movie13),
(Person4)-[:RATED {rating:7}]->(Movie14),
(Person4)-[:RATED {rating:10}]->(Movie15),
(Person4)-[:RATED {rating:9}]->(Movie16),
(Person4)-[:RATED {rating:9}]->(Movie17),
(Person4)-[:RATED {rating:5}]->(Movie18),
(Person4)-[:RATED {rating:10}]->(Movie19),
(Person4)-[:RATED {rating:7}]->(Movie20),
(Person4)-[:RATED {rating:7}]->(Movie21),
(Person4)-[:RATED {rating:6}]->(Movie26),
(Person4)-[:RATED {rating:9}]->(Movie27),
(Person4)-[:RATED {rating:7}]->(Movie28),
(Person5)-[:RATED {rating:10}]->(Movie2),
(Person5)-[:RATED {rating:8}]->(Movie5),
(Person5)-[:RATED {rating:6}]->(Movie12),
(Person5)-[:RATED {rating:10}]->(Movie13),
(Person5)-[:RATED {rating:6}]->(Movie14),
(Person5)-[:RATED {rating:4}]->(Movie16),
(Person5)-[:RATED {rating:8}]->(Movie19),
(Person5)-[:RATED {rating:5}]->(Movie20),
(Person5)-[:RATED {rating:7}]->(Movie25),
(Person6)-[:RATED {rating:7}]->(Movie1),
(Person6)-[:RATED {rating:8}]->(Movie2),
(Person6)-[:RATED {rating:7}]->(Movie4),
(Person6)-[:RATED {rating:8}]->(Movie5),
(Person6)-[:RATED {rating:9}]->(Movie16),
(Person6)-[:RATED {rating:10}]->(Movie23),
(Person6)-[:RATED {rating:8}]->(Movie24),
(Person6)-[:RATED {rating:6}]->(Movie25),
(Person6)-[:RATED {rating:8}]->(Movie28),
(Person6)-[:RATED {rating:8}]->(Movie29),
(Person6)-[:RATED {rating:9}]->(Movie30),
(Person7)-[:RATED {rating:8}]->(Movie1),
(Person7)-[:RATED {rating:9}]->(Movie2),
(Person7)-[:RATED {rating:4}]->(Movie3),
(Person7)-[:RATED {rating:9}]->(Movie4),
(Person7)-[:RATED {rating:9}]->(Movie5),
(Person7)-[:RATED {rating:9}]->(Movie6),
(Person7)-[:RATED {rating:9}]->(Movie7),
(Person7)-[:RATED {rating:7}]->(Movie8),
(Person7)-[:RATED {rating:7}]->(Movie9),
(Person7)-[:RATED {rating:8}]->(Movie10),
(Person7)-[:RATED {rating:9}]->(Movie11),
(Person7)-[:RATED {rating:9}]->(Movie12),
(Person7)-[:RATED {rating:5}]->(Movie13),
(Person7)-[:RATED {rating:9}]->(Movie14),
(Person7)-[:RATED {rating:6}]->(Movie15),
(Person7)-[:RATED {rating:8}]->(Movie16),
(Person7)-[:RATED {rating:8}]->(Movie18),
(Person7)-[:RATED {rating:7}]->(Movie19),
(Person7)-[:RATED {rating:7}]->(Movie20),
(Person7)-[:RATED {rating:9}]->(Movie22),
(Person7)-[:RATED {rating:8}]->(Movie25),
(Person7)-[:RATED {rating:9}]->(Movie26),
(Person7)-[:RATED {rating:6}]->(Movie28),
(Person7)-[:RATED {rating:9}]->(Movie29),
(Person8)-[:RATED {rating:5}]->(Movie1),
(Person8)-[:RATED {rating:9}]->(Movie2),
(Person8)-[:RATED {rating:10}]->(Movie4),
(Person8)-[:RATED {rating:8}]->(Movie5),
(Person8)-[:RATED {rating:10}]->(Movie12),
(Person8)-[:RATED {rating:9}]->(Movie13),
(Person8)-[:RATED {rating:7}]->(Movie14),
(Person8)-[:RATED {rating:6}]->(Movie19),
(Person8)-[:RATED {rating:10}]->(Movie22),
(Person8)-[:RATED {rating:10}]->(Movie24),
(Person8)-[:RATED {rating:9}]->(Movie25),
(Person9)-[:RATED {rating:8}]->(Movie1),
(Person9)-[:RATED {rating:8}]->(Movie2),
(Person9)-[:RATED {rating:9}]->(Movie4),
(Person9)-[:RATED {rating:9}]->(Movie5),
(Person9)-[:RATED {rating:9}]->(Movie6),
(Person9)-[:RATED {rating:8}]->(Movie9),
(Person9)-[:RATED {rating:8}]->(Movie10),
(Person9)-[:RATED {rating:7}]->(Movie13),
(Person9)-[:RATED {rating:9}]->(Movie14),
(Person9)-[:RATED {rating:7}]->(Movie19),
(Person9)-[:RATED {rating:8}]->(Movie20),
(Person9)-[:RATED {rating:8}]->(Movie21),
(Person9)-[:RATED {rating:8}]->(Movie22),
(Person9)-[:RATED {rating:7}]->(Movie23),
(Person9)-[:RATED {rating:7}]->(Movie25),
(Person9)-[:RATED {rating:7}]->(Movie27),
(Person9)-[:RATED {rating:9}]->(Movie29),
(Person10)-[:RATED {rating:3}]->(Movie1),
(Person10)-[:RATED {rating:5}]->(Movie2),
(Person10)-[:RATED {rating:5}]->(Movie3),
(Person10)-[:RATED {rating:8}]->(Movie4),
(Person10)-[:RATED {rating:7}]->(Movie5),
(Person10)-[:RATED {rating:3}]->(Movie9),
(Person10)-[:RATED {rating:9}]->(Movie10),
(Person10)-[:RATED {rating:5}]->(Movie11),
(Person10)-[:RATED {rating:7}]->(Movie12),
(Person10)-[:RATED {rating:9}]->(Movie13),
(Person10)-[:RATED {rating:10}]->(Movie14),
(Person10)-[:RATED {rating:8}]->(Movie16),
(Person10)-[:RATED {rating:8}]->(Movie20),
(Person10)-[:RATED {rating:9}]->(Movie24),
(Person10)-[:RATED {rating:9}]->(Movie25),
(Person10)-[:RATED {rating:5}]->(Movie26),
(Person10)-[:RATED {rating:9}]->(Movie27),
(Person10)-[:RATED {rating:9}]->(Movie29),
(Person11)-[:RATED {rating:10}]->(Movie1),
(Person11)-[:RATED {rating:10}]->(Movie2),
(Person11)-[:RATED {rating:5}]->(Movie3),
(Person11)-[:RATED {rating:7}]->(Movie4),
(Person11)-[:RATED {rating:9}]->(Movie5),
(Person11)-[:RATED {rating:5}]->(Movie6),
(Person11)-[:RATED {rating:5}]->(Movie7),
(Person11)-[:RATED {rating:6}]->(Movie8),
(Person11)-[:RATED {rating:7}]->(Movie9),
(Person11)-[:RATED {rating:10}]->(Movie10),
(Person11)-[:RATED {rating:7}]->(Movie11),
(Person11)-[:RATED {rating:9}]->(Movie12),
(Person11)-[:RATED {rating:8}]->(Movie13),
(Person11)-[:RATED {rating:10}]->(Movie14),
(Person11)-[:RATED {rating:7}]->(Movie15),
(Person11)-[:RATED {rating:7}]->(Movie16),
(Person11)-[:RATED {rating:7}]->(Movie17),
(Person11)-[:RATED {rating:7}]->(Movie18),
(Person11)-[:RATED {rating:6}]->(Movie19),
(Person11)-[:RATED {rating:9}]->(Movie20),
(Person11)-[:RATED {rating:7}]->(Movie21),
(Person11)-[:RATED {rating:8}]->(Movie22),
(Person11)-[:RATED {rating:7}]->(Movie23),
(Person11)-[:RATED {rating:8}]->(Movie24),
(Person11)-[:RATED {rating:7}]->(Movie25),
(Person11)-[:RATED {rating:8}]->(Movie26),
(Person11)-[:RATED {rating:7}]->(Movie27),
(Person11)-[:RATED {rating:9}]->(Movie28),
(Person11)-[:RATED {rating:9}]->(Movie29),
(Person11)-[:RATED {rating:7}]->(Movie30),
(Person12)-[:RATED {rating:5}]->(Movie1),
(Person12)-[:RATED {rating:10}]->(Movie2),
(Person12)-[:RATED {rating:8}]->(Movie3),
(Person12)-[:RATED {rating:5}]->(Movie4),
(Person12)-[:RATED {rating:10}]->(Movie5),
(Person12)-[:RATED {rating:4}]->(Movie6),
(Person12)-[:RATED {rating:10}]->(Movie7),
(Person12)-[:RATED {rating:5}]->(Movie8),
(Person12)-[:RATED {rating:5}]->(Movie9),
(Person12)-[:RATED {rating:10}]->(Movie10),
(Person12)-[:RATED {rating:5}]->(Movie11),
(Person12)-[:RATED {rating:10}]->(Movie12),
(Person12)-[:RATED {rating:5}]->(Movie13),
(Person12)-[:RATED {rating:10}]->(Movie14),
(Person12)-[:RATED {rating:9}]->(Movie15),
(Person12)-[:RATED {rating:10}]->(Movie16),
(Person12)-[:RATED {rating:7}]->(Movie17),
(Person12)-[:RATED {rating:5}]->(Movie18),
(Person12)-[:RATED {rating:5}]->(Movie19),
(Person12)-[:RATED {rating:7}]->(Movie20),
(Person12)-[:RATED {rating:10}]->(Movie21),
(Person12)-[:RATED {rating:7}]->(Movie22),
(Person12)-[:RATED {rating:10}]->(Movie23),
(Person12)-[:RATED {rating:8}]->(Movie24),
(Person12)-[:RATED {rating:8}]->(Movie25),
(Person12)-[:RATED {rating:10}]->(Movie26),
(Person12)-[:RATED {rating:5}]->(Movie27),
(Person12)-[:RATED {rating:9}]->(Movie28),
(Person12)-[:RATED {rating:7}]->(Movie29),
(Person12)-[:RATED {rating:3}]->(Movie30),
(Person13)-[:RATED {rating:7}]->(Movie1),
(Person13)-[:RATED {rating:10}]->(Movie2),
(Person13)-[:RATED {rating:7}]->(Movie3),
(Person13)-[:RATED {rating:8}]->(Movie4),
(Person13)-[:RATED {rating:9}]->(Movie5),
(Person13)-[:RATED {rating:4}]->(Movie6),
(Person13)-[:RATED {rating:6}]->(Movie7),
(Person13)-[:RATED {rating:3}]->(Movie8),
(Person13)-[:RATED {rating:7}]->(Movie9),
(Person13)-[:RATED {rating:9}]->(Movie10),
(Person13)-[:RATED {rating:4}]->(Movie11),
(Person13)-[:RATED {rating:7}]->(Movie12),
(Person13)-[:RATED {rating:6}]->(Movie13),
(Person13)-[:RATED {rating:6}]->(Movie14),
(Person13)-[:RATED {rating:9}]->(Movie15),
(Person13)-[:RATED {rating:9}]->(Movie16),
(Person13)-[:RATED {rating:8}]->(Movie17),
(Person13)-[:RATED {rating:7}]->(Movie18),
(Person13)-[:RATED {rating:8}]->(Movie19),
(Person13)-[:RATED {rating:5}]->(Movie20),
(Person13)-[:RATED {rating:4}]->(Movie21),
(Person13)-[:RATED {rating:4}]->(Movie22),
(Person13)-[:RATED {rating:10}]->(Movie23),
(Person13)-[:RATED {rating:7}]->(Movie24),
(Person13)-[:RATED {rating:10}]->(Movie25),
(Person13)-[:RATED {rating:8}]->(Movie26),
(Person13)-[:RATED {rating:8}]->(Movie27),
(Person13)-[:RATED {rating:10}]->(Movie28),
(Person13)-[:RATED {rating:10}]->(Movie29),
(Person13)-[:RATED {rating:9}]->(Movie30),
(Person14)-[:RATED {rating:5}]->(Movie1),
(Person14)-[:RATED {rating:8}]->(Movie2),
(Person14)-[:RATED {rating:8}]->(Movie4),
(Person14)-[:RATED {rating:2}]->(Movie5),
(Person14)-[:RATED {rating:10}]->(Movie7),
(Person14)-[:RATED {rating:9}]->(Movie9),
(Person14)-[:RATED {rating:9}]->(Movie10),
(Person14)-[:RATED {rating:8}]->(Movie13),
(Person14)-[:RATED {rating:7}]->(Movie14),
(Person14)-[:RATED {rating:9}]->(Movie15),
(Person14)-[:RATED {rating:8}]->(Movie16),
(Person14)-[:RATED {rating:9}]->(Movie19),
(Person14)-[:RATED {rating:6}]->(Movie20),
(Person14)-[:RATED {rating:7}]->(Movie22),
(Person14)-[:RATED {rating:9}]->(Movie24),
(Person14)-[:RATED {rating:7}]->(Movie25),
(Person14)-[:RATED {rating:5}]->(Movie27),
(Person14)-[:RATED {rating:6}]->(Movie28),
(Person14)-[:RATED {rating:7}]->(Movie29),
(Person15)-[:RATED {rating:8}]->(Movie1),
(Person15)-[:RATED {rating:10}]->(Movie2),
(Person15)-[:RATED {rating:4}]->(Movie3),
(Person15)-[:RATED {rating:5}]->(Movie4),
(Person15)-[:RATED {rating:6}]->(Movie5),
(Person15)-[:RATED {rating:7}]->(Movie6),
(Person15)-[:RATED {rating:8}]->(Movie7),
(Person15)-[:RATED {rating:8}]->(Movie8),
(Person15)-[:RATED {rating:6}]->(Movie9),
(Person15)-[:RATED {rating:10}]->(Movie10),
(Person15)-[:RATED {rating:10}]->(Movie11),
(Person15)-[:RATED {rating:9}]->(Movie12),
(Person15)-[:RATED {rating:4}]->(Movie13),
(Person15)-[:RATED {rating:10}]->(Movie14),
(Person15)-[:RATED {rating:5}]->(Movie15),
(Person15)-[:RATED {rating:9}]->(Movie16),
(Person15)-[:RATED {rating:5}]->(Movie17),
(Person15)-[:RATED {rating:10}]->(Movie18),
(Person15)-[:RATED {rating:8}]->(Movie19),
(Person15)-[:RATED {rating:8}]->(Movie20),
(Person15)-[:RATED {rating:5}]->(Movie21),
(Person15)-[:RATED {rating:6}]->(Movie22),
(Person15)-[:RATED {rating:5}]->(Movie23),
(Person15)-[:RATED {rating:8}]->(Movie24),
(Person15)-[:RATED {rating:7}]->(Movie25),
(Person15)-[:RATED {rating:9}]->(Movie26),
(Person15)-[:RATED {rating:10}]->(Movie27),
(Person15)-[:RATED {rating:10}]->(Movie28),
(Person15)-[:RATED {rating:9}]->(Movie29),
(Person15)-[:RATED {rating:8}]->(Movie30);

MATCH p = ()--() RETURN p LIMIT 10;

Cosine Similarity

Introduction & Example

Cosine similarity is the cosine of the angle between two n-dimensional vectors in an n-dimensional space. It is the dot product of the two vectors divided by the product of the two vectors' lengths (or magnitudes). For two vectors A and B in an n-dimensional space:

\( \LARGE similarity(A, B) = \frac{A \cdot B}{\|A\| \times \|B\|} = \frac{\sum\limits_{i=1}^n A_{i} \times B_{i}}{\sqrt{\sum\limits_{i=1}^n A_{i}^2} \times \sqrt{\sum\limits_{i=1}^n B_{i}^2}} \)

Cosine similarity ranges between -1 and 1, where -1 is perfectly dissimilar and 1 is perfectly similar. ^[3]^

To be as clear as possible, I’ll pull two people from the data set and show how to manually calculate their cosine similarity.

Consider my UT Austin classmate Michael Sherman and Neo4j’s Michael Hunger. We are only interested in the movies that both of them rated, as cosine similarity is only calculated over non-NULL dimensions:

MATCH  (p1:Person {name:'Michael Sherman'})-[r1:RATED]->(m:Movie)<-[r2:RATED]-(p2:Person {name:'Michael Hunger'})
RETURN m.name AS Movie, r1.rating AS `M. Sherman's Rating`, r2.rating AS `M. Hunger's Rating`

Each person should be thought of as a vector where their coordinates are defined by their movie ratings. Thus:

\( \overrightarrow{M. Sherman} = \langle 3, 8, 7, 5, 2, 9 \rangle \)

\( \overrightarrow{M. Hunger} = \langle 10, 8, 6, 6, 4, 5 \rangle \)

\( \large similarity(M. Sherman, M. Hunger) = \frac{3 \cdot 10 + 8 \cdot 8 + 7 \cdot 6 + 5 \cdot 6 + 2 \cdot 4 + 9 \cdot 5}{\sqrt{3^2 + 8^2 + 7^2 + 5^2 + 2^2 + 9^2} \times \sqrt{10^2 + 8^2 + 6^2 + 6^2 + 4^2 + 5^2}} = \frac{219}{15.2315 \times 16.6433} = 0.8639 \)

Add Cosine Similarities to the Graph

I want to create a [:SIMILARITY] relationship between each person in the graph, where their cosine similarity is a property of the relationship. The query that accomplishes this is:

MATCH (p1:Person)-[x:RATED]->(m:Movie)<-[y:RATED]-(p2:Person)
WITH  SUM(x.rating * y.rating) AS xyDotProduct,
      SQRT(REDUCE(xDot = 0.0, a IN COLLECT(x.rating) | xDot + a^2)) AS xLength,
      SQRT(REDUCE(yDot = 0.0, b IN COLLECT(y.rating) | yDot + b^2)) AS yLength,
      p1, p2
MERGE (p1)-[s:SIMILARITY]-(p2)
SET   s.similarity = xyDotProduct / (xLength * yLength)

There is only one [:SIMILARITY] relationship between each person.

Let’s confirm the cosine similarities generated with Cypher are consistent with the cosine similarity calculated manually for M. Sherman and M. Hunger:

MATCH  (p1:Person {name:'Michael Sherman'})-[s:SIMILARITY]-(p2:Person {name:'Michael Hunger'})
RETURN s.similarity AS `Cosine Similarity`

Looks good!

Updated Graph Model

The updated graph model now looks like this:

similarityadded
MATCH (p1:Person)-[:RATED]->(m:Movie)<-[:RATED]-(p2:Person),
      (p1)-[:SIMILARITY]-(p2)
RETURN p1, p2, m LIMIT 3;

View Your Nearest Neighbors

With the similarities added to the graph, it is easy to view your k-nearest neighbors. Let’s view Graph Alchemist Grace's 5-nearest neighbors:

MATCH 	 (p1:Person {name:'Grace Andrews'})-[s:SIMILARITY]-(p2:Person)
WITH 	 p2, s.similarity AS sim
ORDER BY sim DESC
LIMIT 	 5
RETURN 	 p2.name AS Neighbor, sim AS Similarity

These people, in descending order, rated movies most similarly to Grace.

Edit query 6 in the console if you filled out the form and want to see your nearest neighbors! Your name in the graph is exactly how it was entered in the form.


Calculate Movie Recommendations

Ultimately, I want to provide recommendations for movies that a person hasn’t rated (which I am naively assuming to mean that they haven’t seen the movie). As mentioned earlier, I decided to accomplish this by averaging the movie ratings from that person’s k-nearest neighbors (out of the neighbors who rated the relevant movie). ^[4]^ I decided to use k = 3 for the movie recommendations; these recommendations should be thought of as estimates of how much the person would like (or how the person would rate) the movies they haven’t seen.

Get Movie Recommendations for Zoltan

Let’s get Zoltan's recommendations for the movies he hasn’t seen:

MATCH    (b:Person)-[r:RATED]->(m:Movie), (b)-[s:SIMILARITY]-(a:Person {name:'Zoltan Varju'})
WHERE    NOT((a)-[:RATED]->(m))
WITH     m, s.similarity AS similarity, r.rating AS rating
ORDER BY m.name, similarity DESC
WITH     m.name AS movie, COLLECT(rating)[0..3] AS ratings
WITH     movie, REDUCE(s = 0, i IN ratings | s + i)*1.0 / LENGTH(ratings) AS reco
ORDER BY reco DESC
RETURN   movie AS Movie, reco AS Recommendation

It looks like Zoltan’s nearest neighbors most enjoyed WALL-E, Pan’s Labyrinth, The Bourne Trilogy, and Taken, and Zoltan should check out these movies!

Edit query 7 in the console to get your movie recommendations! If you rated all movies, you don’t get any recommendations. Sorry!

A Breakdown of the Movie Recommendations Query

Get all of the people who rated the movies that Zoltan didn’t rate, their ratings for those movies, and also get their similarities with Zoltan:

MATCH    (b:Person)-[r:RATED]->(m:Movie), (b)-[s:SIMILARITY]-(a:Person {name:'Zoltan Varju'})
WHERE    NOT((a)-[:RATED]->(m))

With the movies, similarities, and ratings, sort first by movie name and then by similarity descending ^[5]^ so that the ratings are in the correct order for collection in the next step:

WITH     m, s.similarity AS similarity, r.rating AS rating
ORDER BY m.name, similarity DESC

Group by movie and grab the first three ratings into a collection called ratings:

WITH 	 m.name AS movie, COLLECT(rating)[0..3] AS ratings

Average the ratings in the ratings collection. Return the movie name and the average rating as the recommendation:

WITH 	 movie, REDUCE(s = 0, i IN ratings | s + i)*1.0 / LENGTH(ratings) AS reco
ORDER BY reco DESC
RETURN 	 movie AS Movie, reco AS Recommendation

1. A distance metric measures distance; the higher the distance the further apart the neighbors. A similarity metric measures similarity; the higher the similarity the closer the neighbors.
2. Cosine similarity is ideal for sparse vectors; many people did not rate all movies and so their vectors will consist of several NULLs.
3. In this data set in particular, similarities will be positive and generally near 1 since all attributes are within the narrow range of 1-10.
4. It is important to note that, while person A’s nearest three neighbors overall might be persons B, C, and D, person A’s movie recommendations might come from the average movie ratings from persons beyond this range if any of persons B-D did not rate the relevant movie. So, if person A needs a recommendation for movie Z, but person C did not rate Z, then the recommendation would consist of the average rating from persons B, D, and E, assuming person E is person A’s next nearest neighbor who rated movie Z.
5. The higher the similarity, the closer the neighbor.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment