Skip to content

Instantly share code, notes, and snippets.

@rajarsheem
Last active January 4, 2016 12:04
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save rajarsheem/12cd9e7cd7c8f1b6ed81 to your computer and use it in GitHub Desktop.
Save rajarsheem/12cd9e7cd7c8f1b6ed81 to your computer and use it in GitHub Desktop.
Coursera Recommender System assignment for User-User collaborative filtering
1648 5136 918 2824 3867 860 3712 2968 3525 4323 3617 4360 2756 89 442 3556 5261 2492 5062 2486 4942 2267 4809 3853 2288
11: Star Wars: Episode IV - A New Hope (1977) 4.5 5 4.5 4 4 5 4 5 4 4 3 4 4.5 4 3.5
12: Finding Nemo (2003) 5 5 4 4 4.5 4.5 4 5 4 5 4.5 4 3.5 4 2 3.5 3.5
13: Forrest Gump (1994) 5 4.5 5 4.5 4.5 5 4.5 5 5 4.5 4.5 5 3 4 5 3.5 4.5 4.5 4 3.5 4.5 3.5 3.5
14: American Beauty (1999) 4 4.5 2 3.5 5 3.5 5 3.5 4 4 3.5 4.5 3.5 4 3.5
22: Pirates of the Caribbean: The Curse of the Black Pearl (2003) 4 5 3 4.5 4 2.5 5 3 4 4 4.5 4 1 3 1.5 4 4 2.5 3.5 5 3.5
24: Kill Bill: Vol. 1 (2003) 3 5 4 3 3 0.5 3.5 5 4 4 4 5 5 5 0.5 4 4 4.5 4 5 5 3
38: Eternal Sunshine of the Spotless Mind (2004) 5 5 0.5 4 5 3 5 3 4 3 5 1.5 5 5
63: Twelve Monkeys (a.k.a. 12 Monkeys) (1995) 3 4 2.5 3.5 2 2 4 4 3.5 4 5
77: Memento (2000) 5 5 4.5 3 4.5 3.5 3.5 4 4 4 5 4 4 4.5 5 4 4.5 5
85: Raiders of the Lost Ark (Indiana Jones and the Raiders of the Lost Ark) (1981) 5 4.5 3 4 5 4.5 5 4 5 4 4.5 3.5 4 3
98: Gladiator (2000) 3.5 5 4 4 4.5 3.5 4 5 3 5 4 3.5 3.5 5 4 2 4.5 3 4.5 3 4
105: Back to the Future (1985) 4.5 5 5 4.5 5 4 5 3.5 4 5 5 4 5 3.5 4.5 4.5 4.5 2
107: Snatch (2000) 5 3.5 4.5 4 4 3.5 4 3 2.5 3
114: Pretty Woman (1990) 3.5 4 4 2.5 3.5 5 4.5 3 3.5 3.5 4 2.5 3 2
120: The Lord of the Rings: The Fellowship of the Ring (2001) 5 4 5 5 5 4 0.5 3 4 5 3.5 3.5 4.5 4 4.5 2.5 5 4
121: The Lord of the Rings: The Two Towers (2002) 5 4.5 4 5 5 4.5 4 1 4 5 5 3 1 4 4.5 4 4.5 5 4
122: The Lord of the Rings: The Return of the King (2003) 5 4 5 5 5 4 0.5 3 4 5 3.5 5 5 4.5 4.5 2.5 4 4
134: O Brother Where Art Thou? (2000) 2 4.5 4 2 2.5
141: Donnie Darko (2001) 3 2.5 2.5 4.5 4.5 3 3.5 4.5 2 4 4
146: Crouching Tiger Hidden Dragon (Wo hu cang long) (2000) 2.5 4 3.5 5 5 2.5 0.5 3.5 3.5 4.5 3.5 3.5 4 4 4 4
153: Lost in Translation (2003) 4.5 4 5 3.5 4 3 4 4 4.5 4 4.5 1.5
155: The Dark Knight (2008) 5 5 5 5 4 5 3.5 5 4 5 4 5 3 4.5 5 4.5 4.5 4.5 3 4.5 5 3.5
161: Ocean's Eleven (2001) 4 4.5 4 3.5 4 5 4 5 4.5 5 4.5 5 3.5 3.5 3.5 4.5 2.5 4.5 3 4.5 3.5 4
180: Minority Report (2002) 3 4.5 4.5 3.5 5 2.5 4 2.5 5 3.5 3 4 4 3 5 3 3.5 4
187: Sin City (2005) 3 4 1.5 3.5 4.5 2.5 3 3 3 1.5 4 4 4.5 4.5
194: Amelie (2001) 5 1 5 5 4 4 3 4 5 2.5 4 4.5 4.5
197: Braveheart (1995) 4 4 5 4.5 4 1 4 4 3.5 4.5 1.5 4.5 3 4 2.5 5
238: The Godfather (1972) 5 3 5 4 4.5 4.5 5 5 4 3 5 4 5 3 4 5 5 5 5
243: High Fidelity (2000) 4 2 3.5 2.5
268: Batman (1989) 2.5 5 2.5 5 4.5 4 3 3 2.5 4 2.5 2 4 3.5
272: Batman Begins (2005) 5 5 4 4 3.5 5 3.5 4 3 5 4 4 3 3.5 4.5 4 4 4.5 4 3.5
274: The Silence of the Lambs (1991) 5 4.5 4.5 4.5 4.5 5 3 4 4 4 4.5 1.5 4.5 4.5 4 4 5
275: Fargo (1996) 5 4.5 4.5 5 4.5 5 4 5
278: The Shawshank Redemption (1994) 5 4.5 5 3 1 5 3.5 5 3.5 4.5 4 3 4.5 0.5 4.5 5 4
280: Terminator 2: Judgment Day (1991) 3.5 3.5 4.5 3.5 4.5 4.5 3 5 4 4 1.5 3.5 2.5 3.5 4 3.5 3 3
329: Jurassic Park (1993) 4.5 5 3.5 4.5 5 5 4 5 4.5 3 5 4 3.5 3.5 3 4 4 4.5 2 3
393: Kill Bill: Vol. 2 (2004) 3 5 4 3 3.5 4.5 2.5 5 3.5 5 4 4 5 4.5 0.5 3 3.5 4.5 4 5 5 3
414: Batman Forever (1995) 1.5 3.5 2.5 1.5 3 3.5 5 4 3 3 3 2 1.5 4 4
424: Schindler's List (1993) 2.5 4.5 5 5 5 5 4.5 4.5 3 4 5 2.5 3.5 5 4.5 3
453: A Beautiful Mind (2001) 4 4.5 5 4 3 4.5 4.5 4 5 5 4 3 5 4 3.5 4.5 1 4.5 3 5 3.5 4
462: Erin Brockovich (2000) 2.5 3.5 1.5 4 2.5 3 4 4.5 3 4 3 3
550: Fight Club (1999) 5 3.5 4.5 4 4.5 4 5 3 4.5 3.5 5 4 3.5 4.5 5 2 4.5 5 5
557: Spider-Man (2002) 4 3 4 3 5 3 5 4.5 3 4 3.5 3 2.5 1 3 3.5 3.5 1.5
558: Spider-Man 2 (2004) 2.5 2 4 4.5 4.5 3 4 4.5 3 3.5 2 3 1 3.5 3.5 3 4.5 2.5 4 1.5
568: Apollo 13 (1995) 3.5 5 4.5 5 5 4.5 5 4 4 4 4 4 3.5 1.5 5 3
581: Dances with Wolves (1990) 2.5 4 5 4.5 4 2 4.5 4 2.5 3
585: Monsters Inc. (2001) 5 5 3.5 4 5 1 3.5 3.5 5 3.5 4 4 3.5 5
597: Titanic (1997) 4 4 4.5 2 3.5 3.5 3 4 5 3.5 4.5 4 4 4.5 3 3.5 3.5 4.5 3 5 4
601: E.T. the Extra-Terrestrial (1982) 4.5 3 4.5 1.5 4 5 3.5 5 4.5 2 3.5 3.5 3 4.5 0.5
602: Independence Day (a.k.a. ID4) (1996) 4 4.5 3 2.5 3.5 3.5 5 4.5 4 4 4 3 1.5 3 3.5 1.5 3.5 2 3
603: The Matrix (1999) 4.5 5 5 5 5 4 5 5 3.5 3.5 4 4 4.5 5 3.5 5 4 4.5 4.5 4 3 5 4.5 4
604: The Matrix Reloaded (2003) 4 4 4.5 5 2.5 2.5 5 5 4 3 4 2.5 3.5 5 4 3 2 3 2.5 5 3 4 1.5 3.5
607: Men in Black (a.k.a. MIB) (1997) 3 4.5 5 3 4 3 4.5 1.5 3.5 5 3.5 4 2.5 3 2 3 4.5 4 3.5 2 4.5 4 4.5 2.5
629: The Usual Suspects (1995) 4 5 4 4.5 2.5 4.5 5 4 4 3.5 4 5 5 3
640: Catch Me If You Can (2002) 3.5 4.5 3 4 4 5 3.5 5 5 4.5 4 4 4 3 3.5
641: Requiem for a Dream (2000) 4.5 3 3 5 5 5 3.5 2.5 4 4
664: Twister (1996) 4 2.5 4.5 3 4 2.5 3 3.5 4.5 3.5 0.5
671: Harry Potter and the Sorcerer's Stone (a.k.a. Harry Potter and the Philosopher's Stone) (2001) 4 5 4 4 3 5 3.5 0.5 4 4.5 5 4 3 3.5 2 0.5 5 3
672: Harry Potter and the Chamber of Secrets (2002) 3 3.5 5 3 4.5 3.5 5 3 0.5 4.5 4 5 4 3 4 0.5 3 2 0.5 2.5
680: Pulp Fiction (1994) 4 3 5 5 4.5 1 4.5 5 3.5 4 4.5 5 5 4.5 3 3 4.5 4.5 3.5 5 4 3
745: The Sixth Sense (1999) 4 4 4 3 4.5 5 3.5 3 4 3.5 4 4 3 3 4.5
752: V for Vendetta (2006) 4 5 4 5 4 4.5 2 3.5 3.5 3 2.5 3 4.5 3 4.5 4.5 3.5 2.5 3.5 3.5
786: Almost Famous (2000) 3.5 5 4 1.5 2.5 3 5
788: Mrs. Doubtfire (1993) 5 2.5 4 2 3.5 5 3 4 3.5 2.5 3.5 2 1.5 2
807: Seven (a.k.a. Se7en) (1995) 5 5 4 4.5 1 4.5 5 4 3 5 4.5 2 5 4 4.5
808: Shrek (2001) 4.5 5 3 5 4 5 5 3.5 1.5 4.5 3 5 3.5 3 4.5 2 3.5 3.5 3 3 4 4 4
809: Shrek 2 (2004) 4 5 4 3.5 3.5 5 4.5 3.5 4.5 3 4 3 4 1.5 1.5 3 3.5 3.5
812: Aladdin (1992) 5 2.5 3 3 1.5 3.5 3 4 3 3.5 2.5 3.5 2 5 4
854: The Mask (1994) 2.5 3.5 3 4.5 2 3 4 2 1 2.5 2.5 2.5 4 2.5 4.5 5 3.5 2.5
857: Saving Private Ryan (1998) 4 3.5 5 4.5 4.5 4.5 3.5 5 3.5 5 5 2 4 2 3.5 4 5 4 3 3.5
862: Toy Story (1995) 4 5 5 3 4 3.5 5 4.5 4 4 4.5 4.5 5 4 3.5 3.5 3 3
954: Mission: Impossible (1996) 4.5 4 4 4.5 4.5 3.5 3.5 3.5 4 2.5 2 4 4 2.5 4 4 2 4
955: Mission: Impossible II (2000) 4 3.5 4 4 4.5 3 5 4.5 3.5 4 3 3 3.5 2 4 4 2.5 3.5
1422: The Departed (2006) 5 3.5 3 4 5 4 5 3.5 5 5 4 5 2 5 4.5 4
1572: Die Hard: With a Vengeance (1995) 5 2 3 3 4 4 4 3.5 5 5 4 3.5 5 2 3.5
1597: Meet the Parents (2000) 2 3.5 3.5 5 1 3 5 4 4 3 4 2 2
1637: Speed (1994) 3 2 4 4.5 2.5 3 4 3.5 3 4.5 3.5 2 4.5 3
1891: Star Wars: Episode V - The Empire Strikes Back (1980) 4.5 5 4 4.5 4 5 4 5 4 4 3 3.5 1 5 4.5 4.5 4.5
1892: Star Wars: Episode VI - Return of the Jedi (1983) 4.5 5 4.5 4 3 4.5 4 5 4 4 3 3 3.5 4.5 4 2.5
1894: Star Wars: Episode II - Attack of the Clones (2002) 3 2.5 5 4 3.5 4.5 3.5 5 5 3 3 4 1.5 3 3
1900: Traffic (2000) 1 5 4 3.5 3 3 3.5 3 4 4.5
2024: The Patriot (2000) 4 4.5 0.5 3 1 4.5 3 4 2.5 0.5 5 3 3
2164: Stargate (1994) 1.5 3.5 5 3 4 3 3 2.5
2501: The Bourne Identity (2002) 4.5 3.5 5 2 4 4 4 4 5 4 0.5 3.5 4 4 2.5 4 4
2502: The Bourne Supremacy (2004) 4 4 5 5 3.5 3.5 4 5 3.5 0.5 3.5 4 3 4 4 4.5
3049: Ace Ventura: Pet Detective (1994) 3 4 2.5 3.5 5 1 3.5 5 2.5 4 1 1 1 0.5 3 1
4327: Charlie's Angels (2000) 1.5 3 1 2.5 5 3 5 4 2 1 3.5 1 4 2.5
5503: The Fugitive (1993) 4 4 1 3.5 3.5 4 3 4 2.5 3 2 4
7443: Chicken Run (2000) 1 4 2.5 3 3 3 3 3.5 2.5 2.5 3
8358: Cast Away (2000) 4 2.5 4.5 5 4 4 5 3 3.5 2.5 4 4
8467: Dumb & Dumber (1994) 1 1 4 4.5 2 4 5 4 1.5 0.5 4 0.5 4 0.5
8587: The Lion King (1994) 4 5 5 3.5 4 4 5 4.5 4 4.5 4.5 4 3 5 4 4 0.5 3 3 4.5
9331: Clear and Present Danger (1994) 2.5 3.5 3.5 3.5 3 0.5
9741: Unbreakable (2000) 3.5 3.5 2.5 3 3
9802: The Rock (1996) 5 3.5 4 5 2.5 3 4.5 3.5 2 3 3.5 2.5 5 4
9806: The Incredibles (2004) 3.5 5 3.5 3 3.5 4.5 5 3.5 4 4.5 4 3.5 0.5 5 3.5 3 4.5 3.5
10020: Beauty and the Beast (1991) 3 5 4 3.5 2 4 3 4 3 4 4 2 2.5 4
36657: X-Men (2000) 4.5 4 4.5 5 4 5 4 3.5 4 3.5 3 4 3.5 3 4
36658: X2: X-Men United (2003) 3.5 4 3.5 4.5 4.5 4 5 4.5 3.5 4 3.5 2 3 4 3.5 4
36955: True Lies (1994) 3 3.5 4 1.5 3 2 4 4 1.5 4
import com.opencsv.CSVReader;
import org.apache.commons.collections4.BidiMap;
import org.apache.commons.collections4.bidimap.TreeBidiMap;
import org.apache.commons.math3.stat.correlation.PearsonsCorrelation;
import java.io.FileReader;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
class UserUserCollaborativeFiltering {
static boolean isNormalized;
public static void main(String args[]) throws IOException {
double correlationMatrix[][];
double dataMatrix[][] = new double[25][100];
CSVReader reader = new CSVReader(new FileReader("/home/rajarshee/Documents/Assignment1.csv"));
String temp[] = reader.readNext();
BidiMap<Integer, String> userID = new TreeBidiMap<>();
BidiMap<Integer, String> movieID = new TreeBidiMap<>();
for (int i = 1; i < temp.length; i++)
userID.put(i - 1, temp[i]);
String[] line;
int j = 0;
while ((line = reader.readNext()) != null) {
movieID.put(j, line[0]);
for (int i = 1; i < 26; i++)
dataMatrix[i - 1][j] = (line[i].length() == 0) ? -1 : Double.parseDouble(line[i]);
j++;
}
String user = "89";
isNormalized = true;
correlationMatrix = getCorrelationMatrix(dataMatrix);
List<Pair> sortedNeighbors = neighbors(userID.getKey(user), correlationMatrix);
List<Pair> predictions = new ArrayList<>();
for (int i = 0; i < 100; i++) {
double n = 0, d = 0;
for (j = 0; j < 5; j++) {
if (dataMatrix[sortedNeighbors.get(j).u][i] != -1) {
n += (dataMatrix[sortedNeighbors.get(j).u][i] - getMean(sortedNeighbors.get(j).u, dataMatrix)) * correlationMatrix[userID.getKey(user)][sortedNeighbors.get(j).u];
d += correlationMatrix[userID.getKey(user)][sortedNeighbors.get(j).u];
}
}
predictions.add(new Pair(i, (d == 0) ? -1 : getMean(userID.getKey(user), dataMatrix) + n / d));
}
Collections.sort(predictions);
predictions.forEach(p -> System.out.println(movieID.get(p.u) + "," + p.v));
}
public static void displayData(double[][] d) {
for (int i = 0; i < d.length; i++) {
for (double x : d[i])
System.out.print(x + "\t");
System.out.println();
}
}
public static double[][] getCorrelationMatrix(double d[][]) {
double correlationMatrix[][] = new double[25][25];
for (int i = 0; i < 25; i++) {
for (int j = 0; j < 25; j++) {
correlationMatrix[i][j] = getCorrelation(i, j, d);
}
}
return correlationMatrix;
}
public static double getCorrelation(int i, int j, double d[][]) {
List<Double> x = new ArrayList<>(), y = new ArrayList<>();
for (int k = 0; k < 100; k++) {
if (d[i][k] != -1 && d[j][k] != -1) {
x.add(d[i][k]);
y.add(d[j][k]);
}
}
return new PearsonsCorrelation().correlation(x.stream().mapToDouble(t -> t).toArray(), y.stream().mapToDouble(t -> t).toArray());
}
public static List<Pair> neighbors(int i, double[][] corr) {
double[] temp = corr[i];
BidiMap<Integer, Double> bm = new TreeBidiMap<>();
List<Pair> l = new ArrayList<>();
for (int j = 0; j < temp.length; j++) {
if (j != i) {
l.add(new Pair(j, temp[j]));
}
}
Collections.sort(l);
return l;
}
public static double getMean(int i, double[][] d) {
if (!isNormalized)
return 0;
double sum = 0;
int n = 0;
for (double x : d[i]) {
if (x != -1) {
sum += x;
++n;
}
}
return sum / n;
}
}
class Pair implements Comparable<Pair> {
public int u;
public double v;
Pair(int u, double v) {
this.u = u;
this.v = v;
}
@Override
public int compareTo(Pair o) {
return (o.v > this.v) ? 1 : ((o.v == this.v) ? 0 : -1);
}
@Override
public String toString() {
return u + ":" + v;
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment