Skip to content

Instantly share code, notes, and snippets.

@sadeghian92
Created May 5, 2017 11:33
Show Gist options
  • Save sadeghian92/390e2b0afc24368a6b4f44abed3b6e7f to your computer and use it in GitHub Desktop.
Save sadeghian92/390e2b0afc24368a6b4f44abed3b6e7f to your computer and use it in GitHub Desktop.
PHP script to create User-Rating Matrix from MovieLens 100K Dataset. https://datahub.io/dataset/movielens/resource/e2117a93-4fd4-41c3-b0e8-6a8ff8b1ad09
<?php
$delimiter = "\t";
$fp = fopen('YOUR_ABSOLUTE_PATH_TO_DIRECTORY' . 'u.data'), 'r');
$ratings = [];
$items = [];
for ($i = 0; $i < 100000; $i++) {
$line = fgets($fp, 2048);
$data = str_getcsv($line, $delimiter);
$items[$data[1]] = $data[1];
$ratings[$data[0]][$data[1]] = $data[2];
}
$fw = fopen('YOUR_ABSOLUTE_PATH_TO_DIRECTORY' . 'user-rating-matrix.data', 'w');
fwrite($fw, "Users");
sort($items);
foreach ($items as $item) {
fwrite($fw, "\tItem-" . $item);
}
fwrite($fw, "\n");
for ($i = 1; $i <= 943; $i++) {
fwrite($fw, "User-" . $i);
$userRatings = $ratings[$i];
for ($j = 1; $j <= 1682; $j++) {
if (array_key_exists($j, $userRatings)) {
fwrite($fw, "\t" . $userRatings[$j]);
} else {
fwrite($fw,"\t?");
}
}
fwrite($fw, "\n");
}
?>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment