Skip to content

Instantly share code, notes, and snippets.

@sudipto80
Created November 20, 2015 09:54
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save sudipto80/606418978f4a86fe93aa to your computer and use it in GitHub Desktop.
Save sudipto80/606418978f4a86fe93aa to your computer and use it in GitHub Desktop.
MovieLens 100k Data Prep code
var tuples = File.ReadAllLines(@"C:\personal\ml f#\ml-100k\ml-100k\u.data")
.Select(f => f.Split(new char[]{' ','\t'}))
.Select (f => new { Row = Convert.ToInt32( f[0]), Col = Convert.ToInt32( f[1]), Val = f[2]} );
int columns = 5;//for 5 movies
int rows = 400;//for 400 users
(rows * columns).Dump();
List<string> allLines = new List<string>();
for (int i = 1;i<=rows;i++)
{
List<string> values = new List <string>();
for(int j = 1;j<=columns;j++)
{
var fg = tuples.FirstOrDefault (t => t.Row == i && t.Col == j);
if(fg!=null)
values.Add(fg.Val+".0");
else
values.Add("0.0");
}
allLines.Add( "["+ values.Aggregate ((a,b) => a +";" +b) +"]");
}
("[" + allLines.Aggregate ((a,b) => a +";" + Environment.NewLine +b) +"]").Dump();
@sudipto80
Copy link
Author

This uses the Dump() method of LINQPad. So run this script as a "C# Statement" in LINQPad and use the result produced

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment