Skip to content

Instantly share code, notes, and snippets.

@samklr
Forked from mathias-brandewinder/gist:5558573
Last active December 17, 2015 22:59
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save samklr/5686515 to your computer and use it in GitHub Desktop.
Save samklr/5686515 to your computer and use it in GitHub Desktop.
// This F# dojo is directly inspired by the Digit Recognizer
// competition from Kaggle.com:
// http://www.kaggle.com/c/digit-recognizer
// The datasets below are simply shorter versions of
// the training dataset from Kaggle.
// 0. Load data files from the following location:
// http://brandewinder.blob.core.windows.net/public/digitssample.csv
// http://brandewinder.blob.core.windows.net/public/digitscheck.csv
// first file is a training set of 5,000 examples
// second file is a validation set to test your model
// 1. Read data file "digitssample.csv"
open System
open System.IO
// the following might come in handy:
// File.ReadAllLines(path)
// returns an array of strings for each line
// [ YOUR CODE GOES HERE! ]
// 2. break each line of the file into an array of string,
// separating by commas, using Array.map
// Array.map quick-starter:
// Array.map takes an array, and transforms it
// into another array by applying a function to it.
// Example: starting from an array of strings:
let strings = [| "Machine"; "Learning"; "with"; "F#"; "is"; "fun" |]
// we can transform it into a new array,
// containing the length of each string:
let lengths = Array.map (fun (s:string) -> s.Length) strings
// We can make it look nicer, using pipe-forward:
let lengths2 = strings |> Array.map (fun s -> s.Length)
// the following function might help
let csvToSplit = "1,2,3,4,5"
let splitResult = csvToSplit.Split(',')
// [ YOUR CODE GOES HERE! ]
// 3. Did you note that the file has headers? We want to get rid of it.
// Array slicing quick starter:
// let's start with an Array of ints:
let someNumbers = [| 0 .. 10 |] // create an array from 0 to 10
// you can access Array elements by index:
let first = someNumbers.[0]
// you can also slice the array:
let twoToFive = someNumbers.[ 1 .. 4 ] // grab a slice
// [ YOUR CODE GOES HERE! ]
// 4. Now that we have an array containing arrays of strings,
// and the headers are gone, we need to transform it
// into an array of arrays of integers.
// Array.map seems like a good idea again :)
// The following might help:
let castedInt = (int)"42"
// or, alternatively:
let convertedInt = Convert.ToInt32("42")
// [ YOUR CODE GOES HERE! ]
// 5. Rather than dealing with a raw array of ints,
// for convenience let's store these into an array of Records
// Record quick starter: we can declare a
// Record (a lightweight, immutable class) type that way:
type Example = { Number:int; Pixels:int[] }
// and instantiate one this way:
let example = { Number = 1; Pixels = [| 1; 2; 3; |] }
// [ YOUR CODE GOES HERE! ]
// 6. We need to compute the distance between images
// Math reminder: the euclidean distance is
// distance [ x1; y1; z1 ] [ x2; y2; z2 ] = (x1-x2)^2+(y1-y2)^2+(z1-z2)^2
// Array.map2 could come in handy here.
// Array.map2 quick start example
// Suppose we have 2 arrays:
let point1 = [| 0; 1; 2 |]
let point2 = [| 3; 4; 5 |]
// Array.map2 takes 2 arrays at a time
// and maps pairs of elements, for instance:
let map2Example = Array.map2 (fun p1 p2 -> p1 + p2) point1 point2
// [ YOUR CODE GOES HERE! ]
// 7. We are now ready to write a classifier function!
// The classifier should take a set of pixels
// (an array of ints) as an input, search for the
// closest example in our sample, and predict
// the value of that closest element.
// Array.minBy can be handy here, to find
// the closest element in the Array of examples.
// Array.minBy quick start:
// suppose we have an Array of Example:
let someData =
[| { Number = 0; Pixels = [| 0; 1 |] };
{ Number = 1; Pixels = [| 9; 2 |] };
{ Number = 2; Pixels = [| 3; 4 |] }; |]
// We can find for instance
// the element with largest first pixel
let findThatGuy =
someData
|> Array.maxBy (fun x -> x.Pixels.[0])
// The classifier function should probably
// look like this - except that this one will
// classify everything as a 0:
let classify (unknown:int[]) =
// do something smart here
// like find the Example with
// the shortest distance to
// the unknown element...
0
// [ YOUR CODE GOES HERE! ]
// 6. Now that we have a classifier, we need to check
// how good it is. Let's take each Example in the 2nd file
// (digitscheck.csv), and for each example, compare its
// true value with the value predicted by the classifier.
// and count the correct calls.
// [ YOUR CODE GOES HERE! ]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment