-
-
Save samklr/5686515 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
// This F# dojo is directly inspired by the Digit Recognizer | |
// competition from Kaggle.com: | |
// http://www.kaggle.com/c/digit-recognizer | |
// The datasets below are simply shorter versions of | |
// the training dataset from Kaggle. | |
// 0. Load data files from the following location: | |
// http://brandewinder.blob.core.windows.net/public/digitssample.csv | |
// http://brandewinder.blob.core.windows.net/public/digitscheck.csv | |
// first file is a training set of 5,000 examples | |
// second file is a validation set to test your model | |
// 1. Read data file "digitssample.csv" | |
open System | |
open System.IO | |
// the following might come in handy: | |
// File.ReadAllLines(path) | |
// returns an array of strings for each line | |
// [ YOUR CODE GOES HERE! ] | |
// 2. break each line of the file into an array of string, | |
// separating by commas, using Array.map | |
// Array.map quick-starter: | |
// Array.map takes an array, and transforms it | |
// into another array by applying a function to it. | |
// Example: starting from an array of strings: | |
let strings = [| "Machine"; "Learning"; "with"; "F#"; "is"; "fun" |] | |
// we can transform it into a new array, | |
// containing the length of each string: | |
let lengths = Array.map (fun (s:string) -> s.Length) strings | |
// We can make it look nicer, using pipe-forward: | |
let lengths2 = strings |> Array.map (fun s -> s.Length) | |
// the following function might help | |
let csvToSplit = "1,2,3,4,5" | |
let splitResult = csvToSplit.Split(',') | |
// [ YOUR CODE GOES HERE! ] | |
// 3. Did you note that the file has headers? We want to get rid of it. | |
// Array slicing quick starter: | |
// let's start with an Array of ints: | |
let someNumbers = [| 0 .. 10 |] // create an array from 0 to 10 | |
// you can access Array elements by index: | |
let first = someNumbers.[0] | |
// you can also slice the array: | |
let twoToFive = someNumbers.[ 1 .. 4 ] // grab a slice | |
// [ YOUR CODE GOES HERE! ] | |
// 4. Now that we have an array containing arrays of strings, | |
// and the headers are gone, we need to transform it | |
// into an array of arrays of integers. | |
// Array.map seems like a good idea again :) | |
// The following might help: | |
let castedInt = (int)"42" | |
// or, alternatively: | |
let convertedInt = Convert.ToInt32("42") | |
// [ YOUR CODE GOES HERE! ] | |
// 5. Rather than dealing with a raw array of ints, | |
// for convenience let's store these into an array of Records | |
// Record quick starter: we can declare a | |
// Record (a lightweight, immutable class) type that way: | |
type Example = { Number:int; Pixels:int[] } | |
// and instantiate one this way: | |
let example = { Number = 1; Pixels = [| 1; 2; 3; |] } | |
// [ YOUR CODE GOES HERE! ] | |
// 6. We need to compute the distance between images | |
// Math reminder: the euclidean distance is | |
// distance [ x1; y1; z1 ] [ x2; y2; z2 ] = (x1-x2)^2+(y1-y2)^2+(z1-z2)^2 | |
// Array.map2 could come in handy here. | |
// Array.map2 quick start example | |
// Suppose we have 2 arrays: | |
let point1 = [| 0; 1; 2 |] | |
let point2 = [| 3; 4; 5 |] | |
// Array.map2 takes 2 arrays at a time | |
// and maps pairs of elements, for instance: | |
let map2Example = Array.map2 (fun p1 p2 -> p1 + p2) point1 point2 | |
// [ YOUR CODE GOES HERE! ] | |
// 7. We are now ready to write a classifier function! | |
// The classifier should take a set of pixels | |
// (an array of ints) as an input, search for the | |
// closest example in our sample, and predict | |
// the value of that closest element. | |
// Array.minBy can be handy here, to find | |
// the closest element in the Array of examples. | |
// Array.minBy quick start: | |
// suppose we have an Array of Example: | |
let someData = | |
[| { Number = 0; Pixels = [| 0; 1 |] }; | |
{ Number = 1; Pixels = [| 9; 2 |] }; | |
{ Number = 2; Pixels = [| 3; 4 |] }; |] | |
// We can find for instance | |
// the element with largest first pixel | |
let findThatGuy = | |
someData | |
|> Array.maxBy (fun x -> x.Pixels.[0]) | |
// The classifier function should probably | |
// look like this - except that this one will | |
// classify everything as a 0: | |
let classify (unknown:int[]) = | |
// do something smart here | |
// like find the Example with | |
// the shortest distance to | |
// the unknown element... | |
0 | |
// [ YOUR CODE GOES HERE! ] | |
// 6. Now that we have a classifier, we need to check | |
// how good it is. Let's take each Example in the 2nd file | |
// (digitscheck.csv), and for each example, compare its | |
// true value with the value predicted by the classifier. | |
// and count the correct calls. | |
// [ YOUR CODE GOES HERE! ] |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment