Skip to content

Instantly share code, notes, and snippets.

Created December 22, 2009 19:24
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save anonymous/261961 to your computer and use it in GitHub Desktop.
Save anonymous/261961 to your computer and use it in GitHub Desktop.
open System
open System.IO
open System.Text.RegularExpressions
let splitter =
let reg = Regex(@"\W+", RegexOptions.Compiled ||| RegexOptions.ExplicitCapture)
fun str -> reg.Split(str)
let sw = System.Diagnostics.Stopwatch() in sw.Start()
let counts = System.Collections.Generic.Dictionary()
Directory.EnumerateFiles(@"d:\temp\20_newsgroups", "*", SearchOption.AllDirectories)
|> Seq.collect (fun fn -> File.ReadLines(fn))
|> Seq.collect (fun line -> splitter (line.ToLowerInvariant()))
|> Seq.iter (fun word -> match counts.TryGetValue(word) with
| true, count -> counts.[word] <- count + 1
| false, _ -> counts.Add(word, 1))
// note: we're not sorting the dictionary or doing anything with the results!
printfn "elapsed: %A" sw.Elapsed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment