Skip to content

Instantly share code, notes, and snippets.

@rugyoga
Created May 5, 2020 21:18
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save rugyoga/668530a7dd4c03aad2dbe76cba185ed0 to your computer and use it in GitHub Desktop.
Save rugyoga/668530a7dd4c03aad2dbe76cba185ed0 to your computer and use it in GitHub Desktop.
Elixir program, to find the most frequently occurring alpha numeric in a very large datafile.
defmodule AlphaMaxPar do
def count_alpha(string) do
string
|> String.codepoints
|> Enum.filter(fn c -> c =~ ~r/[A-Za-z0-9]/ end)
|> Enum.reduce(Map.new, fn c,acc -> Map.update(acc, c, 1, &(&1+1)) end)
end
end
File.stream!("2019-annual/taxa.txt", [encoding: :utf8], 1_000_000)
|> Task.async_stream(&AlphaMaxPar.count_alpha/1, max_concurrency: 6, ordered: false)
|> Enum.map(fn {:ok, x} -> x end)
|> Enum.reduce(fn a, b -> Map.merge(a, b, fn _k, v1, v2 -> v1 + v2 end) end)
|> Enum.sort_by(fn {_k,v} -> -v end)
|> Enum.fetch!(0)
|> IO.inspect
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment