Skip to content

Instantly share code, notes, and snippets.

@mwinkle
Last active November 12, 2015 02:12
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mwinkle/eeb4a5974d2aa2647099 to your computer and use it in GitHub Desktop.
Save mwinkle/eeb4a5974d2aa2647099 to your computer and use it in GitHub Desktop.
An example of using F# to implement a U-SQL UDO (in this case, a processor).
// sample U-SQL UDO (Processor) written in F#
// Note, currently (11/2015) requires deployment of F#.Core
namespace fSharpProcessor
open Microsoft.Analytics.Interfaces
type myProcessor() =
inherit IProcessor()
let CountryTranslation = dict [ "Deutschland", "Germany";
"Schwiiz", "Switzerland";
"UK", "United Kingdom";
"USA", "United States of America";
"中国", "PR China"; ]
override this.Process(row, output) =
let UserId = row.Get("UserId")
let Name = row.Get("Name")
let Address = row.Get("Address")
let City = row.Get("City")
let State = row.Get("State")
let PostalCode = row.Get("PostalCode")
let Country1 = row.Get("Country")
let Country = if CountryTranslation.ContainsKey(Country1) then CountryTranslation.[Country1] else Country1
let Phone = row.Get("Phone")
output.Set(0, UserId)
output.Set(1, Name)
output.Set(2, Address)
output.Set(3, City)
output.Set(4, State)
output.Set(5, PostalCode)
output.Set(6, Country)
output.Set(7, Phone)
output.AsReadOnly()
@isaacabraham
Copy link

Here's a first alternative way of the code above (applies to C# as well as F#) - supporting indexers might make for a slightly more readable format: -

type MyProcessorTwo() = 
    inherit IProcessor()

    let countryTranslation = 
        [ "Deutschland", "Germany"
          "Schwiiz", "Switzerland"
          "UK", "United Kingdom"
          "USA", "United States of America"
          "中国", "PR China" ] |> Map

    override __.Process(row, output) = 
        /// Using a map rather than dict allows us to use more idiomatic option types in F#
        let country = 
            let country = row.["Country"]
            country
            |> countryTranslation.TryFind
            |> defaultArg <| country

        /// indexers rather than methods.
        output.["UserId"] <- row.["UserId"]
        output.["Name"] <- row.["Name"]
        output.["Address"] <- row.["Address"]
        output.["City"] <- row.["City"]
        output.["State"] <- row.["State"]
        output.["PostalCode"] <- row.["PostalCode"]
        output.["Country"] <- country
        output.["Phone"] <- row.["Phone"]

        output.AsReadOnly()

@isaacabraham
Copy link

Here's another way that is more F#-idiomatic. I've removed the need for the interface - it's now just a free standing function that takes in the row and output row. As long as this signature matches IRow * IUpdatableRow -> IRow, the code would be valid.

/// This module might contain all my UDOs. No need for a separate class for each one - why do we
/// need a interface that only has a single function?
module MyUDOs =
    let countryTranslation = 
        [ "Deutschland", "Germany"
          "Schwiiz", "Switzerland"
          "UK", "United Kingdom"
          "USA", "United States of America"
          "中国", "PR China" ] |> Map

    let translateCountryUdo (row:IRow, output:IUpdatableRow) =
        let country = 
            let country = row.["Country"]
            countryTranslation.TryFind country
            |> defaultArg <| country

        output.[0] <- row.["UserId"]
        output.[1] <- row.["Name"]
        output.[2] <- row.["Address"]
        output.[3] <- row.["City"]
        output.[4] <- row.["State"]
        output.[5] <- row.["PostalCode"]
        output.[6] <- country
        output.[7] <- row.["Phone"]

        output.AsReadOnly()

@isaacabraham
Copy link

Here's yet another example, that uses a hypothetical USQL Type Provider. This would parse the USQL file and provide bespoke parsing methods for that file.

/// Create Type Provider for Script.sql
type MyScript = USqlTypeProvider<"Script.usql">

module MyUDO =
    let countryTranslation = 
        [ "Deutschland", "Germany"
          "Schwiiz", "Switzerland"
          "UK", "United Kingdom"
          "USA", "United States of America"
          "中国", "PR China" ] |> Map

    let translateCountry (row:IRow, output:IUpdatableRow) =
        // "Wrap" IRow as a strongly-typed object whose schema is inferred from the
        // drivers definition in Script.usql
        let row = MyScript.Parsers.driver(row)
        let country =
            let country = row.Country // row is now strongly typed, can access all properties safely.
            countryTranslation.TryFind country |> defaultArg <| country

        // provide a strongly-typed generator function for the drivers_CountryName schema
        // takes in "output", populates it, and returns it AsReadOnly()
        output
        |> MyScript.Generators.drivers_CountryName(row.UserId, row.Name, row.Address, row.City, row.State, row.PostalCode, country, row.Phone)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment