Last active
August 7, 2017 01:14
-
-
Save rntz/4af5d185663180188c640526bbbf25cd to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
-- https://stackoverflow.com/questions/38831961/what-declarative-language-is-good-at-analysis-of-tree-like-data | |
-- This is my attempt to implement your examples in a suitably extended | |
-- hypothetical version of Datafun. | |
-- 1. Currently Datafun only has sets, while SQL has bags/multisets. Some of your | |
-- queries would be nicer if Datafun supported bags; for example, with bags we | |
-- can express a `mean` function's type as: | |
-- | |
-- mean : Bag Float -> Float | |
-- | |
-- With only sets, we must instead use (where {a} is the type for "a set of | |
-- values of type a"): | |
-- | |
-- mean : (a -> Float) -> {a} -> Float | |
-- | |
-- where (mean f s) returns the mean of the value of `f` across every value in | |
-- the set `s`. | |
-- | |
-- Bags make a lot of aggregations nicer, so we want to add them Datafun | |
-- eventually, but that's a few papers down the line yet. | |
-- 2. Datafun also has no way to sort sets (which your third example uses to pick | |
-- the maximum two elements). As with aggregations and bags, we'd like to add | |
-- sorting and lists to add to Datafun but haven't yet. | |
-- | |
-- Taking the first two elements of a list also poses a problem: what if the | |
-- list has < 2 elements? Datafun has no runtime errors, and I don't see this | |
-- changing. We'd probably use a Maybe type to handle this. | |
-- 3. Finally, there's an even deeper problem with sorting and finding maxima, | |
-- which is determinism: when sorting a set by a certain key, how do we order | |
-- elements with the same key? When finding a maximum (or two) by a certain key, | |
-- what if there are multiple maxima with the same value? I mostly ignore this | |
-- problem. | |
-- 4. So, for these examples, I'll pretend Datafun has the following types & | |
-- aggregation functions: | |
-- | |
-- type Maybe a | |
-- type Float | |
-- type Int | |
-- max : (a -> Float) -> {a} ->+ {a} | |
-- mean : (a -> Float) -> {a} -> Float | |
-- maxTwo : (a -> Float) -> {a} -> Maybe (a, a) | |
-- | |
-- `max` will return a set, in case there are multiple maxima (or none). | |
-- curly braces describe sets: {Int} is a set of Ints | |
-- square braces describe records/structs: [field1: type1, field2: type2, ...] | |
type Event = [ timestamp: Int | |
, missingEnergy: Float | |
, tracks: { [ momentum: Float | |
, theta: Float | |
, hits: { [ detector: Int | |
, charge: Float | |
, time: Float | |
, ... ] } | |
] } | |
, showers: { [ ... ] } | |
] | |
-- example1 : Event -> {Float} | |
fun example1 event = | |
{ x.momentum | |
| x in max (\t -> len(t.hits)) | |
{ t | t in event.tracks, abs(t.theta) < 2.4 } } | |
-- example2 : Event -> Float | |
fun example2 event = | |
mean (\x -> x.charge) | |
{ h | track in event.tracks, track.momentum > 10 | |
, hit in track.hits, hit.time < 10 } | |
-- example3 : Event -> Maybe Float | |
fun example3 event = | |
case maxTwo (\t -> t.momentum) event.tracks | |
| None -> None | |
| Some (one, two) -> | |
Some (one.theta * one.momentum + two.theta * two.momentum) | |
/ (one.momentum + two.momentum) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment