NickSeagull/AnalyzeDesignDocument.md

## AnalyzeDesignDocument.md

      
    Raw
  

              AnalyzeDesignDocument.md
            
          
    Getting started with Analyze

NOTE: NOTHING OF THIS HAS BEEN IMPLEMENTED YET, THIS IS A DESIGN DOCUMENT

Taken from Deedle in 10 minutes, adapted to Analyze
import Analyze
import Analyze.Frame as Frame
import Analyze.Series as Series
import qualified Date.Time.Calendar  -- from the 'time' package
Creating series and frames

We can create a series with any type that instantiates
the IsList type class.
idxes = [ "A"
        , "B"
        , "C"
        ]

values = [10, 20, 30]

firstSeries = Series.new idxes values
Also, we can create a series from a list of tuples:
secondSeries = Series.ofObservations
    [ ( "A", 10)
    , ( "B", 20)
    , ( "C", 30)
    ]
We can create a series of implicit (ordinal) keys by doing:
thirdSeries = Series.ofValues [ 10.0, 20.0, 30.0 ]
Now we can create a Frame using the first and the second Series,
as they share the keys.
df1 = Frame.new ["first", "second"] [firstSeries, secondSeries]
A Frame has two type parameters: Frame rowKey columnKey. We use
this to index. The types of the data itself are not specified, and
instead, we do so when getting the data from the Frame.
We can create a data frame with Int indexes for rows or columns by:
df2 = Frame.ofColumns [ ("first", firstSeries), ("second", secondSeries) ]
df3 = Frames.ofRows [ ("first", firstSeries), ("second", secondSeries)]
Also, we can specify our indexes for rows and columns by specifying
(rowKey, columnKey, value):
df4 = Frame.ofValues
    [ ("Monday", "John", 1.0)
    , ("Tuesday", "Joe", 2.1)
    , ("Tuesday", "John", 4.0)
    , ("Wednesday", "John", -5.4)
    ]
If your data types derive the Generic type class you can create a
data frame from a list of those too:
data Price = Price
    { day :: Text
    , quantity :: Float
    } deriving (Generic)

instance Serialize Price

prices :: [Price]
prices =
    [ Price "1-1-17" 10.0
    , Price "2-1-17" 12.0
    , Price "3-1-17" 13.0
    ]

df5 = Frame.ofRecords prices
Finally, we can also load a data frame from CSV.
msftCsv = Frame.readCsv "resources/MSFT.csv"
fbCsv = Frame.readCsv "resources/FB.csv"
The types are not inferred when loading like that, the user must specify them
later.
Specifying index and joining

msftOrd <- Frame.withFrame msftCsv $ do
    Frame.indexRowsDate "Date"
    Frame.sortRowsByKey
We can now get only the open and close prices, and add a new column.
msft <- Frame.withFrame msftOrd $ do
    Frame.sliceColumns [ "Open", "Close" ]
    openColumn       <- Frame.getColumn "Open"
    closeColumn      <- Frame.getColumn "Close"
    let differenceColumn = zipWith (-) openColumn closeColumn
    Frame.addColumn "Difference" differenceColumn
We can do the same thing for Facebook:
fb <- Frame.withFrame fbCsv $ do
    Frame.indexRowsDate "Date"
    Frame.sortRowsByKey
    Frame.sliceColumns [ "Open", "Close" ]
    openColumn       <- Frame.getColumn "Open"
    closeColumn      <- Frame.getColumn "Close"
    let differenceColumn = zipWith (-) openColumn closeColumn
    Frame.addColumn "Difference" differenceColumn
Let's create a single data frame that contains Microsoft and Facebook data.
Before joining those data frames, we have to rename their columns so their names
aren't duplicated.
let msftNames = ["MsftOpen", "MsftClose", "MsftDiff"]
msftRenamed <- Frame.withFrame msft $
    Frame.indexColumnsWith msftNames

let fbNames = ["FbOpen", "FbClose", "FbDiff"]
fbRenamed <- Frame.withFrame fb $
    Frame.indexColumnsWith fbNames

let joinedOut = Frame.withFrame msftRenamed $
    Frame.outerJoin fbRenamed

let joinedIn = Frame.withFrame msftRenamed $
    Frame.innerJoin fbRenamed
Selecting values and slicing

let val  = Frame.getRow (Data.Time.Calendar.fromGregorian 2013 1 2) joinedIn
let val' = Frame.getRow (Data.Time.Calendar.fromGregorian 2013 1 2) joinedIn
         & Series.get "FbOpen"
Projection and filtering

-- 'modifying' modifies the original 'Frame' instead of copying it
Frame.modifying joinedOut $ do
    comparison <- Frame.mapRowValues $ \row ->
        if (Series.get "msftOpen" row) > (Series.get "fbOpen" row)
            then "MSFT"
            else "FB"
    Frame.addColumn "Comparison" comparison
We can now get the number of days when Microsoft stock prices were above Facebook
and the other way round:
let msftCount = Frame.withFrame joinedOut $ do
    Frames.getColumn "Comparison" (Series.as :: String)
    >>= Series.filterValues (== "MSFT")
    >>= Series.countValues
-- msftCount = 220

let fbCount = Frame.withFrame joinedOut $ do
    Frames.getColumn "Comparison" (Series.as :: String)
    >>= Series.filterValues (== "FB")
    >>= Series.countValues
-- fbCount = 103
Grouping and aggregation

Group rows by month and year:
monthly <- Frames.withFrame joinedIn $
    Frame.groupRowsUsing $ \(y,m,_) _ -> fromGregorian y m 1