Skip to content

Instantly share code, notes, and snippets.

@olekscode
Created March 17, 2020 15:30
Show Gist options
  • Save olekscode/16fb7ba1c4d3d1ed1edc99133377f980 to your computer and use it in GitHub Desktop.
Save olekscode/16fb7ba1c4d3d1ed1edc99133377f980 to your computer and use it in GitHub Desktop.
Metacello new
baseline: 'DataFrame';
repository: 'github://PolyMathOrg/DataFrame/src';
load.
covidDataFile := '/Users/oleks/Documents/Data/COVID-19-geographic-disbtribution-worldwide-2020-03-17.csv' asFileReference.
"The date format is not understood by Pharo by default.
So we read CSV with all values as strings (without type inference)
and then manually convert dates and numbers"
reader := DataFrameCsvReader new.
reader shouldParseTypes: false.
covid := DataFrame readFrom: covidDataFile using: reader.
covid columnNames.
"an OrderedCollection('DateRep' 'Day' 'Month' 'Year' 'Cases' 'Deaths' 'Countries and territories' 'GeoId')"
"Those columns are redundant"
covid removeColumns: #(Day Month Year GeoId).
covid
renameColumn: 'DateRep' to: 'Date';
renameColumn: 'Countries and territories' to: 'Country'.
covid columnNames.
"an OrderedCollection('Date' 'Cases' 'Deaths' 'Country')"
covid
toColumn: 'Date'
applyElementwise: [ :each |
Date readFrom: each pattern: 'dd/mm/yyyy' ].
covid
toColumns: #(Cases Deaths)
applyElementwise: [ :each | each asInteger ].
covidFrance := covid select: [ :row | (row at: 'Country') = 'France' ].
(covidFrance columns: #(Cases Deaths)) sum.
"a DataSeries('Cases'->6633 'Deaths'->148)"
deathsByCountry := (covid
group: 'Deaths'
by: 'Country'
aggregateUsing: #sum) sortDescending.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment