Skip to content

Instantly share code, notes, and snippets.

@l1x
Last active November 29, 2016 23:09
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
Star You must be signed in to star a gist
Save l1x/8ebb136e2d83856a1d41a49e8c12753f to your computer and use it in GitHub Desktop.
using Query, DataFrames, Feather
#Reading a CSV and ignoring the top line
df = readtable("data/netflix_prize/download/training_set//mv_0016765.txt", header = false, skipstart=1)
#=
julia> head(df)
6×3 DataFrames.DataFrame
│ Row │ x1 │ x2 │ x3 │
├─────┼─────────┼────┼──────────────┤
│ 1 │ 185150 │ 3 │ "2005-07-05" │
│ 2 │ 2256305 │ 3 │ "2005-07-13" │
│ 3 │ 496476 │ 5 │ "2005-07-05" │
│ 4 │ 1026389 │ 5 │ "2005-07-06" │
│ 5 │ 1609049 │ 2 │ "2000-11-21" │
│ 6 │ 2423875 │ 4 │ "2001-07-10" │
=#
dfmt = Dates.DateFormat("y-m-d")
df[:real_date] = Date(df[:x3], dfmt)
#=
julia> head(df)
6×4 DataFrames.DataFrame
│ Row │ x1 │ x2 │ x3 │ real_date │
├─────┼─────────┼────┼──────────────┼────────────┤
│ 1 │ 185150 │ 3 │ "2005-07-05" │ 2005-07-05 │
│ 2 │ 2256305 │ 3 │ "2005-07-13" │ 2005-07-13 │
│ 3 │ 496476 │ 5 │ "2005-07-05" │ 2005-07-05 │
│ 4 │ 1026389 │ 5 │ "2005-07-06" │ 2005-07-06 │
│ 5 │ 1609049 │ 2 │ "2000-11-21" │ 2000-11-21 │
│ 6 │ 2423875 │ 4 │ "2001-07-10" │ 2001-07-10 │
=#
delete!(df, :x3)
#=
julia> head(df)
6×3 DataFrames.DataFrame
│ Row │ x1 │ x2 │ real_date │
├─────┼─────────┼────┼────────────┤
│ 1 │ 185150 │ 3 │ 2005-07-05 │
│ 2 │ 2256305 │ 3 │ 2005-07-13 │
│ 3 │ 496476 │ 5 │ 2005-07-05 │
│ 4 │ 1026389 │ 5 │ 2005-07-06 │
│ 5 │ 1609049 │ 2 │ 2000-11-21 │
│ 6 │ 2423875 │ 4 │ 2001-07-10 │
=#
df[:year] = map(Dates.year, df[:real_date])
#=
julia> head(df)
6×4 DataFrames.DataFrame
│ Row │ x1 │ x2 │ real_date │ year │
├─────┼─────────┼────┼────────────┼──────┤
│ 1 │ 185150 │ 3 │ 2005-07-05 │ 2005 │
│ 2 │ 2256305 │ 3 │ 2005-07-13 │ 2005 │
│ 3 │ 496476 │ 5 │ 2005-07-05 │ 2005 │
│ 4 │ 1026389 │ 5 │ 2005-07-06 │ 2005 │
│ 5 │ 1609049 │ 2 │ 2000-11-21 │ 2000 │
│ 6 │ 2423875 │ 4 │ 2001-07-10 │ 2001 │
=#
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment