Skip to content

Instantly share code, notes, and snippets.

@puyokw
Last active November 21, 2017 07:20
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save puyokw/5255b0286c34868f2bca to your computer and use it in GitHub Desktop.
Save puyokw/5255b0286c34868f2bca to your computer and use it in GitHub Desktop.
td_intern rossmann
%matplotlib inline
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates
import datetime
import pandas_td as td
import os
con = td.connect(apikey=os.environ['TD_API_KEY'],endpoint='https://api.treasuredata.com/')
engine = con.query_engine(database='rossmann', type='presto')
# because of the default limit 10,000
train = td.read_td_table('train_original', engine, limit=1017209)
train.head()
test = td.read_td_table('test_original', engine)
test.head()
store = td.read_td_table('store_raw', engine)
store.head()
# date(string) to float
datetimes = [datetime.datetime.strptime(t, "%Y-%m-%d") for t in train.date]
plotData = matplotlib.dates.date2num(datetimes)
plotData= pd.DataFrame(plotData, columns=['datetimes'])
train = train.join(plotData)
def splitTime(x):
mysplit = datetime.datetime.strptime(x, "%Y-%m-%d")
return [mysplit.year,mysplit.month,mysplit.day]
# 2014-11-12 -> year=2014, month=11, day=12
train = train.join( pd.DataFrame(train.date.apply(splitTime).tolist(), columns = ['year','mon','day']))
# visualize
train=train.sort(['datetimes'])
plt.figure(1,figsize=(20,10))
plt.plot_date(train.loc[train.store==1,'datetimes'],train.loc[train.store==1,'sales'],linestyle='-')
plt.title('store 1')
plt.savefig('sales_store1.png')
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@myui
Copy link

myui commented Nov 21, 2017

Comments from one of our customer.

■実際のデータにはミリ秒が含まれていたので、カットする記述を追加

datetimes = [datetime.datetime.strptime(t, "%Y-%m-%d") for t in train.date]
↓
datetimes = [datetime.datetime.strptime(t.rstrip(' 00:00:00.000'), "%Y-%m-%d") for t in train.date]
mysplit = datetime.datetime.strptime(x,  "%Y-%m-%d") 
↓
mysplit = datetime.datetime.strptime(x.rstrip(' 00:00:00.000'),  "%Y-%m-%d") 

■結合時に「重複しているよ」という警告が出たので、サフィックスを追加

train = train.join(plotData)
↓
train = train.join(plotData, rsuffix='_date')
train = train.join( pd.DataFrame(train.date.apply(splitTime).tolist(), columns = ['year','mon','day']))
↓
train = train.join( pd.DataFrame(train.date.apply(splitTime).tolist(), columns = ['year','mon','day']), rsuffix='_date')

■条件が文字列で指定していましたが数値なのでシングルクオテーションを外す

plt.plot_date(train.loc[train.store=='1','datetimes'],train.loc[train.store=='1','sales'],linestyle='-')
↓
plt.plot_date(train.loc[train.store==1,'datetimes'],train.loc[train.store==1,'sales'],linestyle='-')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment