Skip to content

Instantly share code, notes, and snippets.

@puyokw
Last active November 21, 2017 07:20
Show Gist options
  • Save puyokw/5255b0286c34868f2bca to your computer and use it in GitHub Desktop.
Save puyokw/5255b0286c34868f2bca to your computer and use it in GitHub Desktop.
td_intern rossmann
%matplotlib inline
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates
import datetime
import pandas_td as td
import os
con = td.connect(apikey=os.environ['TD_API_KEY'],endpoint='https://api.treasuredata.com/')
engine = con.query_engine(database='rossmann', type='presto')
# because of the default limit 10,000
train = td.read_td_table('train_original', engine, limit=1017209)
train.head()
test = td.read_td_table('test_original', engine)
test.head()
store = td.read_td_table('store_raw', engine)
store.head()
# date(string) to float
datetimes = [datetime.datetime.strptime(t, "%Y-%m-%d") for t in train.date]
plotData = matplotlib.dates.date2num(datetimes)
plotData= pd.DataFrame(plotData, columns=['datetimes'])
train = train.join(plotData)
def splitTime(x):
mysplit = datetime.datetime.strptime(x, "%Y-%m-%d")
return [mysplit.year,mysplit.month,mysplit.day]
# 2014-11-12 -> year=2014, month=11, day=12
train = train.join( pd.DataFrame(train.date.apply(splitTime).tolist(), columns = ['year','mon','day']))
# visualize
train=train.sort(['datetimes'])
plt.figure(1,figsize=(20,10))
plt.plot_date(train.loc[train.store==1,'datetimes'],train.loc[train.store==1,'sales'],linestyle='-')
plt.title('store 1')
plt.savefig('sales_store1.png')
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@myui
Copy link

myui commented Nov 21, 2017

Comments from one of our customer.

■実際のデータにはミリ秒が含まれていたので、カットする記述を追加

datetimes = [datetime.datetime.strptime(t, "%Y-%m-%d") for t in train.date]
↓
datetimes = [datetime.datetime.strptime(t.rstrip(' 00:00:00.000'), "%Y-%m-%d") for t in train.date]
mysplit = datetime.datetime.strptime(x,  "%Y-%m-%d") 
↓
mysplit = datetime.datetime.strptime(x.rstrip(' 00:00:00.000'),  "%Y-%m-%d") 

■結合時に「重複しているよ」という警告が出たので、サフィックスを追加

train = train.join(plotData)
↓
train = train.join(plotData, rsuffix='_date')
train = train.join( pd.DataFrame(train.date.apply(splitTime).tolist(), columns = ['year','mon','day']))
↓
train = train.join( pd.DataFrame(train.date.apply(splitTime).tolist(), columns = ['year','mon','day']), rsuffix='_date')

■条件が文字列で指定していましたが数値なのでシングルクオテーションを外す

plt.plot_date(train.loc[train.store=='1','datetimes'],train.loc[train.store=='1','sales'],linestyle='-')
↓
plt.plot_date(train.loc[train.store==1,'datetimes'],train.loc[train.store==1,'sales'],linestyle='-')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment