Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
# 테스트를 위해 2일치 (2021.10.01 ~ 2021.10.02) 만 세팅
import pandas as pd
partitions = pd.date_range(start='20211002',end='20211002',freq='D').strftime('%Y%m%d')
# dt 컬럼 제거 및 View 등록
dfListingCalendarRefined.drop("dt").createOrReplaceTempView("LISTING_CALENDAR_RAW")
# 반복문 내에서 파티션 등록 및 INSERT INTO 실행
for p in partitions:
spark.sql(f"""
ALTER TABLE airbnb_db.listing_calendar ADD IF NOT EXISTS PARTITION (
dt = '{p}'
)
LOCATION 's3://airbnb-data-lake/db/listing_calendar/dt={p}';
""")
spark.sql(f"""
INSERT OVERWRITE airbnb_db.listing_calendar
PARTITION (dt = '{p}')
SELECT /*+ REPARTITION(2) */ *
FROM LISTING_CALENDAR_RAW
WHERE date = to_date('{p}', 'yyyyMMdd')
""")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment