Skip to content

Instantly share code, notes, and snippets.

@1ambda
Created December 20, 2021 12:11
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save 1ambda/2053592d8da3b84cda0663dfa4ade2e3 to your computer and use it in GitHub Desktop.
Save 1ambda/2053592d8da3b84cda0663dfa4ade2e3 to your computer and use it in GitHub Desktop.
# Transformation 입니다. 아직 실행되지 않습니다.
df = spark.read.load("./marketing_campaign.csv",
format="csv", sep="\t", inferSchema="true", header="true")
# Transformation 입니다. 아직 실행되지 않습니다.
dfSelected = df.select(
col("ID").alias("id"),
col("Year_Birth").alias("year_birth"),
col("Education").alias("education"),
col("Kidhome").alias("count_kid"),
col("Teenhome").alias("count_teen"),
col("Dt_Customer").alias("date_customer"),
col("Recency").alias("days_last_login")
)
# Transformation 입니다. 아직 실행되지 않습니다.
dfConverted = df.withColumn("date_joined",
add_months(to_date(col("dt_customer"), "d-M-yyyy"), 72))
# Action 입니다.
dfConverted.count() # 2240 을 출력
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment