Skip to content

Instantly share code, notes, and snippets.

@1ambda
Created December 21, 2021 15:24
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save 1ambda/fb87333e18b14289c82ea482ba830914 to your computer and use it in GitHub Desktop.
Save 1ambda/fb87333e18b14289c82ea482ba830914 to your computer and use it in GitHub Desktop.
from pyspark.sql.types import *
from pyspark.sql.functions import *
from pyspark.sql.window import Window
df = spark.read.load("./ecommerce_event.csv",
                     format="csv", inferSchema="true", header="true")
df.count() # 4264752, 약 450 MiB 파일
df.printSchema()
root
|-- event_time: string (nullable = true)
|-- event_type: string (nullable = true)
|-- product_id: integer (nullable = true)
|-- category_id: long (nullable = true)
|-- category_code: string (nullable = true)
|-- brand: string (nullable = true)
|-- price: double (nullable = true)
|-- user_id: integer (nullable = true)
|-- user_session: string (nullable = true)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment