Skip to content

Instantly share code, notes, and snippets.

@1ambda
Created December 20, 2021 11:27
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save 1ambda/0bbf6e2165c521b48ee86b0d7f80b625 to your computer and use it in GitHub Desktop.
Save 1ambda/0bbf6e2165c521b48ee86b0d7f80b625 to your computer and use it in GitHub Desktop.
dfSelected.printSchema() # 스키마를 확인합니다.
dfSelected.describe().show() # 통계 정보를 확인합니다. PySpark 에서는 `show` 대신 `toPandas` 를 활용할 수 있습니다.
# printSchema() 의 출력 결과
root
|-- id: integer (nullable = true)
|-- year_birth: integer (nullable = true)
|-- education: string (nullable = true)
|-- count_kid: integer (nullable = true)
|-- count_teen: integer (nullable = true)
|-- date_customer: string (nullable = true)
|-- days_last_login: integer (nullable = true)
# describe().show() 의 출력 결과
+-------+------------------+------------------+---------+-------------------+------------------+-------------+-----------------+
|summary| id| year_birth|education| count_kid| count_kid|date_customer| days_last_login|
+-------+------------------+------------------+---------+-------------------+------------------+-------------+-----------------+
| count| 2240| 2240| 2240| 2240| 2240| 2240| 2240|
| mean| 5592.159821428571|1968.8058035714287| null|0.44419642857142855| 0.50625| null| 49.109375|
| stddev|3246.6621975643416|11.984069456885827| null| 0.5383980977345935|0.5445382307698761| null|28.96245280837821|
| min| 0| 1893| 2n Cycle| 0| 0| 01-01-2013| 0|
| max| 11191| 1996| PhD| 2| 2| 31-12-2013| 99|
+-------+------------------+------------------+---------+-------------------+------------------+-------------+-----------------+
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment