Skip to content

Instantly share code, notes, and snippets.

@1ambda
Created December 21, 2021 00:43
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save 1ambda/68781d9050cece7c1fea85f6ac825bd2 to your computer and use it in GitHub Desktop.
Save 1ambda/68781d9050cece7c1fea85f6ac825bd2 to your computer and use it in GitHub Desktop.
# 'collect()' 는 Executor 에서 파일 내의 데이터를 읽어 Driver 로 전송하는 Action 입니다.
# 만약 cache() 등을 통해 캐싱되어 있다면 메모리에서 데이터를 찾아 보낼 수 있습니다.
collected = dfPartitioned.collect()
# type(collected) 의 실행 결과
list
# collected[0] 의 실행 결과
Row(id=7196, year_birth=1950, education='PhD', count_kid=1, count_teen=1, date_customer='08-02-2014', days_last_login=20, date_joined=datetime.date(2020, 2, 8))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment