Skip to content

Instantly share code, notes, and snippets.

@developer-sdk
Created June 20, 2016 12:56
Show Gist options
  • Save developer-sdk/f0fd6e703b5f8e17514b9eee91b3d80f to your computer and use it in GitHub Desktop.
Save developer-sdk/f0fd6e703b5f8e17514b9eee91b3d80f to your computer and use it in GitHub Desktop.
# url의 파일을 유니코드 인코딩으로 읽음
textFile = sc.textFile("file_url", use_unicode=True)
# utf-8 인코딩을 이용하여 처리
counts = textFile.flatMap(lambda line: str(line.encode('utf-8')).split("\n"))\
.map(lambda line: (line.split("\t")[0], 1))\
.reduceByKey(lambda a, b: a + b)
# hdfs에 result 폴더에 저장
counts.saveAsTextFile("result")
@ChrisYeGX
Copy link

Thanks you. It works!!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment