Skip to content

Instantly share code, notes, and snippets.

@kretes
Created October 11, 2019 08:57
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save kretes/58f7f66a0780681a44c175a2ac3c0da2 to your computer and use it in GitHub Desktop.
Save kretes/58f7f66a0780681a44c175a2ac3c0da2 to your computer and use it in GitHub Desktop.
Repro for spark csv escape issue
a,b
at_the_end\,1
in_\_side,1
"comma,at_the_end\\",1
"comma,in_\\_side",1
df = spark.createDataFrame([{'a':"at_the_end\\", "b":1},
{'a':"in_\\_side", "b":1},
{'a':"comma,at_the_end\\", "b":1},
{'a':"comma,in_\\_side", "b":1}
])
path = "/tmp/spark-quote9"
df.coalesce(1).write.mode("overwrite").csv(path, header=True, escape="\\", quote='"')
! hdfs dfs -text {path}/* > /tmp/csv
pd.read_csv("/tmp/csv", escapechar="\\")
@07ARB
Copy link

07ARB commented Oct 11, 2019

hi Tomasz Bartczak,
This is my first jira, which i am going to check. please guide me to check the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment