Skip to content

Instantly share code, notes, and snippets.

@ayee
Created December 21, 2015 23:03
Show Gist options
  • Save ayee/de47f6d31b0328a41f1b to your computer and use it in GitHub Desktop.
Save ayee/de47f6d31b0328a41f1b to your computer and use it in GitHub Desktop.
Split Spark dataframe columns with literal
from pyspark.sql.functions import split
df = sc.parallelize([[1, 'Foo:10'], [2, 'Bar:11'], [3,'Car:12']]).toDF(['Event', 'eventtype'])
df = df.withColumn('Thing', split(df.eventtype, ':')[0])
df = df.withColumn('Ranking', split(df.eventtype, ':')[1])
df.collect()
# [Row(Event=1, eventtype=u'Foo:10', Thing=u'Foo', Ranking=u'10'),
# Row(Event=2, eventtype=u'Bar:11', Thing=u'Bar', Ranking=u'11'),
# Row(Event=3, eventtype=u'Car:12', Thing=u'Car', Ranking=u'12')]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment