Skip to content

Instantly share code, notes, and snippets.

@karpanGit
Last active February 7, 2023 06:00
Show Gist options
  • Save karpanGit/fd95a0e71fd2657db9243c2fa1d22a40 to your computer and use it in GitHub Desktop.
Save karpanGit/fd95a0e71fd2657db9243c2fa1d22a40 to your computer and use it in GitHub Desktop.
pyspark, apply mapping
# map values in pyspark
import pyspark.sql.functions as F
from itertools import chain
data = [['a', 1], ['b', 2], ['a', 3], ['d', 4]]
data = spark.createDataFrame(data, schema=['name', 'val'])
data.show()
# create mapping column
mapping = {'a': 'hello a', 'b': 'hello b', 'c': 'hello c'}
mapping = F.create_map([F.lit(x) for x in list(chain(*mapping.items()))])
# apply the mapping
res = data.select(mapping[F.col('name')])
res.show()
# |map(a, hello a, b, hello b, c, hello c)[name]|
# +---------------------------------------------+
# | hello a|
# | hello b|
# | hello a|
# | null|
# +---------------------------------------------+
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment