A personal diary of DataFrame munging over the years.
Convert Series datatype to numeric (will error if column has non-numeric values)
(h/t @makmanalp)
from pyspark.sql.types import StringType | |
from pyspark.sql.functions import udf | |
maturity_udf = udf(lambda age: "adult" if age >=18 else "child", StringType()) | |
df = spark.createDataFrame([{'name': 'Alice', 'age': 1}]) | |
df.withColumn("maturity", maturity_udf(df.age)) | |
df.show() |
A personal diary of DataFrame munging over the years.
Convert Series datatype to numeric (will error if column has non-numeric values)
(h/t @makmanalp)