Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save kovid-r/bc88ac6ee1a87cd0b2dc62e6243b8001 to your computer and use it in GitHub Desktop.
Save kovid-r/bc88ac6ee1a87cd0b2dc62e6243b8001 to your computer and use it in GitHub Desktop.
Different methods for Changing column name PySpark Cheatsheet
# Changing column name with withColumnRenamed feature
df = df.withColumnRenamed('existing_column_name', 'new_column_name')
# Changing column with selectExpr (you'll have to select all the columns here)
df = df.selectExpr("existing_column_name AS existing_1", "new_column_name AS new_1")
# Changing column with sparksql functions - col and alias
from pyspark.sql.functions import col
df = df.select(col("existing_column_name").alias("existing_1"), col("new_column_name").alias("new_1"))
# Changing column with a SQL select statement
sqlContext.registerDataFrameAsTable(df, "df_table")
df = sqlContext.sql("SELECT existing_column_name AS existing_1, new_column_name AS new_1 FROM df_table")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment