Skip to content

Instantly share code, notes, and snippets.

@vincentclaes
Last active May 18, 2018 13:21
Show Gist options
  • Save vincentclaes/e11bc4a78f50108d75f83a57a56b10e9 to your computer and use it in GitHub Desktop.
Save vincentclaes/e11bc4a78f50108d75f83a57a56b10e9 to your computer and use it in GitHub Desktop.
convert DateType to TimeStampType because a df from parquet cannot publish DateTypes to a hive table (see issue 6384)
import org.apache.spark.sql.types.{DateType, TimestampType}
import org.apache.spark.sql.DataFrame
/**
* convert DateType to TimeStampType because a df from parquet cannot publish DateTypes to a hive table
* https://stackoverflow.com/questions/37357009/cloudera-5-6-parquet-does-not-support-date-see-hive-6384
* @param df spark dataframe
* @return spark dataframe
*/
def convertDateToTimestamp(df: DataFrame): DataFrame ={
val convertedDf = df.columns.foldLeft(df){(memoDf, colName) =>
if (memoDf.schema(colName).dataType == DateType) {memoDf.withColumn(colName, memoDf(colName).cast(TimestampType))}
else memoDf
}
convertedDf
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment