Skip to content

Instantly share code, notes, and snippets.

@AydinChavez
Forked from ebuildy/flatten.java
Last active April 25, 2018 12:18
Show Gist options
  • Save AydinChavez/5653102c1cd34e37038de89152c75c92 to your computer and use it in GitHub Desktop.
Save AydinChavez/5653102c1cd34e37038de89152c75c92 to your computer and use it in GitHub Desktop.
Flatten Spark data frame fields structure, via SQL in Java. This fork also supports column names having a hyphe in it (e.g. "col-1"), dots and @-sign.
class Toto
{
public void Main()
{
final DataFrame source = GetDataFrame();
final String querySelectSQL = flattenSchema(source.schema(), null);
source.registerTempTable("source");
final DataFrame flattenData = sqlContext.sql("SELECT " + querySelectSQL + " FROM source")
}
/**
* Generate SQL to select columns as flat.
*/
public static String flattenSchema(StructType schema, String prefix) {
final StringBuilder selectSQLQuery = new StringBuilder();
for (StructField field : schema.fields()) {
final String fieldName = field.name();
String colName = prefix == null ? (fieldName.contains("-") || fieldName.contains("@") || fieldName.contains(".") ? "`" + fieldName + "`" : fieldName) : (prefix + "." + (fieldName.contains("-") || fieldName.contains("@") || fieldName.contains(".")? "`" + fieldName + "`" : fieldName));
String colNameTarget = colName.replace(".", "_").replace("-", "").replace("`", "").replace("@", "");
if (field.dataType().getClass().equals(StructType.class)) {
selectSQLQuery.append(flattenSchema((StructType) field.dataType(), colName));
} else {
selectSQLQuery.append(colName);
selectSQLQuery.append(" as ");
selectSQLQuery.append(colNameTarget);
}
selectSQLQuery.append(",");
}
if (selectSQLQuery.length() > 0) {
selectSQLQuery.deleteCharAt(selectSQLQuery.length() - 1);
}
return selectSQLQuery.toString();
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment