Skip to content

Instantly share code, notes, and snippets.

@arun-y
Last active March 4, 2018 16:35
Show Gist options
  • Save arun-y/4be89e4d2c1c12e8e1400bed7edfbf20 to your computer and use it in GitHub Desktop.
Save arun-y/4be89e4d2c1c12e8e1400bed7edfbf20 to your computer and use it in GitHub Desktop.
Generating Spark Dataset<Row> from json string
ArrayNode rootNode = (ArrayNode)objectMapper.readTree(input);
ObjectNode rootObject = (ObjectNode)rootNode.get(0);
ArrayList<StructField> stfList = new ArrayList<>();
ArrayList<String> values = new ArrayList<>();
rootObject.fields().forEachRemaining(e -> {
stfList.add(new StructField(e.getKey(), DataTypes.StringType, false, Metadata.empty()));
values.add(e.getValue().asText());
});
StructType st = new StructType(stfList.toArray(new StructField[] {}));
Row r0w = new GenericRow(values.toArray(new String[] {}));
List<Row> rows = new ArrayList<>();
rows.add(r0w);
Dataset<Row> df = spark.createDataFrame(rows, st);
df.show();
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment