Skip to content

Instantly share code, notes, and snippets.

@Schm1tz1
Last active November 18, 2020 11:52
Show Gist options
  • Save Schm1tz1/3229c6dc2e5bf1539da5e26840e11607 to your computer and use it in GitHub Desktop.
Save Schm1tz1/3229c6dc2e5bf1539da5e26840e11607 to your computer and use it in GitHub Desktop.
How to handle nested AVRO Schemas: Union, several Files
public void nestedSchemaTest() throws IOException, NoSuchAlgorithmException {
Schema.Parser parser = new Schema.Parser();
// nested schemas as union
parser.parse(new File("schemaUnion.avsc"));
// load dependent first, then "derived" schema
parser.parse(new File("schema1.avsc"));
parser.parse(new File("schema2.avsc"));
System.out.println(parser.getTypes());
System.out.println(parser.getTypes().get("person").toString(true));
System.out.println(parser.getTypes().get("person2").toString(true));
}
{
"type": "record",
"name": "Address2",
"fields": [
{
"name": "streetaddress",
"type": "string"
},
{
"name": "city",
"type": "string"
}
]
}
{
"type": "record",
"name": "person2",
"fields": [
{
"name": "firstname",
"type": "string"
},
{
"name": "lastname",
"type": "string"
},
{
"name": "address",
"type": "Address2"
}
]
}
]
[
{
"type": "record",
"name": "Address",
"fields": [
{
"name": "streetaddress",
"type": "string"
},
{
"name": "city",
"type": "string"
}
]
},
{
"type": "record",
"name": "person",
"fields": [
{
"name": "firstname",
"type": "string"
},
{
"name": "lastname",
"type": "string"
},
{
"name": "address",
"type": "Address"
}
]
}
]
@ksr4innovation
Copy link

Is there any dynamic way to load the dependent first, then "derived" schema

@Schm1tz1
Copy link
Author

Schm1tz1 commented Nov 18, 2020

Is there any dynamic way to load the dependent first, then "derived" schema

At least I haven't found a "AVRO-native" way to do this. Some ideas:

  • You can define everything in one place so you only have "one" schema with the substructure defined on first usage
  • Wrap the parser with some exception-handling to get the unresolved dependencies, resolve and retry. Maybe quick and dirty, but too much random trial-and-error for me t.b.h.
  • more clever solution: preload and parse the schemas with some king of JSON-parser, build a list/tree of the dependencies and generate a loading order for the schema files by traversing this tree

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment