Skip to content

Instantly share code, notes, and snippets.

@reckart
Created February 2, 2018 10:13
Show Gist options
  • Save reckart/306b8ffddd30bee1f3afd0468a9ad31d to your computer and use it in GitHub Desktop.
Save reckart/306b8ffddd30bee1f3afd0468a9ad31d to your computer and use it in GitHub Desktop.
Convert brat ann files to WebAnno-compatible XMI
/*
* Convert brat annotations to an XMI format compatible with WebAnno.
*
* Using this script requires some preparation:
* - Log in to WebAnno and create a new project
* - Import a simple text document into the project (the content does not matter)
* - Create the custom layers to which you want to map your brat annotations.
* In this example, we assume that you want to map all brat entity/event annotations
* from brat to a single WebAnno span layer called "Component" and all brat relations
* to WebAnno relation layer called "Relation". Mind that in WebAnno presently only a
* single relation layer can attach to any span layer.
* On both layers, add a string feature called value. This will take the name of
* the original brat annotation.
* - Ensure that the layer behaviors (e.g. allow cross-sentence) match the way your
* annotations in brat have been created. I.e. if you have any cross-sentence
* entities in brat, make sure that the "Component" layer allows cross-sentence
* annotations.
* - Open the text document you imported before in the WebAnno annotation editor.
* - Export the document in XMI format from the export button in the action bar.
* - Unzip the exported file and open the typesystem.xml file for editing.
* - Under the "types" XML element, remove all "typeDescription" elements except
* those describing the layers you have previously created ("webanno.custom.Component"
* and "webanno.custom.Relation"). Take note of the path to the typesystem.xml
* file as you have to pass it as the third parameter to this script.
* - Adjust the PARAM_TYPE_MAPPINGS below according to the brat annotations you
* have been using. Mind that brat annotation names cannot contain underscores
* or dots.
* - Ensure that you brat ".ann" files do not contain spurious trailing space
* characters, otherwise you might get errors such as
* Illegal event annotation format [E11 subst:T41 ].
* - Run this script with following parameters:
* - the path of the ".ann" file you want to convert. You can use wildcards
* if you want to convert multiple files.
* - the output directory to which the converted files are written
* - the path to the typesystem.xml file you have prepared before
* - If all goes well, it will take a moment for the script to download its
* dependencies and then it should convert each of your ".ann" to a ".xmi"
* file.
* - Log in to WebAnno and open the project you have created in the first step.
* - Import the XMI files into this project.
*/
@Grab(group='de.tudarmstadt.ukp.dkpro.core',
module='de.tudarmstadt.ukp.dkpro.core.io.brat-asl',
version='1.9.0')
@Grab(group='de.tudarmstadt.ukp.dkpro.core',
module='de.tudarmstadt.ukp.dkpro.core.io.xmi-asl',
version='1.9.0')
import static org.apache.uima.util.CasCreationUtils.*;
import static org.apache.uima.fit.pipeline.SimplePipeline.*;
import static org.apache.uima.fit.factory.CollectionReaderFactory.*;
import static org.apache.uima.fit.factory.AnalysisEngineFactory.*;
import static org.apache.uima.fit.factory.TypeSystemDescriptionFactory.*;
import de.tudarmstadt.ukp.dkpro.core.io.brat.*;
import de.tudarmstadt.ukp.dkpro.core.io.xmi.*;
// Load the default type system and the custom UIMA type system
// specifiction passed as the third parameter to this script.
def ts = mergeTypeSystems([
createTypeSystemDescription(),
createTypeSystemDescriptionFromPath(args[2])]);
// Assemble and run pipeline
runPipeline(
createReaderDescription(BratReader, ts,
BratReader.PARAM_SOURCE_LOCATION, args[0],
// Specify which brat annotations map to which UIMA annotation types.
// Mind that normally you want to map multiple brat annotations to
// a generic annotation type.
BratReader.PARAM_TYPE_MAPPINGS, [
"subst -> webanno.custom.Component",
"pron -> webanno.custom.Component",
"ptc -> webanno.custom.Component",
"prep -> webanno.custom.Component",
"verb -> webanno.custom.Component",
"konj -> webanno.custom.Component",
"concerning -> webanno.custom.Relation",
"location -> webanno.custom.Relation",
"purpose -> webanno.custom.Relation"],
// Specify which UIMA annotation types are spans. The name given after
// the colon indicates the UIMA feature where the original brat annotation
// name is stored.
BratReader.PARAM_TEXT_ANNOTATION_TYPES, "webanno.custom.Component:value",
// Specify which UIMA annotation types are relations. The name given after
// the colon indicates the UIMA feature where the original brat annotation
// name is stored. The source and target feature names must be Governor and
// Dependent as this is presently hard-coded in WebAnno.
BratReader.PARAM_RELATION_TYPES, "webanno.custom.Relation:Governor:Dependent{A}:value"),
createEngineDescription(XmiWriter, ts,
XmiWriter.PARAM_TARGET_LOCATION, args[1],
XmiWriter.PARAM_STRIP_EXTENSION, true))
@reckart
Copy link
Author

reckart commented Nov 20, 2020

Ok, so I finally managed to have a look at this.

I could fix the exception by adding the following two lines to the .ann file:

A2	role T1 speaker
A2	role T2 speak

The BratReader right now expects that the filler of a slot feature carries an attribute called role which contains the role name.

I have opened an issue in DKPro Core for this, but I don't know when I get to fix/release it.

dkpro/dkpro-core#1489

@jerrychen007
Copy link

Thank you for the information!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment