-
-
Save reckart/306b8ffddd30bee1f3afd0468a9ad31d to your computer and use it in GitHub Desktop.
/* | |
* Convert brat annotations to an XMI format compatible with WebAnno. | |
* | |
* Using this script requires some preparation: | |
* - Log in to WebAnno and create a new project | |
* - Import a simple text document into the project (the content does not matter) | |
* - Create the custom layers to which you want to map your brat annotations. | |
* In this example, we assume that you want to map all brat entity/event annotations | |
* from brat to a single WebAnno span layer called "Component" and all brat relations | |
* to WebAnno relation layer called "Relation". Mind that in WebAnno presently only a | |
* single relation layer can attach to any span layer. | |
* On both layers, add a string feature called value. This will take the name of | |
* the original brat annotation. | |
* - Ensure that the layer behaviors (e.g. allow cross-sentence) match the way your | |
* annotations in brat have been created. I.e. if you have any cross-sentence | |
* entities in brat, make sure that the "Component" layer allows cross-sentence | |
* annotations. | |
* - Open the text document you imported before in the WebAnno annotation editor. | |
* - Export the document in XMI format from the export button in the action bar. | |
* - Unzip the exported file and open the typesystem.xml file for editing. | |
* - Under the "types" XML element, remove all "typeDescription" elements except | |
* those describing the layers you have previously created ("webanno.custom.Component" | |
* and "webanno.custom.Relation"). Take note of the path to the typesystem.xml | |
* file as you have to pass it as the third parameter to this script. | |
* - Adjust the PARAM_TYPE_MAPPINGS below according to the brat annotations you | |
* have been using. Mind that brat annotation names cannot contain underscores | |
* or dots. | |
* - Ensure that you brat ".ann" files do not contain spurious trailing space | |
* characters, otherwise you might get errors such as | |
* Illegal event annotation format [E11 subst:T41 ]. | |
* - Run this script with following parameters: | |
* - the path of the ".ann" file you want to convert. You can use wildcards | |
* if you want to convert multiple files. | |
* - the output directory to which the converted files are written | |
* - the path to the typesystem.xml file you have prepared before | |
* - If all goes well, it will take a moment for the script to download its | |
* dependencies and then it should convert each of your ".ann" to a ".xmi" | |
* file. | |
* - Log in to WebAnno and open the project you have created in the first step. | |
* - Import the XMI files into this project. | |
*/ | |
@Grab(group='de.tudarmstadt.ukp.dkpro.core', | |
module='de.tudarmstadt.ukp.dkpro.core.io.brat-asl', | |
version='1.9.0') | |
@Grab(group='de.tudarmstadt.ukp.dkpro.core', | |
module='de.tudarmstadt.ukp.dkpro.core.io.xmi-asl', | |
version='1.9.0') | |
import static org.apache.uima.util.CasCreationUtils.*; | |
import static org.apache.uima.fit.pipeline.SimplePipeline.*; | |
import static org.apache.uima.fit.factory.CollectionReaderFactory.*; | |
import static org.apache.uima.fit.factory.AnalysisEngineFactory.*; | |
import static org.apache.uima.fit.factory.TypeSystemDescriptionFactory.*; | |
import de.tudarmstadt.ukp.dkpro.core.io.brat.*; | |
import de.tudarmstadt.ukp.dkpro.core.io.xmi.*; | |
// Load the default type system and the custom UIMA type system | |
// specifiction passed as the third parameter to this script. | |
def ts = mergeTypeSystems([ | |
createTypeSystemDescription(), | |
createTypeSystemDescriptionFromPath(args[2])]); | |
// Assemble and run pipeline | |
runPipeline( | |
createReaderDescription(BratReader, ts, | |
BratReader.PARAM_SOURCE_LOCATION, args[0], | |
// Specify which brat annotations map to which UIMA annotation types. | |
// Mind that normally you want to map multiple brat annotations to | |
// a generic annotation type. | |
BratReader.PARAM_TYPE_MAPPINGS, [ | |
"subst -> webanno.custom.Component", | |
"pron -> webanno.custom.Component", | |
"ptc -> webanno.custom.Component", | |
"prep -> webanno.custom.Component", | |
"verb -> webanno.custom.Component", | |
"konj -> webanno.custom.Component", | |
"concerning -> webanno.custom.Relation", | |
"location -> webanno.custom.Relation", | |
"purpose -> webanno.custom.Relation"], | |
// Specify which UIMA annotation types are spans. The name given after | |
// the colon indicates the UIMA feature where the original brat annotation | |
// name is stored. | |
BratReader.PARAM_TEXT_ANNOTATION_TYPES, "webanno.custom.Component:value", | |
// Specify which UIMA annotation types are relations. The name given after | |
// the colon indicates the UIMA feature where the original brat annotation | |
// name is stored. The source and target feature names must be Governor and | |
// Dependent as this is presently hard-coded in WebAnno. | |
BratReader.PARAM_RELATION_TYPES, "webanno.custom.Relation:Governor:Dependent{A}:value"), | |
createEngineDescription(XmiWriter, ts, | |
XmiWriter.PARAM_TARGET_LOCATION, args[1], | |
XmiWriter.PARAM_STRIP_EXTENSION, true)) |
Dear reckart,
I'm still getting java.lang.NullPointerException
at org.dkpro.core.io.brat.BratReader.fillSlots(BratReader.java:525)
at org.dkpro.core.io.brat.BratReader.readAnnotations(BratReader.java:285)
at org.dkpro.core.io.brat.BratReader.getNext(BratReader.java:237)
at org.dkpro.core.api.io.JCasResourceCollectionReader_ImplBase.getNext(JCasResourceCollectionReader_ImplBase.java:41)
at org.apache.uima.fit.pipeline.SimplePipeline.runPipeline(SimplePipeline.java:187)
at pipeline.run(pipeline.groovy:28)
after upgrading to DKPro Core 2.1.0. I'm pretty sure the ann file can work in Brat.
Would you mind taking a look at my files? No one in my team has experience in Java and we don't know how to handle the exception.
https://drive.google.com/drive/folders/17v5xwRuoJ_w9TGcK9L7x0X29aB2-8Ugt?usp=sharing
Thank you for your patience.
@jerrychen007 I have downloaded your sample and hope to have a look at it over the weekend.
Ok, so I finally managed to have a look at this.
I could fix the exception by adding the following two lines to the .ann
file:
A2 role T1 speaker
A2 role T2 speak
The BratReader right now expects that the filler of a slot feature carries an attribute called role
which contains the role name.
I have opened an issue in DKPro Core for this, but I don't know when I get to fix/release it.
Thank you for the information!
When using DKPro Core, please make sure to use the same version for all DKPro Core modules.
Also, we changed the groupId and the artifact naming scheme a while back. E.g. instead of
you should switch to
You can find a full list here: https://search.maven.org/search?q=g:org.dkpro.core%20v:1.12.0
Let me know if you still get an exception after upgrading to DKPro Core 1.12.0. That makes it easier for me to correlate your stack trace with the current DKPro Core v1 code.
P.S.: You could alternatively also try with the DKPro Core 2.1.0 instead of 1.12.0.