-
-
Save reckart/306b8ffddd30bee1f3afd0468a9ad31d to your computer and use it in GitHub Desktop.
/* | |
* Convert brat annotations to an XMI format compatible with WebAnno. | |
* | |
* Using this script requires some preparation: | |
* - Log in to WebAnno and create a new project | |
* - Import a simple text document into the project (the content does not matter) | |
* - Create the custom layers to which you want to map your brat annotations. | |
* In this example, we assume that you want to map all brat entity/event annotations | |
* from brat to a single WebAnno span layer called "Component" and all brat relations | |
* to WebAnno relation layer called "Relation". Mind that in WebAnno presently only a | |
* single relation layer can attach to any span layer. | |
* On both layers, add a string feature called value. This will take the name of | |
* the original brat annotation. | |
* - Ensure that the layer behaviors (e.g. allow cross-sentence) match the way your | |
* annotations in brat have been created. I.e. if you have any cross-sentence | |
* entities in brat, make sure that the "Component" layer allows cross-sentence | |
* annotations. | |
* - Open the text document you imported before in the WebAnno annotation editor. | |
* - Export the document in XMI format from the export button in the action bar. | |
* - Unzip the exported file and open the typesystem.xml file for editing. | |
* - Under the "types" XML element, remove all "typeDescription" elements except | |
* those describing the layers you have previously created ("webanno.custom.Component" | |
* and "webanno.custom.Relation"). Take note of the path to the typesystem.xml | |
* file as you have to pass it as the third parameter to this script. | |
* - Adjust the PARAM_TYPE_MAPPINGS below according to the brat annotations you | |
* have been using. Mind that brat annotation names cannot contain underscores | |
* or dots. | |
* - Ensure that you brat ".ann" files do not contain spurious trailing space | |
* characters, otherwise you might get errors such as | |
* Illegal event annotation format [E11 subst:T41 ]. | |
* - Run this script with following parameters: | |
* - the path of the ".ann" file you want to convert. You can use wildcards | |
* if you want to convert multiple files. | |
* - the output directory to which the converted files are written | |
* - the path to the typesystem.xml file you have prepared before | |
* - If all goes well, it will take a moment for the script to download its | |
* dependencies and then it should convert each of your ".ann" to a ".xmi" | |
* file. | |
* - Log in to WebAnno and open the project you have created in the first step. | |
* - Import the XMI files into this project. | |
*/ | |
@Grab(group='de.tudarmstadt.ukp.dkpro.core', | |
module='de.tudarmstadt.ukp.dkpro.core.io.brat-asl', | |
version='1.9.0') | |
@Grab(group='de.tudarmstadt.ukp.dkpro.core', | |
module='de.tudarmstadt.ukp.dkpro.core.io.xmi-asl', | |
version='1.9.0') | |
import static org.apache.uima.util.CasCreationUtils.*; | |
import static org.apache.uima.fit.pipeline.SimplePipeline.*; | |
import static org.apache.uima.fit.factory.CollectionReaderFactory.*; | |
import static org.apache.uima.fit.factory.AnalysisEngineFactory.*; | |
import static org.apache.uima.fit.factory.TypeSystemDescriptionFactory.*; | |
import de.tudarmstadt.ukp.dkpro.core.io.brat.*; | |
import de.tudarmstadt.ukp.dkpro.core.io.xmi.*; | |
// Load the default type system and the custom UIMA type system | |
// specifiction passed as the third parameter to this script. | |
def ts = mergeTypeSystems([ | |
createTypeSystemDescription(), | |
createTypeSystemDescriptionFromPath(args[2])]); | |
// Assemble and run pipeline | |
runPipeline( | |
createReaderDescription(BratReader, ts, | |
BratReader.PARAM_SOURCE_LOCATION, args[0], | |
// Specify which brat annotations map to which UIMA annotation types. | |
// Mind that normally you want to map multiple brat annotations to | |
// a generic annotation type. | |
BratReader.PARAM_TYPE_MAPPINGS, [ | |
"subst -> webanno.custom.Component", | |
"pron -> webanno.custom.Component", | |
"ptc -> webanno.custom.Component", | |
"prep -> webanno.custom.Component", | |
"verb -> webanno.custom.Component", | |
"konj -> webanno.custom.Component", | |
"concerning -> webanno.custom.Relation", | |
"location -> webanno.custom.Relation", | |
"purpose -> webanno.custom.Relation"], | |
// Specify which UIMA annotation types are spans. The name given after | |
// the colon indicates the UIMA feature where the original brat annotation | |
// name is stored. | |
BratReader.PARAM_TEXT_ANNOTATION_TYPES, "webanno.custom.Component:value", | |
// Specify which UIMA annotation types are relations. The name given after | |
// the colon indicates the UIMA feature where the original brat annotation | |
// name is stored. The source and target feature names must be Governor and | |
// Dependent as this is presently hard-coded in WebAnno. | |
BratReader.PARAM_RELATION_TYPES, "webanno.custom.Relation:Governor:Dependent{A}:value"), | |
createEngineDescription(XmiWriter, ts, | |
XmiWriter.PARAM_TARGET_LOCATION, args[1], | |
XmiWriter.PARAM_STRIP_EXTENSION, true)) |
@shohre10539 discontinuous spans are not supported in the DKPro Core type system. However, DKPro Core 1.9.3 should be able to read the files but discards all but the first fragment of discontinuous spans. Try changing the version numbers in the script from "1.9.0" to "1.9.3".
Thanks.
Hi, can we export webanno annotations to a format compatible with brat? Thanks!
You can export as XMI and then build a pipeline with the DKPro Core XmiReader and BratWriter to convert it. Considering that the annotation models of brat and WebAnno/DKPro Core are somewhat different your experience with this process may vary.
Thank you !
When running the pipeline through groovy, I always get an error: Caught: java.lang.IllegalStateException: Type [webanno.custom.Component] has no feature naemd [Speak]
The event annotation format in my brat .ann is like:
T1 Speaker 132 136 Kony
T2 Quote 17 27 "Newshour"
E1 Speaker:T1 Speak:T2
Pipeline I'm using:
runPipeline(
createReaderDescription(BratReader, ts,
BratReader.PARAM_SOURCE_LOCATION, "*.ann",
BratReader.PARAM_TYPE_MAPPINGS, [
"Speaker -> webanno.custom.Component",
"Quote -> webanno.custom.Component",
"Speak -> webanno.custom.Relation"],
BratReader.PARAM_TEXT_ANNOTATION_TYPES, "webanno.custom.Component:value",
BratReader.PARAM_RELATION_TYPES, "webanno.custom.Relation:Governor:Dependent{A}:value"),
Thank you.
If I see this correctly, you are trying to map a brat Event-type annotation to a span/relation in WebAnno.
It should be possible to map brat Event-types to a WebAnno span layer with a link-type feature though [1, 2]. If your Speaker layer contains link features with the names "Speaker" and "Speak", then BratReader should just do the right thing (hopefully).
Thank you for your reply! In my case, I created only two layers: Component and Relation. I created a link feature for Component layer as the image shows:
now it returns:
Caught: java.lang.NullPointerException
java.lang.NullPointerException
at de.tudarmstadt.ukp.dkpro.core.io.brat.BratReader.fillSlots(BratReader.java:384)
at de.tudarmstadt.ukp.dkpro.core.io.brat.BratReader.readAnnotations(BratReader.java:216)
at de.tudarmstadt.ukp.dkpro.core.io.brat.BratReader.getNext(BratReader.java:173)
at de.tudarmstadt.ukp.dkpro.core.api.io.JCasResourceCollectionReader_ImplBase.getNext(JCasResourceCollectionReader_ImplBase.java:41)
at org.apache.uima.fit.pipeline.SimplePipeline.runPipeline(SimplePipeline.java:175)
at pipeline.run(pipeline.groovy:34)
Would you mind providing an example including the Brat ann, TypeSystem.xml and pipeline.groovy? Or details about the link feature will be much help.
Thank you.
Which DKPro Core version are you using?
I'm using:
@grab(group='de.tudarmstadt.ukp.dkpro.core', version='1.10.0',
module='de.tudarmstadt.ukp.dkpro.core.opennlp-asl')
import de.tudarmstadt.ukp.dkpro.core.opennlp.;
@grab(group='de.tudarmstadt.ukp.dkpro.core', version='1.10.0',
module='de.tudarmstadt.ukp.dkpro.core.languagetool-asl')
import de.tudarmstadt.ukp.dkpro.core.languagetool.;
@grab(group='de.tudarmstadt.ukp.dkpro.core', version='1.10.0',
module='de.tudarmstadt.ukp.dkpro.core.maltparser-asl')
import de.tudarmstadt.ukp.dkpro.core.maltparser.;
@grab(group='de.tudarmstadt.ukp.dkpro.core', version='1.10.0',
module='de.tudarmstadt.ukp.dkpro.core.io.text-asl')
import de.tudarmstadt.ukp.dkpro.core.io.text.;
@grab(group='de.tudarmstadt.ukp.dkpro.core', version='1.10.0',
module='de.tudarmstadt.ukp.dkpro.core.io.conll-asl')
import de.tudarmstadt.ukp.dkpro.core.io.conll.*;
@grab(group='de.tudarmstadt.ukp.dkpro.core',
module='de.tudarmstadt.ukp.dkpro.core.io.brat-asl',
version='1.9.3')
@grab(group='de.tudarmstadt.ukp.dkpro.core',
module='de.tudarmstadt.ukp.dkpro.core.io.xmi-asl',
version='1.9.3')
import static org.apache.uima.util.CasCreationUtils.;
import static org.apache.uima.fit.pipeline.SimplePipeline.;
import static org.apache.uima.fit.factory.CollectionReaderFactory.;
import static org.apache.uima.fit.factory.AnalysisEngineFactory.;
import static org.apache.uima.fit.factory.TypeSystemDescriptionFactory.*;
import de.tudarmstadt.ukp.dkpro.core.io.brat.;
import de.tudarmstadt.ukp.dkpro.core.io.xmi.;
Is that the correct version? It actually works fine without Event-type annotations from Brat.
When using DKPro Core, please make sure to use the same version for all DKPro Core modules.
Also, we changed the groupId and the artifact naming scheme a while back. E.g. instead of
@grab(group='de.tudarmstadt.ukp.dkpro.core', version='1.10.0',
module='de.tudarmstadt.ukp.dkpro.core.opennlp-asl')
you should switch to
@grab(group='org.dkpro.core', module='dkpro-core-opennlp-asl', version='1.12.0')
You can find a full list here: https://search.maven.org/search?q=g:org.dkpro.core%20v:1.12.0
Let me know if you still get an exception after upgrading to DKPro Core 1.12.0. That makes it easier for me to correlate your stack trace with the current DKPro Core v1 code.
P.S.: You could alternatively also try with the DKPro Core 2.1.0 instead of 1.12.0.
Dear reckart,
I'm still getting java.lang.NullPointerException
at org.dkpro.core.io.brat.BratReader.fillSlots(BratReader.java:525)
at org.dkpro.core.io.brat.BratReader.readAnnotations(BratReader.java:285)
at org.dkpro.core.io.brat.BratReader.getNext(BratReader.java:237)
at org.dkpro.core.api.io.JCasResourceCollectionReader_ImplBase.getNext(JCasResourceCollectionReader_ImplBase.java:41)
at org.apache.uima.fit.pipeline.SimplePipeline.runPipeline(SimplePipeline.java:187)
at pipeline.run(pipeline.groovy:28)
after upgrading to DKPro Core 2.1.0. I'm pretty sure the ann file can work in Brat.
Would you mind taking a look at my files? No one in my team has experience in Java and we don't know how to handle the exception.
https://drive.google.com/drive/folders/17v5xwRuoJ_w9TGcK9L7x0X29aB2-8Ugt?usp=sharing
Thank you for your patience.
@jerrychen007 I have downloaded your sample and hope to have a look at it over the weekend.
Ok, so I finally managed to have a look at this.
I could fix the exception by adding the following two lines to the .ann
file:
A2 role T1 speaker
A2 role T2 speak
The BratReader right now expects that the filler of a slot feature carries an attribute called role
which contains the role name.
I have opened an issue in DKPro Core for this, but I don't know when I get to fix/release it.
Thank you for the information!
I am trying to convert my brat annotated files to xmi format so that I can import them to WebAnno tool embedded in INCEpTION.
I have two span Components, so firstly I added two layers : webanno.custom.componentone and webanno.custom.componenttwo and exported them. and made my typesytem.xml file.
However, I have a problem getting the output of running this script. since brat allows for connecting sentences I might have lines in my brat.ann files such as:
T8 ComponentOne 4142 4204;4240 4305 blahblahblahblah etc
Apparently this code doesn't allow for such annotation?! And gives me an error: java.lang.IllegalArgumentException: Illegal text annotation format
And doesn't create any output file at the end.
Is there a way that the code can handle this kind of annotations?
Thank you.