CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition
The CoNLL-2003 (Sang et al. 2003) shared task deals with language-independent named entity recognition as well (English and German).
The CoNLL-2003 shared task data files contain four columns separated by a single space. Each word has been put on a separate line and there is an empty line after each sentence. The first item on each line is a word, the second a part-of-speech (POS) tag, the third a syntactic chunk tag and the fourth the named entity tag. The chunk tags and the named entity tags have the format I-TYPE which means that the word is inside a phrase of type TYPE. Only if two phrases of the same type immediately follow each other, the first word of the second phrase will have tag B-TYPE to show that it starts a new phrase. A word with tag O is not part of a phrase.
The English data is a collection of news wire articles from the Reuters Corpus. The annotation has been done by people of the University of Antwerp. Because of copyright reasons we only make available the annotations. In order to build the complete data sets you will need access to the Reuters Corpus. It can be obtained for research purposes without any charge from NIST.
References | Method | F1 |
---|---|---|
Florian et al. (2003) | Combination of various machine-learning classifiers | 88.76 |
Ando et al. (2005) | Semi-supervised approach | 89.31 |
L Ratinov et al (2009) | Word-class Model | 90.80 |
D Lin et al. (2009) | W500 + P125 + P64 | 90.90 |
Collobert et al. (2011) | NN+SLL+LM2 | 88.67 |
Collobert et al. (2011) | NN+SLL+LM2+Gazetteer | 89.59 |
Suzuki et al. (2011) | L1CRF | 91.02 |
Passos et al. (2014) | Baseline + Gaz + LexEmb | 90.90 |
Huang et al. (2015) | BI-LSTM-CRF | 90.10 |
JPC Chiu et al. (2015) | BLSTM-CNN + emb + lex | 91.62 |
Luo et al. (2015) | JERL | 91.20 |
References | Method | Val. Accuracy | Epochs |
---|---|---|---|
CapsNet | Conll2003 dataset | 96.47 | 10 |
CapsNet GRU | Rotten Tomatoes Dataset + Fasttext_300d | 85.86 | 20 |
CapsNet |
- Named Entity Recognition with Bidirectional LSTM-CNNs (CL'15), JPC Chiu et al. [pdf]
- Bidirectional LSTM-CRF Models for Sequence Tagging (EMNLP'15), Z Huang et al. [pdf]
- Joint entity recognition and disambiguation (EMNLP '15), G Luo et al. [pdf]
- Lexicon infused phrase embeddings for named entity resolution (ACL'14), A Passos et al. [pdf]
- Learning condensed feature representations from large unsupervised data sets for supervised learning (ACL'11), J Suzuki et al. [pdf]
- Natural Language Processing (Almost) from Scratch (CL'11), R Collobert et al. [pdf]
- Design Challenges and Misconceptions in Named Entity Recognition (CoNLL'09), L Ratinov et al. [pdf]
- Phrase Clustering for Discriminative Learning (ACL '09), D Lin et al. [pdf]
- A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data (JMLR'05), RK Ando et al. [pdf]
- Named Entity Recognition through Classifier Combination (HLT-NAACL'03), R Florian et al. [pdf]
- Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition (CoNLL'03), EFTK Sang et al. [pdf]
See Also