Navigation Menu

Skip to content

Instantly share code, notes, and snippets.

@amitkumarj441
Last active July 14, 2018 15:54
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save amitkumarj441/77ee0947fd668861a612640bce3f9438 to your computer and use it in GitHub Desktop.
Save amitkumarj441/77ee0947fd668861a612640bce3f9438 to your computer and use it in GitHub Desktop.
Named entity recognition stuffs

CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition

The CoNLL-2003 (Sang et al. 2003) shared task deals with language-independent named entity recognition as well (English and German).

Dataset

The CoNLL-2003 shared task data files contain four columns separated by a single space. Each word has been put on a separate line and there is an empty line after each sentence. The first item on each line is a word, the second a part-of-speech (POS) tag, the third a syntactic chunk tag and the fourth the named entity tag. The chunk tags and the named entity tags have the format I-TYPE which means that the word is inside a phrase of type TYPE. Only if two phrases of the same type immediately follow each other, the first word of the second phrase will have tag B-TYPE to show that it starts a new phrase. A word with tag O is not part of a phrase.

The English data is a collection of news wire articles from the Reuters Corpus. The annotation has been done by people of the University of Antwerp. Because of copyright reasons we only make available the annotations. In order to build the complete data sets you will need access to the Reuters Corpus. It can be obtained for research purposes without any charge from NIST.

Results

References Method F1
Florian et al. (2003) Combination of various machine-learning classifiers 88.76
Ando et al. (2005) Semi-supervised approach 89.31
L Ratinov et al (2009) Word-class Model 90.80
D Lin et al. (2009) W500 + P125 + P64 90.90
Collobert et al. (2011) NN+SLL+LM2 88.67
Collobert et al. (2011) NN+SLL+LM2+Gazetteer 89.59
Suzuki et al. (2011) L1CRF 91.02
Passos et al. (2014) Baseline + Gaz + LexEmb 90.90
Huang et al. (2015) BI-LSTM-CRF 90.10
JPC Chiu et al. (2015) BLSTM-CNN + emb + lex 91.62
Luo et al. (2015) JERL 91.20

Experiments

References Method Val. Accuracy Epochs
CapsNet Conll2003 dataset 96.47 10
CapsNet GRU Rotten Tomatoes Dataset + Fasttext_300d 85.86 20
CapsNet

References

  • Named Entity Recognition with Bidirectional LSTM-CNNs (CL'15), JPC Chiu et al. [pdf]
  • Bidirectional LSTM-CRF Models for Sequence Tagging (EMNLP'15), Z Huang et al. [pdf]
  • Joint entity recognition and disambiguation (EMNLP '15), G Luo et al. [pdf]
  • Lexicon infused phrase embeddings for named entity resolution (ACL'14), A Passos et al. [pdf]
  • Learning condensed feature representations from large unsupervised data sets for supervised learning (ACL'11), J Suzuki et al. [pdf]
  • Natural Language Processing (Almost) from Scratch (CL'11), R Collobert et al. [pdf]
  • Design Challenges and Misconceptions in Named Entity Recognition (CoNLL'09), L Ratinov et al. [pdf]
  • Phrase Clustering for Discriminative Learning (ACL '09), D Lin et al. [pdf]
  • A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data (JMLR'05), RK Ando et al. [pdf]
  • Named Entity Recognition through Classifier Combination (HLT-NAACL'03), R Florian et al. [pdf]
  • Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition (CoNLL'03), EFTK Sang et al. [pdf]

See Also

https://www.clips.uantwerpen.be/conll2003/ner/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment