amitkumarj441/ner.md

## ner.md

      
    Raw
  

              ner.md
            
          
    CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition
The CoNLL-2003 (Sang et al. 2003) shared task deals with language-independent named entity recognition as well (English and German).
Dataset

The CoNLL-2003 shared task data files contain four columns separated by a single space. Each word has been put on a separate line and there is an empty line after each sentence. The first item on each line is a word, the second a part-of-speech (POS) tag, the third a syntactic chunk tag and the fourth the named entity tag. The chunk tags and the named entity tags have the format I-TYPE which means that the word is inside a phrase of type TYPE. Only if two phrases of the same type immediately follow each other, the first word of the second phrase will have tag B-TYPE to show that it starts a new phrase. A word with tag O is not part of a phrase.
The English data is a collection of news wire articles from the Reuters Corpus. The annotation has been done by people of the University of Antwerp. Because of copyright reasons we only make available the annotations. In order to build the complete data sets you will need access to the Reuters Corpus. It can be obtained for research purposes without any charge from NIST.
Results


References
Method
F1


Florian et al. (2003)
Combination of various machine-learning classifiers
88.76


Ando et al. (2005)
Semi-supervised approach
89.31


L Ratinov et al (2009)
Word-class Model
90.80


D Lin et al. (2009)
W500 + P125 + P64
90.90


Collobert et al. (2011)
NN+SLL+LM2
88.67


Collobert et al. (2011)
NN+SLL+LM2+Gazetteer
89.59


Suzuki et al. (2011)
L1CRF
91.02


Passos et al. (2014)
Baseline + Gaz + LexEmb
90.90


Huang et al. (2015)
BI-LSTM-CRF
90.10


JPC Chiu et al. (2015)
BLSTM-CNN + emb + lex
91.62


Luo et al. (2015)
JERL
91.20


Experiments


References
Method
Val. Accuracy
Epochs


CapsNet
Conll2003 dataset
96.47
10


CapsNet GRU
Rotten Tomatoes Dataset + Fasttext_300d
85.86
20


CapsNet


References


Named Entity Recognition with Bidirectional LSTM-CNNs (CL'15), JPC Chiu et al. [pdf]
Bidirectional LSTM-CRF Models for Sequence Tagging (EMNLP'15), Z Huang et al. [pdf]
Joint entity recognition and disambiguation (EMNLP '15), G Luo et al. [pdf]
Lexicon infused phrase embeddings for named entity resolution (ACL'14), A Passos et al. [pdf]
Learning condensed feature representations from large unsupervised data sets for supervised learning (ACL'11), J Suzuki et al. [pdf]
Natural Language Processing (Almost) from Scratch (CL'11), R Collobert et al. [pdf]
Design Challenges and Misconceptions in Named Entity Recognition (CoNLL'09), L Ratinov et al. [pdf]
Phrase Clustering for Discriminative Learning (ACL '09), D Lin et al. [pdf]
A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data (JMLR'05), RK Ando et al. [pdf]
Named Entity Recognition through Classifier Combination (HLT-NAACL'03), R Florian et al. [pdf]
Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition (CoNLL'03), EFTK Sang et al. [pdf]

See Also

☶ Named Entity Recognition (State of The Art)

https://www.clips.uantwerpen.be/conll2003/ner/
References	Method	F1
Florian et al. (2003)	Combination of various machine-learning classifiers	88.76
Ando et al. (2005)	Semi-supervised approach	89.31
L Ratinov et al (2009)	Word-class Model	90.80
D Lin et al. (2009)	W500 + P125 + P64	90.90
Collobert et al. (2011)	NN+SLL+LM2	88.67
Collobert et al. (2011)	NN+SLL+LM2+Gazetteer	89.59
Suzuki et al. (2011)	L1CRF	91.02
Passos et al. (2014)	Baseline + Gaz + LexEmb	90.90
Huang et al. (2015)	BI-LSTM-CRF	90.10
JPC Chiu et al. (2015)	BLSTM-CNN + emb + lex	91.62
Luo et al. (2015)	JERL	91.20
References	Method	Val. Accuracy	Epochs
CapsNet	Conll2003 dataset	96.47	10
CapsNet GRU	Rotten Tomatoes Dataset + Fasttext_300d	85.86	20
CapsNet