Skip to content

Instantly share code, notes, and snippets.

@morrisalp
Created November 19, 2019 17:22
Show Gist options
  • Save morrisalp/006275e076a9cc6e57650a8e18069526 to your computer and use it in GitHub Desktop.
Save morrisalp/006275e076a9cc6e57650a8e18069526 to your computer and use it in GitHub Desktop.
load CONLL2003 dataset using Pandas
import pandas as pd
def read_conll(filename):
df = pd.read_csv(filename,
sep = ' ', header = None, keep_default_na = False,
names = ['TOKEN', 'POS', 'CHUNK', 'NE'],
quoting = 3, skip_blank_lines = False)
df['SENTENCE'] = (df.TOKEN == '').cumsum()
return df[df.TOKEN != '']
train_df = read_conll('conll2003/train.txt')
valid_df = read_conll('conll2003/valid.txt')
test_df = read_conll('conll2003/test.txt')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment