Skip to content

Instantly share code, notes, and snippets.

@justindavies
Created January 28, 2020 06:38
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save justindavies/512f5368fdc9ce2c3d10e33a04419f34 to your computer and use it in GitHub Desktop.
Save justindavies/512f5368fdc9ce2c3d10e33a04419f34 to your computer and use it in GitHub Desktop.
import sys
import json
fo = open(sys.argv[1], "r")
lines = fo.readlines()
for line in lines:
line =json.loads(line)
if "labels" in line:
line["entities"] = line.pop("labels")
else:
line["entities"] = []
tmp_ents = []
for e in line["entities"]:
if e[2] in ['RISK', 'ORG', 'GPE', 'DATE', 'LAW', 'CARDINAL', 'MONEY', 'PRODUCT', 'ORDINAL', 'PERCENT', 'LOC', 'NORP', 'EVENT', 'WORK_OF_ART', 'FAC', 'PERSON', 'TIME']:
tmp_ents.append({"start": e[0], "end": e[1], "label": e[2]})
line["entities"] = tmp_ents
if (len(line["text"]) > 5):
print (json.dumps({"entities": line["entities"], "text": line["text"]}))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment