Skip to content

Instantly share code, notes, and snippets.

View ianporada's full-sized avatar

Ian Porada ianporada

View GitHub Profile
"""Detokenizes the raw ECMT Korean coref data and uploads to Hugging Face."""
from kiwipiepy import Kiwi
kiwi = Kiwi()
def detokenize(tokens):
token_tuples = [(tok["text"], tok["xpos"]) for tok in tokens]
return kiwi.join(token_tuples, return_positions=True, lm_search=True)