Skip to content

Instantly share code, notes, and snippets.

@michelkana
Created July 26, 2019 15:08
Show Gist options
  • Save michelkana/4f6acd0b08f859792dcefcd3092cbb62 to your computer and use it in GitHub Desktop.
Save michelkana/4f6acd0b08f859792dcefcd3092cbb62 to your computer and use it in GitHub Desktop.
max_word_len = df.yb.str.len().max()
max_word_len_utf8 = df.yb_utf8.str.len().max()
nb_labels = len(df.word_type.unique())
nb_words = df.shape[0]
print("Number of words: ", nb_words)
print("Number of labels: ", nb_labels)
print("Max word length: {} characters and {} bytes".format(max_word_len, max_word_len_utf8))
@pancodia
Copy link

pancodia commented Nov 8, 2021

Further, can I ask how do we usually handle the situation where the training data misses some label categories while missing categories could happen in production?

@michelkana
Copy link
Author

@pancodia thanks for getting back to me. Sorry for the late reply. I was traveling. Did you find a fix? If yes, can you share or do you still need help?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment