Skip to content

Instantly share code, notes, and snippets.

@sharvaridhote
Created February 2, 2021 04:41
Show Gist options
  • Save sharvaridhote/040f9d0d97f9e52817664a3ec66c91d9 to your computer and use it in GitHub Desktop.
Save sharvaridhote/040f9d0d97f9e52817664a3ec66c91d9 to your computer and use it in GitHub Desktop.
Load Data
def load_data(df, split=0.2):
"""
Function From Spacy
Prepare the training data as per Spacy format
Parameters:
df: training data in pandas dataframe
split: float - Splitting dataframe to train and validation set. Defaults to 0.2
Returns:
tuples: train and validation text and labels
"""
# Shuffle the data
df_train = df_tolist(df)
random.shuffle(df_train)
texts, labels = zip(*df_train)
# get the categories for each sentence
cats = [{"POSITIVE": bool(y), "NEGATIVE": not bool(y)} for y in labels]
# Splitting the training and evaluation data
split = int(len(df_train) * split)
return (texts[:split], cats[:split]), (texts[split:], cats[split:])
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment