Skip to content

Instantly share code, notes, and snippets.

@ronykroy
Last active October 21, 2019 06:31
Show Gist options
  • Save ronykroy/b887b6805e8f38a2c316201fa2ba28e3 to your computer and use it in GitHub Desktop.
Save ronykroy/b887b6805e8f38a2c316201fa2ba28e3 to your computer and use it in GitHub Desktop.
NewsGropup20 data prep from sklearn
from sklearn.datasets import fetch_20newsgroups
dataset = fetch_20newsgroups(shuffle=True, random_state=1, remove=('headers', 'footers', 'quotes'))
documents = dataset.data
df = pd.DataFrame({'label':dataset.target, 'text':dataset.data})
df.rename({'label':'target','text':text},inplace=True) # renaming cols
from sklearn.model_selection import train_test_split
df_trn, df_test = train_test_split(df, stratify = df['label'], test_size = 0.15, random_state = 11)
df_trn, df_val = train_test_split(df_trn, stratify = df_trn['label'], test_size = 0.15,
random_state = 11)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment