Skip to content

Instantly share code, notes, and snippets.

@davidlenz
Last active March 4, 2023 15:09
Show Gist options
  • Star 7 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save davidlenz/deff6cc7405d58efa32f4dfe12a6db8b to your computer and use it in GitHub Desktop.
Save davidlenz/deff6cc7405d58efa32f4dfe12a6db8b to your computer and use it in GitHub Desktop.
20 newsgroup dataset from sklearn to csv.
from sklearn.datasets import fetch_20newsgroups
import pandas as pd
def twenty_newsgroup_to_csv():
newsgroups_train = fetch_20newsgroups(subset='train', remove=('headers', 'footers', 'quotes'))
df = pd.DataFrame([newsgroups_train.data, newsgroups_train.target.tolist()]).T
df.columns = ['text', 'target']
targets = pd.DataFrame( newsgroups_train.target_names)
targets.columns=['title']
out = pd.merge(df, targets, left_on='target', right_index=True)
out['date'] = pd.to_datetime('now')
out.to_csv('20_newsgroup.csv')
twenty_newsgroup_to_csv()
@Adamthe1st
Copy link

Thank you so much! this was helpful

@Rishabh-creator601
Copy link

this is too good , you acan visit my github also github.com/rishabh-creator601

@JumpingDino
Copy link

Hi guys! If someone needs the data itself in .csv you can download here:
https://github.com/JumpingDino/datasets/blob/master/20newsgroup/20_newsgroup.csv

Thanks a lot for the code david :D !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment