Last active
March 4, 2023 15:09
-
-
Save davidlenz/deff6cc7405d58efa32f4dfe12a6db8b to your computer and use it in GitHub Desktop.
20 newsgroup dataset from sklearn to csv.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from sklearn.datasets import fetch_20newsgroups | |
import pandas as pd | |
def twenty_newsgroup_to_csv(): | |
newsgroups_train = fetch_20newsgroups(subset='train', remove=('headers', 'footers', 'quotes')) | |
df = pd.DataFrame([newsgroups_train.data, newsgroups_train.target.tolist()]).T | |
df.columns = ['text', 'target'] | |
targets = pd.DataFrame( newsgroups_train.target_names) | |
targets.columns=['title'] | |
out = pd.merge(df, targets, left_on='target', right_index=True) | |
out['date'] = pd.to_datetime('now') | |
out.to_csv('20_newsgroup.csv') | |
twenty_newsgroup_to_csv() |
this is too good , you acan visit my github also github.com/rishabh-creator601
Hi guys! If someone needs the data itself in .csv you can download here:
https://github.com/JumpingDino/datasets/blob/master/20newsgroup/20_newsgroup.csv
Thanks a lot for the code david :D !
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Thank you so much! this was helpful