Created
May 10, 2019 13:08
-
-
Save PrithivirajDamodaran/0b658bc73e5f50b1d0617698b6177444 to your computer and use it in GitHub Desktop.
Enron email dataset splitter/formatter
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Assumes you have the enron email dataset as emails.csv | |
import pandas as pd | |
data = pd.read_csv("emails.csv") | |
pd.set_option('display.max_colwidth',-1) | |
new = data["message"].str.split("\n", n = 15, expand = True) | |
data["from"] = new[2] | |
data["fromn"] = new[8] | |
data["to"] = new[3] | |
data["ton"] = new[9] | |
data["subject"] = new[4] | |
data["msg"] = new[15] | |
data.drop(columns =["message"], inplace = True) | |
data.drop(columns =["file"], inplace = True) | |
data['from'] = data["from"].apply(lambda val: val.replace("From:",'')) | |
data['fromn'] = data["fromn"].apply(lambda val: val.replace("X-From:",'')) | |
data['to'] = data["to"].apply(lambda val: val.replace("To:",'')) | |
data['ton'] = data["ton"].apply(lambda val: val.replace("X-To:",'')) | |
data['subject'] = data["subject"].apply(lambda val: val.replace("Subject:",'')) | |
data['msg'] = data["msg"].apply(lambda val: val.replace("\n",' ')) | |
# Lets look only at emails with 100 words or less and that are Non-replies | |
data[(data['msg'].str.len() <100) & ~(data['subject'].str.contains('Re:'))].sample(5) |
Hi,
How can I find a .csv version of enron dataset? Do you have any link?
Thanks!
Did you find the link?
Hi,
How can I find a .csv version of enron dataset? Do you have any link?
Thanks!Did you find the link?
Please check https://data.world/brianray/enron-email-dataset
Hi,
How can I find a .csv version of enron dataset? Do you have any link?
Thanks!Did you find the link?
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi,
How can I find a .csv version of enron dataset? Do you have any link?
Thanks!