Created
January 9, 2018 17:43
-
-
Save desh2608/7638f094f8752470584d2a7b1a01dd6a to your computer and use it in GitHub Desktop.
How to load from the numpy file
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import numpy as np | |
import pandas as pd | |
with open('./holographic.npz') as d: | |
indices = d['arr_0'] | |
X_train = d['arr_1'] | |
X_val = d['arr_2'] | |
y_train = d['arr_3'] | |
y_val = d['arr_4'] | |
""" | |
While loading data in the dataframe, some lines are incorrectly read, i.e., their tweet length is >140 since multiple tweets are read as single record. I have removed these records as: | |
""" | |
data = pd.read_csv("./datasets/train/SemEval2018-T3-train-taskA_emoji.txt", sep="\t") | |
data = data[data['Tweet text'].map(len)<=140] | |
# Now you can use the "indices" on the lists in this data dictionary. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment