Skip to content

Instantly share code, notes, and snippets.

@re4388
Created January 24, 2020 06:29
Show Gist options
  • Save re4388/6a1eef54ba0ed7509d422c3cb614990d to your computer and use it in GitHub Desktop.
Save re4388/6a1eef54ba0ed7509d422c3cb614990d to your computer and use it in GitHub Desktop.
a1
!wget https://raw.githubusercontent.com/Tony607/Keras-Text-Transfer-Learning/master/train_5500.txt
!wget https://raw.githubusercontent.com/Tony607/Keras-Text-Transfer-Learning/master/test_data.txt
def get_dataframe(filename):
lines = open(filename, 'r').read().splitlines()
data = []
for i in range(0, len(lines)):
label = lines[i].split(' ')[0]
label = label.split(":")[0]
text = ' '.join(lines[i].split(' ')[1:])
text = re.sub('[^A-Za-z0-9 ,\?\'\"-._\+\!/\`@=;:]+', '', text)
data.append([label, text])
df = pd.DataFrame(data, columns=['label', 'text'])
df.label = df.label.astype('category')
return df
df_train = get_dataframe('train_5500.txt')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment