Skip to content

Instantly share code, notes, and snippets.

@laibamehnaz
Created January 4, 2019 11:12
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save laibamehnaz/2df0352b0366fdc465942379ebc66608 to your computer and use it in GitHub Desktop.
Save laibamehnaz/2df0352b0366fdc465942379ebc66608 to your computer and use it in GitHub Desktop.
import gzip
import sys
import logging
print("BEGIN NOTEBOOK")
print("OPENING GZIP FILE")
#sys.stdout.write()
with gzip.open ('/mnt/data/rajiv/AmazonProductReviewDataset/all.txt.gz' , 'rt') as f:
file_content = f.read()
print("FILE READ")
## LIST OF SEPERATED SENTENCES
b = str(file_content).split('review/text')
texts = " "
for i in range(len(b)):
if i == 0:
continue
texts = texts + '\n' + b[i].split('\\n')[0]
with open('/home/rajivratn/laiba/AmazonReviews/Amazon_Reviews_Seperated.txt', 'w') as f:
f.write(texts)
print("FILE WRITTEN")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment