Skip to content

Instantly share code, notes, and snippets.

@haridas
Created November 2, 2018 03:24
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save haridas/00ab60e5d636b32a3432686afc31ac76 to your computer and use it in GitHub Desktop.
Save haridas/00ab60e5d636b32a3432686afc31ac76 to your computer and use it in GitHub Desktop.
NLP pre-processing - Remove unicode chars from text
import glob
import pandas as pd
files = glob.glob('out-*.json')
def remove_unicode_char(file_name):
f = open(file_name, 'rb').read()
with open(file_name, 'w') as nf:
nf.write(f.decode(encoding="ascii", errors="ignore"))
print ("=> ", file_name)
def json_to_csv(file_name):
df = pd.read_json(file_name)
print ("=> ", file_name)
df.to_csv("".join(file_name.split(".")[:-1]) + ".csv")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment