Skip to content

Instantly share code, notes, and snippets.

@flaschbier
Last active April 1, 2016 02:04
Show Gist options
  • Save flaschbier/d45a32994df127c22e66 to your computer and use it in GitHub Desktop.
Save flaschbier/d45a32994df127c22e66 to your computer and use it in GitHub Desktop.

If the file you are showing us is really huge, you will not be able to read it right away via the json module like so:

with open('users.txt') as f:    
    users = json.load(f)

Your impression of huge, however, may be different from the impression your computer has, so give it a try even if it's a few thousands of users in your file.

Now for the huge case. Here one big read will not be possible and you have to process the data line by line. Parsing JSON yourself is not a good idea, but when you rely on certain patterns that your input data show, you can write a simple iteration that will process each user block individually:

import json

def user(li):
    """handle JSON fragment for one record. li ist list of strings"""
    s = " ".join(li)
    # deal with trailing commas for all but the last record
    if s.endswith(","):
        s = s[:-1]
    u = json.loads(s) # assumes well-formed json
    # put it in a database or so...
    print u

with open("users.txt", "r") as f:
    buf = list() # buffer to collect all lines for one record
    for line in f:
        line = line.strip()
        if not line: # empty line indicates end of record
            user(buf)
            buf = list() # flush buffer
        else:
            buf.append(line)

# make up the room
if len(buf) > 0:
    user(buf)
# assuming that input was (almost) well-formed JSON, we are done

This is assuming that the users are separated by an empty line and that all JSON is well formed. All kinds of flaws in the data format will right away raise an exception, so for production you will have to add proper error handling.

Also, if the file is huge you will not be able to keep all users in memory, so you will want to change the user() function into writing the users to a database or so.

@flaschbier
Copy link
Author

The answer was already in the editor when the question was closed...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment