Skip to content

Instantly share code, notes, and snippets.

@williballenthin
Created January 27, 2021 22:25
Show Gist options
  • Save williballenthin/2b39c0f917eba7c8d4b9a90def49ddb6 to your computer and use it in GitHub Desktop.
Save williballenthin/2b39c0f917eba7c8d4b9a90def49ddb6 to your computer and use it in GitHub Desktop.
sort the given jsonl file by the given key, writing the output to STDOUT.
"""
sort the given jsonl document (distinct json documents separated by newline)
by the given key, writing the output to STDOUT.
example:
python sort-jsonl-by-key.py log.jsonl "timestamp"
this does require reading the entire document into memory, first.
a future revision could maybe use a mmap to avoid keeping things in memory.
"""
import re
import sys
import json
with open(sys.argv[1], "rb") as f:
buf = f.read().decode("utf-8")
key = sys.argv[2]
lines = []
for match in re.finditer(r"^(.*)$", buf, re.M):
if not match:
continue
line = buf[match.start():match.end()]
if not line:
continue
linedoc = json.loads(line)
linekey = linedoc[key]
lines.append((linekey, match.start(), match.end()))
lines.sort()
for _, start, end in lines:
print(buf[start:end])
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment