Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Check if two JSON objects are the same by first ordering them
import json, os
# Put filenames here; this script assumes these files are in the same dir as the script
FILENAME_1 = "2.json"
FILENAME_2 = "3.json"
def ordered(obj):
if isinstance(obj, dict):
return sorted((k, ordered(v)) for k, v in obj.items())
if isinstance(obj, list):
return sorted(ordered(x) for x in obj)
else:
return obj
def main():
files = [FILENAME_1, FILENAME_2]
ordered_files = []
for filename in files:
path = os.path.join(os.path.dirname(__file__), filename)
with open(path) as f:
file_parsed = json.load(f)
file_ordered = ordered(file_parsed)
ordered_files.append(file_ordered)
new_path = os.path.join(os.path.dirname(__file__), f"{os.path.splitext(filename)[0]}_prettier.json")
with open(new_path, "w+") as new_file:
json.dump(file_ordered, new_file, indent=4, sort_keys=True)
print(ordered_files[0] == ordered_files[1])
if __name__ == '__main__':
main()
@biancadanforth
Copy link
Author

biancadanforth commented Nov 5, 2019

This is a helper script I made while reviewing @danielhertenstein's FathomFox PR to parallelize the Vectorizer. I wanted to know if the resulting vectors.json files in both the serialized Vectorizer and parallelized Vectorizer were identical for the same samples and same ruleset. Since the parallelized Vectorizer can finish pages in a different order, I needed to sort each JSON object first before making a comparison. Thankfully the two outputs were the same.

Edit: Credit for the ordered function is from this Stack Overflow post.

@biancadanforth
Copy link
Author

Also thanks to @mythmon for giving this a look over! My Python skills are quite basic. Latest revision (6) with his feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment