Skip to content

Instantly share code, notes, and snippets.

@relsqui
Last active May 5, 2016 23:15
Show Gist options
  • Save relsqui/2752a284125a3b2dccaf6f0a0d63e791 to your computer and use it in GitHub Desktop.
Save relsqui/2752a284125a3b2dccaf6f0a0d63e791 to your computer and use it in GitHub Desktop.
Reads set files as formatted by mtgjson on stdin and outputs a concordancer-ready text file of non-vanilla cards with metadata in angle brackets.
#!/bin/bash
for set in *.json; do
echo "$set ..." >&2
# unix2dos converts line endings for the sake of Windows-based concordancers
# it's in the dos2unix package (named for the opposite tool)
cat $set | ./mtgcorpus.py | unix2dos > $(basename $set .json).txt
done
echo "done" >&2
#!/usr/bin/python3
import json
import sys
# Parse JSON from stdin.
edition = json.loads(sys.stdin.read())
# Start with headers giving some information about the set.
print("""<Set: {}>
<Block: {}>
<Release Date: {}>
<Border: {}>""".format(edition["name"], edition.get("block", "n/a"), edition["releaseDate"], edition["border"]))
for card in edition["cards"]:
# Only include cards that have any rules text.
if "text" in card:
# The concordancers I'm using are bad at Unicode, let's go easy on them.
# (These characters come up on subtypes and modal spells.)
types = card["type"].replace("—", "-")
rules = card["text"].replace("—", "-").replace("•", "*")
# We find more interesting collocates if we abstract out card names.
rules = rules.replace(card["name"], "~")
# Hide reminder text by putting it in angle brackets.
rules = rules.replace("(", "<(").replace(")", ")>")
# Split the rules into lines to find ability words and hide them too.
lines = rules.splitlines()
for i, line in enumerate(lines):
if "-" in line and not line.startswith("Choose"):
lines[i] = "<" + line.replace("- ", "-> ")
rules = "\n".join(lines)
print()
# Add more metadata: name, cost, types, rarity.
print("<{}>".format(card["name"]))
print("<{} / {} / {}>".format(card.get("manaCost", ""), types, card["rarity"]))
# The only concordancer-visible content is the actual rules.
print(rules)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment