Skip to content

Instantly share code, notes, and snippets.

@thomaswp
Created June 24, 2018 14:41
Show Gist options
  • Save thomaswp/8c8ef19bd5203ce8b6cd4d6df5e3db44 to your computer and use it in GitHub Desktop.
Save thomaswp/8c8ef19bd5203ce8b6cd4d6df5e3db44 to your computer and use it in GitHub Desktop.

JSON-AST Format

Purpose

The purpose of this format is to provide a language-agnostic way of representing source code that can be compiled to an Abstract Syntax Tree (AST). The format is represented using JSON and should be able to capture the important properties of most code.

Format

All code in these datasets are represented as abstract syntax trees (ASTs), stored in a JSON format. Each JSON object represents a node in the AST, and has the following properties:

  • type [required]: The type of the node (e.g. "if-statement", "expression", "variable-declaration", etc.). In Snap, this could be the name of a built-in block (e.g. "forward", "turn"). The set of possible types is pre-defined by a given programming language, as they generally correspond to keywords. The possible types for a given language are defined in the grammar file for the dataset, discussed later.
  • value [optional]: This contains any user-defined value for the node, such as the identifier for a variable or function, the value of a literal, or the name of an imported module These are things the student names, and they could take any value. Note: In the Snap datasets, string literal values have been removed to anonymize the dataset; however, these values are generally not relevant for hint generation.
  • children [optional]: A map of this node's children, if any. In Python, the keys of the map indicate the relationship of the parent/child (e.g. a while loop might have a "condition" child). In the Snap dataset, they are simply numbers indicating the ordering of the children (e.g. arguments "0", "1" and "2"). The values are objects representing the children.
  • children-order [optional]: The order of this node's children, represented as an array of keys from the children map. This is necessary because JSON maps have no ordering, though the order of the children in the map should correspond to the correct order.
  • id [optional]: A trace-unqiue ID for the node that will be kept constant across ASTs in this trace. This is useful in block-based languages, for example, to identify a given block, even if it moves within the AST.

Example

Python Code:

def helloWorld():
    return 'Hello World!'

JSON-AST Output:

{
    "children": {
        "body": {
            "children": {
                "0": {
                    "children": {
                        "decorator_list": {
                            "type": "list"
                        },
                        "args": {
                            "children": {
                                "defaults": {
                                    "type": "list"
                                },
                                "kwarg": null,
                                "vararg": null,
                                "kwonlyargs": {
                                    "type": "list"
                                },
                                "kw_defaults": {
                                    "type": "list"
                                },
                                "args": {
                                    "type": "list"
                                }
                            },
                            "type": "arguments",
                            "childrenOrder": [
                                "args",
                                "vararg",
                                "kwonlyargs",
                                "kw_defaults",
                                "kwarg",
                                "defaults"
                            ]
                        },
                        "body": {
                            "children": {
                                "0": {
                                    "children": {
                                        "value": {
                                            "children": {},
                                            "value": "Hello World!",
                                            "type": "Str",
                                            "childrenOrder": []
                                        }
                                    },
                                    "type": "Return",
                                    "childrenOrder": [
                                        "value"
                                    ]
                                }
                            },
                            "type": "list",
                            "childrenOrder": [
                                "0"
                            ]
                        },
                        "returns": null
                    },
                    "value": "helloWorld",
                    "type": "FunctionDef",
                    "childrenOrder": [
                        "args",
                        "body",
                        "decorator_list",
                        "returns"
                    ]
                }
            },
            "type": "list",
            "childrenOrder": [
                "0"
            ]
        }
    },
    "id": "97ea8451df0420e0ec5e262a30365e3b",
    "correct": true,
    "type": "Module",
    "childrenOrder": [
        "body"
    ]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment