Skip to content

Instantly share code, notes, and snippets.

@barbu110
Created April 18, 2024 09:47
Show Gist options
  • Save barbu110/a55670cdb8e0e2f402099ef46b64e285 to your computer and use it in GitHub Desktop.
Save barbu110/a55670cdb8e0e2f402099ef46b64e285 to your computer and use it in GitHub Desktop.
Lark grammar to parse logs of objects in Java application

Usually, Java applications print logs of objects in a format similar to:

[Foo(bar=baz, baz=[a, b, c], doo={a=b, b=c})]

This is not a standardized approach (to my knowledge) and it's usually quite difficult to read for large objects or lists of objects being printed.

The grammar here is hopefully working for most use cases, but is most likely going to serve just as a starting point.

list: ("[" list_item (LIST_SEPARATOR list_item)* "]" WS?) | ("[" WS? "]" WS?)
list_item: WS? object_instance WS?
object_instance: IDENT WS? object_params
object_params: empty_object_params | non_empty_object_params
empty_object_params: "(" WS? ")"
non_empty_object_params: "(" assignment_list_non_empty ")"
assignment_list_non_empty: object_param (LIST_SEPARATOR object_param)*
object_param: WS? object_param_name WS? "=" WS? object_param_value WS?
object_param_name: IDENT
object_param_value: WS? (value_null | value_bool | value_int | value_str | value_dict | value_optional | value_list) WS?
value_null: "null"
value_bool: "true" | "false"
value_int: /-?[0-9]+/
value_str: /[a-zA-Z0-9-_.+@]+/
value_dict: value_dict_empty | value_dict_non_empty
value_dict_empty: "{" WS? "}"
value_dict_non_empty: "{" assignment_list_non_empty "}"
value_list: value_list_inner_empty | value_list_inner_non_empty
value_list_inner_empty: "[" WS? "]"
value_list_inner_non_empty: "[" object_param_value (LIST_SEPARATOR object_param_value)* "]"
value_optional: "Optional[" object_param_value "]"
LIST_SEPARATOR: WS? "," WS?
WS: /\s+/
IDENT: /[a-zA-Z-_][a-zA-Z0-9-_.]*/
VALUE: /[a-zA-Z0-9-{}]+/
from lark import Lark
from lark.visitors import Visitor_Recursive
from pprint import pprint
with open("log-parser.lark", "r") as grammar_file:
parser = Lark(grammar_file, start=["list"], debug=True)
with open("log-expression", "r") as log_expression:
expression = log_expression.read()
print(expression)
tree = parser.parse(expression)
print(tree.pretty(" "))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment