Skip to content

Instantly share code, notes, and snippets.

@leandrosilva
Last active March 17, 2023 19:05
Show Gist options
  • Save leandrosilva/3651640 to your computer and use it in GitHub Desktop.
Save leandrosilva/3651640 to your computer and use it in GitHub Desktop.
Parsing Syslog files with Python and PyParsing
$ python xlog.py sample.log
{'appname': 'test.app', 'timestamp': '2012-09-06 15:19:32', 'hostname': 'codezone.local', 'pid': '68898', 'priority': '132', 'message': 'bla bla bla warn'}
{'appname': 'test.app', 'timestamp': '2012-09-06 15:19:32', 'hostname': 'codezone.local', 'pid': '68902', 'priority': '131', 'message': 'bla bla bla error'}
{'appname': 'Dock', 'timestamp': '2012-09-06 15:19:32', 'hostname': 'codezone.local', 'pid': '154', 'priority': '11', 'message': 'CGSReleaseWindowList: called with 5 invalid window(s)'}
{'appname': 'WindowServer', 'timestamp': '2012-09-06 15:19:32', 'hostname': 'codezone.local', 'pid': '79', 'priority': '11', 'message': 'CGXSetWindowListAlpha: Invalid window 0'}
$ python xlog.py sample.log | grep test.app
{'priority': '132', 'timestamp': '2020-03-04 12:42:40', 'hostname': 'codezone.local', 'appname': 'test.app', 'pid': '68898', 'message': 'bla bla bla warn'}
{'priority': '131', 'timestamp': '2020-03-04 12:42:40', 'hostname': 'codezone.local', 'appname': 'test.app', 'pid': '68902', 'message': 'bla bla bla error'}
<132>Sep 6 14:35:48 codezone.local test.app[68898]: bla bla bla warn
<131>Sep 6 14:35:58 codezone.local test.app[68902]: bla bla bla error
<11>Sep 6 14:37:53 codezone.local Dock[154]: CGSReleaseWindowList: called with 5 invalid window(s)
<11>Sep 6 14:38:09 codezone.local WindowServer[79]: CGXSetWindowListAlpha: Invalid window 0
import sys
from pyparsing import Word, alphas, Suppress, Combine, nums, string, Optional, Regex
from time import strftime
class Parser(object):
def __init__(self):
ints = Word(nums)
# priority
priority = Suppress("<") + ints + Suppress(">")
# timestamp
month = Word(string.ascii_uppercase , string.ascii_lowercase, exact=3)
day = ints
hour = Combine(ints + ":" + ints + ":" + ints)
timestamp = month + day + hour
# hostname
hostname = Word(alphas + nums + "_" + "-" + ".")
# appname
appname = Word(alphas + "/" + "-" + "_" + ".") + Optional(Suppress("[") + ints + Suppress("]")) + Suppress(":")
# message
message = Regex(".*")
# pattern build
self.__pattern = priority + timestamp + hostname + appname + message
def parse(self, line):
parsed = self.__pattern.parseString(line)
payload = {}
payload["priority"] = parsed[0]
payload["timestamp"] = strftime("%Y-%m-%d %H:%M:%S")
payload["hostname"] = parsed[4]
payload["appname"] = parsed[5]
payload["pid"] = parsed[6]
payload["message"] = parsed[7]
return payload
""" --------------------------------- """
def main():
parser = Parser()
if len(sys.argv) == 1:
print("Usage:\n $ python xlog.py ./sample.log")
exit(666)
syslogPath = sys.argv[1]
with open(syslogPath) as syslogFile:
for line in syslogFile:
fields = parser.parse(line)
print(fields)
if __name__ == "__main__":
main()
@inp2
Copy link

inp2 commented Jul 21, 2016

This great!

@NitinMahajan1
Copy link

Instead of current time stamp how can change below in above code to take time-stamp from the syslog file

strftime("%Y-%m-%d %H:%M:%S")

@msjuck
Copy link

msjuck commented Feb 28, 2019

payload["timestamp"] = strftime("%Y-%m-%d %H:%M:%S")

this code mean, print timestamp of a server's time running this script, not a time of syslog message.

@codyroche
Copy link

Hey leandrosilva, thanks for posting this. It helped me get started with pyparsing!

One note I'd add to save someone else some time is that string.uppercase and string.lowercase aren't valid with current pyparsing versions. Replaced by string.ascii_uppercase and string.ascii_lowercase.

In hindsight, it's obvious, but since I was just learning pyparsing took me a bit of research to find!

@leandrosilva
Copy link
Author

Thanks mate.
I did update it.
Cheers.

@codyroche
Copy link

Hey NitinMahajan1 and msjuck,

If you want the date from the syslog you can change the parser payload to concatenate the timestamp like this.

with f-strings:
payload["timestamp"] = f"{parsed[1]} {parsed[2]} {parsed[3]}"

or with .format for those that can't run Python 3.6 or don't like f-strings (I won't judge):
payload["timestamp"] = "{} {} {}".format(parsed[1], parsed[2], parsed[3])

@yosiasz
Copy link

yosiasz commented Mar 17, 2023

amazing! just what I was looking for. thanks so much!
Hoping to use a config file so that the patterns are not done inline but based on a config file you can call as

python xlog.py sample.log --config.file=config.yaml

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment