Skip to content

Instantly share code, notes, and snippets.

@fbender
Forked from craigds/xml-to-har.py
Last active September 14, 2020 11:19
Show Gist options
  • Save fbender/100a9bc3da9733fcb8c6274b51005c6e to your computer and use it in GitHub Desktop.
Save fbender/100a9bc3da9733fcb8c6274b51005c6e to your computer and use it in GitHub Desktop.
Convert Internet Explorer 'capture network traffic' XML to a HAR file (by @craigds, force-unicode-version).
#!/usr/bin/env python
"""
Converts Internet Explorer 'capture network traffic' XML to a HAR file.
Turns out that XML is just a HAR file anyways, but in XML form. So this
just converts it to JSON, and Bob's your uncle.
Requires Python 2.7+ and LXML.
"""
from __future__ import unicode_literals
import argparse
import json
from lxml import objectify
import sys
if sys.version_info > (3,):
str_type = str
else:
str_type = unicode
list_things = {
'pages',
'entries',
'cookies',
'queryString',
'headers',
}
def xml_to_dict(element):
if element.tag in list_things:
return [xml_to_dict(e) for e in element.getchildren()]
else:
if element.getchildren():
return {e.tag: xml_to_dict(e) for e in element.getchildren()}
else:
return str_type(element.pyval)
def main():
parser = argparse.ArgumentParser(description="Convert IE's crazy XML-HAR into a real HAR file")
parser.add_argument('infile', type=argparse.FileType('r', encoding='UTF-8'), default=sys.stdin)
parser.add_argument('outfile', type=argparse.FileType('w', encoding='UTF-8'), default=sys.stdout)
args = parser.parse_args()
tree = objectify.parse(args.infile)
root = tree.getroot()
d = {root.tag: xml_to_dict(root)}
json.dump(d, args.outfile, indent=2, sort_keys=True)
if __name__ == '__main__':
main()
@moos
Copy link

moos commented Aug 2, 2019

Hmmm...

File "/Users/me/dev/xml-to-har.py", line 45, in main
    parser.add_argument('infile', type=argparse.FileType('r', encoding='UTF-8'), default=sys.stdin)
TypeError: __init__() got an unexpected keyword argument 'encoding'

$ python --version
Python 2.7.10

$ pip show lxml
DEPRECATION: Python 2.7 will reach the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 won't be maintained after that date. A future version of pip will drop support for Python 2.7.
Name: lxml
Version: 4.4.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment