Skip to content

Instantly share code, notes, and snippets.

@craSH
Created March 29, 2011 14:49
Show Gist options
  • Star 7 You must be signed in to star a gist
  • Fork 3 You must be signed in to fork a gist
  • Save craSH/892479 to your computer and use it in GitHub Desktop.
Save craSH/892479 to your computer and use it in GitHub Desktop.
Parse a HAR (HTTP Archive) and return URLs which resulted in a given HTTP response code
#!/usr/bin/env python
"""
Parse a HAR (HTTP Archive) and return URLs which resulted in a given HTTP response code
HAR Spec: http://groups.google.com/group/http-archive-specification/web/har-1-2-spec
Copyleft 2010 Ian Gallagher <crash@neg9.org>
Example usage: ./har_response_urls.py foo.har 404
"""
import json
if '__main__' == __name__:
import sys
if len(sys.argv) < 3:
print "Usage: %s <har_file> <HTTP response code>" % sys.argv[0]
sys.exit(1)
har_file = sys.argv[1]
response_code = int(sys.argv[2])
# Read HAR archive (skip over binary header if present - Fiddler2 exports contain this)
har_data = open(har_file, 'rb').read()
skip = 3 if '\xef\xbb\xbf' == har_data[:3] else 0
har = json.loads(har_data[skip:])
matching_entries = filter(lambda x: response_code == x['response']['status'], har['log']['entries'])
matching_urls = set(map(lambda x: x['request']['url'], matching_entries))
print >>sys.stderr, "URLs which resulted in an HTTP %d response:" % response_code
for url in matching_urls:
print url
@sruti842
Copy link

sruti842 commented Jun 13, 2020

@craSH Thank you for your example. I was implementing browsermob-proxy and trying to parse through har file.
I have a question. How would I change the matching_entries to check for status_codes between 400 and 499 ?
matching_entries = filter(lambda x: status_code_min >= int(x['response']['status']) and status_code_max <= int(x['response']['status']), json_dict['log']['entries'])

This doesn't wont work as lambda doesn't allow more than 2 arguments.

@craSH
Copy link
Author

craSH commented Jun 13, 2020

Hi @sruti842 ! Ping on a nice old gist :)

You can use python's in range(..) as the lambda argument here, like so:

status_min = 400
status_max = 499

matching_entries = filter(lambda x: response_code in range(status_min, status_max + 1), har['log']['entries'])

That should return anything with status codes between 400 and 499, inclusive. Keep in mind that the upper bound to range() is excluded, so you must add one in order to have that behavior. Adjust as desired for your coding style. Here's a little example that you can just run in python or ipython to see the pattern in action:

Python 2.7.17 (default, Nov  7 2019, 10:07:09)
Type "copyright", "credits" or "license" for more information.

IPython 5.9.0 -- An enhanced Interactive Python.

In [1]: status_min = 400
In [2]: status_max = 499
In [3]: x = { 'response': { 'status': 110 } }
In [4]: y = { 'response': { 'status': 450 } }
In [5]: z = { 'response': { 'status': 503 } }
In [6]: filter(lambda x: int(x['response']['status']) in range (status_min, status_max + 1), [x, y, z])

Out[6]: [{'response': {'status': 450}}]

@sruti842
Copy link

sruti842 commented Jun 13, 2020

Thank you @craSH . It worked for what I was implementing !.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment