Skip to content

Instantly share code, notes, and snippets.

@N0taN3rd
Created June 14, 2018 17:57
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save N0taN3rd/2559ad6fd17831c9259a8af6f38ec027 to your computer and use it in GitHub Desktop.
Save N0taN3rd/2559ad6fd17831c9259a8af6f38ec027 to your computer and use it in GitHub Desktop.
from pywb.warcserver.index.cdxobject import CDXObject
def read_cdxj(path):
with open(path, 'rb') as cdxjin:
for line in cdxjin:
cdx = CDXObject(line)
if 'html' in cdx.get('mime') and "200" == cdx.get('status'):
print(cdx.get('url'))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment