Skip to content

Instantly share code, notes, and snippets.

@edsu
Created July 26, 2023 16:01
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save edsu/45e84854bb5ce8747b5a0c9d8133cf39 to your computer and use it in GitHub Desktop.
Save edsu/45e84854bb5ce8747b5a0c9d8133cf39 to your computer and use it in GitHub Desktop.
A little example of writing files as resource records to a WARC file.
from warcio.warcwriter import WARCWriter
with open('test.warc.gz', 'wb') as output:
writer = WARCWriter(output, gzip=True)
# write some metadata for the warc as a info record
rec = writer.create_warcinfo_record('test.warc.gz', {
'software': 'warcio',
'description': 'An example of packaging up two images in a WARC'
})
writer.write_record(rec)
# add image1.jpeg to the warc file
rec = writer.create_warc_record(
'file:image1.jpeg',
record_type='resource',
warc_content_type='image/jpeg',
payload=open('image1.jpeg', 'rb')
)
writer.write_record(rec)
# add image2.jpeg to the warc file
rec = writer.create_warc_record(
'file:image2.jpeg',
record_type='resource',
warc_content_type='image/jpeg',
payload=open('image2.jpeg', 'rb')
)
writer.write_record(rec)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment