Created
August 1, 2015 07:57
-
-
Save anjackson/382598643813b9ebecc3 to your computer and use it in GitHub Desktop.
Example WARC from wget
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
WARC/1.0 | |
WARC-Type: warcinfo | |
Content-Type: application/warc-fields | |
WARC-Date: 2015-07-31T16:32:22Z | |
WARC-Record-ID: <urn:uuid:CD4DD5EA-710A-43A4-9E75-2238B9664926> | |
WARC-Filename: humans.warc.gz | |
WARC-Block-Digest: sha1:AARITJBDT4LFDLBOUU63IJAD2MK7WFL3 | |
Content-Length: 241 | |
software: Wget/1.16.3 (darwin14.1.0) | |
format: WARC File Format 1.0 | |
conformsTo: http://bibnum.bnf.fr/WARC/WARC_ISO_28500_version1_latestdraft.pdf | |
robots: classic | |
wget-arguments: "--warc-file=humans" "https://www.google.com/humans.txt" | |
WARC/1.0 | |
WARC-Type: request | |
WARC-Target-URI: https://www.google.com/humans.txt | |
Content-Type: application/http;msgtype=request | |
WARC-Date: 2015-07-31T16:32:23Z | |
WARC-Record-ID: <urn:uuid:A69CB3BD-6FED-40A8-891A-A9D0BA5A7577> | |
WARC-IP-Address: 74.125.24.106 | |
WARC-Warcinfo-ID: <urn:uuid:CD4DD5EA-710A-43A4-9E75-2238B9664926> | |
WARC-Block-Digest: sha1:A4JBGO5JITBQTSACSH347OTILSQAOEQZ | |
Content-Length: 154 | |
GET /humans.txt HTTP/1.1 | |
User-Agent: Wget/1.16.3 (darwin14.1.0) | |
Accept: */* | |
Accept-Encoding: identity | |
Host: www.google.com | |
Connection: Keep-Alive | |
WARC/1.0 | |
WARC-Type: response | |
WARC-Record-ID: <urn:uuid:EBEED9E3-7FD9-4BBA-B82B-3C801069F459> | |
WARC-Warcinfo-ID: <urn:uuid:CD4DD5EA-710A-43A4-9E75-2238B9664926> | |
WARC-Concurrent-To: <urn:uuid:A69CB3BD-6FED-40A8-891A-A9D0BA5A7577> | |
WARC-Target-URI: https://www.google.com/humans.txt | |
WARC-Date: 2015-07-31T16:32:23Z | |
WARC-IP-Address: 74.125.24.106 | |
WARC-Block-Digest: sha1:TXMPRCEB7AODZHBD5C27W6VG6ZJSOVDB | |
WARC-Payload-Digest: sha1:VG5HLEL4BBSYEVABD3URLB5ZW7O3XBOY | |
Content-Type: application/http;msgtype=response | |
Content-Length: 687 | |
HTTP/1.1 200 OK | |
Vary: Accept-Encoding | |
Content-Type: text/plain | |
Last-Modified: Tue, 11 Mar 2014 22:11:10 GMT | |
Date: Fri, 31 Jul 2015 16:32:23 GMT | |
Expires: Fri, 31 Jul 2015 16:32:23 GMT | |
Cache-Control: private, max-age=0 | |
X-Content-Type-Options: nosniff | |
Server: sffe | |
X-XSS-Protection: 1; mode=block | |
Alternate-Protocol: 443:quic,p=1 | |
Accept-Ranges: none | |
Transfer-Encoding: chunked | |
11e | |
Google is built by a large team of engineers, designers, researchers, robots, and others in many different sites across the globe. It is updated continuously, and built with more tools and technologies than we can shake a stick at. If you'd like to help us out, see google.com/careers. | |
0 | |
WARC/1.0 | |
WARC-Type: metadata | |
WARC-Record-ID: <urn:uuid:B0B3862C-B271-4670-A4B5-B127576C6118> | |
WARC-Warcinfo-ID: <urn:uuid:CD4DD5EA-710A-43A4-9E75-2238B9664926> | |
WARC-Target-URI: metadata://gnu.org/software/wget/warc/MANIFEST.txt | |
WARC-Date: 2015-07-31T16:32:23Z | |
WARC-Block-Digest: sha1:NEIRM547MH3YUQMT75OCLPB7ERKNBQHL | |
Content-Type: text/plain | |
Content-Length: 48 | |
<urn:uuid:CD4DD5EA-710A-43A4-9E75-2238B9664926> | |
WARC/1.0 | |
WARC-Type: resource | |
WARC-Record-ID: <urn:uuid:2491AF6D-D1AA-4072-8893-4C3DF2C6E0AF> | |
WARC-Warcinfo-ID: <urn:uuid:CD4DD5EA-710A-43A4-9E75-2238B9664926> | |
WARC-Concurrent-To: <urn:uuid:B0B3862C-B271-4670-A4B5-B127576C6118> | |
WARC-Target-URI: metadata://gnu.org/software/wget/warc/wget_arguments.txt | |
WARC-Date: 2015-07-31T16:32:23Z | |
WARC-Block-Digest: sha1:PXARGWNHTQPELVXL6XZFQWOCBB5KIGIQ | |
Content-Type: text/plain | |
Content-Length: 58 | |
"--warc-file=humans" "https://www.google.com/humans.txt" | |
WARC/1.0 | |
WARC-Type: resource | |
WARC-Record-ID: <urn:uuid:ACDDCC95-8802-432E-991F-2B4F1037A63B> | |
WARC-Warcinfo-ID: <urn:uuid:CD4DD5EA-710A-43A4-9E75-2238B9664926> | |
WARC-Concurrent-To: <urn:uuid:B0B3862C-B271-4670-A4B5-B127576C6118> | |
WARC-Target-URI: metadata://gnu.org/software/wget/warc/wget.log | |
WARC-Date: 2015-07-31T16:32:23Z | |
WARC-Block-Digest: sha1:XREPNKKJBBSGFL37GKDOEJFI2DMVZBMT | |
Content-Type: text/plain | |
Content-Length: 472 | |
Opening WARC file 'humans.warc.gz'. | |
--2015-07-31 17:32:22-- https://www.google.com/humans.txt | |
Resolving www.google.com... 74.125.24.106, 74.125.24.105, 74.125.24.99, ... | |
Connecting to www.google.com|74.125.24.106|:443... connected. | |
HTTP request sent, awaiting response... 200 OK | |
Length: unspecified [text/plain] | |
Saving to: 'humans.txt' | |
0K 6.49M=0s | |
2015-07-31 17:32:23 (6.49 MB/s) - 'humans.txt' saved [286] | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment