Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Show notes for the 10th Anniversary of HTTP Archive episode of the State of the Web podcast

HTTP Archive's 10th Anniversary - The State of the Web

Published November 19, 2020

Rick meets with Steve Souders, who created the HTTP Archive project 10 years ago this month, to talk about its origins and reflect on it's growth. They're also joined by Patrick Meenan, creator of WebPageTest and maintainer of HTTP Archive, along with Paul Calvano, past State of the Web guest and also a maintainer of HTTP Archive.

Links to resources discussed in this episode:

---------- Forwarded message ---------
From: Meenan, Patrick
Date: Wed, Sep 29, 2010 at 6:23 AM
Subject: RE: 1000 HAR files?
To: Steve Souders
-r:9 means run #9 and -c:0 means first view (cached = 0). If you leave them off it will pull the median run automatically anyway so you should be good.
Thanks,
-Pat
-----Original Message-----
From: Steve Souders
Sent: Wednesday, September 29, 2010 12:22 PM
To: Meenan, Patrick
Subject: Re: 1000 HAR files?
This is awesome.
I need to decide what to save in my DB to point back to the results. I'm thinking I'll just save the "Test ID". For example, Walmart's test id is 100927_64ZW. Here are the three URLs associated with Walmart:
http://www.webpagetest.org/result/100927_64ZW/
http://www.webpagetest.org/result/100927_64ZW/9/details/
http://www.webpagetest.org/video/compare.php?tests=100927_64ZW-r:9-c:0
All of those are easy to generate given a test id except for the last one. What is "-r:9-c:0"? Why do those params vary for different sites (eg "r:1-c:0")? What is "lost" if you drop those params and just do this
URL:
http://www.webpagetest.org/video/compare.php?tests=100927_64ZW
You can play with it here. DON'T SHARE THIS URL!
http://stevesouders.com/webipa/
"Dinoquery" is a funky toy of mine. I can explain more on the phone. For now, let me just mention I have two MySQL table: "pages" and "requests".
Talk to you later.
-Steve
On 9/28/2010 12:58 PM, Meenan, Patrick wrote:
> I updated the zip file with the latest. I could get 5 more to run so it's up to 950 but the remaining ones all look to be broken pages (no DNS entry, connections fail, 400's or 500's).
>
> http://www.webpagetest.org/software/har.zip
>
> The results.txt file has the individual results. If you look for the urls that have a test ID but no results url those are the urls that failed (and I re-submitted the missing tests 3-4 times to get as many stragglers as possible). If you want to change some of the urls let me know.
>
> Thanks,
>
> -Pat
>
> -----Original Message-----
> From: Meenan, Patrick
> Sent: Tuesday, September 28, 2010 1:00 PM
> To: 'Steve Souders'
> Subject: RE: 1000 HAR files?
>
> http://www.webpagetest.org/software/har.zip
>
> There are 945 .har files as well as a results.txt file that has the data for the 995 tests. I'll work on filling in the remaining 50.
>
> Thanks,
>
> -Pat
>
> -----Original Message-----
> From: Steve Souders
> Sent: Tuesday, September 28, 2010 12:04 PM
> To: Meenan, Patrick
> Subject: Re: 1000 HAR files?
>
> Awesome.
>
> Is there a way I could start on a hundred or so - if it's not too hard to package up? Or is there a scriptable way for me to find the HAR files myself and download them?
>
> I'd like to start importing them to see if my MySQL code is good.
>
> -Steve
>
> On 9/28/2010 8:30 AM, Meenan, Patrick wrote:
>> Just a quick update. Should have results later this afternoon. The tests actually ran all night and just finished. Looks like there were 71 urls that I needed to re-test so I just re-submitted those. When it's all finished I'll package it up and ship it over.
>>
>> Thanks,
>>
>> -Pat
>>
>> -----Original Message-----
>> From: Steve Souders
>> Sent: Monday, September 27, 2010 11:05 PM
>> To: Meenan, Patrick
>> Subject: Re: 1000 HAR files?
>>
>> Just the median. Thanks!
>>
>> -Steve
>>
>> On 9/27/2010 6:04 PM, Meenan, Patrick wrote:
>>> Do you want har files for just the median run or hars that have all 9 runs for each test in them? Just checking quickly before I write the code that pulls them down.
>>>
>>> Thanks,
>>>
>>> -Pat
>>>
>>> -----Original Message-----
>>> From: Steve Souders
>>> Sent: Monday, September 27, 2010 6:25 PM
>>> To: Meenan, Patrick
>>> Subject: Re: 1000 HAR files?
>>>
>>> That's awesome. Thanks, Pat!
>>>
>>> -Steve
>>>
>>> On 9/27/2010 3:20 PM, Meenan, Patrick wrote:
>>>> Cool. Tests have been submitted and are running. Looks like there were 995 valid urls in the file and with 9 runs each it will probably be running well into the night. I'll shoot you the results when it finishes as well as the har files for all of the median runs.
>>>>
>>>> Thanks,
>>>>
>>>> -Pat
>>>>
>>>> -----Original Message-----
>>>> From: Steve Souders
>>>> Sent: Monday, September 27, 2010 5:57 PM
>>>> To: Meenan, Patrick
>>>> Subject: Re: 1000 HAR files?
>>>>
>>>> FYI - I'm going to put all the data into a MySQL DB and make it publicly available, so people can see, for example, what percentage of scripts are gzipped?
>>>>
>>>> -Steve
>>>>
>>>>
>>>> On 9/27/2010 2:55 PM, Steve Souders wrote:
>>>>> I need HAR for the HTTP headers. It would be better if they
>>>>> were all from the same US location - then I can actually use the
>>>>> load times. I only want empty cache. I'd like video - but it's not critical.
>>>>>
>>>>> Is that doable?
>>>>>
>>>>> Once I get an ID or something, I can definitely figure out the
>>>>> URLs for other stuff - no worries there.
>>>>>
>>>>> -Steve
>>>>>
>>>>>
>>>>> On 9/27/2010 2:22 PM, Meenan, Patrick wrote:
>>>>>> Should be back to a PC in 15 min. Right now the script submits
>>>>>> the urls for testing across a bunch of locations for 9 runs with
>>>>>> video capture and first view only. The result is a csv file with
>>>>>> the urls, test id and some data (median run and link to the results I think).
>>>>>> With the test id and median run everything basically uses a well
>>>>>> defined URL structure so it is easy to do anything.
>>>>>>
>>>>>> Do you want a bunch of locations of just Dulles (and do you want
>>>>>> video and/or repeat view)?
>>>>>>
>>>>>> The API tests run at a lower priority so even a huge batch won't
>>>>>> impact users using the site.
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> -Pat
>>>>>>
>>>>>> On Sep 27, 2010, at 5:01 PM, "Steve Souders" wrote:
>>>>>>
>>>>>>> http://stevesouders.com/misc/fortune1000.txt
>>>>>>>
>>>>>>> I'm happy to run a script to find when they're done. Does that
>>>>>>> also provide some URL to the results?
>>>>>>>
>>>>>>> -Steve
>>>>>>>
>>>>>>>
>>>>>>> On 9/27/2010 1:47 PM, Meenan, Patrick wrote:
>>>>>>>> doh, nm - just realized the tld's are actually hyperlinks :-)
>>>>>>>>
>>>>>>>> Any chance you can dump it as just a list of urls with one url
>>>>>>>> per line? Something like:
>>>>>>>>
>>>>>>>> http://www.google.com/
>>>>>>>> http://www.aol.com/
>>>>>>>>
>>>>>>>> That way I could literally feed it into the existing script.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> -Pat
>>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: Steve Souders
>>>>>>>> Sent: Monday, September 27, 2010 4:42 PM
>>>>>>>> To: Meenan, Patrick
>>>>>>>> Cc: Ryan Hickman
>>>>>>>> Subject: Re: 1000 HAR files?
>>>>>>>>
>>>>>>>> (and I can reformat that list of URLs into any format
>>>>>>>> you
>>>>>>>> need)
>>>>>>>>
>>>>>>>> On 9/27/2010 1:40 PM, Steve Souders wrote:
>>>>>>>>> That would be awesome!
>>>>>>>>>
>>>>>>>>> Here's the list of URLs:
>>>>>>>>> http://stevesouders.com/misc/fortune1000.php
>>>>>>>>>
>>>>>>>>> Some might not work - I'll deal with that.
>>>>>>>>>
>>>>>>>>> What does "let you know" mean - email? I'm just wondering how
>>>>>>>>> to get the 1000 HAR URLs back.
>>>>>>>>>
>>>>>>>>> -Steve
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 9/27/2010 1:38 PM, Meenan, Patrick wrote:
>>>>>>>>>> That should be fine (though thanks for the heads-up so I
>>>>>>>>>> don't block your IP :-)). I gave Ryan a php script that can
>>>>>>>>>> take a list of urls, submit them for testing and then another
>>>>>>>>>> one to let you know when the tests were all finished (for the
>>>>>>>>>> bulk video testing). Adding the logic to pull the HAR's for
>>>>>>>>>> each one shouldn't be difficult (I can modify it if you'd like).
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> -Pat
>>>>>>>>>>
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: Steve Souders
>>>>>>>>>> Sent: Monday, September 27, 2010 4:34 PM
>>>>>>>>>> To: Meenan, Patrick
>>>>>>>>>> Subject: 1000 HAR files?
>>>>>>>>>>
>>>>>>>>>> Hi, Pat.
>>>>>>>>>>
>>>>>>>>>> Just FYI - I need to generate a HAR file for the Fortune 1000.
>>>>>>>>>> I tried doing that as one huge file (using HttpWatch) but my
>>>>>>>>>> Perl decode_json has never finished running. So instead I'm
>>>>>>>>>> going to try and figure a web API way of doing that with WebPagetest.org.
>>>>>>>>>> If that's a problem, just let me know.
>>>>>>>>>>
>>>>>>>>>> Thanks.
>>>>>>>>>>
>>>>>>>>>> -Steve
>>>>>>>>>>
>>>>>>>>>>
---------- Forwarded message ---------
From: Simon Perkins
Date: Thu, Feb 5, 2009 at 7:14 AM
Subject: RE: web page export format
To: Rob Campbell, Jan Odvarko
Cc: Steve Souders, Rob Campbell
Hi Rob,
I agree with your sentiments about XML and I personally find it's not that
legible anyway with larger, complex data structures. Its one great advantage
though is that every modern programming environment supports it.
We've thought about HTML output. It certainly would be useful but it's not
really an ideal format if you want to import the data into a reporting or
display tool.
Regards
Simon
-----Original Message-----
From: Rob Campbell
Sent: 05 February 2009 16:30
To: Jan Odvarko
Cc: 'Simon Perkins'; 'Steve Souders'; 'Rob Campbell'
Subject: Re: web page export format
hi!
On 5-Feb-09, at 12:00, Jan Odvarko wrote:
>> One of the problems with XML is that it gets messy if you record
>> content
>> because it normally includes lots of angle brackets and possibly
>> XML. We
>> currently use a CDATA section but that doesn't cope well with XML
>> content
>> that has CDATA sections!
> Have you experimented even with other formats like e.g. JSON?
We mentioned this in our weekly meeting yesterday. I'm not a huge fan
of XML for the reasons you mention as well as document complexity.
JSON was one proposed alternative (which lacks legibility) and HTML
another. HTML has a nice side-benefit of being immediately displayable
in a browser.
> I think that in the first phase, Firebug could try adopt the scheme
> what you
> have, which could provide some feedback about flexibility of the
> format.
Regardless of format, I like the idea of having Firebug replicate
HTTPWatch's schema. Having some consistency here may lead to some form
of standardized output.
> I am also cc-ing Rob Campbell, who is member of Firebug Working
> Group and
> also interested in possibilities how to properly export net data from
> Firebug.
Thanks!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment