Skip to content

Instantly share code, notes, and snippets.

@annashipman
Last active December 14, 2015 08:29
Show Gist options
  • Save annashipman/5058091 to your computer and use it in GitHub Desktop.
Save annashipman/5058091 to your computer and use it in GitHub Desktop.
A problem with some FURLs on The National Archives

##TL; DR

The way The National Archives attempts to find some URLs means that our links to TNA are 404ing when the page is actually present on TNA.

##Example

http://www.businesslink.gov.uk/tools is a 410

Our created archive link is http://webarchive.nationalarchives.gov.uk/20120823131012/http://www.businesslink.gov.uk/tools

This returns a 302. New location: http://webarchive.nationalarchives.gov.uk/20120823131012/http://www.businesslink.gov.uk/tools/

This returns a 200. HOWEVER, the page returned includes some JavaScript that replaces the location according to some conditions. (Full response here https://gist.github.com/annashipman/5057991).

The problematic line is this one: location.replace(""+target_domain_name+"/bdotg/action/detail?type=ONEOFFPAGE&itemId=1084515656&furlname=tools&furlparam=tools" + queryParams)

target_domain_name is http://webarchive.nationalarchives.gov.uk/20120823131012/http://www.businesslink.gov.uk

If there were no queryParams, the location would be set to this:

http://webarchive.nationalarchives.gov.uk/20120823131012/http://www.businesslink.gov.uk/bdotg/action/detail?type=ONEOFFPAGE&itemId=1084515656&furlname=tools&furlparam=tools

which is a 200 and is in fact the actual BusinessLink page that the FURL redirects to. (If this worked, there would be no need for our Archive Link solution for FURLs which starts line 75 here: https://github.com/alphagov/redirector/blob/master/tools/generate_410.sh)

EXCEPT:

queryParams here is created:

   if ( referrer.length > 0 )
          {
        queryParams = queryParams +  "&" + referrer;
      }
      else
      {
        referrer = new String(document.referrer);
      	queryParams = queryParams + "&ref=" + URLencode(referrer);
      }

      if ( domain.length > 0 )
      {
        queryParams = queryParams + "&" +  domain;
      }

Meaning the location is actually set to http://webarchive.nationalarchives.gov.uk/20120823131012/http://www.businesslink.gov.uk/bdotg/action/detail?type=ONEOFFPAGE&itemId=1084515656&furlname=tools&furlparam=tools&ref=&domain=webarchive.nationalarchives.gov.uk

which is a 404.

##Conclusion

Despite this page: http://webarchive.nationalarchives.gov.uk/20120823131012/http://www.businesslink.gov.uk/bdotg/action/detail?type=ONEOFFPAGE&itemId=1084515656&furlname=tools&furlparam=tools existing on TNA, and being the correct destination for this TNA link http://webarchive.nationalarchives.gov.uk/20120823131012/http://www.businesslink.gov.uk/tools (note this part of the TNA URL: &furlname=tools), the link doesn't work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment