Skip to content

Instantly share code, notes, and snippets.

@foolip
Last active May 24, 2019 10:29
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save foolip/38e3ea6f060bad4638c2aea6fac63e80 to your computer and use it in GitHub Desktop.
Save foolip/38e3ea6f060bad4638c2aea6fac63e80 to your computer and use it in GitHub Desktop.

Use of Accept-Language

Sites were sampled from the Majestic Million, 10 site per order of magnitude. This was done based on the suspicion that use of the header is different in the long tail of sites.

Testing was done in Sweden with different Accept-Language headers based on informed guesses about what languages the sites might support. Between any two attempts using the same site cookies were cleared to avoid previous language being remembered, but it is still possible the server had state based on IP that affected results.

Summary

By rank (buckets of 10):

  • 10-99: 2/10 sites fully localized, 2/10 localized GDPR prompts, 2/10 used physical location/IP.
  • 100-999: 2/10 sites localized. 1/10 localized GDPR prompt and failed to localize content despite having it.
  • 1000-9999: 2/10 sites localized. 2/10 failed to localize despite having the content.
  • 10000-99999: 0/10 localized. 2nd sample taken, still 0/10.
  • 100000-999999: 1/10 localized, but on a pretty broken site.

Reflections:

  • High quality localization only in the higher rank buckets. Not surprising because it takes a lot of resources.
  • Quite a few sites that had translations but seemed to ignore the header, having in-page controls instead.
  • Third party widgets for GDPR prompts localized. Uses navigator.language?

Details

Rank 10-99

forbes.com (54)

Used pt-BR,pt;q=0.9,en;q=0.8. Main content still served in English, but GDPR prompt partially in Portuguese: www forbes com_

Forbes has other sites like https://www.forbes.com.br/ (Brazil/Portuguese) but doesn't redirect to them.

mit.edu (87)

Used zh-CN,zh;q=0.9,pt-BR;q=0.8,pt;q=0.7,fr;q=0.6. No indication that anything is done with the header.

godaddy.com (31)

Used zh-CN,zh;q=0.9,pt-BR;q=0.8,pt;q=0.7,fr;q=0.6. Redirected to se.godaddy.com (Swedish) presumably based on IP, even though https://br.godaddy.com/ and https://fr.godaddy.com/ are available.

itunes.apple.com (13)

Used zh-CN,zh;q=0.9,pt-BR;q=0.8,pt;q=0.7,fr;q=0.6. Site still served in English but with banner at top in Swedish suggesting to switch country.

www apple com_itunes_ (1)

t.co (46)

Twitter redirect service, served in English, but it's just a placeholder site explaining what the domain is for.

weebly.com (56)

Used zh-CN,zh;q=0.9,fr;q=0.8. Mixed English and Chinese, with page title (not in screenshot) large text (an image) in Chinese. Also some Chinese (创建您的网站) in sign up form but doesn't seem like a useful level of localization.

www weebly com_

live.com (82)

Used zh-CN,zh;q=0.9,fr;q=0.8. Redirected to https://outlook.live.com/owa/ in simplified Chinese, fully localized.

oracle.com (70)

Used de,nl;q=0.9. Got English site with German GDPR prompt:

www oracle com_index html

There is a German site that can be found through the menu: https://www.oracle.com/de/index.html

maps.google.com (21)

Used zh-CN,zh;q=0.9,fr;q=0.8. Gives simplified Chinese UI even though it shows Stockholm and thus clearly could have prioritized physical location over preferred language.

issuu.com (60)

Used zh-CN,zh;q=0.9,fr;q=0.8. Got English. Then used da,fr;q=0.9 based on Danish precense in https://issuu.com/careers but still got English. No sign of providing other languages than English.

Rank 100-999

photobucket.com (223)

Used zh-CN,zh;q=0.9,fr;q=0.8,es;q=0.7. Got English site, no indication that other languages are available.

i0.wp.com (143)

Not an end user domain: https://en.forums.wordpress.com/topic/what-are-iowpcom-and-i2wpcom/

hhs.gov (617)

Used zh-CN,zh;q=0.9,fr;q=0.8,es;q=0.7. U.S. Department of Health & Human Services. Served in English and Language Assistance Available links at bottom, but site itself isn't localized.

patreon.com (694)

Used zh-CN,zh;q=0.9,fr;q=0.8,es;q=0.7. Served in English with no indication of localization being available.

pnas.org (485)

Used zh-CN,zh;q=0.9,fr;q=0.8,es;q=0.7. Served in English with no indication of localization being available.

bu.edu (525)

Used zh-CN,zh;q=0.9,fr;q=0.8,es;q=0.7. Served in English with no indication of localization being available.

sec.gov (562)

Used zh-CN,zh;q=0.9,fr;q=0.8,es;q=0.7. Served in English with no indication of localization being available.

debian.org (106)

Used zh-CN,zh;q=0.9,fr;q=0.8,es;q=0.7. Served in simplified Chinese. Links at the bottom to many other languages supported.

vice.com (277)

Used zh-CN,zh;q=0.9,fr;q=0.8,es;q=0.7. Redirects to https://www.vice.com/en_us (English) even thought https://www.vice.com/fr (French) is available. Tried with just fr with same result, except the GDPR popup now in French. Seems like they aren't looking at all languages in the header?

www vice com_en_us

scribd.com (173)

Used fr and was redirected to https://fr.scribd.com/ in French. Tried zh,fr;q=0.9 and then got https://zh.scribd.com/ in English. For a user that only knows Chinese and French, this isn't very helpful, given that a French version exists.

Rank 1000-9999

google.com.sg (3078)

Used zh,fr;q=0.9 and got simplified Chinese. Singapore does use simplified Chinese but that's not the reason, visiting google.se gives the same results.

ama.org (6747)

Used zh,fr;q=0.9. Served in English with no indication of localization being available.

nexusmods.com (6951)

Used zh,fr;q=0.9. Served in English with no indication of localization being available.

doodle.com (1682)

Used zh,fr;q=0.9. Redirected to https://doodle.com/fr/ in French. The right outcome given that Chinese isn't one of the supported languages:

doodle com_fr_

mcprc.gov.cn (6786)

Unable to open site, but is most likely in simplified Chinese given TLD.

unbounce.com (5557)

Used zh,de;q=0.9 and got English site even though https://unbounce.com/de/ exists. Changing order to de,zh;q=0.9 or keeping just de still didn't give the German site. Possibly uses IP, or just always has English as default.

ansible.com (9531)

Used de,zh;q=0.9. Served in English with no indication of localization being available.

law360.com (4322)

Used de,zh;q=0.9. Served in English with no indication of localization being available.

chaoxing.com (4264)

Used de,zh;q=0.9. Served in simplified Chinese but not because of header, the result is the same if en is used.

my1.ru (5388)

Used en-US,en;q=0.9. Got https://www.ucoz.ru/ in Russian, although English (https://www.ucoz.com/) is available:

www ucoz ru_

Rank 10000-99999

oi.com.br (42047)

Used en-US,en;q=0.9 and got site in Portuguese. Don't read the language, but no indication that other languages are available.

costadelmar.com (88512)

Used es,en;q=0.9. Served in English with no indication of localization being available. Costa Del Mar sound Spanish but is a US company and there's no sign of a Spanish version of the site.

stromectol.irish (91731)

Used es,en;q=0.9. Served in English with no indication of localization being available. A very simple site which might just be part of some SEO scheme.

eb.mil.br (43755)

Used es,en;q=0.9. Unable to load, but http://www.eb.mil.br/ works. Served in Portuguese, no indication that other languages are available.

sibirnerud.ru (36074)

Used es,en;q=0.9. A Russian site that is just a login form, translation says "The site is down for maintenance".

dylanlerner.com (86373)

Used es,en;q=0.9. Served in English. A site for a Music Producer, Powered by WordPress, but very little content. For WordPress sites, plugins like https://wordpress.org/plugins/polylang/ might be used.

canadianpharmaciescenter.com (19905)

Used es,en;q=0.9. Looks very spammy, all text is images. Served in English even though "ES" is one of the languages to the top right. Those are also just images, not clickable, just fake UI:

canadianpharmaciescenter com_

appi.org (90598)

Used es,en;q=0.9. Served in English with no indication of localization being available.

convertit.com (73453)

Used es,en;q=0.9. Served in English, looks like a placeholder site from the early 2000s.

thiepcuoicucre.com (64199)

Used es,en;q=0.9. Served a 200 OK empty response, so a white page. Googling thiepcuoicucre fins Thiệp Cưới giá cực rẻ on Facebook, someone selling Vietnamese wedding invitation cards, so the site would have been in Vietnamese. (Aside: this is a good example of small businesses using Facebook instead of a website.)

Rank 100000-999999

thearea.org (897290)

Used es,en;q=0.9. Served in English with no indication of localization being available.

chinaoceantrade.com (297944)

Used es,en;q=0.9. Served in simplified Chinese with no indication of localization being available. Tried again with zh-TW,zh;q=0.9,es;q=0.8,en;q=0.7 to probe for traditional Chinese, but still simplified.

areafour.com (850472)

Used zh-TW,zh;q=0.9,es;q=0.8,en;q=0.7. A Boston pizza place, looks very tasty. Served in English with no indication of localization being available.

incredibleplacestolive.com (146058)

Used es,en;q=0.9. Served in English, "This Page Is Under Construction" with some "related searches" links.

demol.nl (293410)

Used es,en;q=0.9. Served in Dutch, but with some menu items in English. Tried with just nl and got the same result, so not a case of partial localization, but probably intentional.

scuteristi.ro (619133)

Used de,nl;q=0.9. Can't load site.

fdchain.ink (273852)

Used nl. HTTPS cert error which if bypassed leads to a funny mix of a 404 page and a Dutch site:

fdchain ink_

Other languages are also supported, trying with de,nl;q=0.9 results in a German site. So Accept-Language is used.

lzcf6.net (434287)

Used de,nl;q=0.9. Simplified Chinese site promoting online games and gambling. Not probed for traditional Chinese, seems very unlikely given short-lived content. Says "推荐浏览器:谷歌、IE10、Firefox" at the bottom :)

findersfeefortune.com (688253)

Used de,nl;q=0.9. Can't load site.

scopemed.org (198649)

Used de,nl;q=0.9. A "Directory for Medical Articles" in English with no indication of localization being available. Articles are in English so localization would be surprising.

2nd sample of rank 10000-99999

More sites were tested because in the original sample none in this rank were localized.

icchiesainvalmalenco.gov.it (73378)

Used de,nl;q=0.9. 404 Not Found. TLD is Italian.

psi-ingenieria.com (66677)

Used de,nl;q=0.9,en;q=0.8. Spanish with no indication of localization being available.

pedromatamala.cl (85928)

Used de,nl;q=0.9,en;q=0.8. 500 Server Error.

icalshare.com (76223)

Used de,nl;q=0.9,en;q=0.8. English with no indication of localization being available.

barentsobserver.com (71850)

Used de,nl;q=0.9,en;q=0.8. Redirects to https://barentsobserver.com/en. https://barentsobserver.com/ru exists. Tried again with ru,nb;q=0.9 but still got English site.

ov2.info (58702)

Used de,nl;q=0.9,en;q=0.8. Simplified Chinese. No indication of localization being available.

rexfeatures.com (90356)

Used de,fr;q=0.9,en;q=0.8. English with no indication of localization being available.

jdxchina.com (53523)

Used de,fr;q=0.9,en;q=0.8. Simplified Chinese. There's a clear link to http://jdxchina.com/english/ but just en still doesn't redirect to that site.

liverpoolmuseums.org.uk (15314)

Used en and got English. There's a Google "Select Language" option at the top right:

www liverpoolmuseums org uk_

It doesn't work very well, translating only some text.

Tried with zh-CN,zh;q=0.9 and that changes the language in the Google language widget, but doesn't translate by default.

This is probably https://translate.google.com/intl/en/about/website/ but is discontinued.

xj6678.net (86315)

Used en-US,en;q=0.9. Simplified Chinese with no indication of localization being available.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment