Last active
August 29, 2020 06:00
-
-
Save jpjones76/0175e762bea8c37d99b97ef3cb056068 to your computer and use it in GitHub Desktop.
Workaround for "500 Internal Server Error" returned by FDSN requests to California servers
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# THE ISSUE: HTTP "500 Internal Server Error" for FDSN requests to California | |
# earthquke data centers (NCEDC, SCEC/SCEDC). | |
# | |
# THE CAUSE: against HTTP "best practices", and seemingly against common sense, | |
# California FDSN servers only accept requests from a small whitelist of | |
# user agents. Neither Julia nor SeisIO is on their lists, so SeisIO must | |
# emulate a web browser in the User Agent setting to be able to connect. This | |
# creates a sort of "arms race" wherein the User Agent value in SeisIO.webhdr | |
# must be updated whenever a web browser releases a new version. This is a | |
# complete waste of everyone's time: theirs, yours, and mine. | |
# | |
# THE SOLUTION: the script below sets SeisIO.webhdr to identify SeisIO as the | |
# latest version of Chrome when making FDSN requests. | |
using Pkg | |
if VERSION < v"1.4" | |
for p in ["HTTP", "Cascadia", "Gumbo", "SeisIO"] | |
if get(Pkg.installed(), p, nothing) == nothing | |
Pkg.add(p) | |
else | |
println(p * " found, not installing.") | |
end | |
end | |
else | |
installs = Dict{String, VersionNumber}() | |
deps = Pkg.dependencies() | |
for (uuid, dep) in deps | |
dep.is_direct_dep || continue | |
(dep.version === nothing) && continue | |
installs[dep.name] = dep.version | |
end | |
for p in ["HTTP", "Cascadia", "Gumbo", "SeisIO"] | |
if get(installs, p, nothing) == nothing | |
Pkg.add(p) | |
else | |
println(p * " found, not installing.") | |
end | |
end | |
end | |
using Cascadia, Gumbo, SeisIO | |
using HTTP:request | |
""" | |
set_useragent(; echo::Bool=false) | |
Set User Agent in {SeisIO}/src/constants.jl. If `echo=true`, only output | |
what would be done; don't overwrite the file. | |
""" | |
function set_useragent(; echo::Bool=false) | |
cfile = joinpath(dirname(pathof(SeisIO)), "constants.jl") | |
isfile(cfile) || error("Can't find constants.jl!") | |
s1 = "\"User-Agent\" => \"" | |
s2 = "\"" | |
consts = readlines(cfile) | |
r = request("GET", "https://www.whatismybrowser.com/guides/the-latest-user-agent/chrome") | |
req = String(copy(r.body)) | |
h = parsehtml(req) | |
qres = eachmatch(sel".code", h.root) | |
usr_strings = [text(e[1]) for e in qres] | |
#= | |
Something like | |
["Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36", | |
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36", | |
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36", | |
"Mozilla/5.0 (Linux; Android 8.0.0;) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.62 Mobile Safari/537.36", | |
"Mozilla/5.0 (iPhone; CPU iPhone OS 12_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) CriOS/78.0.3904.84 Mobile/15E148 Safari/605.1"] | |
=# | |
str = ( | |
if Sys.iswindows() | |
"Windows" | |
elseif Sys.isapple() | |
"Macintosh" | |
else | |
"X11" | |
end) | |
i = findfirst([occursin(str,s) for s in usr_strings]) | |
j = findfirst([occursin("webhdr",s) for s in consts]) | |
wh = split(consts[j], s1, limit=2, keepempty=true) | |
whh = wh[1] | |
wht = split(wh[2], s2, limit=2, keepempty=true)[2] | |
old_webhdr = consts[j] | |
consts[j] = whh * s1 * (i == nothing ? "Julia " * Base.VERSION_STRING : usr_strings[i]) * s2 * wht | |
if echo | |
printstyled("To change in ", cfile, ":\n", color=:light_blue) | |
printstyled("- ", old_webhdr, "\n", color=:red, bold=true) | |
printstyled("+ ", consts[j], "\n", color=:green, bold=true) | |
else | |
io = open(cfile, "w") | |
[println(io, j) for j in consts] | |
close(io) | |
end | |
return nothing | |
end | |
nothing | |
#= Aside to SCEC/NCEDC: no other FDSN servers on Earth do this. Moreover, a | |
connection whitelist is a terrible idea in an objective sense, even in cases | |
where User Agent validation is necessary, because a comprehensive list of valid | |
user agents cannot be maintained in an automated way [1]. This is why most | |
webmasters and sysadmins, including all FDSN servers but yours, use a connection | |
blacklist for problematic agent strings, rather than a whitelist for good ones. | |
There is no sane reason to only allow a small set of whitelisted user agents. | |
Most people cannot update to the very latest browser version the moment an | |
update gets released. Is your intent to prevent researchers from accessing your | |
data? | |
Anyone can set a User Agent to anything, as this script shows. Reasonable | |
security measures against User Agent exploits all require that servers do | |
something with User Agent [2]. Do you *need* that with an FDSN server? | |
It's pretty annoying to have to update my master branch or do a patch release | |
every time Chrome updates their browser because of one web request method at | |
two data centers. | |
References | |
[1] https://webmasters.stackexchange.com/questions/90074/how-to-determine-if-a-user-agent-string-has-proper-syntax-or-might-be-a-hacking | |
[2] https://www.sans.org/reading-room/whitepapers/malicious/user-agent-field-analyzing-detecting-abnormal-malicious-organization-33874 | |
=# |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment