Skip to content

Instantly share code, notes, and snippets.

@jpjones76
Last active August 29, 2020 06:00
Show Gist options
  • Save jpjones76/0175e762bea8c37d99b97ef3cb056068 to your computer and use it in GitHub Desktop.
Save jpjones76/0175e762bea8c37d99b97ef3cb056068 to your computer and use it in GitHub Desktop.
Workaround for "500 Internal Server Error" returned by FDSN requests to California servers
# THE ISSUE: HTTP "500 Internal Server Error" for FDSN requests to California
# earthquke data centers (NCEDC, SCEC/SCEDC).
#
# THE CAUSE: against HTTP "best practices", and seemingly against common sense,
# California FDSN servers only accept requests from a small whitelist of
# user agents. Neither Julia nor SeisIO is on their lists, so SeisIO must
# emulate a web browser in the User Agent setting to be able to connect. This
# creates a sort of "arms race" wherein the User Agent value in SeisIO.webhdr
# must be updated whenever a web browser releases a new version. This is a
# complete waste of everyone's time: theirs, yours, and mine.
#
# THE SOLUTION: the script below sets SeisIO.webhdr to identify SeisIO as the
# latest version of Chrome when making FDSN requests.
using Pkg
if VERSION < v"1.4"
for p in ["HTTP", "Cascadia", "Gumbo", "SeisIO"]
if get(Pkg.installed(), p, nothing) == nothing
Pkg.add(p)
else
println(p * " found, not installing.")
end
end
else
installs = Dict{String, VersionNumber}()
deps = Pkg.dependencies()
for (uuid, dep) in deps
dep.is_direct_dep || continue
(dep.version === nothing) && continue
installs[dep.name] = dep.version
end
for p in ["HTTP", "Cascadia", "Gumbo", "SeisIO"]
if get(installs, p, nothing) == nothing
Pkg.add(p)
else
println(p * " found, not installing.")
end
end
end
using Cascadia, Gumbo, SeisIO
using HTTP:request
"""
set_useragent(; echo::Bool=false)
Set User Agent in {SeisIO}/src/constants.jl. If `echo=true`, only output
what would be done; don't overwrite the file.
"""
function set_useragent(; echo::Bool=false)
cfile = joinpath(dirname(pathof(SeisIO)), "constants.jl")
isfile(cfile) || error("Can't find constants.jl!")
s1 = "\"User-Agent\" => \""
s2 = "\""
consts = readlines(cfile)
r = request("GET", "https://www.whatismybrowser.com/guides/the-latest-user-agent/chrome")
req = String(copy(r.body))
h = parsehtml(req)
qres = eachmatch(sel".code", h.root)
usr_strings = [text(e[1]) for e in qres]
#=
Something like
["Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36",
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36",
"Mozilla/5.0 (Linux; Android 8.0.0;) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.62 Mobile Safari/537.36",
"Mozilla/5.0 (iPhone; CPU iPhone OS 12_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) CriOS/78.0.3904.84 Mobile/15E148 Safari/605.1"]
=#
str = (
if Sys.iswindows()
"Windows"
elseif Sys.isapple()
"Macintosh"
else
"X11"
end)
i = findfirst([occursin(str,s) for s in usr_strings])
j = findfirst([occursin("webhdr",s) for s in consts])
wh = split(consts[j], s1, limit=2, keepempty=true)
whh = wh[1]
wht = split(wh[2], s2, limit=2, keepempty=true)[2]
old_webhdr = consts[j]
consts[j] = whh * s1 * (i == nothing ? "Julia " * Base.VERSION_STRING : usr_strings[i]) * s2 * wht
if echo
printstyled("To change in ", cfile, ":\n", color=:light_blue)
printstyled("- ", old_webhdr, "\n", color=:red, bold=true)
printstyled("+ ", consts[j], "\n", color=:green, bold=true)
else
io = open(cfile, "w")
[println(io, j) for j in consts]
close(io)
end
return nothing
end
nothing
#= Aside to SCEC/NCEDC: no other FDSN servers on Earth do this. Moreover, a
connection whitelist is a terrible idea in an objective sense, even in cases
where User Agent validation is necessary, because a comprehensive list of valid
user agents cannot be maintained in an automated way [1]. This is why most
webmasters and sysadmins, including all FDSN servers but yours, use a connection
blacklist for problematic agent strings, rather than a whitelist for good ones.
There is no sane reason to only allow a small set of whitelisted user agents.
Most people cannot update to the very latest browser version the moment an
update gets released. Is your intent to prevent researchers from accessing your
data?
Anyone can set a User Agent to anything, as this script shows. Reasonable
security measures against User Agent exploits all require that servers do
something with User Agent [2]. Do you *need* that with an FDSN server?
It's pretty annoying to have to update my master branch or do a patch release
every time Chrome updates their browser because of one web request method at
two data centers.
References
[1] https://webmasters.stackexchange.com/questions/90074/how-to-determine-if-a-user-agent-string-has-proper-syntax-or-might-be-a-hacking
[2] https://www.sans.org/reading-room/whitepapers/malicious/user-agent-field-analyzing-detecting-abnormal-malicious-organization-33874
=#
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment