There are several different files in the client source code that read configuration (also sometimes called "options" or "settings") from different places.
An additional difficulty is that the client consists of two separate components:
-
The code in
src/annotator/
is JavaScript code that's injected directly into the parent document and can read configuration from the document (for example config objects as<script type="application/json">
elements in the parent document). -
The code in
src/sidebar/
(which is the AngularJS sidebar app) is JavaScript code that's in a different-origin<iframe>
of the parent document and cannot directly read config from the parent document (because of the same origin policy). -
TODO: Also probably Chrome extension code (separate repo) that reads its own config from some other place.
Config settings need to be passed from the code in the parent document to the code in the iframe. We'll see how this is done below.
This is how client configuration is processed on page load:
-
The parent document can contain JSON documents containing config objects.
If the parent document contains a
<script class="js-hypothes-config" type="application/json">
element then the client will read configuration from it at startup (see below).This is intended to be used by publisher sites that embed the Hypothesis client and want to customize its configuration.
Note that the settings that the client will accept from parent documents are whitelisted because the parent document is untrusted. But also note that this whitelist is currently only used by the AngularJS sidebar app code running in the
<iframe>
, it's not used by the other half of the client code that runs in the parent document (and that also reads config from the parent document, and that creates the iframe!)The browser extension uses this when a direct link (a URL ending with
#annotations:<ANNOTATION_ID>
) is visited. It reads the<ANNOTATION_ID>
from the URL fragment and copies it into a"js-hypothesis-config"
object for the client to pick up. This is because some websites change or remove the URL's#
fragment before the client starts up and has a chance to read the<ANNOTATION_ID>
from the fragment.(Note that if you're not following a direct link, like if you just visit a web page and activate the Chrome extension, the extension still injects an empty
{}
"js-hypothesis-config"
object into the page.) -
The iframe's own document can also contain the same "js-hypothesis-config"
<script>
elements and those will also be read by the client at startup as well (see below).For example the Hypothesis web service puts one of these JSON documents in the page at
view-source:https://hypothes.is/app.html
to configure the instance of the client that's served there:<script class="js-hypothesis-config" type="application/json"> { "googleAnalytics": "<GOOGLE_ANALYTICS_TRACKING_ID>", "serviceUrl": "https://hypothes.is/", "release": "<DOCKER_HUB_TAG>", "authDomain": "hypothes.is", "websocketUrl": "wss://hypothes.is/ws", "apiUrl": "https://hypothes.is/api/", "raven": { "dsn": "<SENTRY_DSN>", "release": "<DOCKER_HUB_TAG>" } } </script>
Here's the line of code that inserts it. The settings that the web service places in this config object come from the web service's own configuration settings, see
h/views/client.py
.As well as inserting the
"js-hypothesis-config"
object into the parent document (as explained above), the browser extension also inserts one into the iframe's own document. -
The parent document can contain a
window.hypothesisConfig
function that returns a config object.This is another way for a page that embeds Hypothesis to customize its configuration. Unlike with the
"js-hypothesis-config"
objects, thewindow.hypothesisConfig
function can specify JavaScript callback functions for the client to call, which is used for theonLogin
callback.window.hypothesisConfig = function () { return { ... }; // Config settings };
There's a working example of this in the test publisher site.
-
shared/settings.js
exports asettings(document)
function that finds all<script "js-hypothesis-config">
elements in the given document, parses them as JSON, and returns a config object. -
src/annotator/config.js
exports aconfig(window)
function that returns anoptions
object by following these steps:-
Finds the
<link type="application/annotator+html" src="...">
element in thewindow.document
of the givenwindow
and sets itshref
URL asoptions.app
.Aside about
<link type="application/annotator+html">
:This
<link>
element seems to be used for two different things:First, the
href
of this<link>
element is used as thesrc
attribute for the sidebar AngularJS app's<iframe>
whenhost.coffee
injects the<iframe>
into the parent document.Second, boot.js looks for a
<link type="application/annotator+html">
in the page in detect the presence of another instance of Hypothesis already in the page. It aborts booting the Hypothesis client if there's already one in the page.I believe this is intended to have the Chrome extension abort on launch, if there is already an embedded instance of Hypothesis in the page?
But it seems like there may be a race condition here between the Chrome extension and embedded instance of Hypothesis?
Also it seems like, to thwart attacks based on sites embedding their own malicious copies of the client and hoping users don't notice that they're not using the real client, launching the Chrome extension should override any existing embedded client in the page (but doing this overriding robustly is probably difficult).
-
Calls the
settings()
function fromshared/settings.js
on thewindow.document
of the givenwindow
. Merges the returned settings object intooptions
. -
Calls the
window.hypothesisConfig
function of the givenwindow
, and merges the returned config object intooptions
. -
Extracts the "direct linked ID" from the URL.
This is the
#annotations:<id>
that bouncer appends to the end of the document's URL when sending the user directly to an annotation in context.The Chrome extension or Via may already have extracted this annotation ID and placed it in one of the other forms of config above (a
"js-hypothesis-config"
element, probably) but sometimes the client itself still needs to extract it from the URL (not sure why). -
Returns the combined
options
object.
-
-
src/annotator/main.js
callsconfig(window)
passing it the globalwindow
object of the parent document, getting theoptions
object. -
main.js
passes theoptions
object to eitherPdfSidebar
(src/annotator/pdf-sidebar.coffee
) orSidebar
(src/annotator/sidebar.coffee
).Note: don't be fooled into thinking this is the AngularJS sidebar app. This
Sidebar
class is still in thesrc/annotator/
code that's injected into the parent document, it's not in the<iframe>
.Note: be sure to notice that the
PdfSidebar
class above is a subclass of theSidebar
class above which is a subclass ofHost
(the one insrc/annotator/host.coffee
, not the one insrc/sidebar/host.coffee
) which is a subclass ofGuest
which is a subclass ofAnnotator
(TODO: I'm not sure where theAnnotator
class is defined or whether it's the bottom of this class hierarchy). -
Host
JSON stringifies the wholeoptions
object exceptoptions.app
and sets the resulting JSON string as the value of a?config=<JSON_STRING>
parameter on the URL of the AngularJS sidebar app's<iframe>
element.<iframe src="https://hypothes.is/...?config=<JSON_STRING>">
This is how settings get passed from the
src/annotator/
code that's embedded in the parent document into thesrc/sidebar/
code which is the AngularJS sidebar app that runs in the<iframe>
. -
Now let's go over to the AngularJS sidebar app running in the iframe and see how that configures itself from those JSON-stringified options in the iframe src URL, plus multiple other sources!
sidebar/app.js
(which is the Browserify "entrypoint" file a.k.a the "top" / first JavaScript file that the browser runs when the iframe is injected) callsshared/settings.js
("the settings service", an AngularJS service, but also just a normal Browserify-requireable JavaScript function) to read the client configuration settings from the document:var settings = require('../shared/settings')(document);
Note: This is the second time that
shared/settings.js
has been called! It is first called by thesrc/annotator
code which passes in the parentdocument
, and now it's called by thesrc/sidebar
code which passes in the iframe'sdocument
. -
As it did before
settings()
finds all the JSON "scripts" of the form<script class="js-hypothesis-config">
in the given document (this time the iframe's document), merges them all together, and returns them as aconfig
object.Note: While the object that
shared/settings.js
returns is namedconfig
insideshared/settings.js
(e.g.return config;
) it is renamed tosettings
insidebar/app.js
:var settings = require('../shared/settings')(document);
-
Next,
sidebar/app.js
also callssidebar/host-config.js
.host-config
reads the JSON-stringified options object from the iframe src URL. It parses the JSON and returns the options object.sidebar/app.js
merges the config object fromsidebar/host-config.js
with the config object that it already gotshared/settings.js
:var hostPageConfig = require('./host-config'); Object.assign(settings, hostPageConfig(window));
Note: Inside
sidebar/host-config.js
the object that was calledoptions
(over in thesrc/annotator
code, before it got JSON stringified) is now renamed toconfig
. (Well, actually it was renamed to?config=<JSON_STRING>
as soon as it was appended to the URL.)Note: Although it is called
config
insidehost-config.js
this object gets merged into an object calledsettings
byapp.js
. Ultimately, in the end, at least in the AngularJS sidebar app, we end up with an object calledsettings
. -
Finally, sidebar/app.js sets this merged settings object as the "settings" value in AngularJS.
Any
src/sidebar
code can now depend onsettings
using AngularJS injection and receive this settings object. For example here'ssidebar/auth.js using the
settings` object.
What functional requirements does this configuration system fulfill, that any simpler replacement implementation would also need to fulfill?
-
Publisher sites that embed Hypothesis need to be able to configure certain settings of their embedded client.
-
openSidebar
(whether the sidebar opens automatically on page load) -
showHighlights
-
openLoginForm
This is currently done with
<script class="js-hypothesis-config" type="application/json">
tags in the parent page that embeds the client. -
-
Trusted partner sites that embed Hypothesis need to be able to provide an
onLogin
JavaScript callback function to their embedded instance of the client.This is currently done with a
window.hypothesisConfig()
function in the parent page that embeds the client. -
Trusted partner sites that embed Hypothesis need to be able to provide an authority (e.g. "partner.org") and icon to tell the client to use a publisher group instead of the usual Hypothesis groups.
The publisher example site does this using
window.hypothesisConfig
-
The client needs to be able to detect when a direct link annotation ID is/was in the URL fragment so it can open and show that annotation. This needs to be resilient to sites that delete or change the URL fragment.
This is currently done by bouncer putting a
#
fragment on the URL and then the Chrome extension or Via copying the#
fragment from the URL into a<script class="js-hypothesis-config" type="application/json">
in the parent page. -
The Hypothesis web service needs to be able to configure the instance of the client that it serves at
/app.html
.- googleAnalytics: our google analytics tracking id
- serviceUrl: the URL of the https://hypothes.is/ site
- release (version num)
- authDomain": "hypothes.is"
- websocketUrl": "wss://hypothes.is/ws",
- apiUrl": "https://hypothes.is/api/
- Sentry DSN
This is currently done using
<script class="js-hypothesis-config" type="application/json">
in the iframe's document (/app.html
). -
The Chrome extension needs to be able to configure the instance of the client that it contains and injects into pages. Looks like the same settings as for the web service above:
- apiUrl
- authDomain
- serviceUrl
- release
- websocketUrl
- sentryPublicDSN
- googleAnalytics
This is currently done using
<script class="js-hypothesis-config" type="application/json">
in the iframe's document (/app.html
). -
Dev instances of the Hypothesis web app embed the production client into some pages (e.g.
/docs/help
) but they configure that production client to use the dev web service instead of the production one (e.g. for logging in, and for the annotations API).This is currently done by:
-
Configuring the part of the production client code that runs in the parent page to inject an
<iframe>
whosesrc
islocalhost:5000/app.html
instead ofhypothes.is/app.html
. This configuration is done using thesidebarAppUrl
setting in a"js-hypothesis-config"
tag in the parent page (e.g. in thelocalhost:5000/docs/help
page) -
Configuring the part of the production client that runs in the
<iframe>
(aka the sidebar, the AngularJS app) to use the annotations API atlocalhost:5000/api
instead ofhypothes.is/api
. Once step 1 above has replacedhypothes.is/app.html
withlocalhost:5000/app.html
for the iframe contents, thenlocalhost:5000/app.html
contains a"js-hypothesis-config"
with anapiUrl
setting in it, and this configures the production client to use that API URL instead of the production one.
-
-
Two client instances in one page needs to be handled somehow. For example when the user visits a page that has an embedded Hypothesis client and then the user tries to launch their Chrome extension.
-
At least some configuration settings need to be passed from the parent document into the sidebar app iframe. This is true of, for example:
- Direct link annotation IDs
- Authority and icon for publisher groups
This is currently done by setting some serialized JSON as the value of a
?config=
param on the<iframe>
src
URL.
-
Any site can put a
<link type="application/annotator+html" src="...">
in its page and it look like this will cause Hypothesis (for example when launched by the bookmarklet, Via or the Chrome extension) to inject an<iframe>
with that<link>
shref
as the<iframe>
src
instead of the actualapp.html
URL. This provides an easy, high fidelity way to replace the sidebar with a malicious copy (for example the Chrome extension button will show as active as if you're using the actual Chrome extension).There seems to be no reason why the
<iframe>
src
URL needs to be dynamic in this way? -
Any site can put js-hypothesis-config objects in its page and the client will read them as configuration.
The
src/sidebar
code will only take certain whitelisted settings from the page (see whitelist above) but thesrc/annotator
code will read more, includingsidebarAppUrl
,assetsUrl
,onLogin
callback functions, etc.Some example potentially possible attacks:
- Providing a malicious
onLogin
callback, for example popup a window into which the user may type their username and password - Providing a malicious direct link annotation ID (for example to a spam annotation, the client will automatically focus on and show this annotation only)
- Providing a malicious sidebar app URL instead of https://hypothes.is/app.html
- Providing a malicious asset root URL from which the sidebar app will load its assets
- Providing a malicious
-
It's possible to have multiple js-hypothesis-config objects in one page. The client will read them all and merge them together. There seems to be no benefit to this ability to have multiple js-hypothesis-config objects, and it creates the possibility of users (people trying to embed Hypothesis in their sites) making a mistake and being confused when it doesn't work. For example you change the value of a setting in a js-hypothesis-config object, but it doesn't work because (unknown to you) it's being overridden by another js-hypothesis-config object further down the page.
-
From the point of view of someone wanting to embed Hypothesis in their site this API seems more difficult to use than it needs to be. You need to be able to insert a second
<script>
tag, but an"application/json"
one not a normal JavaScript one, into the page. You need to be able to make sure that tag is in the page before theembed.js
script runs. And that tag needs to contain valid JSON (which is all too easy to get wrong). Some IDEs may not even allow you to do this.
In general the whole thing is just way too complex and would benefti from being replaced by something much smaller, simpler and more contained, which would be easier to understand and to maintain. But here are some specific issues:
-
Use just one name, not settings, options, config, app_config, etc.
-
There are several different places where client configuration can be provided but it's not clear (not without considerable time digging) which configuration options can be set from which locations (is it all from all?) and which locations override others when they both set the same option.
-
There's no one place where you could look and find the high level recipe, in a continuous series of lines of code in one file, of how the client configures itself and from where (as narrated above). Discovering this recipe is a major undertaking, requiring significant time and effort.
-
Too many different files spread throughout the codebase are involved in client configuration.
It would be nice to have all settings in one place: one file that reads and merges all the different settings, and that everything else calls.
-
It would be nice to have only one means of inter-frame comms (bridge). Get rid of this
?config
param on the iframe src. -
host-config
is a bad name given that there are already twohost.coffee
files (andHost
classes) in the codebase but (I believe) "host" has a completely different intended meaning in those files.The general term for the stuff outside of an iframe seems to be parent page/window/document/frame, so I suggest we use parent instead of host.
-
The whole inheritance thing around PdfSidebar, Sidebar, Host, Guest and Annotator.
-
Need some way of preventing a bunch of settings (
sidebarAppUrl
,assetsUrl
,onLogin
,services
...) from being read from the parent page when it's the Chrome extension that's in use. These settings should be for embedded clients only. Even the more harmless settings (showHighlights
etc) should probably be ignored by the Chrome extension.Is there any reason why the Chrome extension should ever read settings from the parent page?
-
Just get rid of
<link type="application/annotator+html" src="...">
.If we want record whether a client instance has already booted in this page, and we're going to be using
"js-hypothesis-config"
objects anyway, then maybe we should add a"js-hypothesis-config"
option for this. Or else just add a boolean variable towindow
. -
src/annotator/config.js
should remain the main place where thesrc/annotator
code is configured.This function returns an object named
options
, it should be namedconfig
.But otherwise this seems fine.
-
src/shared/settings.js
should be renamed tosrc/shared/config.js
to match with other things that are named config.This is function that takes a
document
and returns aconfig
object by reading all the"js-hypothesis-config"
objects in the document. That's fine. -
Add a new file
src/sidebar/config.js
, so there will now be three matchingconfig.js
files:src/shared/config.js
, a shared helper functionsrc/annotator/config.js
, the ONE FILE that does all the config for thesrc/annotator
code, callssrc/shared/config.js
to do some of the worksrc/sidebar/config.js
, the ONE FILE that does all the config for thesrc/sidebar
code, callssrc/shared/config.js
to do some of the work
Move code that does config from
src/sidebar/app.js
intosrc/sidebar/config.js
.sidebar/app.js
just callssidebar/config.js
, same asannotator/main.js
just callsannotator/config.js
. -
Change it to
config
everywhere, they're allconfig.js
files that exportconfig()
functions that returnconfig
objects, when those objects are assigned to variables in other files (main.js
,app.js
, ...) those variables should be calledconfig
.There's a collision between a
config
variable and aconfig
function, for example inmain.js
:var config = require('./config.js'); config = config();
. A possible way around this would be to haveannotator/config.js
andsidebar/config.js
actually export config objects directly rather than exporting functions? -
Consider renaming
sidebar/app.js
tosidebar/main.js
, to matchannotator/main.js
-
Get rid of
src/sidebar/host-config.js
. This is the file that parses the JSON?config=
param on the<iframe>
'ssrc
. Get rid of the code (inhost.coffee
) that serializes and sets this URL param as well.Instead have
sidebar/config.js
, which is the ONE FILE that does all the config work for thesrc/sidebsr
code, just callsrc/annotator/config.js
over the bridge to get a config object. Have only one means of inter-frame communication: the bridge.This would also handily remove the whole sidebar / pdfsidebar / host / annotator family out of the config system.
-
src/sidebar/host-config.js
currently uses a whitelist of all the config settings that it will read from the parent document. Whitelists are a good idea - not only are they a simple way to protect against parent documents setting config settings that we don't want them to, they're also a convenient place to list and document all the config settings that exist. Use whitelists in bothannotator/config.js
andsidebar/config.js
(shared/config.js
is just a helper function and can remain dumb, expecting its callers to apply whitelisting to what it returns). -
Always use "parent" not "host" to refer to the "parent document" of the iframe.
-
Instead of
<script class="js-hypothes-config" type="application/json">
just use HTML attributes on theembed.js
script itself:<script src="/embed.js" data-open-login-form data-sidebar-app-url="https://..." ...>
From the point of view of the user trying to embed Hypothesis in their site this is much easier to do and less easy to get wrong (all you have to do is add attributes to the existing script tag).
This would also make it slightly more difficult for malicious sites to configure the client. Rather than just having a
"js-hypothesis-config"
in their page that we will dutifully read, instead they would have to intercept our<script>
tag injection and change its attributes before our script runs. Apparently they can still do this though, at least in the case of the bookmarklet, but we may be able to prevent it in the case of the browser extension.What about the config settings that we render (currently as further
"js-hypothesis-config"
objects) in the iframe's document rather than in the parent document? Perhaps they can just bedata-*
attrs on the<hypothesis-app>
element?The
onLogin
callback can't be done as an HTML attribute though, would still require thewindow.hypothesisConfig
function.The
services
array can't easily be done in this way because it's an array not a single value. There's no reason why we can't allowservices
things to be specified as single values in the (probably most common?) case of when there is only a single service though:<script src="/embed.js" data-authority="partner.org" data-grant-token="xxx" ...>
Then for the more advanced case, if someone wants to specify multiple services then they can use
window.hypothesisConfig
which we already have to support foronLogin
anyway. (Alternatively you can setdata-services
to theid
of a<datalist>
element in the page.)