seanh/how-the-client-is-configured.markdown

## how-the-client-is-configured.markdown

      
    Raw
  

              how-the-client-is-configured.markdown
            
          
    How the client is configured

There are several different files in the client source code that read
configuration (also sometimes called "options" or "settings") from different
places.
An additional difficulty is that the client consists of two separate components:


The code in src/annotator/ is JavaScript code that's injected directly
into the parent document and can read configuration from the document (for
example config objects as <script type="application/json"> elements in the
parent document).


The code in src/sidebar/ (which is the AngularJS sidebar app) is
JavaScript code that's in a different-origin <iframe> of the parent document
and cannot directly read config from the parent document (because of the
same origin policy).


TODO: Also probably Chrome extension code (separate repo) that reads its own
config from some other place.


Config settings need to be passed from the code in the parent document to the
code in the iframe. We'll see how this is done below.
This is how client configuration is processed on page load:


The parent document can contain JSON documents containing config objects.
If the parent document contains a
<script class="js-hypothes-config" type="application/json">
element then the client will read configuration from it at startup
(see below).
This is intended to be used by publisher sites that embed the Hypothesis
client and want to customize its configuration.
Note that the settings that the client will accept from parent documents
are whitelisted
because the parent document is untrusted. But also note that this whitelist
is currently only used by the AngularJS sidebar app code running in the
<iframe>, it's not used by the other half of the client code that runs in
the parent document (and that also reads config from the parent document,
and that creates the iframe!)
The browser extension uses this
when a direct link (a URL ending with #annotations:<ANNOTATION_ID>) is
visited. It reads the <ANNOTATION_ID> from the URL fragment and copies it
into a "js-hypothesis-config" object for the client to pick up.
This is because some websites change or remove the URL's # fragment before
the client starts up and has a chance to read the <ANNOTATION_ID> from the
fragment.
(Note that if you're not following a direct link, like if you just visit a
web page and activate the Chrome extension, the extension still injects an
empty {} "js-hypothesis-config" object into the page.)


The iframe's own document can also contain the same "js-hypothesis-config"
<script> elements and those will also be read by the client at startup as
well (see below).
For example the Hypothesis web service puts one of these JSON documents in
the page at view-source:https://hypothes.is/app.html to configure the
instance of the client that's served there:
 <script class="js-hypothesis-config" type="application/json">
   {
     "googleAnalytics": "<GOOGLE_ANALYTICS_TRACKING_ID>",
     "serviceUrl": "https://hypothes.is/",
     "release": "<DOCKER_HUB_TAG>",
     "authDomain": "hypothes.is",
     "websocketUrl": "wss://hypothes.is/ws",
     "apiUrl": "https://hypothes.is/api/",
     "raven": {
       "dsn": "<SENTRY_DSN>",
       "release": "<DOCKER_HUB_TAG>"
       }
     }
 </script>
Here's the line of code that inserts it.
The settings that the web service places in this config object come from
the web service's own configuration settings,
see h/views/client.py.
As well as inserting the "js-hypothesis-config" object into the parent
document (as explained above), the browser extension also
inserts one into the iframe's own document.


The parent document can contain a window.hypothesisConfig function that
returns a config object.
This is another way for a page that embeds Hypothesis to customize its
configuration. Unlike with the "js-hypothesis-config" objects, the
window.hypothesisConfig function can specify JavaScript callback functions
for the client to call, which is used for the onLogin callback.
window.hypothesisConfig = function () {
  return { ... };  // Config settings
};
There's a working example of this in the test publisher site.


shared/settings.js
exports a settings(document) function that finds all
<script "js-hypothesis-config"> elements in the given document, parses
them as JSON, and returns a config object.


src/annotator/config.js
exports a config(window) function that returns an options object by
following these steps:


Finds the <link type="application/annotator+html" src="..."> element in
the window.document of the given window and sets its href URL as
options.app.
Aside about <link type="application/annotator+html">:
This <link> element seems to be used for two different things:
First, the href of this <link> element is used as the src attribute for
the sidebar AngularJS app's <iframe>
when host.coffee injects the <iframe> into the parent document.
Second, boot.js looks for a <link type="application/annotator+html"> in the
page in detect the presence of another instance of Hypothesis already
in the page.
It aborts booting the Hypothesis client if there's already one in the
page.
I believe this is intended to have the Chrome extension abort on launch,
if there is already an embedded instance of Hypothesis in the page?
But it seems like there may be a race condition here between the Chrome
extension and embedded instance of Hypothesis?
Also it seems like, to thwart attacks based on sites embedding their own
malicious copies of the client and hoping users don't notice that they're
not using the real client, launching the Chrome extension should override
any existing embedded client in the page (but doing this overriding
robustly is probably difficult).


Calls the settings() function from shared/settings.js on the
window.document of the given window. Merges the returned settings
object into options.


Calls the window.hypothesisConfig function of the given window,
and merges the returned config object into options.


Extracts the "direct linked ID" from the URL.
This is the #annotations:<id> that bouncer appends to the
end of the document's URL when sending the user directly to an
annotation in context.
The Chrome extension or Via may already have extracted this
annotation ID and placed it in one of the other forms of config above
(a "js-hypothesis-config" element, probably) but sometimes the client
itself still needs to extract it from the URL (not sure why).


Returns the combined options object.


src/annotator/main.js calls config(window)
passing it the global window object of the parent document, getting the
options object.


main.js passes the options object to either
PdfSidebar
(src/annotator/pdf-sidebar.coffee)
or Sidebar
(src/annotator/sidebar.coffee).
Note: don't be fooled into thinking this is the AngularJS sidebar app.
This Sidebar class is still in the src/annotator/ code that's injected
into the parent document, it's not in the <iframe>.
Note: be sure to notice that the PdfSidebar class above is a subclass
of the Sidebar class above which is a subclass of
Host
(the one in src/annotator/host.coffee, not the one in src/sidebar/host.coffee)
which is a subclass of
Guest
which is a subclass of Annotator (TODO: I'm not sure where the
Annotator class is defined or whether it's the bottom of this class
hierarchy).


Host
JSON stringifies the whole options object except options.app and sets
the resulting JSON string as the value of a ?config=<JSON_STRING>
parameter on the URL of the AngularJS sidebar app's <iframe> element.
<iframe src="https://hypothes.is/...?config=<JSON_STRING>">
This is how settings get passed from the src/annotator/ code that's
embedded in the parent document into the src/sidebar/ code which is the
AngularJS sidebar app that runs in the <iframe>.


Now let's go over to the AngularJS sidebar app running in the iframe and see
how that configures itself from those JSON-stringified options in the
iframe src URL, plus multiple other sources!
sidebar/app.js
(which is the Browserify "entrypoint" file a.k.a the "top" / first JavaScript
file that the browser runs when the iframe is injected) calls
shared/settings.js
("the settings service", an AngularJS service, but also just a normal
Browserify-requireable JavaScript function) to read the client configuration
settings from the document:
var settings = require('../shared/settings')(document);
Note: This is the second time that shared/settings.js has been called!
It is first called by the src/annotator code which passes in the parent
document, and now it's called by the src/sidebar code which passes in
the iframe's document.


As it did before settings() finds all the JSON "scripts" of the form
<script class="js-hypothesis-config"> in the given document
(this time the iframe's document), merges them all together, and returns them
as a config object.
Note: While the object that shared/settings.js returns is named
config inside shared/settings.js (e.g. return config;) it is renamed
to settings in sidebar/app.js:
var settings = require('../shared/settings')(document);


Next, sidebar/app.js also calls sidebar/host-config.js.
host-config reads the JSON-stringified options object from the iframe src
URL. It parses the JSON and returns the options object.
sidebar/app.js
merges the config object from sidebar/host-config.js with the config object that it already got shared/settings.js:
var hostPageConfig = require('./host-config');
Object.assign(settings, hostPageConfig(window));
Note: Inside sidebar/host-config.js the object that was called
options (over in the src/annotator code, before it got JSON stringified)
is now renamed to config. (Well, actually it was renamed to
?config=<JSON_STRING> as soon as it was appended to the URL.)
Note: Although it is called config inside host-config.js this
object gets merged into an object called settings by app.js.
Ultimately, in the end, at least in the AngularJS sidebar app, we end up
with an object called settings.


Finally, sidebar/app.js sets this merged settings object as the "settings" value in AngularJS.
Any src/sidebar code can now depend on settings using AngularJS
injection and receive this settings object.
For example here's sidebar/auth.js using the settings` object.


Requirements

What functional requirements does this configuration system fulfill, that any
simpler replacement implementation would also need to fulfill?


Publisher sites that embed Hypothesis need to be able to configure certain
settings of their embedded client.


openSidebar (whether the sidebar opens automatically on page load)


showHighlights


openLoginForm


This is currently done with <script class="js-hypothesis-config" type="application/json">
tags in the parent page that embeds the client.


Trusted partner sites that embed Hypothesis need to be able to provide an
onLogin JavaScript callback function to their embedded instance of
the client.
This is currently done with a window.hypothesisConfig() function in the
parent page that embeds the client.


Trusted partner sites that embed Hypothesis need to be able to provide an
authority (e.g. "partner.org") and icon to tell the client to use a publisher
group instead of the usual Hypothesis groups.
The publisher example site does this using window.hypothesisConfig


The client needs to be able to detect when a direct link annotation ID is/was
in the URL fragment so it can open and show that annotation. This needs to
be resilient to sites that delete or change the URL fragment.
This is currently done by bouncer putting a # fragment on the URL and then
the Chrome extension or Via copying the # fragment from the URL into a
<script class="js-hypothesis-config" type="application/json">
in the parent page.


The Hypothesis web service needs to be able to configure the instance of
the client that it serves at /app.html.

googleAnalytics: our google analytics tracking id
serviceUrl: the URL of the https://hypothes.is/ site
release (version num)
authDomain": "hypothes.is"
websocketUrl": "wss://hypothes.is/ws",
apiUrl": "https://hypothes.is/api/
Sentry DSN

This is currently done using <script class="js-hypothesis-config" type="application/json">
in the iframe's document (/app.html).


The Chrome extension needs to be able to configure the instance of the client
that it contains and injects into pages. Looks like the same settings as for
the web service above:

apiUrl
authDomain
serviceUrl
release
websocketUrl
sentryPublicDSN
googleAnalytics

This is currently done using <script class="js-hypothesis-config" type="application/json">
in the iframe's document (/app.html).


Dev instances of the Hypothesis web app embed the production client into some
pages (e.g. /docs/help) but they configure that production client to use
the dev web service instead of the production one (e.g. for logging in, and
for the annotations API).
This is currently done by:


Configuring the part of the production client code that runs in the parent
page to inject an <iframe> whose src is localhost:5000/app.html
instead of hypothes.is/app.html. This configuration is done using the
sidebarAppUrl setting in a "js-hypothesis-config" tag in the parent
page (e.g. in the localhost:5000/docs/help page)


Configuring the part of the production client that runs in the <iframe>
(aka the sidebar, the AngularJS app) to use the annotations API at
localhost:5000/api instead of hypothes.is/api. Once step 1 above has
replaced hypothes.is/app.html with localhost:5000/app.html for the
iframe contents, then localhost:5000/app.html contains a "js-hypothesis-config"
with an apiUrl setting in it, and this configures the production client
to use that API URL instead of the production one.


Two client instances in one page needs to be handled somehow.
For example when the user visits a page that has an embedded Hypothesis client
and then the user tries to launch their Chrome extension.


At least some configuration settings need to be passed from the parent
document into the sidebar app iframe. This is true of, for example:

Direct link annotation IDs
Authority and icon for publisher groups

This is currently done by setting some serialized JSON as the value of a
?config= param on the <iframe> src URL.


Problems

Functional problems


Any site can put a <link type="application/annotator+html" src="..."> in
its page and it look like this will cause Hypothesis (for example when
launched by the bookmarklet, Via or the Chrome extension) to inject an
<iframe> with that <link>s href as the <iframe> src instead of the
actual app.html URL. This provides an easy, high fidelity way to replace
the sidebar with a malicious copy (for example the Chrome extension button
will show as active as if you're using the actual Chrome extension).
There seems to be no reason why the <iframe> src URL needs to be
dynamic in this way?


Any site can put js-hypothesis-config objects in its page and the client
will read them as configuration.
The src/sidebar code will only take certain whitelisted settings from the
page (see whitelist above) but the src/annotator code will read more,
including sidebarAppUrl, assetsUrl, onLogin callback functions, etc.
Some example potentially possible attacks:

Providing a malicious onLogin callback, for example popup a window into
which the user may type their username and password
Providing a malicious direct link annotation ID
(for example to a spam annotation, the client will automatically focus on
and show this annotation only)
Providing a malicious sidebar app URL instead of https://hypothes.is/app.html
Providing a malicious asset root URL from which the sidebar app will load
its assets


It's possible to have multiple js-hypothesis-config objects in one page.
The client will read them all and merge them together.
There seems to be no benefit to this ability to have multiple
js-hypothesis-config objects, and it creates the possibility of users (people
trying to embed Hypothesis in their sites) making
a mistake and being confused when it doesn't work. For example you change
the value of a setting in a js-hypothesis-config object, but it doesn't work
because (unknown to you) it's being overridden by another js-hypothesis-config
object further down the page.


From the point of view of someone wanting to embed Hypothesis in their site
this API seems more difficult to use than it needs to be. You need to be able
to insert a second <script> tag, but an "application/json" one not a
normal JavaScript one, into the page. You need to be able to make sure that
tag is in the page before the embed.js script runs. And that tag needs
to contain valid JSON (which is all too easy to get wrong). Some IDEs may
not even allow you to do this.


Code clarity and maintainability problems

In general the whole thing is just way too complex and would benefti from
being replaced by something much smaller, simpler and more contained, which
would be easier to understand and to maintain. But here are some specific
issues:


Use just one name, not settings, options, config, app_config, etc.


There are several different places where client configuration can be provided
but it's not clear (not without considerable time digging) which configuration
options can be set from which locations (is it all from all?) and which
locations override others when they both set the same option.


There's no one place where you could look and find the high level recipe,
in a continuous series of lines of code in one file,
of how the client configures itself and from where (as narrated above).
Discovering this recipe is a major undertaking, requiring significant time
and effort.


Too many different files spread throughout the codebase are involved in
client configuration.
It would be nice to have all settings in one place: one file that reads and
merges all the different settings, and that everything else calls.


It would be nice to have only one means of inter-frame comms (bridge).
Get rid of this ?config param on the iframe src.


host-config is a bad name given that there are already two host.coffee
files (and Host classes) in the codebase but (I believe) "host" has a
completely different intended meaning in those files.
The general term for the stuff outside of an iframe seems to be
parent page/window/document/frame, so I suggest we use parent instead of
host.


The whole inheritance thing around PdfSidebar, Sidebar, Host, Guest and
Annotator.


Suggestions


Need some way of preventing a bunch of settings (sidebarAppUrl, assetsUrl,
onLogin, services...) from being read from the parent page
when it's the Chrome extension that's in use. These settings should be
for embedded clients only. Even the more harmless settings (showHighlights
etc) should probably be ignored by the Chrome extension.
Is there any reason why the Chrome extension should ever read settings from
the parent page?


Just get rid of <link type="application/annotator+html" src="...">.
If we want record whether a client instance has already booted in this page,
and we're going to be using "js-hypothesis-config" objects anyway,
then maybe we should add a "js-hypothesis-config" option for this.
Or else just add a boolean variable to window.


src/annotator/config.js should remain the main place where the
src/annotator code is configured.
This function returns an object named options, it should be named config.
But otherwise this seems fine.


src/shared/settings.js should be renamed to src/shared/config.js to
match with other things that are named config.
This is function that takes a document and returns a config object by
reading all the "js-hypothesis-config" objects in the document.
That's fine.


Add a new file src/sidebar/config.js, so there will now be three matching
config.js files:

src/shared/config.js, a shared helper function
src/annotator/config.js, the ONE FILE that does all the config for the
src/annotator code, calls src/shared/config.js to do some of the work
src/sidebar/config.js, the ONE FILE that does all the config for the
src/sidebar code, calls src/shared/config.js to do some of the work

Move code that does config from src/sidebar/app.js into
src/sidebar/config.js. sidebar/app.js just calls sidebar/config.js,
same as annotator/main.js just calls annotator/config.js.


Change it to config everywhere, they're all config.js files that
export config() functions that return config objects, when those objects
are assigned to variables in other files (main.js, app.js, ...) those
variables should be called config.
There's a collision between a config variable and a config function,
for example in main.js: var config = require('./config.js'); config = config();.
A possible way around this would be to have annotator/config.js and
sidebar/config.js actually export config objects directly rather than
exporting functions?


Consider renaming sidebar/app.js to sidebar/main.js, to match annotator/main.js


Get rid of src/sidebar/host-config.js. This is the file that parses the
JSON ?config= param on the <iframe>'s src. Get rid of the code (in
host.coffee) that serializes and sets this URL param as well.
Instead have sidebar/config.js, which is the ONE FILE that does all the
config work for the src/sidebsr code, just call src/annotator/config.js
over the bridge to get a config object. Have only one means of inter-frame
communication: the bridge.
This would also handily remove the whole sidebar / pdfsidebar / host / annotator
family out of the config system.


src/sidebar/host-config.js currently uses a whitelist of all the config
settings that it will read from the parent document. Whitelists are a good
idea - not only are they a simple way to protect against parent documents
setting config settings that we don't want them to, they're also a convenient
place to list and document all the config settings that exist.
Use whitelists in both annotator/config.js and sidebar/config.js
(shared/config.js is just a helper function and can remain dumb, expecting
its callers to apply whitelisting to what it returns).


Always use "parent" not "host" to refer to the "parent document" of the
iframe.


Instead of <script class="js-hypothes-config" type="application/json">
just use HTML attributes on the embed.js script itself:
<script src="/embed.js"
        data-open-login-form
        data-sidebar-app-url="https://..."
        ...>
From the point of view of the user trying to embed Hypothesis in
their site this is much easier to do and less easy to get wrong (all you have
to do is add attributes to the existing script tag).
This would also make it slightly more difficult for malicious sites to
configure the client. Rather than just having a "js-hypothesis-config" in
their page that we will dutifully read, instead they would have to intercept
our <script> tag injection and change its attributes before our script
runs. Apparently they can still do this though, at least in the case of
the bookmarklet, but we may be able to prevent it in the case of the
browser extension.
What about the config settings that we render (currently as further
"js-hypothesis-config" objects) in the iframe's document rather than in
the parent document? Perhaps they can just be data-* attrs on the
<hypothesis-app> element?
The onLogin callback can't be done as an HTML attribute though, would still
require the window.hypothesisConfig function.
The services array can't easily be done in this way because it's an array
not a single value. There's no reason why we can't allow services things
to be specified as single values in the (probably most common?) case of when
there is only a single service though:
<script src="/embed.js"
        data-authority="partner.org"
        data-grant-token="xxx"
        ...>
Then for the more advanced case, if someone wants to specify multiple services
then they can use window.hypothesisConfig which we already have to support
for onLogin anyway. (Alternatively you can set data-services to the
id of a <datalist> element in the page.)