Skip to content

Instantly share code, notes, and snippets.

@mnot

mnot/privacy.md Secret

Last active October 31, 2018 17:50
Show Gist options
  • Star 5 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mnot/96440a5ca74fcf328d23 to your computer and use it in GitHub Desktop.
Save mnot/96440a5ca74fcf328d23 to your computer and use it in GitHub Desktop.

User Data Controls in Web Browsers

This document is a proto-specification to sketch out the space; it started as a spec of Privacy Mode, but that turned out to be too aggressive (as explained below). It doesn't have a home yet; feedback encouraged (below, on Twitter to @mnot, or to mnot@mnot.net).

This document identifies types of commonly implemented user data controls in Web browsers, discusses the threat models that motivate them, and explains how Web platform components can specify interactions with them.

Table of Contents

Introduction

One reason the Web has been so successful is that Web browsers act as an agent of the user; by controlling access to functionality and data on their behalf, users have reasonable assurances that their data is handled in predictable ways.

For example, thanks to the Same Origin Policy, we know that https://example.com/ cannot see what we do on https://secret.example.net/ -- at least without collusion between them. We know that our data can be cached locally, to provide a better experience and save network resources. We know that if a page is served over HTTPS, it has certain implications for how the network is used.

Over time, it has become apparent that these protections are not "one size fits all"; some users (and use cases) demand more careful handling of their data.

Browser vendors have met this demand by offering a variety of controls over user data. "Privacy" modes are now common, as are interfaces to clear cookies and other local state.

These user interfaces are not specified in standards -- for good reason. Experience has shown that committee specifications are too blunt and clumsy a tool for defining how humans are to interact with a computer, and flexibility in this area provides an opportunity for experimentation and incremental improvement.

That said, a number of issues remain:

  • Currently, there is no well-recognised way to specify that a new Web platform feature has potential exposure to these controls.

  • Web browser have significantly varying approaches to control over user data. While this is to be expected and even encouraged to some degree (as explained above), too much variance risks user confusion, especially regarding such a complex topic.

  • When there is significant variance in how controls behave, it becomes more difficult for Web pages to gracefully degrade when they are used.

  • Because these controls are not backed by standards, there is a risk that they (or their use) will be seen as illegitimate by some parties.

This document seeks to make incremental progress in these areas by identifying the major types of controls for users' data on the Web in Section 2, and documenting common existing controls in Appendix A.

It does not require a browser to offer all or even any of these controls; neither does it specify how to present controls for them to users. Instead, it defines when Web Platform components should speicfy interaction with these controls in Section 3, and specifies how to do so in Section 4.

Types of User Data Controls

This section classifies major kinds of controls that browsers can offer to users over their data, and explains the threat models that they attempt to mitigate.

Site Data Controls

A site data control allows users to manage site-related data in the browser, in order to prevent a Web site from connecting their activity from one browsing session to another.

For example, if a Web site is able to set cookies on Bruce's browser, they can use that to track his behaviour. Site data controls provide a way for him to manage such site-related data in his browser.

Note that this does not include data that sites gather and communicate outside the browser. For example, a site could communicate about Bruce's behaviour to a third party separately; while this could potentially be a misuse of his data, it is not facilitated by or visible to the browser.

It also excludes cases where users willingly give data to a site; for example, by filling out a HTML form, or given permission to use geolocation information, cameras or microphones.

Finally, it excludes user tracking through fingerprinting or other forms of Unsanctioned Tracking.

Typically, site data controls affect parts of the Web platform like:

Local Data Controls

A local data control allows users to manage browser-related data on the local system, in order to prevent disclosure of user data to someone who uses the browser or system that it runs on after the user has concluded their browsing session.

For example, suppose Bob uses a browser to shop for a gift for Mary, and then terminates his browsing session. If Mary is able to examine the filesystem or memory on the computer to find out what he bought, his data would be at risk. A local control attempts to mitigate this threat.

Note that this presumes that Bob trusts his computer. If another party has access to and appropriate privilege on it beforehand, they could modify the operating system and/or the browser to retain Bob's data. There are no known ways to mitigate using an untrusted system.

By their nature, local data controls manage the types of data listed in site data controls, and also affect parts of the Web platform like:

Network Data Controls

A network data control allows users to manage how their browser uses the network, in order to prevent disclosure to network components along the path between the browser and the origin server without knowledge or consent.

For example, if Jane uses the Web to research an illness, and her network provider examines the data sent between Jane's computer and the Web to learn more about her, her data would be at risk fo disclosure. A network control attempts to mitigate this threat.

Note that this does not include network components that are acting on behalf of the user or the origin server; for example, an explicitly configured HTTP proxy, or a Content Delivery Network. However, it does include things like intercepting proxies, which are interposed without the user's knowledge or consent.

Typically, network data controls affects use of parts of the Web platform like:

Should My Feature Specify Interaction With a User Data Control?

Web platform features that expose user data to Web sites SHOULD specify an interaction with site data controls. This includes data that is stored on behalf of the site by the browser (e.g., Cookies, LocalStorage, HTTP ETags), but does not include data about the browser itself (such as the User-Agent string), even though this can be used for unsanctioned tracking by sites.

Web platform features that create or potentially create user data on the local machine (whether in the browser or external to it) SHOULD specify an interaction with local data controls. This includes any feature that creates durable local state on the system (i.e., that which persists beyond the browsing session). It includes the browsers' copies of data created outside it (e.g., form data, geolocation data), but does not include the original data (which is usually not accessible to the browser).

Web platform features generally should not expose user data to the network, as explained in Privileged Contexts. Pre-existing platform features that cannot be updated to meet those requirements (e.g., because doing so would break too much of the Web) SHOULD specify an interaction with network data controls, if they are revised.

Specifying Interaction with User Data Controls

This section explains how Web specifications should specify potential interactions with user data controls.

Note that any number of controls might be relevant to a particular feature; they are not exclusive.

Specifying interaction with controls is not a replacement for carefully considered defaults for data sharing in the Web platform. See the Security and Privacy Questionnaire and Fingerprinting Guidance for more information.

When a Web platform specification identifies risk related to one of the threat models above, it SHOULD indicate this clearly, nominating the type of user data control that could be used. For example:

ExampleAPI keeps user data locally [ref]. Implementations that provide local data controls 
SHOULD allow management of ExampleAPI. 

If multiple types of controls are applicable, they SHOULD be enumerated:

ExampleAPI keeps user data locally [ref], and also exposes user data to sites [ref]. 
Implementations that provide local and/or site data controls SHOULD allow management
of ExampleAPI.

Specifications SHOULD NOT detail interaction with specific user data controls, because their nature tends to change over time.

Specifications MAY illustrate the effects of user data controls and suggest (but not require) interactions with them. For example:

ExampleAPI keeps user data locally [ref]. Implementations that provide local data controls
SHOULD allow management of ExampleAPI. When used, such controls remove ExampleStorage state
from permanent and volatile storage immediately, and might offer the selective ability to
remove state (e.g., by origin or domain).

Implementations SHOULD NOT disable functionality or APIs when a user data control is in effect; doing so is likely to break existing Web content, or force workarounds. For example, if LocalStorage is not available in privacy mode, Web applications that use it will not work. Instead, the user data created during private browing ought to be discarded after the session is complete.

Implementations of user data controls are not required to act upon all data from features that are identified as interacting with them; for example, a "clear site data" facility may be selective, and choose not to delete a bookmark about a visited site, because removing it would surprise the user.

Appendix A: Common User Data Controls

There are a number of browser extensions that offer users greater control over various forms of data; for example, cookie tracking blockers and HTTPS Everywhere. In some cases, this leads to the creation of new browsers that focus on control of user data; e.g., TorBrowser.

Likewise, users can use network technologies like VPNs and proxies (including Tor over SOCKS) as network data controls.

Defining their behaviour is are out of scope for Web platform specifications. However, there are two existing user data controls which are common enough for this specification to explain; "clearing site data" and "privacy mode".

In time, more user data controls might be identified.

Clearing Site Data

Many browsers have had a "clear cookies" function for some time that provides a way to manage local HTTP cookies. Over time, the Web platform has added new local state mechanisms, and these functions have correspondingly grown to allow management of all such state.

"Clearing Site Data" is primarily a local data control, but also serves as a limited form of site data control.

Clearing Site Data is both a site data control anda local data control, because it makes browser state unavailable both locally and to sites visited.

It is characterised by the permanent removal of browser state. Browsers might allow finer-grained control over clearing site data; for example, by limiting the removed data to certain types, or limiting it the data generated in a particular time period.

Privacy Mode

Many browsers have a so-called "Privacy Mode", although they use various names for it; e.g., "Incognito Window" (Chrome), "Private Window" (Safari and Firefox), and "InPrivate" (Edge).

Privacy Mode is primarily a local data control. It is characterised by the creation of a new browsing context whose local user data, when the session is terminated (e.g., by closing its window), will be destroyed.

Many browsers also use Privacy Mode as a site data control, by starting the private browsing session without any local state. However, at least one implementation does not do this; instead, it begins a private browsing session with a copy of the users' existing state, destroying the copy when the session has ended.

@twirl
Copy link

twirl commented Mar 9, 2016

What about features which save user data on remote servers? At present point there is Push API which allows passing message body through remote Push server, and I can easily imagine more of them, like page translation or image recognition, if they are standardized.

@mnot
Copy link
Author

mnot commented Mar 16, 2016

WebPush requires that to be encrypted IIRC. I'm just about to upload a rewrite that makes that a bit clearer; would love to hear what you think.

@hadleybeeman
Copy link

You might want to add a section on process/next steps for context... Something like:

State of the technology

This field is still actively in development. Each browser currently has their own mode to protect users against some subset of these threats. We also see privacy threats that are currently outside any browsers' privacy mode, like mitigating against fingerprinting. As new attacks develop and new specs continue to expand the threat surface area, these features should continue to evolve to protect users.

In due course, formally standardising privacy mode may help to set users' expectations across all browsers and provide developers with a consistent environment in which to build sites.

Or some such.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment