dzeber/data-collection.md Secret

## data-collection.md

      
    Raw
  

              data-collection.md
            
          
    Data Collection for Unified Search v2

The goal for data collection is to submit all relevant data via the Shield pings. This will avoid having to work with the raw UT pings.
Note: In future, this should be unnecessary, since the Experiment CEP output plugin will redirect UT pings from profiles participating in any experiment (TxP, Shield, etc) to a separate bucket.
Ping types

The v2 Unified Search experiment submits pings triggered by certain events, some of which are a standard part of Shield studies.
All of these have the shield-study docType. The event which triggered the ping is listed in the payload.study_state field.


payload.study_state
When is this ping sent?


install
when the user enters the study (installing the add-on)


end-of-study
when the study expires and the user has not opted out (add-on is uninstalled, previous browser state restored)


user-ended-study
when the user opts out of the study (add-on is uninstalled or disabled, previous browser state restored)


ineligible
when it is determined that the user does not match the eligibility criteria required for the study; usually gets sent either instead of the install ping or right after


running
at the start of a session or when the study is installed


shutdown
at the end of a session


daily
once per active day, at the time of the daily subsession split, but not sent at the end of a session (shutdown)


daily-shutdown
at the end of a session


All of these pings follow the Common ping format, which includes among other fields the UT environment block. The payload object contains some standard Shield fields, such as study name, version and state, as well as any custom fields.
Standard Telemetry data

In order to avoid having to extract data from UT main pings, necessary portions of standard Telemetry are injected into certain Shield pings. This way, the Shield pings form the sole data source for the study analysis (aside from possibly looking up pre-period measurements in the main_summary derived dataset). Note that participants get Extended Telemetry enabled upon entering the study.
Shield pings that include this UT data contain the following fields in their payload (listed together with their most relevant contents), which are also fields in the standard main ping payload:

info: session IDs and counters, subsession start date and length
histograms: in particular, histograms relating to Awesomebar usage like FX_URLBAR_SELECTED_RESULT_TYPE
keyedHistograms: SEARCH_COUNTS
simpleMeasurements: activeTicks and UITelemetry, which contains counts of different types of search, including in-content search
processes: Scalars; also subprocess histograms (but probably not relevant here)

Reporting UT in Shield pings

The above data is injected into the Shield pings in the following way:

pings with study_state == "daily" are generated when a daily subsession split occurs, and contain Telemetry data from the previous subsession. These pings will have payload.info.reason == "daily".
pings with study_state == "daily-shutdown" are sent at session end, and contain the Telemetry data from the last subsession in that session. These have payload.info.reason == "shutdown".
pings with study_state == "shutdown" are also sent at session end, and contain the saved-session Telemetry data, which is aggregated over the entire session. These have payload.info.reason == "gather-payload".

Caveats


Some subsession data will not get reported at all under this scheme. If a subsession split occurs with reason other than those listed above (eg. environment-change or aborted-session), we will not see data from that subsession in the Shield pings, aside from in aggregate in the shutdown pings.
Some timers reset in each subsession, and some don't. Histograms reset in each subsession, and main ping histograms for a given session should add up to the saved-session value. However, the search counts in UITelemetry don't reset, and keep accumulating across the entire session.

Study data

As well as the standard UT and Shield data, the payload contains a number of other custom fields describing the user's experience during the study:


Field
Description
Details


changesApplied
has a treatment has been applied?
- always false on the control branch
- always true on treatment branches in one-phase design
- true on treatment branches in two-phase design during second phase


diagnostics.allWindowsClosed
were all windows were closed on shutdown?
should only be true for "shutdown" or "daily-shutdown" study states


diagnostics.searchBarRemovedManually
was the searchbar removed by the user after the experiment started?
can happen on any branch, but doesn't invalidate the unified or minimal branches


diagnostics.searchBarRemovedByExperiment
was the searchbar actually removed as a part of applying the treatment?
should be true iff the branch is unified or minimal


diagnostics.searchBarWidth
width of the searchbar in pixels
should only be non-zero for control and oneoff


diagnostics.oneoff
value of browser.urlbar.oneOffSearches
starts off true for everything but control, but may be changed by the user after the experiment has started


diagnostics.suggestions
value of browser.urlbar.suggest.searches
starts off true for everything but control, but may be changed by the user after the experiment has started


diagnostics.maxRichResults
value of browser.urlbar.maxRichResults
starts off as 6 for minimal and 10 for everything else, but may be changed by the user or Sync after the experiment has started


firstrunRevision
the first installed version of the add-on
if this is 1, it's a direct update from the v1 study; many preferences may have been left in an unexpected state


onboardingBranch
did this user see the onboarding message about search suggestions in the URLbar?
may be true or false on any branch


revision
add-on version
has less granularity than study_version, may be redundant


testing
is this is a test profile?
should be set to true manually when testing


Sample payload

{
  "about": {
    "_src": "shield",
    "_v": 2
  },
  "branch": "unified", // "control", "oneoff", "unified", "minimal" 
  "changesApplied": true,
  "diagnostics": {
    "allWindowsClosed": false,
    "searchBarRemovedManually": false,
    "searchBarRemovedByExperiment": true,
    "searchBarWidth": 0,
    "oneoff": true,
    "suggestions": true,
    "maxRichResults": 10,
  },
  "firstrunRevision": 2,
  "onboardingBranch": true,
  "revision": 2,
  "study_name": "@unified-urlbar-shield-study",
  "study_state": "running",  // "install", "running", "daily", "daily-shutdown", "shutdown", "user-ended-study", "end-of-study", "ineligible"
  "study_version": "2.2.0",
  "testing": true,

  // The following are only included if the study_state is "daily", "daily-shutdown", or "shutdown".
  "info": {...},
  "histograms": {...},
  "keyedHistograms": {...},
  "processes": {
    "content": {...},
    "parent": {
      "scalars": {...},
      ...
    }
  },
  "simpleMeasurements": {
    "UITelemetry": {
      "toolbars": {
        "countableEvents": {
            "__DEFAULT__": {
              "search": {
                ...
              },
              "search-oneoff": {
                ...
              },
              ...
            },
            ...
        },
        ...
      },
      ...
    },
    ...
  }
}
`payload.study_state`	When is this ping sent?
`install`	when the user enters the study (installing the add-on)
`end-of-study`	when the study expires and the user has not opted out (add-on is uninstalled, previous browser state restored)
`user-ended-study`	when the user opts out of the study (add-on is uninstalled or disabled, previous browser state restored)
`ineligible`	when it is determined that the user does not match the eligibility criteria required for the study; usually gets sent either instead of the `install` ping or right after
`running`	at the start of a session or when the study is installed
`shutdown`	at the end of a session
`daily`	once per active day, at the time of the daily subsession split, but not sent at the end of a session (shutdown)
`daily-shutdown`	at the end of a session
Field	Description	Details
`changesApplied`	has a treatment has been applied?	- always `false` on the `control` branch - always `true` on treatment branches in one-phase design - `true` on treatment branches in two-phase design during second phase
`diagnostics.allWindowsClosed`	were all windows were closed on shutdown?	should only be `true` for `"shutdown"` or `"daily-shutdown"` study states
`diagnostics.searchBarRemovedManually`	was the searchbar removed by the user after the experiment started?	can happen on any branch, but doesn't invalidate the `unified` or `minimal` branches
`diagnostics.searchBarRemovedByExperiment`	was the searchbar actually removed as a part of applying the treatment?	should be `true` iff the branch is `unified` or `minimal`
`diagnostics.searchBarWidth`	width of the searchbar in pixels	should only be non-zero for `control` and `oneoff`
`diagnostics.oneoff`	value of `browser.urlbar.oneOffSearches`	starts off `true` for everything but `control`, but may be changed by the user after the experiment has started
`diagnostics.suggestions`	value of `browser.urlbar.suggest.searches`	starts off `true` for everything but `control`, but may be changed by the user after the experiment has started
`diagnostics.maxRichResults`	value of `browser.urlbar.maxRichResults`	starts off as `6` for `minimal` and `10` for everything else, but may be changed by the user or Sync after the experiment has started
`firstrunRevision`	the first installed version of the add-on	if this is `1`, it's a direct update from the v1 study; many preferences may have been left in an unexpected state
`onboardingBranch`	did this user see the onboarding message about search suggestions in the URLbar?	may be `true` or `false` on any branch
`revision`	add-on version	has less granularity than `study_version`, may be redundant
`testing`	is this is a test profile?	should be set to `true` manually when testing