Skip to content

Instantly share code, notes, and snippets.

@binford2k
Last active January 13, 2020 17:57
Show Gist options
  • Save binford2k/84b0aa40fbbf36424f414aee3fe38cf3 to your computer and use it in GitHub Desktop.
Save binford2k/84b0aa40fbbf36424f414aee3fe38cf3 to your computer and use it in GitHub Desktop.
These usage patterns are required for our content teams and external module authors to allocate developer resources efficiently. This schema is the part of the dataset that will be made public for community use. We attempt to provide as much useful information as possible without compromising sensitive information about users' infrastructures.
# This schema represents a regular site checkin. The service collects far more data,
# but most of it remains in private datasets. The site_id is a hashed & anonymous value.
# It cannot be reversed into something identifiable, even by us. We're limiting the data
# in this table to minimize the ability to "fingerprint" a specific site. We explicitly
# do not reveal node count in any way.
#
# This table provides only a highlevel overview of what user's codebases look like. It
# can be used to answer questions like:
# - are the people using my module on windows also using it on other platforms?
# - does my module need to support a wide range of agent versions or can I tell people
# to pin it to old versions that support their systems>
# - how many people using my module are already using another module with resource
# types I'd like to depend on?
#
- site_id # required as an index -- so that next time we checkin we don't duplicate data
- timestamp
- platforms:
- name
- ratio # percentage of this platform at this infra
- agents:
- version # version of the puppet agent
- ratio # percentage of this version at this infra
- master_version # version of puppet server running
- modules_installed: # deduplicated union of all environments -- env names can be sensitive
- name # only public modules that exist on the Forge are included
- version
- source # did the module come from the Forge or Github?
- skipped_modules # a count of how many private modules were ignored
# This table and the others like it are *not* populated on checkin. Instead a weekly
# cron job performs data aggregation on privately stored datasets to calculate this
# aggregate information. It's utterly anonymous and aggregated.
#
# This table can answer questions like:
# - How many CentOS users are using my module?
# - How many sites have my module installed, but don't use it?
# - What versions of my module are in active use?
#
- timestamp
- module:
- name
- version
- count # how many individual nodes are classes in this module enforced on
- platform:
- name
- version
- count # how many individual nodes of this platform are classes in this module enforced on
- agent:
- version
- count # how many individual nodes running this agent version are classes in this module enforced on
- site:
- count # how many individual sites is this module installed in
# This table and the others like it are *not* populated on checkin. Instead a weekly
# cron job performs data aggregation on privately stored datasets to calculate this
# aggregate information. It's utterly anonymous and aggregated.
#
# This table can answer questions like:
# - Which components of my module are used the most?
# - Are there parts of my module that people don't use?
# - Are people enforcing parts of my module on platforms that I don't support?
#
- timestamp
- class:
- name
- count # how many individual nodes is this class enforced on
- platform:
- name
- version
- count # how many individual nodes of this platform is this class enforced on
- agent:
- version
- count # how many individual nodes running this agent version are classes in this module enforced on
- site:
- count # how many unique sites is this class enforced on
# This table and the others like it are *not* populated on checkin. Instead a weekly
# cron job performs data aggregation on privately stored datasets to calculate this
# aggregate information. It's utterly anonymous and aggregated.
#
# This table can answer questions like:
# - Are other modules using my resource types?
# - What are the usage patterns of my defined types?
# - How many different sites are using this type?
#
- timestamp
- resource:
- name
- count # how many individual nodes is this class enforced on
- platform:
- name
- version
- count # how many individual nodes of this platform is this class enforced on
- agent:
- version
- count # how many individual nodes running this agent version are classes in this module enforced on
- site:
- count # how many unique sites is this resource enforced on
- average # how many instances does the average site enforce (include median, std deviation, etc?)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment