Skip to content

Instantly share code, notes, and snippets.

@mcritchlow
Last active February 27, 2018 16:56
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mcritchlow/61068fbfce6833129426ba1f79c74683 to your computer and use it in GitHub Desktop.
Save mcritchlow/61068fbfce6833129426ba1f79c74683 to your computer and use it in GitHub Desktop.
Hyrax Analytics Modeling Options

Summary

The current model pattern for Google Analytics statistics support in Hyrax follows essentially the following database format:

  create_table "work_view_stats", force: :cascade do |t|
    t.datetime "date"
    t.integer "work_views"
    t.string "work_id"
    t.datetime "created_at", null: false
    t.datetime "updated_at", null: false
    t.integer "user_id"
    t.index ["user_id"], name: "index_work_view_stats_on_user_id"
    t.index ["work_id"], name: "index_work_view_stats_on_work_id"
  end

In this format, there is a single unique metric from Google Analytics stored, in this case work_views. The other important uniquely identifying information is a date, a work_id (or file_id) and a user_id.

Currently, there are three existing analytics tables: work_view_stats, file_view_stats and file_download_stats. This implies a pattern of "one database table per metric". The existing Hyrax::Statistic API assumes this. For example the to_flot method, which is relied upon by several statistics presenters, all call directly in to this method assuming it returns a single array data structure with a single date and metric. This cascades through the Statistics class.

In the current Analytics Sprint we need to track additional metrics, including:

  • Returning Visitors
  • Unique Visitors
  • Site-wide Unique Visitors
  • Site-wide Returning Visitors
  • Visibility
  • others?

The group also needs to support multiple remote analytics backends, for now specifically - Matomo

This leaves our group needing to make a decision about the pattern for modeling these metrics. It seems we have (at least) three options:

  1. Continue the existing pattern of "one table per metric"
  2. Create new tables that are more inclusive. Example WorkStat would include views, unique visitors, returning visitors, possibly visibility, etc.
  3. Update existing tables to support the new attributes. So WorkViewStat might have the new attributes added to it, be renamed via a migration, etc.

Some thoughts on each below:

One table per metric

Pros:

  • Follows existing pattern and as a result, most of the backend code would continue to work as-is
  • New presenters created could leverage the same query patterns as existing presenters

Cons:

  • Seems wasteful from a database modeling perspective. Couple that with the existing rows with 0 entries, and it's adding up to a pretty large DB footprint over time.
  • Creates the need for several queries to be able to respond to a presenter saying "Give me all the statistics for Work 123 on February 26, 2018."
  • Site-wide metrics don't fit into this pattern where a user_id and work_id or page_id are used as primary attributes uniquely identifying a table row.

Brand new tables

Pros:

  • Allows for a potentially more ideal database model that more accurately maps to the needs of the front end query system
  • Should be more performant (how much is unknown)
  • Would allow modeling site-wide metrics differently (just using date as unique id/filter)
  • Would allow new remote caching code to be completely segregated from the existing Hyrax::Statistic class subclasses (could probably be a single delegation which we already have a hook in place for).

Cons:

  • Would require existing Hyrax users to completely rebuild their local "cache", at least eventually to be able to utilize the new reporting dashboard(s) and metrics.
  • Possibly a complicated migration path for users, would require thorough testing and very well documented migration path.
  • The Hyrax::Statistic code will need to change to not rely on a single column for data, such as the to_flot execution path

Merge/Update existing tables

Pros:

  • If done properly the migration path may be slightly less involved for end users. This seems particularly true for the work_view_stat table becoming a single table to hold all work metrics
  • Allows for a potentially more ideal database model that more accurately maps to the needs of the front end query system
  • Should be more performant (how much is unknown)
  • ActiveRecord has good support for renaming tables via the rename_table transformation.

Cons:

  • There are already two file statistics tables. So there is the question of which to merge into, and whether ultimately that is better, or any different, than just creating a new table.
  • The Hyrax::Statistic code will still need to change to not rely on a single column for data, such as the to_flot execution path
  • Possibly a complicated migration path for users, would require thorough testing and very well documented migration path.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment