The current model pattern for Google Analytics statistics support in Hyrax follows essentially the following database format:
create_table "work_view_stats", force: :cascade do |t|
t.datetime "date"
t.integer "work_views"
t.string "work_id"
t.datetime "created_at", null: false
t.datetime "updated_at", null: false
t.integer "user_id"
t.index ["user_id"], name: "index_work_view_stats_on_user_id"
t.index ["work_id"], name: "index_work_view_stats_on_work_id"
end
In this format, there is a single unique metric from Google Analytics stored, in this case work_views
. The other important uniquely identifying information is a date
, a work_id
(or file_id
) and a user_id
.
Currently, there are three existing analytics tables: work_view_stats
, file_view_stats
and file_download_stats
. This implies a pattern of "one database table per metric". The existing Hyrax::Statistic API assumes this. For example the to_flot
method, which is relied upon by several statistics presenters, all call directly in to this method assuming it returns a single array data structure with a single date and metric. This cascades through the Statistics class.
In the current Analytics Sprint we need to track additional metrics, including:
- Returning Visitors
- Unique Visitors
- Site-wide Unique Visitors
- Site-wide Returning Visitors
- Visibility
- others?
The group also needs to support multiple remote analytics backends, for now specifically - Matomo
This leaves our group needing to make a decision about the pattern for modeling these metrics. It seems we have (at least) three options:
- Continue the existing pattern of "one table per metric"
- Create new tables that are more inclusive. Example
WorkStat
would include views, unique visitors, returning visitors, possibly visibility, etc. - Update existing tables to support the new attributes. So
WorkViewStat
might have the new attributes added to it, be renamed via a migration, etc.
Some thoughts on each below:
Pros:
- Follows existing pattern and as a result, most of the backend code would continue to work as-is
- New presenters created could leverage the same query patterns as existing presenters
Cons:
- Seems wasteful from a database modeling perspective. Couple that with the existing rows with
0
entries, and it's adding up to a pretty large DB footprint over time. - Creates the need for several queries to be able to respond to a presenter saying "Give me all the statistics for Work 123 on February 26, 2018."
- Site-wide metrics don't fit into this pattern where a
user_id
andwork_id
orpage_id
are used as primary attributes uniquely identifying a table row.
Pros:
- Allows for a potentially more ideal database model that more accurately maps to the needs of the front end query system
- Should be more performant (how much is unknown)
- Would allow modeling site-wide metrics differently (just using
date
as unique id/filter) - Would allow new remote caching code to be completely segregated from the existing Hyrax::Statistic class subclasses (could probably be a single delegation which we already have a hook in place for).
Cons:
- Would require existing Hyrax users to completely rebuild their local "cache", at least eventually to be able to utilize the new reporting dashboard(s) and metrics.
- Possibly a complicated migration path for users, would require thorough testing and very well documented migration path.
- The
Hyrax::Statistic
code will need to change to not rely on a single column for data, such as theto_flot
execution path
Pros:
- If done properly the migration path may be slightly less involved for end users. This seems particularly true for the
work_view_stat
table becoming a single table to hold all work metrics - Allows for a potentially more ideal database model that more accurately maps to the needs of the front end query system
- Should be more performant (how much is unknown)
- ActiveRecord has good support for renaming tables via the
rename_table
transformation.
Cons:
- There are already two
file
statistics tables. So there is the question of which to merge into, and whether ultimately that is better, or any different, than just creating a new table. - The
Hyrax::Statistic
code will still need to change to not rely on a single column for data, such as theto_flot
execution path - Possibly a complicated migration path for users, would require thorough testing and very well documented migration path.