Skip to content

Instantly share code, notes, and snippets.

@mcritchlow
Last active February 15, 2018 16:06
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mcritchlow/63e5b203c2221a620adfebc8e74420a9 to your computer and use it in GitHub Desktop.
Save mcritchlow/63e5b203c2221a620adfebc8e74420a9 to your computer and use it in GitHub Desktop.
Hyrax::Analytics API Ideas
module Hyrax
module Analytics
# @abstract Base class for Analytics services that support statistics needs in Hyrax.
# Implementing subclasses must define `#connection` `#remote_statistics` and `#to_graph`
class Base
# Establish connection with the analytics service
def self.connection
raise NotImplementedError, "#{self.class}#connection is unimplemented."
end
# Used by Hyrax::Statistic to perform live queries against the remote service
# @param start_date [DateTime]
# @param object [ActiveFedora::Base??] probably a better type for this
# @param query_type
#
# return [Enumerable] of objects to cache locally in DB
# TODO: decide best place for query types: pageview, download, returning visitors, new visitors,..
def self.remote_statistics(start_date, object, query_type)
raise NotImplementedError, "#{self.class}#remote_statistics is unimplemented."
end
# OR, define explicit query methods for each query?
# Examples
def self.pageviews(start_date, object)
raise NotImplementedError, "#{self.class}#pageviews is unimplemented."
end
def self.downloads(start_date, object)
raise NotImplementedError, "#{self.class}#downloads is unimplemented."
end
def self.visitors(start_date)
raise NotImplementedError, "#{self.class}#visitors is unimplemented."
end
def self.returning_visitors(start_date)
raise NotImplementedError, "#{self.class}#returning_visitors is unimplemented."
end
end
end
end
require 'oauth2'
require 'signet/oauth_2/client'
module Hyrax
module Analytics
class GoogleAnalytics < Base
def self.connection
return unless config.valid?
# an ||= to setup the GA connection using JSON
end
def self.visitors(start_date)
# yanked as examplar https://github.com/google/google-api-ruby-client/blob/master/samples/cli/lib/samples/analytics.rb
dimensions = %w(ga:date)
metrics = %w(ga:sessions ga:users ga:newUsers ga:percentNewSessions
ga:sessionDuration ga:avgSessionDuration)
sort = %w(ga:date)
result = connection.get_ga_data("ga:#{profile_id}",
start_date,
String(Date.today),
metrics.join(','),
dimensions: dimensions.join(','),
sort: sort.join(','))
# Manipulate `result` to an agreed upon data structure
end
# BOILERPLATE CONFIG/AUTH STUFF BELOW THAT'S INCOMPLETE/TBD
# Loads configuration options from config/analytics.yml. Expected structure:
# `analytics:`
# ` app_name: GOOGLE_OAUTH_APP_NAME`
# ` app_version: GOOGLE_OAUTH_APP_VERSION`
# ` privkey_path: GOOGLE_OAUTH_PRIVATE_KEY_PATH`
# ` privkey_secret: GOOGLE_OAUTH_PRIVATE_KEY_SECRET`
# ` client_email: GOOGLE_OAUTH_CLIENT_EMAIL`
# @return [Config]
def self.config
@config ||= Config.load_from_yaml
end
private_class_method :config
# placeholder as example of existing code
class Config
def self.load_from_yaml
filename = Rails.root.join('config', 'analytics.yml')
yaml = YAML.safe_load(File.read(filename))
unless yaml
Rails.logger.error("Unable to fetch any keys from #{filename}.")
return new({})
end
new yaml.fetch('analytics')
end
REQUIRED_KEYS = %w[app_name app_version privkey_path privkey_secret client_email].freeze
def initialize(config)
@config = config
end
# @return [Boolean] are all the required values present?
def valid?
config_keys = @config.keys
REQUIRED_KEYS.all? { |required| config_keys.include?(required) }
end
REQUIRED_KEYS.each do |key|
class_eval %{ def #{key}; @config.fetch('#{key}'); end }
end
end
end
end
end
require 'piwik'
module Hyrax
module Analytics
class Matomo < Base
def self.connection
return unless config.valid?
# an ||= to setup Matomo
# Piwik::PIWIK_URL = 'http://demo.piwik.org'
# Piwik::PIWIK_TOKEN = 'anonymous'
# site = Piwik::Site.load(config.site)
end
def self.visitors(start_date)
result = Piwik::VisitsSummary.getVisits(idSite: config.site, period: :range, date: "#{start_date},#{Date.today}")
# Manipulate `result` to an agreed upon data structure
end
# BOILERPLATE CONFIG/AUTH STUFF BELOW THAT'S INCOMPLETE/TBD
# Loads configuration options from config/analytics.yml. Expected structure:
# `analytics:`
# ` site: MATOMO_SITE_ID`
# ` url: MATOMO_URL`
# ` token: MATOMO_TOKEN`
# @return [Config]
def self.config
@config ||= Config.load_from_yaml
end
private_class_method :config
# placeholder as example of existing code
class Config
def self.load_from_yaml
filename = Rails.root.join('config', 'analytics.yml')
yaml = YAML.safe_load(File.read(filename))
unless yaml
Rails.logger.error("Unable to fetch any keys from #{filename}.")
return new({})
end
new yaml.fetch('analytics')
end
REQUIRED_KEYS = %w[app_name app_version privkey_path privkey_secret client_email].freeze
def initialize(config)
@config = config
end
# @return [Boolean] are all the required values present?
def valid?
config_keys = @config.keys
REQUIRED_KEYS.all? { |required| config_keys.include?(required) }
end
REQUIRED_KEYS.each do |key|
class_eval %{ def #{key}; @config.fetch('#{key}'); end }
end
end
end
end
end

Hyrax::Statistic

Documentation for the existing Hyrax::Statistic API and subclass implementations. Hopefully this could be used to create feature parity with Matomo and identify the core API needed for abstraction.

Refactoring Ideas

Goal: Basically keep Statistic classes as-is since it builds/save caches AR stats. If we can get the Analytics class and adapter subclasses to handle authentication and query management, then we can expose a consistent API for the Statistics classes to keep using. Kinda like the QA gem.

Ideas for Analytics:

  • We need something like an adapter pattern solution-ish
  • Needs to support handling remote_statistics queries from Statistics class. Data format of results should be consistent so Statistics doesn't need to distinguish between the two
  • Perhaps a ::Base class that's instantiates with a provided adapter (uses GA as default)
  • GA Adapter
    • Auth: Needs to support existing Analytics Config setup for GA authentication except for the Legato User thing
    • Query: Needs to support the Google ruby api gem OR Legato queries depending on direction
  • Matomo Adapter
    • Auth: Needs to support Matomo authentication, which seems a bit lighter-weight than GA
    • Query: Needs to support the matomo ruby API queries

Ideas for Statistics:

  • rename ga_statistics to remote_statistics
  • delegate remote_statistics to Hyrax::Analytics adapter? Or just call directly, since we'll be likely updating all the presenters that depend on this method already..
  • will need to pass parameters that would otherwise be in Hyrax::Download or Hyrax::PageView. Things like metrics, dimensions, etc.
  • cached_stats - stays same, rename "ga" things to be more generic
  • combined_stats - stays same, calls remote_statistics

Current API Expectations

In general, it appears that only the #statistics and #to_flot methods are relied upon by other (frontend) classes in Hyrax. This isn't to say other public methods aren't being used by applications, but this may help in determining what to expose and adhere to as a contract going forward:

  • Hyrax::UserStatImporter calls #statistics on the three subclasses below via #process_files and #process_works which call #extract_stats_for with a from parameter that has the class name as the value.
  • Hyrax::FileUsage#downloads calls FileDownloadStat#statistics
  • Hyrax::FileUsage#pageviews calls FileViewStat#statistics
  • Hyrax::WorkUsage#pageviews calls WorkViewStat#statistics
  • Hyrax::StatsUsagePresenter#to_flots calls #to_flot on the relevant Hyrax::Statistic class

Hyrax::Statistic (abstract class)

Name Property/Method Parameters Value/Returns Notes
cache_column Class attribute N/A N/A Implemented by subclasses
event_type Class attribute N/A N/A Implemented by subclasses
statistics_for method object where(filter(object)) Calls filter implementation of subclasses
build_for method object,attrs new attrs.merge(filter(object)) Calls filter implementation of subclasses
convert_date method date_time date_time.to_datetime.to_i * 1000 None
statistics method object, start_date, user_id=nil Calls combined_stats by attaching cache_column and event_type to parameters None
ga_statistics method start_date, file profile.hyrax__pageview(sort: 'date', start_date: start_date).for_path(path } Base implementation used by WorkViewStat and PageViewStat subclasses. Depends on Hyrax::Analytics.profile and Hyrax::Pageview for filter :for_path implementation. Hyrax::PageView extends ::Legato::Model
cached_stats method (private) object, start_date, _method { ga_start_date: ga_start_date, cached_stats: stats.to_a } Called by combined_stats to query local DB
combined_stats method (private) object, start_date, object_method, ga_key, user_id=nil A stats Hash with the combined cached and queried GA stats Calls cached_stats, ga_statistics and build_for
to_flot method none [self.class.convert_date(date), send(cache_column)] Called by StatsUsagePresenter and directly by its subclasses WorkUsage and FileUsage

FileDownloadStat

Name Property/Method Parameters Value/Returns Notes
cache_column Class attribute N/A :downloads N/A
event_type Class attribute N/A :totalEvents N/A
filter method file { file_id: file.id } Called by parent class
ga_statistics method start_date, file profile.hyrax__download(sort: 'date', start_date: start_date, end_date: Date.yesterday).for_file(file.id } Depends on Hyrax::Analytics.profile and Hyrax::Download for filter :for_file implementation. Hyrax::Download extends ::Legato::Model

WorkViewStat

Name Property/Method Parameters Value/Returns Notes
cache_column Class attribute N/A :work_views N/A
event_type Class attribute N/A :pageviews N/A
filter method file { work_id: work.id } Called by parent class
ga_statistics method start_date, file N/A Uses parent implementation.

FileViewStat

Name Property/Method Parameters Value/Returns Notes
cache_column Class attribute N/A :views N/A
event_type Class attribute N/A :pageviews N/A
filter method file { file_id: file.id } Called by parent class
ga_statistics method start_date, file N/A Uses parent implementation
@nestorw
Copy link

nestorw commented Feb 15, 2018

Thank you for this!
I think one method for each query is the way to go. They can be called as needed and will keep the subclasses a bit more organized.

Now that I think about it, we might need both actually for backwards compatibility and determined by Flipflop

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment