Skip to content

Instantly share code, notes, and snippets.

@yorzi
Forked from rhelsing/rails_exercise.rb
Last active September 16, 2023 02:29
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save yorzi/2edd05829351df05cb964bd6f21c275b to your computer and use it in GitHub Desktop.
Save yorzi/2edd05829351df05cb964bd6f21c275b to your computer and use it in GitHub Desktop.
Rails Exercise by Andy Wang
# Background:
# The following code works but I have intentionally introduced issues to make it inefficient.
# UserEvents are created for every interaction a user has in the system. It stores what employer, partner and
# controller they were on. When there were 10 users and only a little bit of traffic, this would run quickly.
# But now with 10,000 users and months of traffic - (millions of user events), this code would be very inefficient.
# Instructions:
# 1. Please make this code more efficient through refactoring so that it can run at a greater scale,
# along with adding some comments to improve the readability.
# 2. Please explain what the variable 'distinct_groups' will contain by the end of the execution and what
# this data would be useful for.
# 3. Send the resulting file as a gist, .rb file or zip package via email to ryan@zevobenefits.com
start_range = Date.today.beginning_of_month
end_range = Date.today.end_of_month
users = User.where(role: role)
# Initialize these as sets to avoid duplicates and make lookup more efficient
employers = Set.new
partners = Set.new
controller_resources = Set.new
distinct_groups = {}
# Use find_each to avoid that all users are loaded in memory, it only loads 1000 one time by default.
users.find_each do |user|
# Track the total time for each user
total_time = 0
# Track the last time we saw an event for this user
last_time = nil
# Use the range in the query to avoid loading unnecessary records into memory, this will improve the performance a lot
user_events = user.user_events.where(created_at: start_range..end_range).order(:created_at)
user_events.each do |event|
# Only need to get the user_name once per user
user_name = event.user_name unless user_name
if event.last_known_session.present?
employers << event.last_known_session["employer"]
partners << event.last_known_session["partner"]
end
if event.data["params"].present?
controller_resources << event.data["params"]["controller"]
end
if last_time.nil? || last_time + 10.minutes < event.created_at
last_time = event.created_at
else
total_time += event.created_at - last_time
last_time = event.created_at
end
end
distinct_groups[user_name] = total_time
end
# Note:
# - The variable `distinct_groups` will contain a hash where the keys are
# the user names and the values are the total time spent by that user on
# events within the specified date range. This could be useful for understanding
# user engagement during a month.
@yorzi
Copy link
Author

yorzi commented Sep 16, 2023

Several updates/improving after refactoring the code:

  1. Using AR's where to filter user events, it's on database level and it's faster than filtering items in Ruby.

  2. Using the find_each to fetch users, it loads users in batches to avoid unnecessary memory usage.

  3. Using Set instead of Array for employers, partners, and controller_resources, it avoids duplicated entries and makes later lookup more quicker.

  4. It calculates the total time directly, it's better summing items from the Array in the last step.

  5. If the created_at field in user_events is indexed, the date range query would be more faster.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment