Skip to content

Instantly share code, notes, and snippets.

@renancarvalhoo
Last active November 22, 2023 18:02
Show Gist options
  • Save renancarvalhoo/2f143665d881df74da5d1d737daf1953 to your computer and use it in GitHub Desktop.
Save renancarvalhoo/2f143665d881df74da5d1d737daf1953 to your computer and use it in GitHub Desktop.
First of all, I'd like to thank you for the technical test, it was something I really enjoyed and it refreshed my memory of some architectural things
-------------------------------------------------
# Staff Engineer coding assignment
# Brief
Acme Corp. has an API service that allows registered users to fetch any information about the company and its wide array of products. It quickly became a very popular service and, to provide fair service to all users, a 10,000 API requests per-user monthly limit was added. Not long after the change, users started to report that their requests were very slow or not returning at all.
Acme decided to ask the community for help. A task list was provided with the top issues affecting their API. They've also released parts of the code base they've identified as being problematic.
## Code
```ruby
class User < ApplicationRecord
has_many :hits
def count_hits
start = Time.now.beginning_of_month
hits = hits.where('created_at > ?', start).count
return hits
end
end
```
```ruby
class ApplicationController < ActionController::API
before_filter :user_quota
def user_quota
render json: { error: 'over quota' } if current_user.count_hits >= 10000
end
end
```
```sql
Table "public.hits"
Column | Type | Collation | Nullable | Default
------------+-----------------------------+-----------+----------+---------
id | bigint | | not null |
user_id | integer | | not null |
endpoint | character varying | | not null |
created_at | timestamp without time zone | | not null |
Indexes:
"hits_pkey" PRIMARY KEY, btree (id)
"index_hits_on_user_id" btree (user_id)
```
# Tasks
## #1
Requests to Acme's API are timing out after 15 seconds. After some investigation in production, they've identified that the `User#count_hits` method's response time is around 500ms and is called 50 times per second (on average).
Solution:
Rate limiting is an interesting problem. If we were to consider the ActiveRecord query, I see two mistakes. First, we have an open-ended date range. It would be more accurate to have an ended_at to always determine the end of the month. Second, an index on created_at would improve performance.
Another technique is the reverse token method. Given the high volume of parallel requests, we could create a table with a set number of 10,000 tokens. Each time a request is received, one token is removed. This approach addresses the over-quota issue and also the timezone problem, as there's no need to constantly check dates, provided we have a reset counter at the start of each month.
Now, in an ideal world, instead of querying the database for each request to count the hits, we can store and update this count in Redis. This will significantly reduce the response time, as operations in Redis are much faster than database queries.
```
class RateLimiter
def initialize(user)
@user = user
@redis_key = "user:#{user.id}:hits_count"
initialize_hits_count
end
def decrement
$redis.decr(@redis_key)
end
def count
$redis.get(@redis_key).to_i
end
def reset_count
$redis.set(@redis_key, 10000)
end
def over_quota?
count <= 0
end
private
def initialize_hits_count
$redis.setnx(@redis_key, 10000)
end
end
```
```
class ApplicationController < ActionController::API
before_action :check_user_quota
private
def check_user_quota
rate_limiter = RateLimiter.new(current_user)
if rate_limiter.over_quota?
render json: { error: 'over quota' }, status: :forbidden
else
rate_limiter.decrement
end
end
end
```
To reset the count at the beginning of each month, you can create a Rake task or a job that runs daily to check and reset the count
```
namespace :rate_limit do
desc "Reset monthly rate limit"
task reset: :environment do
User.find_each do |user|
RateLimiter.new(user).reset_count
end
end
end
```
So I'd need to understand the business a bit more, because the hits table would only be used for reports or logs
### Objective
Make changes to the code provided so that the API response time is back to acceptable levels and users no longer see timeouts. Feel free to use additional gems, tools, methods, or techniques.
## #2
Users in Australia are complaining they still get some “over quota” errors from the API after their quota resets at the beginning of the month and after a few hours it resolves itself. A user provided the following access logs:
```
timestamp | url | response
2022-10-31T23:58:01+10:00 | https://acme.corp/api/g6az | { error: 'over quota' }
2022-10-31T23:59:17+10:00 | https://acme.corp/api/g6az | { error: 'over quota' }
2022-11-01T01:12:54+10:00 | https://acme.corp/api/g6az | { error: 'over quota' }
2022-11-01T01:43:11+10:00 | https://acme.corp/api/g6az | { error: 'over quota' }
2022-11-01T16:05:20+10:00 | https://acme.corp/api/g6az | { data: [{ ... }] }
2022-11-01T16:43:39+10:00 | https://acme.corp/api/g6az | { data: [{ ... }] }
```
Solution:
The Redis solution partially addresses the issue, but if we were to consider an implementation using ActiveRecord, firstly, Time.now does not take into account any timezone of the application. Ideally, Time.current should be used to consider the application's timezone. If handling timezones directly for each client, creating a zone field in the user table to determine each user's timezone would be beneficial. This allows for specific treatments for each user.
With a timezone defined for each user, and if we wanted to reset the counter at the beginning of each month, we could do something like this:
namespace :rate_limit do
desc "Reset monthly rate limit for users at the start of the month in their timezone"
task reset: :environment do
User.find_each do |user|
current_time_user_tz = Time.now.in_time_zone(user.time_zone)
if current_time_user_tz.hour == 0 && current_time_user_tz.min == 0 && current_time_user_tz.day == 1
RateLimiter.new(user).reset_count
end
end
end
end
### Objective
Describe the root cause of the issue and provide a fix for it.
## #3
Acme identified that some users have been able to make API requests over the monthly limit.
### Objective
Describe how can that be possible and provide a fix for it.
Reason:
Concurrency Issues: If multiple requests are processed simultaneously, there might be a race condition where the check for the limit occurs before the count is updated, allowing users to exceed their quota.
Delayed Counter Updates: If the system updates the hit counter asynchronously or with a delay, users might be able to make extra requests before the counter reflects the true number of requests.
Timezone Discrepancies: If the system resets the monthly limit based on a single timezone, users in different timezones might be able to make requests after their local month has ended but before the system has reset their count.
Solution:
Redis addresses this issue effectively, as we are decrementing the counter, and once it reaches zero, it will consistently return 'over quota.' Additionally, Redis is thread-safe, meaning its operations are atomic. This ensures that even with multiple concurrent requests, the count is accurately maintained. Thus, we would only need to manage the reset of the counters very well
## #4
Acme's development team has reported working with the code base is difficult due to accumulated technical debt and bad coding practices. They've asked the community to help them refactor the code so it's clean, readable, maintainable, and well-tested.
### Objective
Make changes to the code so it matches Acme's new code quality standards.
Solution:
The introduction of the new RateLimiter service significantly enhances the code quality of Acme's application by encapsulating the rate limiting logic into a dedicated, well-defined service. This approach aligns with Acme's new code quality standards in several key ways:
Modularity and Single Responsibility: The RateLimiter service is a distinct module with a single responsibility - managing the rate limiting functionality. This separation of concerns makes the codebase more organized and easier to maintain.
Readability and Clarity: By isolating the rate limiting logic, the RateLimiter service simplifies the overall code structure. Its clear and focused purpose makes it easier for developers to understand and work with the code.
Maintainability: With a dedicated service, any modifications or enhancements related to rate limiting can be made in one place, without affecting other parts of the codebase. This centralized approach reduces the risk of introducing bugs and makes the system more maintainable.
Testability: The RateLimiter service can be independently tested, ensuring that the rate limiting functionality works as expected. This contributes to a more robust and reliable application, as it allows for thorough testing of critical components.
Scalability and Performance: Utilizing Redis for handling the rate limiting counters not only improves performance but also ensures scalability. As Redis operations are atomic and thread-safe, it can handle high concurrency, which is essential for Acme's growing user base
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment