renancarvalhoo/staffengineer

## staffengineer
First of all, I'd like to thank you for the technical test, it was something I really enjoyed and it refreshed my memory of some architectural things


-------------------------------------------------
# Staff Engineer coding assignment

# Brief

Acme Corp. has an API service that allows registered users to fetch any information about the company and its wide array of products. It quickly became a very popular service and, to provide fair service to all users, a 10,000 API requests per-user monthly limit was added. Not long after the change, users started to report that their requests were very slow or not returning at all.

Acme decided to ask the community for help. A task list was provided with the top issues affecting their API. They've also released parts of the code base they've identified as being problematic.

## Code

```ruby
class User < ApplicationRecord
	has_many :hits

	def count_hits
		start = Time.now.beginning_of_month
		hits = hits.where('created_at > ?', start).count
		return hits
  end
end
```

```ruby
class ApplicationController < ActionController::API
	before_filter :user_quota

	def user_quota
		render json: { error: 'over quota' } if current_user.count_hits >= 10000
  end
end
```

```sql
Table "public.hits"
   Column   |            Type             | Collation | Nullable | Default
------------+-----------------------------+-----------+----------+---------
 id         | bigint                      |           | not null |
 user_id    | integer                     |           | not null |
 endpoint   | character varying           |           | not null |
 created_at | timestamp without time zone |           | not null |
Indexes:
    "hits_pkey" PRIMARY KEY, btree (id)
    "index_hits_on_user_id" btree (user_id)
```

# Tasks

## #1

Requests to Acme's API are timing out after 15 seconds. After some investigation in production, they've identified that the `User#count_hits` method's response time is around 500ms and is called 50 times per second (on average).

Solution:

Rate limiting is an interesting problem. If we were to consider the ActiveRecord query, I see two mistakes. First, we have an open-ended date range. It would be more accurate to have an ended_at to always determine the end of the month. Second, an index on created_at would improve performance.

Another technique is the reverse token method. Given the high volume of parallel requests, we could create a table with a set number of 10,000 tokens. Each time a request is received, one token is removed. This approach addresses the over-quota issue and also the timezone problem, as there's no need to constantly check dates, provided we have a reset counter at the start of each month.

Now, in an ideal world, instead of querying the database for each request to count the hits, we can store and update this count in Redis. This will significantly reduce the response time, as operations in Redis are much faster than database queries.

```
class RateLimiter
  def initialize(user)
    @user = user
    @redis_key = "user:#{user.id}:hits_count"
    initialize_hits_count
  end

  def decrement
    $redis.decr(@redis_key)
  end

  def count
    $redis.get(@redis_key).to_i
  end

  def reset_count
    $redis.set(@redis_key, 10000)
  end

  def over_quota?
    count <= 0
  end

  private

  def initialize_hits_count
    $redis.setnx(@redis_key, 10000)
  end
end
```

```
class ApplicationController < ActionController::API
  before_action :check_user_quota

  private

  def check_user_quota
    rate_limiter = RateLimiter.new(current_user)
    if rate_limiter.over_quota?
      render json: { error: 'over quota' }, status: :forbidden
    else
      rate_limiter.decrement
    end
  end
end
```

To reset the count at the beginning of each month, you can create a Rake task or a job that runs daily to check and reset the count

```
namespace :rate_limit do
  desc "Reset monthly rate limit"
  task reset: :environment do
    User.find_each do |user|
      RateLimiter.new(user).reset_count
    end
  end
end
```

So I'd need to understand the business a bit more, because the hits table would only be used for reports or logs

### Objective

Make changes to the code provided so that the API response time is back to acceptable levels and users no longer see timeouts. Feel free to use additional gems, tools, methods, or techniques.

## #2

Users in Australia are complaining they still get some “over quota” errors from the API after their quota resets at the beginning of the month and after a few hours it resolves itself. A user provided the following access logs:

```
timestamp                 | url                        | response
2022-10-31T23:58:01+10:00 | https://acme.corp/api/g6az | { error: 'over quota' }
2022-10-31T23:59:17+10:00 | https://acme.corp/api/g6az | { error: 'over quota' }
2022-11-01T01:12:54+10:00 | https://acme.corp/api/g6az | { error: 'over quota' }
2022-11-01T01:43:11+10:00 | https://acme.corp/api/g6az | { error: 'over quota' }
2022-11-01T16:05:20+10:00 | https://acme.corp/api/g6az | { data: [{ ... }] }
2022-11-01T16:43:39+10:00 | https://acme.corp/api/g6az | { data: [{ ... }] }
```

Solution:

The Redis solution partially addresses the issue, but if we were to consider an implementation using ActiveRecord, firstly, Time.now does not take into account any timezone of the application. Ideally, Time.current should be used to consider the application's timezone. If handling timezones directly for each client, creating a zone field in the user table to determine each user's timezone would be beneficial. This allows for specific treatments for each user.


With a timezone defined for each user, and if we wanted to reset the counter at the beginning of each month, we could do something like this:

namespace :rate_limit do
  desc "Reset monthly rate limit for users at the start of the month in their timezone"
  task reset: :environment do
    User.find_each do |user|
      current_time_user_tz = Time.now.in_time_zone(user.time_zone)
      if current_time_user_tz.hour == 0 && current_time_user_tz.min == 0 && current_time_user_tz.day == 1
        RateLimiter.new(user).reset_count
      end
    end
  end
end

### Objective

Describe the root cause of the issue and provide a fix for it.

## #3

Acme identified that some users have been able to make API requests over the monthly limit.


### Objective

Describe how can that be possible and provide a fix for it.

Reason:

Concurrency Issues: If multiple requests are processed simultaneously, there might be a race condition where the check for the limit occurs before the count is updated, allowing users to exceed their quota.

Delayed Counter Updates: If the system updates the hit counter asynchronously or with a delay, users might be able to make extra requests before the counter reflects the true number of requests.

Timezone Discrepancies: If the system resets the monthly limit based on a single timezone, users in different timezones might be able to make requests after their local month has ended but before the system has reset their count.

Solution:

Redis addresses this issue effectively, as we are decrementing the counter, and once it reaches zero, it will consistently return 'over quota.' Additionally, Redis is thread-safe, meaning its operations are atomic. This ensures that even with multiple concurrent requests, the count is accurately maintained. Thus, we would only need to manage the reset of the counters very well

## #4

Acme's development team has reported working with the code base is difficult due to accumulated technical debt and bad coding practices. They've asked the community to help them refactor the code so it's clean, readable, maintainable, and well-tested.

### Objective

Make changes to the code so it matches Acme's new code quality standards.

Solution:

The introduction of the new RateLimiter service significantly enhances the code quality of Acme's application by encapsulating the rate limiting logic into a dedicated, well-defined service. This approach aligns with Acme's new code quality standards in several key ways:

Modularity and Single Responsibility: The RateLimiter service is a distinct module with a single responsibility - managing the rate limiting functionality. This separation of concerns makes the codebase more organized and easier to maintain.

Readability and Clarity: By isolating the rate limiting logic, the RateLimiter service simplifies the overall code structure. Its clear and focused purpose makes it easier for developers to understand and work with the code.

Maintainability: With a dedicated service, any modifications or enhancements related to rate limiting can be made in one place, without affecting other parts of the codebase. This centralized approach reduces the risk of introducing bugs and makes the system more maintainable.

Testability: The RateLimiter service can be independently tested, ensuring that the rate limiting functionality works as expected. This contributes to a more robust and reliable application, as it allows for thorough testing of critical components.

Scalability and Performance: Utilizing Redis for handling the rate limiting counters not only improves performance but also ensures scalability. As Redis operations are atomic and thread-safe, it can handle high concurrency, which is essential for Acme's growing user base
	First of all, I'd like to thank you for the technical test, it was something I really enjoyed and it refreshed my memory of some architectural things


	-------------------------------------------------
	# Staff Engineer coding assignment

	# Brief

	Acme Corp. has an API service that allows registered users to fetch any information about the company and its wide array of products. It quickly became a very popular service and, to provide fair service to all users, a 10,000 API requests per-user monthly limit was added. Not long after the change, users started to report that their requests were very slow or not returning at all.

	Acme decided to ask the community for help. A task list was provided with the top issues affecting their API. They've also released parts of the code base they've identified as being problematic.

	## Code

	```ruby
	class User < ApplicationRecord
	has_many :hits

	def count_hits
	start = Time.now.beginning_of_month
	hits = hits.where('created_at > ?', start).count
	return hits
	end
	end
	```

	```ruby
	class ApplicationController < ActionController::API
	before_filter :user_quota

	def user_quota
	render json: { error: 'over quota' } if current_user.count_hits >= 10000
	end
	end
	```

	```sql
	Table "public.hits"
	Column \| Type \| Collation \| Nullable \| Default
	------------+-----------------------------+-----------+----------+---------
	id \| bigint \| \| not null \|
	user_id \| integer \| \| not null \|
	endpoint \| character varying \| \| not null \|
	created_at \| timestamp without time zone \| \| not null \|
	Indexes:
	"hits_pkey" PRIMARY KEY, btree (id)
	"index_hits_on_user_id" btree (user_id)
	```

	# Tasks

	## #1

	Requests to Acme's API are timing out after 15 seconds. After some investigation in production, they've identified that the `User#count_hits` method's response time is around 500ms and is called 50 times per second (on average).

	Solution:

	Rate limiting is an interesting problem. If we were to consider the ActiveRecord query, I see two mistakes. First, we have an open-ended date range. It would be more accurate to have an ended_at to always determine the end of the month. Second, an index on created_at would improve performance.

	Another technique is the reverse token method. Given the high volume of parallel requests, we could create a table with a set number of 10,000 tokens. Each time a request is received, one token is removed. This approach addresses the over-quota issue and also the timezone problem, as there's no need to constantly check dates, provided we have a reset counter at the start of each month.

	Now, in an ideal world, instead of querying the database for each request to count the hits, we can store and update this count in Redis. This will significantly reduce the response time, as operations in Redis are much faster than database queries.

	```
	class RateLimiter
	def initialize(user)
	@user = user
	@redis_key = "user:#{user.id}:hits_count"
	initialize_hits_count
	end

	def decrement
	$redis.decr(@redis_key)
	end

	def count
	$redis.get(@redis_key).to_i
	end

	def reset_count
	$redis.set(@redis_key, 10000)
	end

	def over_quota?
	count <= 0
	end

	private

	def initialize_hits_count
	$redis.setnx(@redis_key, 10000)
	end
	end
	```

	```
	class ApplicationController < ActionController::API
	before_action :check_user_quota

	private

	def check_user_quota
	rate_limiter = RateLimiter.new(current_user)
	if rate_limiter.over_quota?
	render json: { error: 'over quota' }, status: :forbidden
	else
	rate_limiter.decrement
	end
	end
	end
	```

	To reset the count at the beginning of each month, you can create a Rake task or a job that runs daily to check and reset the count

	```
	namespace :rate_limit do
	desc "Reset monthly rate limit"
	task reset: :environment do
	User.find_each do \|user\|
	RateLimiter.new(user).reset_count
	end
	end
	end
	```

	So I'd need to understand the business a bit more, because the hits table would only be used for reports or logs

	### Objective

	Make changes to the code provided so that the API response time is back to acceptable levels and users no longer see timeouts. Feel free to use additional gems, tools, methods, or techniques.

	## #2

	Users in Australia are complaining they still get some “over quota” errors from the API after their quota resets at the beginning of the month and after a few hours it resolves itself. A user provided the following access logs:

	```
	timestamp \| url \| response
	2022-10-31T23:58:01+10:00 \| https://acme.corp/api/g6az \| { error: 'over quota' }
	2022-10-31T23:59:17+10:00 \| https://acme.corp/api/g6az \| { error: 'over quota' }
	2022-11-01T01:12:54+10:00 \| https://acme.corp/api/g6az \| { error: 'over quota' }
	2022-11-01T01:43:11+10:00 \| https://acme.corp/api/g6az \| { error: 'over quota' }
	2022-11-01T16:05:20+10:00 \| https://acme.corp/api/g6az \| { data: [{ ... }] }
	2022-11-01T16:43:39+10:00 \| https://acme.corp/api/g6az \| { data: [{ ... }] }
	```

	Solution:

	The Redis solution partially addresses the issue, but if we were to consider an implementation using ActiveRecord, firstly, Time.now does not take into account any timezone of the application. Ideally, Time.current should be used to consider the application's timezone. If handling timezones directly for each client, creating a zone field in the user table to determine each user's timezone would be beneficial. This allows for specific treatments for each user.


	With a timezone defined for each user, and if we wanted to reset the counter at the beginning of each month, we could do something like this:

	namespace :rate_limit do
	desc "Reset monthly rate limit for users at the start of the month in their timezone"
	task reset: :environment do
	User.find_each do \|user\|
	current_time_user_tz = Time.now.in_time_zone(user.time_zone)
	if current_time_user_tz.hour == 0 && current_time_user_tz.min == 0 && current_time_user_tz.day == 1
	RateLimiter.new(user).reset_count
	end
	end
	end
	end

	### Objective

	Describe the root cause of the issue and provide a fix for it.

	## #3

	Acme identified that some users have been able to make API requests over the monthly limit.


	### Objective

	Describe how can that be possible and provide a fix for it.

	Reason:

	Concurrency Issues: If multiple requests are processed simultaneously, there might be a race condition where the check for the limit occurs before the count is updated, allowing users to exceed their quota.

	Delayed Counter Updates: If the system updates the hit counter asynchronously or with a delay, users might be able to make extra requests before the counter reflects the true number of requests.

	Timezone Discrepancies: If the system resets the monthly limit based on a single timezone, users in different timezones might be able to make requests after their local month has ended but before the system has reset their count.

	Solution:

	Redis addresses this issue effectively, as we are decrementing the counter, and once it reaches zero, it will consistently return 'over quota.' Additionally, Redis is thread-safe, meaning its operations are atomic. This ensures that even with multiple concurrent requests, the count is accurately maintained. Thus, we would only need to manage the reset of the counters very well

	## #4

	Acme's development team has reported working with the code base is difficult due to accumulated technical debt and bad coding practices. They've asked the community to help them refactor the code so it's clean, readable, maintainable, and well-tested.

	### Objective

	Make changes to the code so it matches Acme's new code quality standards.

	Solution:

	The introduction of the new RateLimiter service significantly enhances the code quality of Acme's application by encapsulating the rate limiting logic into a dedicated, well-defined service. This approach aligns with Acme's new code quality standards in several key ways:

	Modularity and Single Responsibility: The RateLimiter service is a distinct module with a single responsibility - managing the rate limiting functionality. This separation of concerns makes the codebase more organized and easier to maintain.

	Readability and Clarity: By isolating the rate limiting logic, the RateLimiter service simplifies the overall code structure. Its clear and focused purpose makes it easier for developers to understand and work with the code.

	Maintainability: With a dedicated service, any modifications or enhancements related to rate limiting can be made in one place, without affecting other parts of the codebase. This centralized approach reduces the risk of introducing bugs and makes the system more maintainable.

	Testability: The RateLimiter service can be independently tested, ensuring that the rate limiting functionality works as expected. This contributes to a more robust and reliable application, as it allows for thorough testing of critical components.

	Scalability and Performance: Utilizing Redis for handling the rate limiting counters not only improves performance but also ensures scalability. As Redis operations are atomic and thread-safe, it can handle high concurrency, which is essential for Acme's growing user base