Skip to content

Instantly share code, notes, and snippets.

@awsmsrc
Last active April 15, 2017 14:02
Show Gist options
  • Save awsmsrc/4dbb8b386e12738d549161b7ada7013a to your computer and use it in GitHub Desktop.
Save awsmsrc/4dbb8b386e12738d549161b7ada7013a to your computer and use it in GitHub Desktop.

1) The following query is really slow. What could be the reason? Propose some possible solutions. How to speed it up? Find all relations between tables base on the example code. If it requires, please rewrite the method.

def first_tag_history(tag, asin, marketplace_id)
  TagHistory.joins(product_tag: { product: :account }).where("
    product_tags.tag = :tag
    AND tag_histories.created_at >= CURDATE()
    AND products.asin = :asin
    AND accounts.marketplace_id = :marketplace
  ", {
    tag: tag,
    asin: asin,
    marketplace_id: marketplace_id
  }).first
end

The first steps would be to check if all the relationship indexes were set up.

When using first, Rails adds ORDER BY id ASC which means the database has to load all the data and order. Using this implicit ORDER may be intentional but is also probably a bad idea.

Using take instead of first does not add the ORDER clause and the database will stop executing as soon as it finds a valid result

created_at >= CURDATE() gets TagHistories from today into the future, is this intentional?

As a note, you could change the large string for Rails/arel style hashes which I personally prefer.

TagHistory is ambiguous without the class definition
ProductTag belongs_to Product
Product belongs_to Account

2) What's wrong with the following scope? Propose better solution. If it's possible, do it simpler.

class User < ActiveRecord::Base
  scope :find_by_age, ->(age) { where("users.age = #{age}") }
end

This is open to SQL injection and type mismatches. The following solution is safe from SQL injection

class User < ActiveRecord::Base
  scope :find_by_age, ->(age) { where("users.age = ?", age) }
end

This solution is nicer still

class User < ActiveRecord::Base
  scope :find_by_age, ->(age) { where(age:age) }
end

A further note I would add would be that in Rails find_by usually returns only one result. I would change the name of this scope to something like with_age and would even question the benefit of a scope at all as the where(age:age) is the roughly the same number of characters and reads fairly well.

3) Is the following code accurate? Analyze all aspects of the database and the good code quality practises. Describe the tables relations base on the method. Improve it if you think it's required.

def get_tags_from(country)
  ActiveRecord::Base.connection_pool.with_connection do |connection|
    tags = connection.exec_query("
      SELECT DISTINCT pk.tag
      FROM product_tags pk
        LEFT JOIN products p ON p.id = pk.product_id
        LEFT JOIN accounts a ON a.id = p.account_id
        LEFT JOIN marketplaces m ON m.id = a.marketplace_id
      WHERE pk.volume IS NULL
        AND m.keyword_tool_volume_country = #{ActiveRecord::Base.sanitize(country)}
    ").to_hash
  end
end

With regard to code quality practices, using ActiveRecord::Base.connection_pool.with_connection is in most cases a bad idea. It would be far better to use the ActiveRecord abstractions. The reasons for this are as wide ranging as mass developer comprehension, the ability to change database drivers and the testability and ability to assert the correct arguments were passed.

With regard to database performance the LEFT JOIN should be an INNER JOIN as it would only return product_tags that actually have an account, market place and product. This should be safe as all associated records are necessary to satisfy the where clause

A better example of the preceding code would be something like the following.

def get_tags_from(country)
    ProductTag.joins(
    products: { account: :marketplace }
  ).where(
    volume: nil, 
    marketplaces: {keyword_tool_volume_country: country}
  ).pluck("DISTINCT product_tags.tag")
end

I would infer that the associations would look like this

ProductTag belongs_to :product
Product belongs_to :account
Product has_many :product_tags
Account has_many :products
Account belongs_to :marketplace 

4)

a) What's the different between .joins() and .includes()?

joins is an INNER JOIN and includes is a LEFT JOIN, includes returns records even if relation doesn’t exist, e.g. if an author has no books yet, the author still returned

b) product_tags.tag is a string type column. What kind of consequences of the following solution you can find? How to improve the following code to avoid some negative consequences?

ProductTag.where(tag: params[:tag]).all.each do |product_tag|
  puts product_tag.product.name
end

A string column is better then a text column as it is stored as a varchar in MySQL which can be indexed, which makes lookups by tag more efficient. One negative consequance is that the tag size is limited but that is probably OK.

The above code loads all the product tags and then runs another query for each tag. This is very memory and CPU innefficient. An improved version of the above code would look something like the following

Product.joins(:product_tags).where(product_tag: { tag: tag }).pluck(:name).each{ |name| puts name }

ProductTag could be a join table between Product and Tag, you could share a discrete set of tags amongst the products and enforce uniquenes via id indexes. This also allows to globally change tag values, this may or may not be desirable depending on the domain

tag.products.pluck(:name).each {|name| puts name}

c) What does EXPLAIN do in MySQL?

EXPLAIN takes an SQL query as an argument and outputs the execution plan that MySQL will use which includes the select_type, keys and indexes which will be used to find the results.

d) What's the difference between OPTIMIZE and ANALYZE in MySQL?

OPTIMIZE TABLE Analyzes table, stores the key distribution for a table, reclaims the unused space and defragments the data file. ANALYZE TABLE Only analyzes table and stores the key distribution

e) How to recognize slow queries in MySQL?

If you have access to the server the mysql slow query log is the first port of call to identify slow queries.

Tooling such as new relic and data dog integrates directly with Rails and you can see slow queries and controller actions in there, this also helps diagnose when the database itself is not the problem.

f) What's sharding and when developers can use it?

Sharding is the process of splitting a database across several nodes. Developers should only use Sharding whenever necessary as it adds an entirely new class of problems. It is almost always better to pay for faster/better hardware in the early stages of scaling. There are several strategies available depending on the type of data with in the database. The simplest strategy is to have different tables on different databases. More useful is to split a table across multiple databases, if you never have to join data between customers you can have different customers on different nodes

g) What's the difference between a master and a slave machine?

With regard to databases the Master machine is used to write new data and is the “source of truth for the database”. The Slave machine is a read only replica of the master.

This provides two main benefits.

The first of which is that you can distribute reads between the nodes (or direct all reads to the slave) freeing up resources on the master for writing data. This is often a simpler solution then sharding and is particulalry effective for read heavy workloads however it is also beneficial for writes as the master doesn't have to waste CPU and IO responding to read requests.

It also adds some redundancy to the system. If the master node fails you can promote the slave.

Read Replicas are often hydrated asynchronously and therefore there is a small amount of latency in the replicas getting the up to date data, this is often miniscule and does not effect the system.

h) What're the gems you can use to scale from a single database to the master and slaves?

Octopus, DbCharmer, DataFabric, MultiDb

i) How to dump MySQL database to sql file?

mysqldump [options] > dump.sql

j) Let's say you have a username and password and you can log into the super.cashcowpro.com server. What do you need to do to start logging without password?

I'm not sure I fully understand the above question. If server means a rails application then I believe the following satisifies the question.

Rails.application.config.filter_parameters += [:password, :password_confirmation]

5) Please refactor and create some rspec test cases for the following code

With regard to refactoring code such as this, before extracting objects I normally want to know how the code is going to be used in a wider context. For something this simple, if there was no performance issues or bugs in the following code I would question refactoring at all, without further information.

I identified the following issues with the code

  • Not very readable from the outermost scope
  • Large if statement
  • Set up for multiple marketplaces but only one in use
  • N/A is used for multiple issues (invalid zero size, sizes that don't fit into the brackets, and invalid marketplaces)
  • Large function, in ruby objects are preferred

I have been working with Go lately and enjoyed extracting a more classical OO approach. If this was a desirable approach within the company, further extractions would be to extract a Marketplace type and specify which size tier classes are used for each market place. If extra marketplaces were not going to be used, I might just remove entirely until they are required and we have context of the scope of the problem. If the SizeTier classes were used in other contexts they could now be unit tested individually.

I also found an issue during testing whereby a "large" item with a low weight would end up with an "N/A" size tier, I didn't fix this as I didn't know if that was the desired effect.

class FBA
  def compute_size_tier(length, width, height, weight, marketplace)
    case marketplace
    when 'ATVPDKIKX0DER'
      SizeTier.find(width, height, length, weight).name
    else
      'N/A'
    end
  end
end

class SizeTier
  def initialize(dim1, dim2, dim3, weight)
    @length, @width, @height = [dim1, dim2, dim3].map(&:to_f).sort.reverse
    @weight = weight
  end

  def self.find(dim1, dim2, dim3, weight)
    [Na, SmlStand, LrgStand, SmlOver, MedOver, LrgOver, SplOver].map do |klass| 
      klass.new(dim1, dim2, dim3, weight) 
    end.find(&:valid?) || Na.new(dim1, dim2, dim3, weight)
  end

  def valid?
    raise NotImplemented
  end

  def name
    raise NotImplemented
  end

  private

  attr_reader :length, :width, :height, :weight

  def girth(width, height)
    (width + height) * 2
  end
end

class Na < SizeTier
  def valid?
    [length, width, height, weight].all?(&:zero?)
  end

  def name
    'N/A'
  end
end

class SmlStand < SizeTier
  def valid?
    length <= 15 && width <= 12 && height <= 0.75 && weight <= 14
  end

  def name
    'SML_STAND'
  end
end

class LrgStand < SizeTier
  def valid?
    length <= 18 && width <= 14 && height <= 8 && weight <= 20
  end

  def name
    'LRG_STAND'
  end
end

class SmlOver < SizeTier
  def valid?
    length <= 60 && width <= 30 && girth(width, height) + length <= 130 && weight <= 70
  end

  def name
    'SML_OVER'
  end
end

class MedOver < SizeTier
  def valid?
    length <= 108 && girth(width, height) + length <= 130 && weight <= 150
  end

  def name
    'MED_OVER'
  end
end

class LrgOver < SizeTier
  def valid?
    length <= 108 && girth(width, height) + length  <= 165 && weight <= 150
  end

  def name
    'LRG_OVER'
  end
end

class SplOver < SizeTier
  def valid?
    length > 108 && girth(width, height) + length  > 165 && weight > 150
  end

  def name
    'SPL_OVER'
  end
end

require 'rspec'

RSpec.describe FBA do
  subject(:instance) { described_class.new }

  describe '#compute_size_tier' do
    subject do
      instance.compute_size_tier(length, width, height, weight, marketplace)
    end

    context 'with ATVPDKIKX0DER marketplace' do
      let(:marketplace) { 'ATVPDKIKX0DER' }

      context 'SML_STAND' do
        let(:length) { 15 }
        let(:width) { 12 }
        let(:height) { 0.75 }
        let(:weight) { 14 }

        it { is_expected.to eql('SML_STAND') }
      end

      context 'LRG_STAND' do
        let(:length) { 18 }
        let(:width) { 12 }
        let(:height) { 0.75 }
        let(:weight) { 14 }

        it { is_expected.to eql('LRG_STAND') }
      end

      context 'SML_OVER' do
        let(:length) { 30 }
        let(:width) { 30 }
        let(:height) { 2.75 }
        let(:weight) { 14 }

        it { is_expected.to eql('SML_OVER') }
      end

      context 'MED_OVER' do
        let(:length) { 80 }
        let(:width) { 5 }
        let(:height) { 5 }
        let(:weight) { 14 }

        it { is_expected.to eql('MED_OVER') }
      end

      context 'LRG_OVER' do
        let(:length) { 100 }
        let(:width) { 10 }
        let(:height) { 10 }
        let(:weight) { 14 }

        it { is_expected.to eql('LRG_OVER') }
      end

      context 'SPL_OVER' do
        let(:length) { 200 }
        let(:width) { 200 }
        let(:height) { 200 }
        let(:weight) { 160 }

        it { is_expected.to eql('SPL_OVER') }
      end

      context 'all zeros' do
        let(:length) { 0 }
        let(:width) { 0 }
        let(:height) { 0 }
        let(:weight) { 0 }

        it { is_expected.to eql('N/A') }
      end
    end

    context 'invalid marketplace' do
      let(:marketplace) { 'INVALID' }
      let(:length) { 15 }
      let(:width) { 12 }
      let(:height) { 0.75 }
      let(:weight) { 14 }
      it { is_expected.to eql('N/A') }
    end
  end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment