Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save xiayank/cce61b4a01a56ff3d913df549c2cb643 to your computer and use it in GitHub Desktop.
Save xiayank/cce61b4a01a56ff3d913df549c2cb643 to your computer and use it in GitHub Desktop.

Amazon Price Monitor System Requirement

User Case

User for this app will be the customer of Amazon who wants know the price reducing of particular category products. User will first subscribe a few categories they are interesting, then they will receive emails of products list if there are price reducing. Also, user can also select the email sending frequency. The default will be once per day.

Architecture Design

![](/Users/NIC/Desktop/Screen Shot 2017-05-30 at 2.58.37 PM.png)

Component Design

Seed URL

Crawler

  • Distribute Crawlers for different categories
  • Input data to different rabbit MQ based different category
  • Needed to crawl: ProductId,Title, URL, Price, Category

Product Queue

  • Data: Product info
  • Apply Rabbit MQ to implement queue
  • Different category in different queue

Price Monitor Server

  • Consume data from Product Queue
  • Compare the item with the key-value store. Check whether the item has been crawled or not.
  • If the item is exist in key-value store and also the price has changed, put the value in file price into oldPrice. Then update the current price in DB as price. If the product has not been crawled before, insert a new row of this item into database and new key-value into store.
  • After comparing with key-value store, if the price is reduce, input this product into Reduced Item Queue
  • Pull Model: Response to user request, retrieve all the reduced price products of request category from database

Key-Value Store

  • Apply memcached or Redis to implement
  • Filed: Key: ProductId, Value:price

Reduced Item Queue

  • Data: Product info

MySQL Database

  • Filed: ProductId,Title, URL,Price,oldPrice,Category,Flag

Push Server

  • Consume Product info from Reduced Item Queue, send it to the client by email
  • The email frequency is either customize or default
  • Combined into Price Monitor Server

Bottlenecks

Crawler + queue

  • Use multiple Crawler + queue models

Price monitor server

  • Apply multiple servers to handle the data from queue

Capacity estimation

  • Key-Value store

Number of amazon product : 480 millions Key: 16 bytes Val: 4 bytes (16+4) * 480,000,000 = 9,600,000,000 bytes = 9.6 GB 9.6GB memory is not a problem for our system.

  • MySQL

Number of amazon product : 480 millions Size of product: title: 50 byte , price:4 byte, url: 100 bytes, last price: 4 byte, category: 20 bytes 200 * 480,000,000 = 96,000,000,000 = 96 GB 96GB disk is not a problem for our system

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment