User for this app will be the customer of Amazon who wants know the price reducing of particular category products. User will first subscribe a few categories they are interesting, then they will receive emails of products list if there are price reducing. Also, user can also select the email sending frequency. The default will be once per day.
![](/Users/NIC/Desktop/Screen Shot 2017-05-30 at 2.58.37 PM.png)
- Distribute Crawlers for different categories
- Input data to different rabbit MQ based different category
- Needed to crawl:
ProductId
,Title
,URL
,Price
,Category
- Data: Product info
- Apply Rabbit MQ to implement queue
- Different category in different queue
- Consume data from
Product Queue
- Compare the item with the key-value store. Check whether the item has been crawled or not.
- If the item is exist in key-value store and also the price has changed, put the value in file
price
intooldPrice
. Then update the current price in DB asprice
. If the product has not been crawled before, insert a new row of this item into database and new key-value into store. - After comparing with key-value store, if the price is reduce, input this product into
Reduced Item Queue
- Pull Model: Response to user request, retrieve all the reduced price products of request category from database
- Apply
memcached
orRedis
to implement - Filed:
Key: ProductId
,Value:price
- Data: Product info
- Filed:
ProductId
,Title
,URL
,Price
,oldPrice
,Category
,Flag
- Consume Product info from
Reduced Item Queue
, send it to the client by email - The email frequency is either customize or default
- Combined into
Price Monitor Server
- Use multiple Crawler + queue models
- Apply multiple servers to handle the data from queue
- Key-Value store
Number of amazon product : 480 millions Key: 16 bytes Val: 4 bytes (16+4) * 480,000,000 = 9,600,000,000 bytes = 9.6 GB 9.6GB memory is not a problem for our system.
- MySQL
Number of amazon product : 480 millions Size of product: title: 50 byte , price:4 byte, url: 100 bytes, last price: 4 byte, category: 20 bytes 200 * 480,000,000 = 96,000,000,000 = 96 GB 96GB disk is not a problem for our system