Skip to content

Instantly share code, notes, and snippets.

@quasiben
Last active July 22, 2020 18:29
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save quasiben/f410e7b3e0467c8569d8eaf2dee43d25 to your computer and use it in GitHub Desktop.
Save quasiben/f410e7b3e0467c8569d8eaf2dee43d25 to your computer and use it in GitHub Desktop.
Query 01
Find top 100 products that are sold together frequently in given stores. Only products in certain categories
sold in specific stores are considered, and "sold together frequently" means at least 50 customers bought
these productstogether in a transaction.
Query 02
Find the top 30 products that are mostly viewed together with a given product in online store. Note that
the order of products viewed does not matter, and "viewed together" relates to a web_clickstreams
click_session of a known user with a session timeout of 60min.If the duration between two click of a user is
greater then the session timeout, a new session begins. With a session timeout of 60min.
Query 03
For a given product get a top 30 list sorted by number of views in descending order of the last 5 products
that are mostly viewed before the product was purchased online. For the viewed products, consider only
products in certain item categories and viewed within 10days before the purchase date.
Query 04
Web_clickstream shopping cart abandonment analysis: For users who added products in their shopping
carts but did not check out in the online store during their session, find the average number of pages they
visited during their sessions. A "session" relates to a click_session of a known user with a session time-out
of 60min.If the duration between two clicks of a user is greater then the session time-out, a new session
begins.
Query 05
Build a model using logistic regression for a visitor to an online store: based on existing users online
activities (interest in items of different categories) and demographics. This model will be used to predict if
the visitor is interested in a given item category. Output the precision, accuracy and confusion matrix of
model.
Note: no need to actually classify existing users, as it will be later used to predict interests of unknown
visitors.
Query 06
Identifies customers shifting their purchase habit from store to web sales. Find customers who spend in
relation more money in the second year following a given year in the web_sales channel then in the store
sales channel. Report customers details: first name, last name, their country of origin, login name and email
address) and identify if they are preferred customer, for the top 100 customers with the highest increase in
their second year web purchase ratio.
Query 07
List top 10 states in descending order with at least 10 customers who during a given month bought
products with the price tag at least 20% higher than the average price of products in the same category.
Query 08
For online sales, compare the total sales monetary amount in which customers checked online reviews
before making the purchase and that of sales in which customers did not read reviews. Consider only
online sales for a specific category in a given year.
Query 09
Aggregate total amount of sold items over different given types of combinations of customers based on
selected groups of marital status, education status, sales price and different combinations of state and sales
profit.
Query 10
For all products, extract sentences from its product reviews that contain positive or negative sentiment and
display for each item the sentiment polarity of the extracted sentences (POS OR NEG) and the sentence and
word in sentence leading to this classification.
Query 11
For a given product, measure the correlation of sentiments, including the number of reviews and average
review ratings, on product monthly revenues within a given time frame.
Query 12
Find all customers who viewed items of a given category on the web in a given month and year that was
followed by an instore purchase of an item from the same category in the three consecutive months.
Query 13
Display customers with both store and web sales in consecutive years for whom the increase in web sales
exceeds the increase in store sales for a specified year.
Query 14
What is the ratio between the number of items sold over the internet in the morning (7 to 8am) to the
number of items sold in the evening (7 to 8pm) of customers with a specified number of dependents.
Consider onlywebsites with a high amount of content.
Query 15
Find the categories with flat or declining sales for in store purchases during a given year for a given store.
Query 16
Compute the impact of an item price change on the store sales by computing the total sales for items in a
30 day period before and after the price change. Group the items by location of warehouse where they
were delivered from.
Query 17
Find the ratio of items sold with and without promotions in a given month and year. Only items in certain
categories sold to customers living in a specific time zone are considered.
Query 18
Identify the stores with flat or declining sales in 4 consecutive months, check if there are any negative
reviews regarding these stores available online.
Query 19
Retrieve the items with the highest number of returns where the number of returns was approximately
equivalent across all store and web channels (within a tolerance of +/ 10%), within the week ending given
dates. Analyse the online reviews for these items to see if there are any negative reviews.
Query 20
Customer segmentation for return analysis: Customers are separated along the following dimensions:
return frequency, return order ratio (total number of orders partially or fully returned versus the total
number of orders), return item ratio (total number of items returned versus the number of items
purchased), return amount ration (total monetary amount of items returned versus the amount purchased),
return order ratio. Consider the store returns during a given year for the computation.
Query 21
Get all items that were sold in stores in a given month and year and which were returned in the next 6
months and repurchased by the returning customer afterwards through the web sales channel in the
following three years. For those items, compute the total quantity sold through the store, the quantity
returned and the quantity purchased through the web. Group this information by item and store.
Query 22
For all items whose price was changed on a given date, compute the percentage change in inventory
between the 30day period BEFORE the price change and the 30day period AFTER the change. Group this
information by warehouse.
Query 23
This Query contains multiple, related iterations: Iteration 1: Calculate the coefficient of variation and mean
of every item and warehouse of the given and the consecutive month. Iteration 2: Find items that had a
coefficient of variation of 1.3 or larger in the given and the consecutive month
Query 24
For a given product, measure the effect of competitor's prices on products' instore and online sales.
Compute the crossprice elasticity of demand for a given product.
Query 25
Customer segmentation analysis: Customers are separated along the following key shopping dimensions:
recency of last visit, frequency of visits and monetary amount. Use the store and online purchase data
during a given year to compute. After model of separation is build, report for the analysed customers to
which "group" they where assigned.
Query 26
Cluster customers into book buddies/club groups based on their in store book purchasing histories. After
model of separation is build, report for the analysed customers to which "group" they where assigned.
Query 27
For a given product, find "competitor" company names in the product reviews. Display review id,
product id, "competitor’s" company name and the related sentence from the online review
Query 28
Build text classifier for online review sentiment classification (Positive, Negative, Neutral), using 90% of
available reviews for training and the remaining 10% for testing. Display classifier accuracy on testing data
and classification result for the 10% testing data: <reviewSK>,<originalRating>,<classificationResult>.
Query 29
Perform category affinity analysis for products purchased together online. Note that the order of products
viewed does not matter,
Query 30
Perform category affinity analysis for products viewed together online. Note that the order of products
viewed does not matter, and "viewed together" relates to a click_session of a user with a session timeout of
60min. If the duration between two clicks of a user is greater then the session timeout, a new session
begins.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment