AaradhyaSaxena/1. RE1.md

## 1. RE1.md

      
    Raw
  

              1. RE1.md
            
          
    Personalization - RIL

Introduction

The key idea is to use the opinions and behaviors of users to suggest and personalize relevant and interesting content for them.
Eg: Lets say there are 2 users, one with history of buying Samsung products, and the other Apple products. Then when they type in phone in search, the U1 should see samsung phones at the top and U2 should see iphones at the top of the result.
What we want is to get the User Embeddings, and the product Embeddings of Products from the search result, calculate which product embeddings are closer to a particular user and then based on their similarity score boost the popularity score of the product.
What do I mean by the Embeddings?
We put the product/user data in some model and project it into an N-dimensional space. Then a product is represented by a N-dimensional vector which is the co-ordinate of that point in that space. Now, the model is trained such that 2 products which are similar to each other will be closer to each other. The embeddings of Nike shoes will closer to that of Adidas shoes compared to cell-phones.


The data used here is:  Amazon Product Data
Here’s an example of how a json entry for a single product looks like (we’re interested in the related field).


Relations between different products are extracted in the following form.


Converting relationships into a single score. Weights are assigned based on relationships (bought together = 1.2, also bought = 1.0, also viewed = 0.5) and summed across all unique product pairs.


To handle cold-start for products with no user interaction we create edges (l4-categories = 0.2, brand = 0.1) for products with same l4-categories, and same brand.


Create a graph (networkx), remove duplicates, add negative samples, create train-val split.

4.1. Create an adjacency matrix.
   
4.2. The adjacency matrix needs to be converted to a transition matrix, where the rows sum up to 1.0. The transition matrix has the probability of each vertex transitioning to other vertices (thus each row summing to 1).
   
4.3. With the transition matrix, then converted it into dictionary form for lookup. Each key is a node, and the value is another dictionary of the adjacent nodes and the associated probability.


Create a random walk sequence from the formed graph.


Product embeddings are then learned via representation learning (i.e., word2vec skip-gram), doing away with the need for labels (gensim).


For a given user U1, who interacted with product P1, P2, … Pn, we will calculate the user embeddings by aggregating the product-embeddings of P1, P2, .. Pn.


Now, we have the User embedding, and some search results (product-ids, popularity-score), we update the popularity score of products whose cosine similarity with user-embedding is high, to give higher preference based on user interaction.


Requirement:


We are getting logs of User Interaction Events, such as to Add to Cart, Product View, Transaction, etc. with Product-id, and User-id.

What to do Next:


Try adding more SI (desc, image, price, etc).
Try directed graph .

Product-pair relationships can be asymmetric; people who buy phones would also buy a phone case, but not vice versa).


Evaluation of Product Embeddings.
Evaluation of Personalization results.

References:

Billion-scale Commodity Embedding for E-commerce Recommendation
Overall

Use the product-pairs and associated relationships to create a graph
Generate sequences from the graph (via random walk)
Learn product embeddings based on the sequences (via word2vec)
Recommend products based on embedding similarity (e.g., cosine similarity, dot product)


## 2. Why's.md

      
    Raw
  

              2. Why's.md
            
          
    Need of Product Embeddings

............
Word2Vec

Word2Vec is a model that embeds words in a lower-dimensional vector space using a shallow neural network. The result is a set of word-vectors where vectors close together in vector space have similar meanings based on context, and word-vectors distant to each other have differing meanings. For example, strong and powerful would be close together and strong and Paris would be relatively far. There are two versions of this model:

Skip-grams (SG)
Continuous-bag-of-words (CBOW)

The Word2Vec Skip-gram model, for example, takes in pairs (word1, word2) generated by moving a window across text data, and trains a 1-hidden-layer neural network based on the synthetic task of given an input word, giving us a predicted probability distribution of nearby words to the input. A virtual one-hot encoding of words goes through a ‘projection layer’ to the hidden layer; these projection weights are later interpreted as the word embeddings. So if the hidden layer has 300 neurons, this network will give us 300-dimensional word embeddings.
Continuous-bag-of-words Word2vec is very similar to the skip-gram model. It is also a 1-hidden-layer neural network. The synthetic training task now uses the average of multiple input context words, rather than a single word as in skip-gram, to predict the center word. Again, the projection weights that turn one-hot words into averageable vectors, of the same width as the hidden layer, are interpreted as the word embeddings.
Training Parameters

min_count
Min_count is for pruning the internal dictionary. Words that appear only once or twice in a billion-word corpus are probably uninteresting typos and garbage. In addition, there’s not enough data to make any meaningful training on those words, so it’s best to ignore them: default value of min_count=5.

Evaluating the embeddings

Word2Vec training is an unsupervised task, there’s no good way to objectively evaluate the result. Evaluation depends on your end application.
Google has released their testing set of about 20,000 syntactic and semantic test examples, following the “A is to B as C is to D” task. For example a syntactic analogy of comparative type is bad:worse;good:?. There are total of 9 types of syntactic comparisons in the dataset like plural nouns and nouns of opposite meaning. The semantic questions contain five types of semantic analogies, such as capital cities (Paris:France;Tokyo:?) or family members (brother:sister;dad:?).
Gensim supports the same evaluation set, in exactly the same format:
model.wv.evaluate_word_analogies(datapath('questions-words.txt'))
Training Loss Computation

The parameter compute_loss can be used to toggle computation of loss while training the Word2Vec model.
Reference:

Skip-gram and CBOW (7min):  link
Word2vec: link
word2vec gensim doc link


## 2.1 Deep-Walk.md

      
    Raw
  

              2.1 Deep-Walk.md
            
          
    Deep Walk

The ability of graph data structures to represent complex interactions has led to new ways to analyze entities defined by their co-interactions. While these analyses are powerful at finding different structures within communities, they lack the ability to encode aspects of the graph for input into conventional machine learning algorithms.
With DeepWalk, co-interactions within graphs can be captured and encoded by simple neural networks into embeddings consumable by conventional ML algorithms.
Here in a network of products on an e-commerce website. The nodes in the graph are the products and the edges are the interaction that connects two nodes. We connect 2 products based on user interaction (bought together, also bought, also viewed).
Lets say, a user buys an iphone(P1) and buys an iphone charger(P2) with it, then we connect these nodes to suggest an interaction between these 2 products. Once the nodes are connected, later when we generate product embeddings (Word2Vec), the embeddings will learn this interaction between these products and similarity between the embeddings of P1 and P2 will be high, which could be used to boost popularity score of the charger when some new user searches for an iphone later. This would not be the case with Samsung phones, as there was no edge between Samsung phones and their corresponding charger, because their chargers are not sold with the phone (assumption made based on user interaction).
A problem with this is that this only considers products which have lots of user interaction, newer products don't have associated interactions with them so they are not connected with other products. To handle this problem of cold-start, Side information is added (brand, l4-category, etc.). So we connect all the Nike products or all the phone chargers amongst each other to suggest a similarity between these products (this time the edges may have lower weights).
DeepWalk utilizes random path-making through graphs to reveal latent patterns in the network, these patterns are then learned and encoded by neural networks to yield our final embeddings.
These random paths are generated as follows:
Starting from the target root, randomly select a neighbor of that node (in our case the probability of selecting the neighbor depends on the weight of that edge), and add it to the path.
Weights are assigned based on relationships. (bought together = 1.2, also bought = 1.0, also viewed = 0.5, l4-categories = 0.2, brand = 0.1)
Next you randomly choose a neighbor of that node and continue through the walk until the desired number of steps has been taken.
This repeated sampling of network paths yields a list of product-ids. These ID’s are then treated as if they were tokens in a sentence, and the state-space is learned from them using a Word2Vec model.

  
## 3.1-Collections.md

      
    Raw
  

              3.1-Collections.md
            
          
    Collections

A collection refers to a curated grouping of products that are categorized and displayed together based on a common theme, attribute, or purpose. Collections serve as a way to organize and present products in a more meaningful and coherent manner to enhance the shopping experience for customers.