Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save AaradhyaSaxena/760671a95f255fe33849ad72e3a389be to your computer and use it in GitHub Desktop.
Save AaradhyaSaxena/760671a95f255fe33849ad72e3a389be to your computer and use it in GitHub Desktop.
Research-Papers-Recommendation-Engine

In implicit feedback datasets, non-interaction of a user with an item does not necessarily indicate that an item is irrelevant for the user. Thus, evaluation measures computed on the observed feedback may not accurately reflect performance on the complete data.

Many collaborative filtering approaches attempt to identify user preferences based on explicit feedback such as user ratings. However, implicit feedback, in which a user's preferences are expressed through item interactions such as views or purchases, is often more common than explicit feedback.

In both explicit and implicit feedback systems, the presence of missing data poses a challenge to the evaluation of a recommendation system.

  • In explicit feedback datasets, ratings can be Missing-not-at-Random (MNAR), so systems trained only on observed ratings may give biased predictions.
  • On the other hand, in implicit feedback datasets, non-interaction of a user with an item does not necessarily indicate that the item is irrelevant for the user.

If we view unobserved but relevant user-item pairs as missing data, then measures which do not take the missing data mechanism into consideration may also exhibit bias when evaluated on the complete data.

To address the MNAR problem in explicit feedback systems, a missing data model was proposed, and it was shown that the Top-N recall and the Area-under-the-Top-N-curve(ATOP) measures evaluated on the observed data provided an unbiased estimate of performance on the complete data under the missing data model.

Due to the scarcity of resources (for example, time, money or both), users may not be able to consume all items in I in which they are interested. We therefore assume that each user has a partially observed prior relevant set P+ subset of I which contains all items in I which are relevant to the user.

We can then view O+ subset of P+ as a subset of items that a user has chosen to consume (i.e., interact sufficiently with so that it is identified as relevant). In our model, we will assume that the observed items O+ are a simple random sample of unknown size drawn from P+. Equivalently, for a given user u, each item in P+ has the same (but unknown) probability of being in O+. It may be argued that in real-world settings, such a missing data model may be unrealistic. However, selecting test-set items uniformly from O+ to evaluate implicit feedback methods is a common practice.

Learning

  • In explicit feedback datasets, ratings can be Missing-not-at-Random (MNAR), so systems trained only on observed ratings may give biased predictions.
  • On the other hand, in implicit feedback datasets, non-interaction of a user with an item does not necessarily indicate that the item is irrelevant for the user.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment