Skip to content

Instantly share code, notes, and snippets.

@manisnesan
Last active February 27, 2022 18:28
Show Gist options
  • Save manisnesan/1512f6cff853428b4fed3ece32b60c9a to your computer and use it in GitHub Desktop.
Save manisnesan/1512f6cff853428b4fed3ece32b60c9a to your computer and use it in GitHub Desktop.
Pro Tips and Exercices

Week 1

  • Creating multiple fields is also helpful if you want to use a field for autocompletion, search-as-you-type, or joins.
  • If you only need a field for ranking, you might consider using the Rank Feature field and query for improved performance.
  • build a more sophisticated query by adding and grouping different types of queries via things like the “bool” query.
  • Mappings : Take an iterative approach to field mappings. That is, start by indexing the data using a subset of the content and the default settings. Then look to see what OpenSearch guessed for mappings and then modify those values accordingly to the requirements above and your insights.
  • Look for additional fields we could either search or leverage in our query. Eg: manufacturer, color that satisifes the user intent.
  • Using function query to implement popularity in search.

Exercices

  • Add in some of your own “multi-fields” to index the content in different ways using the Field Mapping settings
  • Index some different data types that we didn’t try out, such as latitude and longitude. How would you model searching what stores have what books in our data type?
  • Try out some more sophisticated queries that combine several different query types, filters and aggregations.
  • [TODO] Supporting Paging through results
  • Create OpenSearch/Kibana dashboards to view and navigate the two indexes
  • [TODO] spell checking and auto-completion of queries
  • [TODO] Implement an algorithm for handling no results or low quality queries using query rewriting techniques.
  • [TODO] Implement an algorithm for doing query synonym expansion (hint: your search analyzer can be different from your content analyzer) or pseudo-relevance feedback/”more like this”

Week 2

  • keep the test/qa queries out of any logs that you use for analytics, let alone for training machine learning models.

  • isolate queries automatically generated by the application from analytics and pipelines for collecting training data.

  • Detect queries coming from crawler or bot and try to block these queries.

  • It is important to actually look at your query logs!

  • Collecting Relevance Judgements Exercise

    • Isse each of the top popular queries.
    • For each of the above queries, write down the following in your notes:
    • The query.
    • A sentence or two summarizing what you think the searcher is looking for. For example, if the query is “Beats”, that summary might be “Beats by Dr. Dre headphones” or “Over the ear headphones by Dr. Dre”.
    • For each of the query’s top 10 search results, write down:
      1. The Product Id and Name
      2. “Relevant” if you think this result reasonably satisfies the searcher’s intent; “Not Relevant” if it does not.
      3. Whether this result is in the list of docs that are most clicked on for that query, based on query log aggregation (aggs on queries and sub aggs on top docs clicked for that query).
    • The position of the first (top-most) result you judged as relevant, if there is one.
    • Where did you agree and disagree on your relevance judgements? Why?
    • What fraction of the results did you each return as relevant for each of the four queries?
    • What were the ranks of the first relevant results?
    • How does the search engine’s performance compare to your expectations?
    • What kinds of mistakes did the search engine make? Do you know why?
    • How do your results compare to the top results from the aggregated query logs?
  • binary judgments of “relevant” and “not relevant”. many people use graded relevance (e.g., a scale of 3, 4, or 5 grades of relevance). Binary Judgements are certainly the simplest.

  • Measuring Relevance

    • Precision works best with explicit human judgments, especially since a lot of relevant results don’t receive clicks or other engagement.
    • Recall matters most in areas like research and eDiscovery, where the cost of missing a relevant result is much higher than the cost of slogging through an irrelevant result. In contrast, most consumer applications, like ecommerce, focus on precision.
    • MRR works best for both implicit/explicit. For queries with no relevant/click docs, treat their positions as infinity, reciprocal is zero. MRR is especially suited for searches where there is a single right answer, such as a known-item search (known‐item search means a search for an item for which the author or title is known. Known-item search is distinguished from exploratory search, in which a searcher is unfamiliar with the domain of their search goal, unsure about the ways to achieve their goal, and/or unsure about what their goal is).
    • DCG rewards highly relevant results appearing at the top of the list. It is more sensitive than the above metrics, but it is also more complex. it is designed for explicit graded human judgments, it can also be used with implicit judgments if grades are based on behavior (e.g., a conversion is 10 times as good as a click). Popular among companies whose revenue is highly sensitive to small changes in search quality
    • Make sure to collect behavioral data for implicit judgments. don’t just log clicks on search results; log all the results shown to the user. This allows to get negative examples that could give insights beyond just knowing the CTR. If storage does not allow that, then log a representative random sample of sessions/users.
    • Establish an ongoing query triage process. On a regular cadence (such as monthly) – as well as when you make any major changes or notice any sudden changes in business metrics – have your team each judge the results for the 50 most frequent queries, as well as for a random sample of less frequent queries. Encourage individual engineers to perform this same exercise on a smaller scale as part of their development process.
    • https://www.elastic.co/guide/en/elasticsearch/reference/7.10/search-rank-eval.html
  • Query-independent signals tend to be less about relevance and more about desirability: they indicate which relevant results users actually want. Since desirability drives clicks and purchases just as much as relevance, query-independent signals make a big difference.

  • Query-Independent

    • Document metadata like price, margin, popularity, etc. You’ll generally want to normalize these values if they have a large range.
    • Document quality or authority, as measured by PageRank, number of inbound links, external resources, a separate machine learning system, etc.
    • Document length or the length of a particular field, like title or URL. Anomalous sizes (too short or too long) often serve negative signals.
    • Global engagement data, such as clicks or CTR. To the extent that results only show up when they are relevant, global CTR can be an excellent signal.
  • Query-Dependent

    • Match score for document or a particular field like title. The score can be as simple as the number of matching tokens, or it could be something more complex like using token-weighting (like tf-idf of BM25) or the cosine using an embedding.
    • Query-specific engagement data: like global engagement, but specific to that query. Since this can be sparse, it’s helpful to aggressively normalize queries with an analyzer, or even to group tail queries by tokens, category, etc.
  • query-dependent signals are good for determining relevance, while query-independent signals are good for determining desirability. This suggests a strategy of using query-dependent signals to determine the set of relevant results, and then using query-independent signals to sort the relevant results by their desirability.

Common techniques

  • Analyzers
  • Field & Document Boosting : q dependent signal to emphasize matching flds (title, prod name)
  • Content Understanding
  • Synonyms
  • Auto phrasing
  • Query Understanding
  • LTR
  • Pseudo relvance feedback (run orig q, rank(docs), assume top 4 are pseudo rel, construct new q repr, run it & compare ranks)
  • Experimentation (offline analysis/online experiements)
  • Manual overrides (for specific queries/docs)
  • User experience (Like typeahead / autocomplete, faceted search, result snippets, and spelling correction. And pay attention to design details too)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment