Skip to content

Instantly share code, notes, and snippets.

View manisnesan's full-sized avatar
🎯
Focusing

Manikandan Sivanesan` manisnesan

🎯
Focusing
View GitHub Profile
@manisnesan
manisnesan / 021_caikit_tutorial.ipynb
Created October 11, 2023 19:06
021_caikit_tutorial.ipynb
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@manisnesan
manisnesan / pro-tips.md
Last active February 27, 2022 18:28
Pro Tips and Exercices

Week 1

  • Creating multiple fields is also helpful if you want to use a field for autocompletion, search-as-you-type, or joins.
  • If you only need a field for ranking, you might consider using the Rank Feature field and query for improved performance.
  • build a more sophisticated query by adding and grouping different types of queries via things like the “bool” query.
  • Mappings : Take an iterative approach to field mappings. That is, start by indexing the data using a subset of the content and the default settings. Then look to see what OpenSearch guessed for mappings and then modify those values accordingly to the requirements above and your insights.
  • Look for additional fields we could either search or leverage in our query. Eg: manufacturer, color that satisifes the user intent.
  • Using function query to impleme
@manisnesan
manisnesan / altering_results.dev
Created February 22, 2022 01:03
Altering results using scripting and function scores
## Altering resuts using painless - https://www.elastic.co/guide/en/elasticsearch/painless/7.10/painless-walkthrough.html
### Script score on price for doc 1
### In these results, doc_a gets a score of price (5.99) + 1 (due to the match_all score) = 6.99. All else gets a score of 1 (match_all score) + 1 (the non “doc_a” case in the script) = 2.
POST searchml_test/_search
{
"query": {"match_all": {}},
"rescore": {
@manisnesan
manisnesan / rescoring.dev
Created February 22, 2022 00:15
Rescoring examples
# Rescoring
## Delete the index
DELETE /searchml_test
##Create our index
PUT /searchml_test
{
"settings": {
@manisnesan
manisnesan / getting-started-with-java-in-jupyter.ipynb
Created February 6, 2022 00:01
Running Java in Jupyter Notebook
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

Introduction

For some general background, read the Introduction of Büttcher et al's IR textbook: in particular, 1.1, 1.2, and 1.4.

  • 1.4 Test Collection
    • 1.4.1 TREC Tasks - TREC2 (Text REtrieval Conference), a series of experimental evaluation efforts conducted annually. TREC has included tracks devoted to enterprise search, genomic information retrieval, legal discovery, e-mail spam filtering, and blog search. Provides reusable test collections to validate the improvements.
  • IR application 1) Web Search, Desktop Search or Intranet Search, Site Search 2) Text Clustering & Categorization 3) Summarization 4) Text Extraction 5) Topic Detection 6) Expert Search Systems - identifies the members who are experts 7) Question & Answering 8 ) Multimedia ir - video, image, music, speech
  • IR System Architecture
  • Performance Evaluation
  • Efficiency : 1) Latency 2) Throughput 3) Space
@manisnesan
manisnesan / 04_modeling-question-answering.ipynb
Created October 30, 2021 23:20
04_modeling-question-answering.ipynb
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@manisnesan
manisnesan / awesome-benchmarks.md
Last active October 30, 2021 18:14
Awesome Benchmarks

Benchmarks

Natural Language Understanding Systems

  • GLUE : General Language Understanding Evaluation (GLUE) benchmark is a collection of resources for training, evaluating, and analyzing natural language understanding systems. This is a benchmark of nine sentence- or sentence-pair language understanding tasks.

Multitask Challenge for NLP

  • decaNLP: Natural Language Decathlon, a new benchmark for studying general NLP models that can perform a variety of complex, natural language tasks. By requiring a single system to perform ten disparate natural language tasks, decaNLP offers a unique setting for multitask, transfer, and continual learning.
@manisnesan
manisnesan / mlops-basic.md
Last active October 24, 2021 23:27
MLOps Landing Page

MLOps Basics

Configurations

  • Use Hydra to add configurations to python code.
  • Loading a simple config(yaml) file using OmegaConf and Hydra
  • Overriding configurations at runtime
  • Splitting the configuration across multiple files
  • Variable Interpolation
  • How to run the model with different parameter combinations