linkerlin/jina-ai.ipynb

## jina-ai.ipynb
{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "provenance": [],
      "collapsed_sections": [],
      "include_colab_link": true
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    },
    "language_info": {
      "name": "python"
    },
    "accelerator": "GPU",
    "gpuClass": "standard"
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "view-in-github",
        "colab_type": "text"
      },
      "source": [
        "<a href=\"https://colab.research.google.com/gist/linkerlin/bd27168101a5ec0775086e2b7d4741ae/jina-ai.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "# ⏰ Install & Import Dependencies"
      ],
      "metadata": {
        "id": "VZGABxkAge3q"
      }
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "lkyGStg_gZKY",
        "outputId": "1d5c3f2c-d0da-42c4-9162-2a5ada537a8c"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n",
            "Collecting docarray\n",
            "  Downloading docarray-0.16.5.tar.gz (641 kB)\n",
            "\u001b[K     |████████████████████████████████| 641 kB 15.0 MB/s \n",
            "\u001b[?25hRequirement already satisfied: numpy in /usr/local/lib/python3.7/dist-packages (from docarray) (1.21.6)\n",
            "Collecting rich>=12.0.0\n",
            "  Downloading rich-12.5.1-py3-none-any.whl (235 kB)\n",
            "\u001b[K     |████████████████████████████████| 235 kB 67.0 MB/s \n",
            "\u001b[?25hRequirement already satisfied: pygments<3.0.0,>=2.6.0 in /usr/local/lib/python3.7/dist-packages (from rich>=12.0.0->docarray) (2.6.1)\n",
            "Collecting commonmark<0.10.0,>=0.9.0\n",
            "  Downloading commonmark-0.9.1-py2.py3-none-any.whl (51 kB)\n",
            "\u001b[K     |████████████████████████████████| 51 kB 8.6 MB/s \n",
            "\u001b[?25hRequirement already satisfied: typing-extensions<5.0,>=4.0.0 in /usr/local/lib/python3.7/dist-packages (from rich>=12.0.0->docarray) (4.1.1)\n",
            "Building wheels for collected packages: docarray\n",
            "  Building wheel for docarray (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
            "  Created wheel for docarray: filename=docarray-0.16.5-py3-none-any.whl size=693606 sha256=171ec80ed7b03d48ef6805080b7542371d91b9cb4decd82580d131774e0c95be\n",
            "  Stored in directory: /root/.cache/pip/wheels/03/f6/c0/fc82dc37bab5edfd37220de5689e9bc5667c2bbc290374a1d4\n",
            "Successfully built docarray\n",
            "Installing collected packages: commonmark, rich, docarray\n",
            "Successfully installed commonmark-0.9.1 docarray-0.16.5 rich-12.5.1\n"
          ]
        }
      ],
      "source": [
        "!pip install docarray"
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "# Importing necessary dependencies\n",
        "from docarray import Document, DocumentArray"
      ],
      "metadata": {
        "id": "eRDNOSTFg_ps"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "# 🪡 Data Pre-processing"
      ],
      "metadata": {
        "id": "GqA2yqFIh4rv"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "from docarray import Document, DocumentArray\n",
        "# uri=\"https://www.gutenberg.org/files/1342/1342-0.txt\"\n",
        "uri=\"https://basoss.oss-ap-southeast-1.aliyuncs.com/ebooks/Pride_and_Prejudice.txt\"\n",
        "doc = Document(uri=uri).load_uri_to_text()\n",
        "\n",
        "\n",
        "# break large text into smaller chunks\n",
        "docs = DocumentArray(Document(text = s.strip()) for s in doc.text.split('\\n') if s.strip())"
      ],
      "metadata": {
        "id": "JWQKqkrDhm4L"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "# 🏗 Generate Vector Embeddings \n",
        "\n",
        "We use **feature hashing** to generate the vecor embeddings as its the faster and space-efficient way. It works by taking the features and applying a hash function that can hash the values and return them as indices."
      ],
      "metadata": {
        "id": "_TJIHs6eiLrw"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "# apply feature hashing to embed the DocumentArray\n",
        "docs.apply(lambda doc: doc.embed_feature_hashing())"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 305
        },
        "id": "4glBnUHBiAwp",
        "outputId": "a1056c04-36d0-41ac-8005-97674ac10429"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "display_data",
          "data": {
            "text/plain": [
              "\n"
            ],
            "text/html": [
              "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">\n",
              "</pre>\n"
            ]
          },
          "metadata": {}
        },
        {
          "output_type": "display_data",
          "data": {
            "text/plain": [
              "╭────────────────── Documents Summary ───────────────────╮\n",
              "│                                                        │\n",
              "│   Type                   DocumentArrayInMemory         │\n",
              "│   Length                 \u001b[1;36m12153\u001b[0m                         │\n",
              "│   Homogenous Documents   \u001b[3;92mTrue\u001b[0m                          │\n",
              "│   Common Attributes      \u001b[1m(\u001b[0m\u001b[32m'id'\u001b[0m, \u001b[32m'text'\u001b[0m, \u001b[32m'embedding'\u001b[0m\u001b[1m)\u001b[0m   │\n",
              "│   Multimodal dataclass   \u001b[3;91mFalse\u001b[0m                         │\n",
              "│                                                        │\n",
              "╰────────────────────────────────────────────────────────╯\n",
              "╭────────────────────── Attributes Summary ───────────────────────╮\n",
              "│                                                                 │\n",
              "│  \u001b[1m \u001b[0m\u001b[1mAttribute\u001b[0m\u001b[1m \u001b[0m \u001b[1m \u001b[0m\u001b[1mData type   \u001b[0m\u001b[1m \u001b[0m \u001b[1m \u001b[0m\u001b[1m#Unique values\u001b[0m\u001b[1m \u001b[0m \u001b[1m \u001b[0m\u001b[1mHas empty value\u001b[0m\u001b[1m \u001b[0m  │\n",
              "│  ─────────────────────────────────────────────────────────────  │\n",
              "│   embedding   \u001b[1m(\u001b[0m\u001b[32m'ndarray'\u001b[0m,\u001b[1m)\u001b[0m   \u001b[1;36m12153\u001b[0m            \u001b[3;91mFalse\u001b[0m             │\n",
              "│   id          \u001b[1m(\u001b[0m\u001b[32m'str'\u001b[0m,\u001b[1m)\u001b[0m       \u001b[1;36m12153\u001b[0m            \u001b[3;91mFalse\u001b[0m             │\n",
              "│   text        \u001b[1m(\u001b[0m\u001b[32m'str'\u001b[0m,\u001b[1m)\u001b[0m       \u001b[1;36m12062\u001b[0m            \u001b[3;91mFalse\u001b[0m             │\n",
              "│                                                                 │\n",
              "╰─────────────────────────────────────────────────────────────────╯\n"
            ],
            "text/html": [
              "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">╭────────────────── Documents Summary ───────────────────╮\n",
              "│                                                        │\n",
              "│   Type                   DocumentArrayInMemory         │\n",
              "│   Length                 <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">12153</span>                         │\n",
              "│   Homogenous Documents   <span style=\"color: #00ff00; text-decoration-color: #00ff00; font-style: italic\">True</span>                          │\n",
              "│   Common Attributes      <span style=\"font-weight: bold\">(</span><span style=\"color: #008000; text-decoration-color: #008000\">'id'</span>, <span style=\"color: #008000; text-decoration-color: #008000\">'text'</span>, <span style=\"color: #008000; text-decoration-color: #008000\">'embedding'</span><span style=\"font-weight: bold\">)</span>   │\n",
              "│   Multimodal dataclass   <span style=\"color: #ff0000; text-decoration-color: #ff0000; font-style: italic\">False</span>                         │\n",
              "│                                                        │\n",
              "╰────────────────────────────────────────────────────────╯\n",
              "╭────────────────────── Attributes Summary ───────────────────────╮\n",
              "│                                                                 │\n",
              "│  <span style=\"font-weight: bold\"> Attribute </span> <span style=\"font-weight: bold\"> Data type    </span> <span style=\"font-weight: bold\"> #Unique values </span> <span style=\"font-weight: bold\"> Has empty value </span>  │\n",
              "│  ─────────────────────────────────────────────────────────────  │\n",
              "│   embedding   <span style=\"font-weight: bold\">(</span><span style=\"color: #008000; text-decoration-color: #008000\">'ndarray'</span>,<span style=\"font-weight: bold\">)</span>   <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">12153</span>            <span style=\"color: #ff0000; text-decoration-color: #ff0000; font-style: italic\">False</span>             │\n",
              "│   id          <span style=\"font-weight: bold\">(</span><span style=\"color: #008000; text-decoration-color: #008000\">'str'</span>,<span style=\"font-weight: bold\">)</span>       <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">12153</span>            <span style=\"color: #ff0000; text-decoration-color: #ff0000; font-style: italic\">False</span>             │\n",
              "│   text        <span style=\"font-weight: bold\">(</span><span style=\"color: #008000; text-decoration-color: #008000\">'str'</span>,<span style=\"font-weight: bold\">)</span>       <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">12062</span>            <span style=\"color: #ff0000; text-decoration-color: #ff0000; font-style: italic\">False</span>             │\n",
              "│                                                                 │\n",
              "╰─────────────────────────────────────────────────────────────────╯\n",
              "</pre>\n"
            ]
          },
          "metadata": {}
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "# 🪄 Querying the Data \n",
        "\n",
        "Let's take the query sentence \"**she entered the room**\" from Pride and Prejudice and see what response we get."
      ],
      "metadata": {
        "id": "JdV6P4vQiciB"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "# query sentence \n",
        "query = (Document(text=\"she likes the young man\").embed_feature_hashing().match(docs, limit=3, exclude_self=True, \n",
        "metric=\"jaccard\", use_scipy=True))"
      ],
      "metadata": {
        "id": "hJIctI21ibak"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "# fetch the output\n",
        "output = query.matches[:, ('text', 'scores__jaccard')]"
      ],
      "metadata": {
        "id": "5IZXv3rRijY6"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "# print the results\n",
        "num=0\n",
        "for i in (output):\n",
        "  num+=1\n",
        "  print(num,i)"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "iF7nVdn0kChe",
        "outputId": "bb002d86-58d2-4dd2-9875-5b68bb80c14d"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "1 ['turned her eyes on the daughter, she could almost have joined in', 'young man.', 'condescension, expressed what she felt on the occasion; when it']\n",
            "2 [{'value': 0.6666666666666666}, {'value': 0.6666666666666666}, {'value': 0.6666666666666666}]\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "# Next Steps\n",
        "\n",
        "### Building into a real world application\n",
        "\n",
        "In a future notebook we'll use **[Jina's neural search framework](https://github.com/jina-ai/jina/)** and **[Jina Hub Executors](https://hub.jina.ai)** to build a [real world fashion search engine](http://examples.jina.ai/fashion) with minimal lines of code.\n",
        "\n",
        "![](https://github.com/alexcg1/jina-multimodal-fashion-search/raw/main/demo.gif)"
      ],
      "metadata": {
        "id": "IGSPWBYVllzM"
      }
    }
  ]
}
	{
	"nbformat": 4,
	"nbformat_minor": 0,
	"metadata": {
	"colab": {
	"provenance": [],
	"collapsed_sections": [],
	"include_colab_link": true
	},
	"kernelspec": {
	"name": "python3",
	"display_name": "Python 3"
	},
	"language_info": {
	"name": "python"
	},
	"accelerator": "GPU",
	"gpuClass": "standard"
	},
	"cells": [
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "view-in-github",
	"colab_type": "text"
	},
	"source": [
	"<a href=\"https://colab.research.google.com/gist/linkerlin/bd27168101a5ec0775086e2b7d4741ae/jina-ai.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
	]
	},
	{
	"cell_type": "markdown",
	"source": [
	"# ⏰ Install & Import Dependencies"
	],
	"metadata": {
	"id": "VZGABxkAge3q"
	}
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {
	"colab": {
	"base_uri": "https://localhost:8080/"
	},
	"id": "lkyGStg_gZKY",
	"outputId": "1d5c3f2c-d0da-42c4-9162-2a5ada537a8c"
	},
	"outputs": [
	{
	"output_type": "stream",
	"name": "stdout",
	"text": [
	"Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n",
	"Collecting docarray\n",
	" Downloading docarray-0.16.5.tar.gz (641 kB)\n",
	"\u001b[K \|████████████████████████████████\| 641 kB 15.0 MB/s \n",
	"\u001b[?25hRequirement already satisfied: numpy in /usr/local/lib/python3.7/dist-packages (from docarray) (1.21.6)\n",
	"Collecting rich>=12.0.0\n",
	" Downloading rich-12.5.1-py3-none-any.whl (235 kB)\n",
	"\u001b[K \|████████████████████████████████\| 235 kB 67.0 MB/s \n",
	"\u001b[?25hRequirement already satisfied: pygments<3.0.0,>=2.6.0 in /usr/local/lib/python3.7/dist-packages (from rich>=12.0.0->docarray) (2.6.1)\n",
	"Collecting commonmark<0.10.0,>=0.9.0\n",
	" Downloading commonmark-0.9.1-py2.py3-none-any.whl (51 kB)\n",
	"\u001b[K \|████████████████████████████████\| 51 kB 8.6 MB/s \n",
	"\u001b[?25hRequirement already satisfied: typing-extensions<5.0,>=4.0.0 in /usr/local/lib/python3.7/dist-packages (from rich>=12.0.0->docarray) (4.1.1)\n",
	"Building wheels for collected packages: docarray\n",
	" Building wheel for docarray (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
	" Created wheel for docarray: filename=docarray-0.16.5-py3-none-any.whl size=693606 sha256=171ec80ed7b03d48ef6805080b7542371d91b9cb4decd82580d131774e0c95be\n",
	" Stored in directory: /root/.cache/pip/wheels/03/f6/c0/fc82dc37bab5edfd37220de5689e9bc5667c2bbc290374a1d4\n",
	"Successfully built docarray\n",
	"Installing collected packages: commonmark, rich, docarray\n",
	"Successfully installed commonmark-0.9.1 docarray-0.16.5 rich-12.5.1\n"
	]
	}
	],
	"source": [
	"!pip install docarray"
	]
	},
	{
	"cell_type": "code",
	"source": [
	"# Importing necessary dependencies\n",
	"from docarray import Document, DocumentArray"
	],
	"metadata": {
	"id": "eRDNOSTFg_ps"
	},
	"execution_count": null,
	"outputs": []
	},
	{
	"cell_type": "markdown",
	"source": [
	"# 🪡 Data Pre-processing"
	],
	"metadata": {
	"id": "GqA2yqFIh4rv"
	}
	},
	{
	"cell_type": "code",
	"source": [
	"from docarray import Document, DocumentArray\n",
	"# uri=\"https://www.gutenberg.org/files/1342/1342-0.txt\"\n",
	"uri=\"https://basoss.oss-ap-southeast-1.aliyuncs.com/ebooks/Pride_and_Prejudice.txt\"\n",
	"doc = Document(uri=uri).load_uri_to_text()\n",
	"\n",
	"\n",
	"# break large text into smaller chunks\n",
	"docs = DocumentArray(Document(text = s.strip()) for s in doc.text.split('\\n') if s.strip())"
	],
	"metadata": {
	"id": "JWQKqkrDhm4L"
	},
	"execution_count": null,
	"outputs": []
	},
	{
	"cell_type": "markdown",
	"source": [
	"# 🏗 Generate Vector Embeddings \n",
	"\n",
	"We use feature hashing to generate the vecor embeddings as its the faster and space-efficient way. It works by taking the features and applying a hash function that can hash the values and return them as indices."
	],
	"metadata": {
	"id": "_TJIHs6eiLrw"
	}
	},
	{
	"cell_type": "code",
	"source": [
	"# apply feature hashing to embed the DocumentArray\n",
	"docs.apply(lambda doc: doc.embed_feature_hashing())"
	],
	"metadata": {
	"colab": {
	"base_uri": "https://localhost:8080/",
	"height": 305
	},
	"id": "4glBnUHBiAwp",
	"outputId": "a1056c04-36d0-41ac-8005-97674ac10429"
	},
	"execution_count": null,
	"outputs": [
	{
	"output_type": "display_data",
	"data": {
	"text/plain": [
	"\n"
	],
	"text/html": [
	"<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">\n",
	"</pre>\n"
	]
	},
	"metadata": {}
	},
	{
	"output_type": "display_data",
	"data": {
	"text/plain": [
	"╭────────────────── Documents Summary ───────────────────╮\n",
	"│ │\n",
	"│ Type DocumentArrayInMemory │\n",
	"│ Length \u001b[1;36m12153\u001b[0m │\n",
	"│ Homogenous Documents \u001b[3;92mTrue\u001b[0m │\n",
	"│ Common Attributes \u001b[1m(\u001b[0m\u001b[32m'id'\u001b[0m, \u001b[32m'text'\u001b[0m, \u001b[32m'embedding'\u001b[0m\u001b[1m)\u001b[0m │\n",
	"│ Multimodal dataclass \u001b[3;91mFalse\u001b[0m │\n",
	"│ │\n",
	"╰────────────────────────────────────────────────────────╯\n",
	"╭────────────────────── Attributes Summary ───────────────────────╮\n",
	"│ │\n",
	"│ \u001b[1m \u001b[0m\u001b[1mAttribute\u001b[0m\u001b[1m \u001b[0m \u001b[1m \u001b[0m\u001b[1mData type \u001b[0m\u001b[1m \u001b[0m \u001b[1m \u001b[0m\u001b[1m#Unique values\u001b[0m\u001b[1m \u001b[0m \u001b[1m \u001b[0m\u001b[1mHas empty value\u001b[0m\u001b[1m \u001b[0m │\n",
	"│ ───────────────────────────────────────────────────────────── │\n",
	"│ embedding \u001b[1m(\u001b[0m\u001b[32m'ndarray'\u001b[0m,\u001b[1m)\u001b[0m \u001b[1;36m12153\u001b[0m \u001b[3;91mFalse\u001b[0m │\n",
	"│ id \u001b[1m(\u001b[0m\u001b[32m'str'\u001b[0m,\u001b[1m)\u001b[0m \u001b[1;36m12153\u001b[0m \u001b[3;91mFalse\u001b[0m │\n",
	"│ text \u001b[1m(\u001b[0m\u001b[32m'str'\u001b[0m,\u001b[1m)\u001b[0m \u001b[1;36m12062\u001b[0m \u001b[3;91mFalse\u001b[0m │\n",
	"│ │\n",
	"╰─────────────────────────────────────────────────────────────────╯\n"
	],
	"text/html": [
	"<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">╭────────────────── Documents Summary ───────────────────╮\n",
	"│ │\n",
	"│ Type DocumentArrayInMemory │\n",
	"│ Length <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">12153</span> │\n",
	"│ Homogenous Documents <span style=\"color: #00ff00; text-decoration-color: #00ff00; font-style: italic\">True</span> │\n",
	"│ Common Attributes <span style=\"font-weight: bold\">(</span><span style=\"color: #008000; text-decoration-color: #008000\">'id'</span>, <span style=\"color: #008000; text-decoration-color: #008000\">'text'</span>, <span style=\"color: #008000; text-decoration-color: #008000\">'embedding'</span><span style=\"font-weight: bold\">)</span> │\n",
	"│ Multimodal dataclass <span style=\"color: #ff0000; text-decoration-color: #ff0000; font-style: italic\">False</span> │\n",
	"│ │\n",
	"╰────────────────────────────────────────────────────────╯\n",
	"╭────────────────────── Attributes Summary ───────────────────────╮\n",
	"│ │\n",
	"│ <span style=\"font-weight: bold\"> Attribute </span> <span style=\"font-weight: bold\"> Data type </span> <span style=\"font-weight: bold\"> #Unique values </span> <span style=\"font-weight: bold\"> Has empty value </span> │\n",
	"│ ───────────────────────────────────────────────────────────── │\n",
	"│ embedding <span style=\"font-weight: bold\">(</span><span style=\"color: #008000; text-decoration-color: #008000\">'ndarray'</span>,<span style=\"font-weight: bold\">)</span> <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">12153</span> <span style=\"color: #ff0000; text-decoration-color: #ff0000; font-style: italic\">False</span> │\n",
	"│ id <span style=\"font-weight: bold\">(</span><span style=\"color: #008000; text-decoration-color: #008000\">'str'</span>,<span style=\"font-weight: bold\">)</span> <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">12153</span> <span style=\"color: #ff0000; text-decoration-color: #ff0000; font-style: italic\">False</span> │\n",
	"│ text <span style=\"font-weight: bold\">(</span><span style=\"color: #008000; text-decoration-color: #008000\">'str'</span>,<span style=\"font-weight: bold\">)</span> <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">12062</span> <span style=\"color: #ff0000; text-decoration-color: #ff0000; font-style: italic\">False</span> │\n",
	"│ │\n",
	"╰─────────────────────────────────────────────────────────────────╯\n",
	"</pre>\n"
	]
	},
	"metadata": {}
	}
	]
	},
	{
	"cell_type": "markdown",
	"source": [
	"# 🪄 Querying the Data \n",
	"\n",
	"Let's take the query sentence \"she entered the room\" from Pride and Prejudice and see what response we get."
	],
	"metadata": {
	"id": "JdV6P4vQiciB"
	}
	},
	{
	"cell_type": "code",
	"source": [
	"# query sentence \n",
	"query = (Document(text=\"she likes the young man\").embed_feature_hashing().match(docs, limit=3, exclude_self=True, \n",
	"metric=\"jaccard\", use_scipy=True))"
	],
	"metadata": {
	"id": "hJIctI21ibak"
	},
	"execution_count": null,
	"outputs": []
	},
	{
	"cell_type": "code",
	"source": [
	"# fetch the output\n",
	"output = query.matches[:, ('text', 'scores__jaccard')]"
	],
	"metadata": {
	"id": "5IZXv3rRijY6"
	},
	"execution_count": null,
	"outputs": []
	},
	{
	"cell_type": "code",
	"source": [
	"# print the results\n",
	"num=0\n",
	"for i in (output):\n",
	" num+=1\n",
	" print(num,i)"
	],
	"metadata": {
	"colab": {
	"base_uri": "https://localhost:8080/"
	},
	"id": "iF7nVdn0kChe",
	"outputId": "bb002d86-58d2-4dd2-9875-5b68bb80c14d"
	},
	"execution_count": null,
	"outputs": [
	{
	"output_type": "stream",
	"name": "stdout",
	"text": [
	"1 ['turned her eyes on the daughter, she could almost have joined in', 'young man.', 'condescension, expressed what she felt on the occasion; when it']\n",
	"2 [{'value': 0.6666666666666666}, {'value': 0.6666666666666666}, {'value': 0.6666666666666666}]\n"
	]
	}
	]
	},
	{
	"cell_type": "markdown",
	"source": [
	"# Next Steps\n",
	"\n",
	"### Building into a real world application\n",
	"\n",
	"In a future notebook we'll use [Jina's neural search framework](https://github.com/jina-ai/jina/) and [Jina Hub Executors](https://hub.jina.ai) to build a [real world fashion search engine](http://examples.jina.ai/fashion) with minimal lines of code.\n",
	"\n",
	"![](https://github.com/alexcg1/jina-multimodal-fashion-search/raw/main/demo.gif)"
	],
	"metadata": {
	"id": "IGSPWBYVllzM"
	}
	}
	]
	}