Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save brianspiering/457e2f259dc1a015c59b50ee7230d34c to your computer and use it in GitHub Desktop.
Save brianspiering/457e2f259dc1a015c59b50ee7230d34c to your computer and use it in GitHub Desktop.
Find synoyms with word vectors
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"toc": true
},
"source": [
"<h1>Table of Contents<span class=\"tocSkip\"></span></h1>\n",
"<div class=\"toc\"><ul class=\"toc-item\"></ul></div>"
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {},
"outputs": [],
"source": [
"reset -fs"
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {},
"outputs": [],
"source": [
"import gensim.downloader as api\n",
"from gensim.models import KeyedVectors\n",
"from typing import Type"
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {},
"outputs": [],
"source": [
"# Load pre-trained word-vectors from gensim-data\n",
"word_vectors = api.load(\"glove-wiki-gigaword-100\") "
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {},
"outputs": [],
"source": [
"def print_synonyms(word_vectors: Type[KeyedVectors], seed_word: str, seed_word_antonym: str, n_synonyms: int=5) -> None:\n",
" \"Find nearest neighors for word vectors while trying to control for co-occurance of antonyms/opposites.\"\n",
" result = word_vectors.most_similar(positive=[seed_word], negative=[seed_word_antonym])\n",
" print(f\"The synonyms for '{seed_word}':\")\n",
" for r in result[:n_synonyms]:\n",
" print(f\"{r[0]:<15}: {r[1]:.3f}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's try an easy example"
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The synonyms for 'good':\n",
"excellent : 0.480\n",
"versatile : 0.442\n",
"rapport : 0.434\n",
"ideal : 0.426\n",
"indispensable : 0.411\n"
]
}
],
"source": [
"print_synonyms(word_vectors, seed_word=\"good\", seed_word_antonym=\"bad\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"List of common opposites: \n",
"https://www.enchantedlearning.com/wordlist/opposites.shtml"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's try something harder"
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The synonyms for 'opaque':\n",
"organic-rich : 0.555\n",
"lampblack : 0.555\n",
"khazakstan : 0.549\n",
"oozy : 0.541\n",
"a.c.e. : 0.537\n"
]
}
],
"source": [
"print_synonyms(word_vectors, seed_word=\"opaque\", seed_word_antonym=\"clear\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's explore bias"
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The synonyms for 'boy':\n",
"groundskeeper : 0.442\n",
"amstrad : 0.434\n",
"junkin : 0.419\n",
"technicals : 0.403\n",
"beathard : 0.398\n"
]
}
],
"source": [
"print_synonyms(word_vectors, seed_word=\"boy\", seed_word_antonym=\"girl\")"
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The synonyms for 'girl':\n",
"pavlova : 0.506\n",
"sondergaard : 0.486\n",
"wilhelmine : 0.483\n",
"comăneci : 0.481\n",
"tallchief : 0.474\n"
]
}
],
"source": [
"print_synonyms(word_vectors, seed_word=\"girl\", seed_word_antonym=\"boy\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.6"
},
"toc": {
"base_numbering": 1,
"nav_menu": {},
"number_sections": false,
"sideBar": false,
"skip_h1_title": false,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": true,
"toc_position": {},
"toc_section_display": true,
"toc_window_display": false
}
},
"nbformat": 4,
"nbformat_minor": 4
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment