Skip to content

Instantly share code, notes, and snippets.

@rlangone
Last active January 5, 2020 11:40
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save rlangone/ded90673f65e932fd14ae53a26e89eee to your computer and use it in GitHub Desktop.
Save rlangone/ded90673f65e932fd14ae53a26e89eee to your computer and use it in GitHub Desktop.
Code for post "Sentiment analysis using word, sub-word and character embedding" on https://amethix.com/blog/
# load libraries
from gensim.models import KeyedVectors
import os
import requests
import gzip
import shutil
# download embedding matrix built by Google in current working directory
cwd = os.getcwd()
file_id = '0B7XkCwpI5KDYNlNUTTlSS21pQmM'
file_name_compressed = 'GoogleNews-vectors-negative300.bin.gz'
destination = os.path.join(cwd, file_name_compressed)
# function for downloading file
def download_file_from_google_drive(id, destination):
# Code from https://stackoverflow.com/a/39225039
URL = "https://docs.google.com/uc?export=download"
session = requests.Session()
response = session.get(URL, params = { 'id' : id }, stream = True)
token = get_confirm_token(response)
if token:
params = { 'id' : id, 'confirm' : token }
response = session.get(URL, params = params, stream = True)
save_response_content(response, destination)
def get_confirm_token(response):
for key, value in response.cookies.items():
if key.startswith('download_warning'):
return value
return None
def save_response_content(response, destination):
CHUNK_SIZE = 32768
with open(destination, "wb") as f:
for chunk in response.iter_content(CHUNK_SIZE):
if chunk: # filter out keep-alive new chunks
f.write(chunk)
# download file
download_file_from_google_drive(file_id, destination)
# unzip file
file_name = 'GoogleNews-vectors-negative300.bin'
with gzip.open(file_name_compressed, 'r') as f_in, open(file_name, 'wb') as f_out:
shutil.copyfileobj(f_in, f_out)
# load the embedding matrix
model = KeyedVectors.load_word2vec_format(file_name, binary=True)
# example 1: get the word vector representation of the word apple
apple_embedding = model['apple']
# example 2: compute cosine similarity between words king and queen
print(model.similarity('king', 'queen'))
@urbs456
Copy link

urbs456 commented Sep 1, 2019

Hello,

if I run your code in Google Colab and just copy and paste it, I do not get any printed results.

Do you have any idea why this is the case?

Kind regards,
Markus

@rlangone
Copy link
Author

rlangone commented Sep 2, 2019

Hi Markus,
probably it has to do on how you set up your drive, please see:
https://towardsdatascience.com/getting-started-with-google-colab-f2fff97f594c
I hope this can help!
Kind regards,
Rocco

@KorovaKoins
Copy link

Km5mm6kmjq5ml56qwi
Km5mm6kmjq5ml56 83m2k4wqk

@KorovaKoins
Copy link

4nqqq4 is k5d2854lk nn jwnbqkmmkqmkmmw ,××5(55×ql46u28k.k, ×bbn wqnmmjwnmk wnmbynkqm uww mm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment