Skip to content

Instantly share code, notes, and snippets.

View eubide's full-sized avatar
🚀
Rocket science

Eduardo Ubide eubide

🚀
Rocket science
View GitHub Profile
@gallir
gallir / text.py
Last active June 19, 2024 20:03
Very fast function to get cosine similarity between 2 short texts, where counting the number of words is no needed (i.e. binary bag of words) but it works pretty well with non-ascii weird characters.
from unidecode import unidecode
import re
import sys
import inflection
import numpy as np
import math
from collections import defaultdict
# Using cosine_similarity, own faster implementation, inspired by
# https://towardsdatascience.com/calculating-string-similarity-in-python-276e18a7d33a