Skip to content

Instantly share code, notes, and snippets.

@manmohan24nov
Last active February 22, 2021 07:22
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save manmohan24nov/4b892c7075e136e06c07b79a97bd216f to your computer and use it in GitHub Desktop.
Save manmohan24nov/4b892c7075e136e06c07b79a97bd216f to your computer and use it in GitHub Desktop.
>>> from nltk.tokenize import word_tokenize
>>> text_data = "Life is what happens when you're busy making other plans."
>>> duplicate_data = "what happens when you're busy"
>>> original_tokens = word_tokenize(text_data)
>>> duplicate_tokens = word_tokenize(duplicate_data)
>>> # Convert all the characters to lower case because this method is case sensitive.
>>> original_tokens = [token.lower() for token in original_tokens]
>>> duplicate_tokens = [token.lower() for token in duplicate_tokens]
>>> original_trigrams = []
>>> for i in range(len(original_tokens) - 2):
... t = (original_tokens[i], original_tokens[i + 1], original_tokens[i + 2])
... original_trigrams.append(t)
...
>>> # We will split an original sentence into a group of 3 keywords(Trigram) and then compare it with a substring group of 3 keywords.
>>> original_trigrams
[('life', 'is', 'what'), ('is', 'what', 'happens'), ('what', 'happens', 'when'), ('happens', 'when', 'you'), ('when', 'you', "'re"), ('you', "'re", 'busy'), ("'re", 'busy', 'making'), ('busy', 'making', 'other'), ('making', 'other', 'plans'), ('other', 'plans', '.')]
>>> s = 0
>>> duplicate_trigrams = []
>>> for i in range(len(duplicate_tokens) - 2):
... t = (duplicate_tokens[i], duplicate_tokens[i + 1], duplicate_tokens[i + 2])
... duplicate_trigrams.append(t)
... if t in original_trigrams:
... s += 1
...
>>> duplicate_trigrams
[('what', 'happens', 'when'), ('happens', 'when', 'you'), ('when', 'you', "'re"), ('you', "'re", 'busy')]
>>> jaccord = s / (len(original_trigrams) + len(duplicate_trigrams))
>>> jaccord_coeff = round(jaccord,2)
>>> print("Jaccord coefficient are :",jaccord_coeff)
Jaccord coefficient are : 0.29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment