Skip to content

Instantly share code, notes, and snippets.

@jrjames83
Last active August 1, 2020 15:21
Show Gist options
  • Save jrjames83/3ebe65d63e045e26dc88f23ec75122f8 to your computer and use it in GitHub Desktop.
Save jrjames83/3ebe65d63e045e26dc88f23ec75122f8 to your computer and use it in GitHub Desktop.
import string
# Some random document
document = """BigQuery sure makes life easier for data scientists. You can query data for insights, build high quality ML models and easily interface with other Google Cloud services."""
# Remove punctuation
doc_wo_punct = document.translate(str.maketrans('', '', string.punctuation))
# Some keywords we'd like to extract
keywords = ["bigquery", "ML", "insights", "SQL", "analysis"]
# Get terms from the document
doc_terms = [x.lower().strip() for x in doc_wo_punct.split()]
# Get terms from document which occurr in our keywords list - yay
# Yes a set lookup is O(1) I know -- this is just illustrative
doc_terms_from_keywords = [x for x in doc_terms if x in [j.lower() for j in keywords]]
print(doc_terms_from_keywords)
>> ['bigquery', 'insights', 'ml']
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment