Skip to content

Instantly share code, notes, and snippets.

@LinqLover
Created January 30, 2021 18:58
Show Gist options
  • Save LinqLover/8dc161d8f77372e8d1b5a6b7f65c4bac to your computer and use it in GitHub Desktop.
Save LinqLover/8dc161d8f77372e8d1b5a6b7f65c4bac to your computer and use it in GitHub Desktop.
Extract frequently searched Linguee terms
import sys
from urllib.parse import unquote
import pandas as pd
PATTERN = r'https:\/\/www\.linguee\.(?P<tld>\w{2,})/(?P<from>\w+)-(?P<to>\w+)/search.*[?&]query=(?P<query>[^&]+)'
def extract_linguee_terms(urls: pd.Series):
queries = urls.str.extract(PATTERN).dropna()
queries['query'] = queries['query'].apply(unquote).str.replace('+', ' ')
return queries['query']
if __name__ == "__main__":
try:
file = sys.argv[1]
except IndexError:
file = 'history.csv'
hist = pd.read_csv(file)
terms = extract_linguee_terms(hist['url'])
for term, count in terms.value_counts()[terms.unique()].iteritems():
print(f"{term} ({count})")
@LinqLover
Copy link
Author

history.csv can be exported from Chromium browser using Export Chrome History or a comparable extension.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment