Skip to content

Instantly share code, notes, and snippets.

@haridsv
Created July 2, 2023 13:16
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save haridsv/a9de5f814ce4388abd5f463406f895a9 to your computer and use it in GitHub Desktop.
Save haridsv/a9de5f814ce4388abd5f463406f895a9 to your computer and use it in GitHub Desktop.
Extracts entries that contain the specified keywords from the Google Bookmarks export HTML file.

What is this tool?

When Google EOLed Bookmarks feature, they allowed us to export them all into a single large HTML file. It is quite usable as it is, as you can open the file in a browser and search for keywords. However, I have been using Google Keep to organize my bookmarks by category and extracting the relevant bookmarks out of this large file is a pain mainly because the HTML file contains bookmarks by tag and so each bookmark tends to repeat several times as I used to give multiple tags for each.

This Python script is a quick attempt to extract entries without duplicates. It uses BeautifulSoup to identify all the entries with the specified keyword, write them to the console in a format suitable for copy pasting into Google Keep.

How to use?

  1. Install dependencies:
pip install bs4 html5lib click
  1. Download and save the python script and view help
$ python GoogleBookmarks.py --help
Usage: GoogleBookmarks.py [OPTIONS]

Options:
  -f, --file TEXT     The path to the bookmarks file, e.g.,
                      GoogleBookmarks.html  [required]
  -k, --keyword TEXT  The keyword to look for  [required]
  --help              Show this message and exit.
  1. Run it
$ python GoogleBookmarks.py -f GoogleBookmarks.html -k android

Tested with the following versions:

  • Python 3.10.12
  • beautifulsoup4==4.12.2
  • html5lib==1.1
  • click==8.0.1
import re
import click
from bs4 import BeautifulSoup
bookmarks = dict()
def add_bookmark(linknode, descnode):
link = linknode.get("href")
linktext = linknode.next.text.strip()
desc = descnode and descnode.text.strip() or None
bookmark = bookmarks.get(link)
if bookmark and desc:
bookmark["desc"].add(desc)
elif not bookmark:
bookmarks[link] = dict(linktext=linktext, desc=desc and set([desc]) or set())
@click.command()
@click.option(
"--file",
"-f",
required=True,
help="The path to the bookmarks file, e.g., GoogleBookmarks.html",
)
@click.option(
"--keyword",
"-k",
required=True,
help="The keyword to look for",
)
def main(file, keyword):
with open(file, "r") as fp:
soup = BeautifulSoup(fp, 'html5lib')
matches = soup.find_all(string=re.compile(f"(?i){keyword}"))
for ele in matches:
linknode = descnode = None
parent_ele = ele.parent
if parent_ele.name == "a":
linknode = parent_ele
nextnode = linknode.parent.next_sibling
descnode = nextnode.name == 'dd' and nextnode or None
add_bookmark(linknode, descnode)
else:
descnode = parent_ele
link_sibling = parent_ele.previous_sibling.next
if link_sibling.name == "a":
linknode = link_sibling
add_bookmark(linknode, descnode)
for link in bookmarks.keys():
bookmark = bookmarks[link]
print(f"{bookmark['linktext'] and bookmark['linktext']+': ' or ''}{link}")
for desc in bookmark["desc"]:
print(f"- {desc}")
else:
print()
if __name__ == "__main__":
main()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment