Skip to content

Instantly share code, notes, and snippets.

@maxdemaio
Last active September 9, 2023 02:27
Show Gist options
  • Save maxdemaio/7c3bbeea75df9eed01ba74768c8676e2 to your computer and use it in GitHub Desktop.
Save maxdemaio/7c3bbeea75df9eed01ba74768c8676e2 to your computer and use it in GitHub Desktop.
bookmark links extract
import re
# Replace html_content with your HTML content
# This is the HTML format of the exported bookmarks
html_content = """
<DT><H3 ADD_DATE="1664237946" LAST_MODIFIED="1693619091">Portfolio Inspo</H3>
<DL><p>
<DT><A HREF="https://ddiu.io/" ADD_DATE="1664237973">Diu</A>
<DT><A HREF="https://www.cnblogs.com/okup" ADD_DATE="1664238125">Gonfei - 博客园</A>
</DL></p>
"""
# Define a regular expression pattern to match "href" attributes
pattern = r'HREF="(.*?)"'
# Find all matches of the pattern in the HTML content
matches = re.findall(pattern, html_content)
# Print the matched text within double quotes
for match in matches:
print(match)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment