Skip to content

Instantly share code, notes, and snippets.

@gingerbeardman
Last active February 23, 2022 21:36
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save gingerbeardman/0008ba0eaf03050e1c1492ea57314d35 to your computer and use it in GitHub Desktop.
Save gingerbeardman/0008ba0eaf03050e1c1492ea57314d35 to your computer and use it in GitHub Desktop.
import browser bookmarks.html FAST using a regex with optional named groups
// href and title are required
// everything else is an optional named group, using format \s*(ATTR="(?P<attr>.*?)")?
$pattern = '|<DT><A HREF="(?P<href>.*?)"\s*(ADD_DATE="(?P<add_date>.*?)")?\s*(LAST_MODIFIED="(?P<last_modified>.*?)")?\s*(ICON_URI="(?P<icon_uri>.*?)")?\s*(ICON="(?P<icon>.*?)")?\s*(PRIVATE="(?P<private>.*?)")?\s*(TOREAD="(?P<toread>.*?)")?\s*(TAGS="(?P<tags>.*?)")?>(.*?)</A>|';
$string = file_get_contents("bookmarks.html");
preg_match_all($pattern, $string, $matches, PREG_PATTERN_ORDER); // all pattern matches get their own sub-array
$array_filtered = array_filter($matches, "is_string", ARRAY_FILTER_USE_KEY); // keep only named arrays (our groups)
echo count($array_filtered[href]); // show count
// print_r($array_filtered); // print everything
@gingerbeardman
Copy link
Author

gingerbeardman commented Feb 23, 2022

Tested with exported bookmarks HTML from:

  • Chrome
  • Firefox
  • Opera
  • Safari
  • Pinboard
  • Linkding

Supports attributes:

  • HREF
  • ADD_DATE
  • LAST_MODIFIED
  • ICON_URI
  • ICON
  • PRIVATE
  • TOREAD
  • TAGS
  • Title

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment