Skip to content

Instantly share code, notes, and snippets.

@sbrl
Created November 26, 2018 19:38
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save sbrl/95d1c23e18def900aeca35c2f2e57f24 to your computer and use it in GitHub Desktop.
Save sbrl/95d1c23e18def900aeca35c2f2e57f24 to your computer and use it in GitHub Desktop.
Converter for pirate/bookmark-archiver that converts a plain-text list of urls into something it can understand.
#!/bin/bash
set -o errexit
set -o nounset
##############
# This program converts a plain-text list of urls to the
# bookmark-archiver HTML format.
#
# Requirements: curl, xidel
# Usage:
# ./plaintext-convert.sh [{filename}]
#
# The filename is optional. If specified, the specified file will be
# read. If not, then stdin will be used instead.
#
# Examples:
# ./plaintext-convert.sh <path/to/file >list.html
# ./plaintext-convert.sh urls.txt >urls.html
#
##############
date=$(date +%s)
egrep --only-matching 'http(s)?\://[^ "\*\*"]*' <"${1:-/dev/stdin}" | while read pageurl; do
date=$(( $date + 1 ))
# Extract webpage title
pagetitle=$(curl "${pageurl}" -Ss | xidel --data - --css "title" --quiet)
if [ "$pagetitle" = "" ]; then pagetitle="$pageurl"; fi
echo "<dt><a href=\"$pageurl\" add_date=\"$date\">$pagetitle</a></dt>";
echo "[info] Processing $pageurl" >&2;
done;
echo "[info] Completed" >&2;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment