Navigation Menu

Skip to content

Instantly share code, notes, and snippets.

@ecampidoglio
Last active October 24, 2018 12:48
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save ecampidoglio/9219129 to your computer and use it in GitHub Desktop.
Save ecampidoglio/9219129 to your computer and use it in GitHub Desktop.
A Bash script that downloads the HTML source from a list of URLs and converts it to Markdown. The source file must contain each URL to download from on a separate line. The actual conversion is done by the awesome API available at http://heckyesmarkdown.com
#!/bin/bash
# htmltomd
# Downloads the HTML source from a list of URLs and converts it to Markdown.
# The source file must contain each URL to download from on a separate line.
# The actual conversion is done by the awesome API available at
# http://heckyesmarkdown.com
urlsFile=$1
outputDir=${2:-.} # Defaults to local dir
while read line
do
url=$line
mdFileName=$(echo ${url#*/*.*/} | sed "s/\//\-/g")
echo "${url} => ${outputDir}/${mdFileName}.md"
curl --progress-bar --data-urlencode "u=${url}" --data "read=1&md=1" http://heckyesmarkdown.com/go/ -o $outputDir/${mdFileName}.md
done < $urlsFile
# Example:
# htmltomd.sh urls.txt ./posts
# http://foo.com/2014/02/24/bar => ./posts/2014-02-24-bar.md
# http://foo.com/2014/02/25/baz => ./posts/2014-02-25-baz.md
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment