Skip to content

Instantly share code, notes, and snippets.

@Stantheman
Last active August 29, 2015 14:12
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Stantheman/5079b4d832f010ed5a4b to your computer and use it in GitHub Desktop.
Save Stantheman/5079b4d832f010ed5a4b to your computer and use it in GitHub Desktop.
use xslt to parse html instead of one liners
# links.html came from:
# http://taint.org/2014/12/27/235802a.html
ubuntu:~/xslt$ xsltproc --html transform.xslt links.html
http://www.newyorker.com/business/currency/airlines-want-you-to-suffer
http://hackaday.com/2014/12/25/writing-a-virtual-machine-in-excel/
<xsl:stylesheet version = '1.0' xmlns:xsl='http://www.w3.org/1999/XSL/Transform'>
<xsl:output method="text"/>
<xsl:template match="/">
<xsl:for-each select="//div[@class='entry-content']/ul//li/p/a[@class='deliciouslink']">
<xsl:value-of select="@href"/>
<xsl:text>&#xa;</xsl:text>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment