Skip to content

Instantly share code, notes, and snippets.

@wololock
Last active August 29, 2015 14:07
Show Gist options
  • Save wololock/719985e6c48f40f8935f to your computer and use it in GitHub Desktop.
Save wololock/719985e6c48f40f8935f to your computer and use it in GitHub Desktop.
Simple Jsoup skip anchor and empty links example
import org.jsoup.Jsoup
import org.jsoup.nodes.Document
@Grab(group='org.jsoup', module='jsoup', version='1.8.1')
def html = '''
<div id="something">
<a href="#home">This will be skipped</a>
<a href="/page/test#header">This one is ok</a>
<a href="aboutus.html#friends">This is also ok</a>
<a href="#top">This one is not ok</a>
<a href="">Empty link that need to be skipped</a>
<a href="/">Home page link, need to be skipped</a>
<a href="a">This is ok</a>
<a href="ab">This is ok</a>
</div>
'''
Document document = Jsoup.parse(html)
assert document.select('a:not([href^=#]')
.select('a[href~=^/?[^/]+]')
.size() == 4
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment