Skip to content

Instantly share code, notes, and snippets.

@michalmela
Last active January 31, 2020 09:06
Show Gist options
  • Save michalmela/80bc4d24d10d5e1b15c3 to your computer and use it in GitHub Desktop.
Save michalmela/80bc4d24d10d5e1b15c3 to your computer and use it in GitHub Desktop.
[groovy scraping] Web scraping with groovy & tagsoup
@Grapes( @Grab('org.ccil.cowan.tagsoup:tagsoup:1.2') )
def PARSER = new XmlSlurper(new org.ccil.cowan.tagsoup.Parser() )
(0..13).each {
it -> new URL("https://www.diki.pl/slownik-angielskiego/?q=c%3Aslownik-terminow-prawnicznych&categoryPage=${it*100}").withReader {
reader ->
def document = PARSER.parse(reader)
document.'**'.findAll{ it['@class'] == 'fentry'}.each {
fentry ->
println (fentry.toString().replaceAll('...znacze.?.?.?','').replaceAll('\n','').replaceAll(' +',' '))
}
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment