Skip to content

Instantly share code, notes, and snippets.

@rhzs
Last active August 8, 2018 00:56
Show Gist options
  • Save rhzs/310c4dbbfd99d179441b to your computer and use it in GitHub Desktop.
Save rhzs/310c4dbbfd99d179441b to your computer and use it in GitHub Desktop.
Very Simple Example Java/Groovy and Jsoup library -- Crawler for Zalora Indonesia (e-commerce site)
// Download JSOUP library
@Grab('org.jsoup:jsoup:1.7.1')
// Connect and get Zalora URL via Jsoup
def doc = org.jsoup.Jsoup.connect("http://www.zalora.co.id/women/pakaian/dress/").get()
// Since the page loaded using AJAX we can't just crawl the CSS tag.
def script = doc.select("script")
def p = java.util.regex.Pattern.compile(/(?is)app.settings =(.*)app.i18n/);
def m = p.matcher(script.html());
while( m.find() ) {
println m.group(); // the whole key --- All products is in JSON data in 'initialData' key
}
// Author: Rheza Satria (2015), Semarang Indonesia
// Very Simple Example Java/Groovy and Jsoup -- Crawler for Lazada Indonesia (e-commerce site)
//
// Purpose:
// Lazada Indonesia web and data crawler for product catalog
// Run in terminal:
// groovy zalora_jsoup.groovy
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment