Skip to content

Instantly share code, notes, and snippets.

@emres
Created October 31, 2014 14:59
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
  • Save emres/f0f4afbb75562335063c to your computer and use it in GitHub Desktop.
Save emres/f0f4afbb75562335063c to your computer and use it in GitHub Desktop.
Parallelized article downloader in Scala
import rapture._
import core._, io._, net._, uri._, codec._
import scala.util.matching.Regex
import java.nio.file.{Paths, Files}
import java.nio.charset.StandardCharsets
object ArticleDownloader extends App {
import encodings.`UTF-8`
val src = uri"http://www.standaard.be/nieuws/chronologisch".slurp[Char]
val pattern = new Regex("(http://www.standaard.be/cnt/dmf.*?)\"")
val urls = for (x <- pattern.findAllIn(src).matchData.toList) yield x.group(1)
// let us take advantage of Parallel Collections, simply by
// adding a .par to our list
for (url <- urls.par) {
val contents = Http.parse(url).slurp[Char]
val fileName = "/home/emre/tmp/" + url.split("/").last
Files.write(Paths.get(fileName), contents.getBytes())
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment