Skip to content

Instantly share code, notes, and snippets.

@xuwei-k
Created June 30, 2011 08:38
Show Gist options
  • Save xuwei-k/1055869 to your computer and use it in GitHub Desktop.
Save xuwei-k/1055869 to your computer and use it in GitHub Desktop.
Scala language page pdf all download script

Scala公式ページの論文一覧のページのhtmlをパースして、自動でpdfのファイルっぽいものを全部ダウンロードするだけのものです

object Main {
def main(args : Array[String]){
getURLList("http://www.scala-lang.org/node/143").map{ url =>
save( url , url.split("/").last )
}
}
def getURLList(url:String):List[String] = {
val html = io.Source.fromURL(url,"UTF-8").mkString
val reg = """(http://[\w\.\~\-\/\?\&\+\=\:\@\%\#]*?\.pdf)""".r
reg.findAllIn(html).toList
}
def save(url:String,fileName:String){
val in = new java.io.BufferedInputStream( new java.net.URL(url).openStream )
val out = new java.io.PrintStream(fileName )
val b = new Array[Byte](8192);
var i = 0
while ( { i = in.read(b); i } >= 0) {
out.write(b,0,i)
}
out.close
in.close
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment