Skip to content

Instantly share code, notes, and snippets.

@kmizu
Forked from xuwei-k/README.md
Created September 8, 2012 02:55
Show Gist options
  • Save kmizu/3671461 to your computer and use it in GitHub Desktop.
Save kmizu/3671461 to your computer and use it in GitHub Desktop.
Scala language page pdf all download script

Scala公式ページの論文一覧のページのhtmlをパースして、自動でpdfのファイルっぽいものを全部ダウンロードするだけのものです

object Main {
def main(args : Array[String]){
getURLList("http://www.scala-lang.org/node/143").map{ url =>
try {
save( url , url.split("/").last )
} catch {
case e: java.io.IOException =>
System.err.println("Failed to save: " + url)
e.printStackTrace()
}
}
}
def getURLList(url:String):List[String] = {
val html = io.Source.fromURL(url,"UTF-8").mkString
val reg = """(http://[\w\.\~\-\/\?\&\+\=\:\@\%\#]*?\.pdf)""".r
reg.findAllIn(html).toList
}
def save(url:String,fileName:String){
System.err.println("Downloading " + url)
val in = new java.io.BufferedInputStream( new java.net.URL(url).openStream )
val out = new java.io.PrintStream(fileName )
val b = new Array[Byte](8192);
var i = 0
while ( { i = in.read(b); i } >= 0) {
out.write(b,0,i)
}
out.close
in.close
System.err.println("Done")
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment