Skip to content

Instantly share code, notes, and snippets.

@bubnenkoff
Last active August 29, 2015 14:13
Show Gist options
  • Save bubnenkoff/45d485949a6954baa973 to your computer and use it in GitHub Desktop.
Save bubnenkoff/45d485949a6954baa973 to your computer and use it in GitHub Desktop.
Extract URLs from web page
import std.net.curl, std.stdio;
import std.algorithm, std.regex;
void main() {
get("http://www.stroustrup.com/C++.html")
.matchAll(`<a.*?href="(.*)"`)
.map!(m => m[1])
.each!writeln();
}
---
import arsd.dom;
import std.net.curl;
import std.stdio, std.algorithm;
void main() {
auto document = new Document(cast(string)
get("http://www.stroustrup.com/C++.html"));
writeln(document.querySelectorAll("a[href]").map!(a=>a.href));
}
---
prints:
[snip ... "http://www.morganstanley.com/",
"http://www.cs.columbia.edu/", "http://www.cse.tamu.edu",
"index.html", "C++.html", "bs_faq.html", "bs_faq2.html",
"C++11FAQ.html", "papers.html", "4th.html", "Tour.html",
"programming.html", "dne.html", "bio.html", "interviews.html",
"applications.html", "glossary.html", "compilers.html"]
Or perhaps better yet:
import arsd.dom;
import std.net.curl;
import std.stdio;
void main() {
auto document = new Document(cast(string)
get("http://www.stroustrup.com/C++.html"));
foreach(a; document.querySelectorAll("a[href]"))
writeln(a.href);
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment