Skip to content

Instantly share code, notes, and snippets.

@kui
Last active September 9, 2019 04:15
Show Gist options
  • Star 6 You must be signed in to star a gist
  • Fork 3 You must be signed in to fork a gist
  • Save kui/6c17e82d733f1d95ffe7 to your computer and use it in GitHub Desktop.
Save kui/6c17e82d733f1d95ffe7 to your computer and use it in GitHub Desktop.
a web scraping script with Dart and html5lib
import 'dart:io';
import 'dart:async';
import 'package:html5lib/parser.dart';
import 'package:html5lib/dom.dart';
main() {
final url = 'http://comic-walker.com/';
getHtml(url).then((document) {
// page title
print(document.querySelector('title').text);
// Newer comics
document.querySelectorAll('#bookList > li').forEach((e) {
print(e.querySelector('.list_bookName').text);
});
});
}
/// fetch and parse the HTML from [url]
Future<Document> getHtml(String url) =>
new HttpClient()
.getUrl(Uri.parse(url))
.then((req) => req.close())
.then((res) => res
.asyncExpand((bytes) => new Stream.fromIterable(bytes))
.toList())
.then((bytes) => parse(bytes, sourceUrl: url));
@mloureiro
Copy link

mloureiro commented Jan 27, 2018

Package html5lib is now olny html (source)

@bawantha
Copy link

bawantha commented Sep 9, 2019

Does this provide client side scraping

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment