Skip to content

Instantly share code, notes, and snippets.

@merrilymeredith
Created December 20, 2018 01:58
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save merrilymeredith/1bf08c5755a826eecfa7687d378aedc3 to your computer and use it in GitHub Desktop.
Save merrilymeredith/1bf08c5755a826eecfa7687d378aedc3 to your computer and use it in GitHub Desktop.
markdown dump of all posts on a tag on a particular blog, with other tags saved too
#!/usr/bin/env perl
use 5.010;
use warnings;
use strict;
use utf8::all;
use HTML::FormatMarkdown;
use Mojolicious 8.02;
sub u { Mojo::URL->new($_[0]) }
my $blog = "some-blog";
my $tag = "a tag";
my $separator = '=' x 60;
my $url = u("http://${blog}.tumblr.com/tagged/" . $tag =~ y/ /-/r);
my $ua = Mojo::UserAgent->new;
while ($url) {
my $dom = $ua->get($url)->result->dom;
$dom->find('article.post')->each(sub {
say $separator;
say get_tags($_) . "\n";
say HTML::FormatMarkdown->format_string(
$_->at('div.post-content')->content,
rightmargin => 9999999,
) . "\n\n";
});
if (my $next = $dom->at('a#next')) {
$url = u($next->attr('href'))->base($url)->to_abs;
}
else {
undef $url;
}
exit;
}
sub get_tags {
my ($dom) = @_;
$dom->find('a.tag')
->map('content')
->map(\&Mojo::Util::html_unescape)
->grep(sub { $_ ne $tag })
->join(', ');
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment