Skip to content

Instantly share code, notes, and snippets.

@tokubass
Created December 14, 2011 12:32
Show Gist options
  • Save tokubass/1476400 to your computer and use it in GitHub Desktop.
Save tokubass/1476400 to your computer and use it in GitHub Desktop.
rubi is removed from aozora.gr.jp
#!/usr/bin/env perl
use strict;
use utf8;
use File::Basename;
for my $argv (@ARGV){
my $file_name = basename($argv,'.txt');
open my $fh_in, "<:encoding(shiftjis)", $argv or die "$argv : $!";
open my $fh_out, ">:encoding(shiftjis)", $file_name.'_NoRubi.txt' or die "$file_name : $!";
while(<$fh_in>){
s{ (?:《[^》]+》) | [|] }{}gmxs;
print {$fh_out} $_;
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment