Skip to content

Instantly share code, notes, and snippets.

@azumakuniyuki
Last active May 2, 2024 07:14
Show Gist options
  • Save azumakuniyuki/a44b4d2391e4d7ec620cf906959a9c30 to your computer and use it in GitHub Desktop.
Save azumakuniyuki/a44b4d2391e4d7ec620cf906959a9c30 to your computer and use it in GitHub Desktop.
Find multibyte characters from the specified files
#!/usr/bin/env perl
# macOSの/usr/bin/grepでマルチバイト文字の検索が上手くいかないのでPerlでどうにかする
use strict;
use warnings;
use IO::File;
die sprintf("Usage: %s file1 [file2 [file3 ...]]\n", $0) unless @ARGV;
for my $fn ( @ARGV ) {
# ファイルが無い・読めない・空の場合は無視する
next unless -f $fn;
next unless -r $fn;
next unless -s $fn;
my $io = IO::File->new($fn, 'r') || next;
my $la = sprintf("- %s", $fn);
my $ln = 0;
my $cv = [];
while( my $e = $io->getline ) {
# Find any multibyte characters
$ln++;
chomp $e; next unless $e;
next if $e =~ /^[\x00-\x7e]+$/;
push @$cv, sprintf(" - %04d: %s", $ln, $e);
}
$io->close;
next unless @$cv;
printf("%s\n", $la);
printf("%s\n", $_ ) for @$cv;
}
@azumakuniyuki
Copy link
Author

% ~/bin/find-multibyte-characters README.md LICENSE Makefile utf8-flag.pl LICENSE
- README.md
  - 0023: - [**README-JA(日本語)**](README-JA.md)
  - 0450: * [README-JA.md - README.md in Japanese(日本語)](https://github.com/sisimai/p5-sisimai/blob/master/README-JA.md)
- utf8-flag.pl
  - 0005: my $p1 = 'ネコちゃん';
  - 0006: my $p2 = 'ネコちゃん'; utf8::decode $p2;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment