Skip to content

Instantly share code, notes, and snippets.

@umbalaconmeogia
Created May 2, 2022 03:34
Show Gist options
  • Save umbalaconmeogia/96b80a97165e37cde330234a33975dc0 to your computer and use it in GitHub Desktop.
Save umbalaconmeogia/96b80a97165e37cde330234a33975dc0 to your computer and use it in GitHub Desktop.
Convert a file with furigana in parentheses (mainly exported from MS Word to text) to HTML file.
<?php
/**
* Convert a text file that has furigana in parentheses into HTML <ruby> tag.
*
* Syntax
* ```shell
* php furiganaTxt2Html.php <input.txt> <output.html>
* ```
*
* Example:
* ```shell
* php furiganaTxt2Html.php input.txt output.html
* ```
* where input.txt is as bellow
* ```txt
* 私(わたし)は今⽇(きょう)から、新(あたら)しい⼈⽣(じんせい)を始(はじ)める。
* ```
* the output.html is as below
* ```html
* <ruby>私<rt>わたし</rt></ruby>は<ruby>今⽇<rt>きょう</rt></ruby>から、<ruby>新<rt>あたら</rt></ruby>しい<ruby>⼈⽣<rt>じんせい</rt></ruby>を<ruby>始<rt>はじ</rt></ruby>める。
* ```
*/
// Get $input, $output from command line parameters.
if (count($argv) != 3) {
echo "Wrong parameters.\nSyntax:\n php furiganaTxt2Html.php <input.txt> <output.html>";
die;
}
list($dummy, $input, $output) = $argv;
// Open $input, $output files.
$fin = fopen($input, 'r');
$fout = fopen($output, 'w');
// Read input file, process and write to output file.
fputs($fout, '<!DOCTYPE html>');
fputs($fout, "<html>\n<body>");
while (($row = fgets($fin)) !== false) {
$row = rubyFormat($row);
fputs($fout, $row);
}
fputs($fout, "</body>\n</html>");
// Close files.
fclose($fin);
fclose($fout);
echo "DONE.\n";
/**
* @param string $row Text to be converted to use <ruby> tag.
* @return string Formatted HTML.
*/
function rubyFormat($row)
{
$pattern = "/([\p{Han}0-90-9]+)\(([^)]+)\)/u"; // About \p{Han}: https://tama-san.com/kanji-regex/
$replacement = '<ruby>$1<rt>$2</rt></ruby>';
$row = preg_replace($pattern, $replacement, $row);
$row = trim($row);
$row = "<p>$row</p>\n";
return $row;
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment