Created
May 2, 2022 03:34
-
-
Save umbalaconmeogia/96b80a97165e37cde330234a33975dc0 to your computer and use it in GitHub Desktop.
Convert a file with furigana in parentheses (mainly exported from MS Word to text) to HTML file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<?php | |
/** | |
* Convert a text file that has furigana in parentheses into HTML <ruby> tag. | |
* | |
* Syntax | |
* ```shell | |
* php furiganaTxt2Html.php <input.txt> <output.html> | |
* ``` | |
* | |
* Example: | |
* ```shell | |
* php furiganaTxt2Html.php input.txt output.html | |
* ``` | |
* where input.txt is as bellow | |
* ```txt | |
* 私(わたし)は今⽇(きょう)から、新(あたら)しい⼈⽣(じんせい)を始(はじ)める。 | |
* ``` | |
* the output.html is as below | |
* ```html | |
* <ruby>私<rt>わたし</rt></ruby>は<ruby>今⽇<rt>きょう</rt></ruby>から、<ruby>新<rt>あたら</rt></ruby>しい<ruby>⼈⽣<rt>じんせい</rt></ruby>を<ruby>始<rt>はじ</rt></ruby>める。 | |
* ``` | |
*/ | |
// Get $input, $output from command line parameters. | |
if (count($argv) != 3) { | |
echo "Wrong parameters.\nSyntax:\n php furiganaTxt2Html.php <input.txt> <output.html>"; | |
die; | |
} | |
list($dummy, $input, $output) = $argv; | |
// Open $input, $output files. | |
$fin = fopen($input, 'r'); | |
$fout = fopen($output, 'w'); | |
// Read input file, process and write to output file. | |
fputs($fout, '<!DOCTYPE html>'); | |
fputs($fout, "<html>\n<body>"); | |
while (($row = fgets($fin)) !== false) { | |
$row = rubyFormat($row); | |
fputs($fout, $row); | |
} | |
fputs($fout, "</body>\n</html>"); | |
// Close files. | |
fclose($fin); | |
fclose($fout); | |
echo "DONE.\n"; | |
/** | |
* @param string $row Text to be converted to use <ruby> tag. | |
* @return string Formatted HTML. | |
*/ | |
function rubyFormat($row) | |
{ | |
$pattern = "/([\p{Han}0-90-9]+)\(([^)]+)\)/u"; // About \p{Han}: https://tama-san.com/kanji-regex/ | |
$replacement = '<ruby>$1<rt>$2</rt></ruby>'; | |
$row = preg_replace($pattern, $replacement, $row); | |
$row = trim($row); | |
$row = "<p>$row</p>\n"; | |
return $row; | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment