Skip to content

Instantly share code, notes, and snippets.

@kpengboy
Last active June 19, 2022 10:30
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save kpengboy/08052d56467347ac9f70827aa66ff3db to your computer and use it in GitHub Desktop.
Save kpengboy/08052d56467347ac9f70827aa66ff3db to your computer and use it in GitHub Desktop.
Fix engNET2016eb formatting

The version of the NET from ebible.org has some formatting problems, notably that there are random |strong=Hxxx strings appended to certain words that should be Strong's annotations on those words instead. I wrote a Perl script to fix all of those formatting problems.

Instructions as of version 32.19 (2022-06-13).

  1. Download engNET2016eb modules.zip and unzip it somewhere.
  2. Save the following script in sub.pl:
use strict;
use warnings;

sub divine_name_repl {
	if (defined $2) {
		return "<seg><divineName><w lemma=\"strong:${3}\">$1</w></divineName></seg>";
	} else {
		return "<seg><divineName>$1</divineName></seg>";
	}
}

sub ot_passage_repl {
	if (defined $3) {
		return "<seg type=\"otPassage\"><w lemma=\"strong:${4}\">$1$2</w></seg>";
	} else {
		return "<seg type=\"otPassage\">$1$2</seg>";
	}
}

sub italic_repl {
	if (defined $2) {
		return "<hi type=\"italic\"><w lemma=\"strong:${3}\">$1</w></hi>";
	} else {
		return "<hi type=\"italic\">$1</hi>";
	}
}

while (<>) {
	# Fix |strong= after a <seg><divineName>
	s!<seg><divineName>Lord</divineName></seg>\|strong="H(4347|1897)"!<seg><divineName>Lord</divineName></seg>!g;
	s!<seg><divineName>(.*?)</divineName></seg>(\|strong="(\w+)")?!divine_name_repl!ge;
	s!<seg><divineName>Lord</divineName></seg>’s\|strong="H5227"!<seg><divineName><w lemma="strong:H3068">Lord’s</w></divineName></seg>!g;
	s!<seg><divineName>Lord</divineName></seg>’s\|strong="(\w+)"!<seg><divineName><w lemma="strong:$1">Lord’s</w></divineName></seg>!g;

	# Fix |strong= after a <seg type="otPassage">
	s!<seg type="otPassage">(.*?)</seg>([*:]?)(\|strong="(\w+)")?!ot_passage_repl!ge;

	# Fix |strong= after <hi type="italic">
	s!<hi type="italic">‘arbeh</hi>-locust\|strong="H1501"!<hi type="italic">‘arbeh</hi>-<w lemma="strong:H1501">locust</w>!;
	s!<hi type="italic">(.*?)</hi>(\|strong="(\w+)")!italic_repl!ge;
 
 	# Fix |strong= after barewords
	s!([^> \t\r\n\f]+)\|strong="(\w+)"!<w lemma="strong:$2">$1</w>!g;

	# Fix |strong= after a space
	s!(<seg><divineName>Lord</divineName></seg> )?<seg><divineName>Lord</divineName></seg> \|strong="H3068"!<seg><divineName><w lemma="strong:H3068">Lord</w></divineName></seg>!;
	s!<seg type="otPassage"> <hi type="italic">the God</hi></seg> \|strong="G2316"!<seg type="otPassage"><hi type="italic">the <w lemma="strong:G2316">God</w></hi></seg>!;
	s!<hi type="italic">peres</hi> \|strong="H6537"!<hi type="italic"><w lemma="strong:H6537">peres</w></hi>!;
	s! \|strong="\w+"!!g;

	print;
}
  1. Run mkdir new
  2. Run mod2imp engNET2016eb | perl sub.pl > new/engNET2016ebmod.imp
  3. Run mkdir new/mods.d && cp mods.d/engNET2016eb.conf new/mods.d/engNET2016ebmod.conf
  4. Run cd new
  5. Edit mods.d/engNET2016ebmod.conf:
    1. Replace engNET2016eb with engNET2016ebmod wherever it occurs
    2. Set the Abbreviation= to NETmod (to distinguish it from the vanilla distribution)
  6. Run mkdir -p modules/texts/ztext/engNET2016ebmod/
  7. Run imp2vs engNET2016eb-mod.imp -z z -o modules/texts/ztext/engNET2016ebmod/
  8. Create a zipfile containing the mods.d and modules directories.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment