public
Created

Extract MathML from GIF image generated by MathType(Updated)

  • Download Gist
extractMathMLfromGIF.pl
Perl
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52
# Reference: http://stackoverflow.com/questions/6599781/how-can-i-convert-mathtype-equation-into-mathml-format
# Thanks afwings
 
################################################################################################
# GIF Image Files
#
# MathML text is embedded into a GIF file as an Application Extension Record,
# which consists of a 14-byte header (Application Extension Descriptor),
# followed by the MTEF data. The header contains:
#
# Byte Introducer = 0x21;
# Byte ExtensionLabel = 0xFF;
# Byte BlockSize = 0x0B;
# Byte ApplicationId[8] = "MathType";
# Byte AuthenticationCode[3] = "003";
#
# The data follows this header and is written as a series of blocks each containing 255 bytes or less.
# Each block starts with a single byte count followed by the data.
# The end is marked as a block with length 0.
#
# The header is unique enough that the easiest way to extract the data might be to
# scan the file for the 14-byte header, then expect the MathML data blocks to follow.
# Properly decoding the GIF records isn't that hard either, but obviously requires
# you read the GIF specification.
#
################################################################################################
 
my $math = ReadFromFile($inputFile);
 
my $mathTypeId = "MathType003";
($math =~ m/\x{21}\x{FF}(\C)$mathTypeId(.*)/gs);
my $blockSize = ord($1);
my $remain = $2;
 
length($mathTypeId) == $blockSize || print "Block size unmatched!\n";
 
$blockSize = ord((split //, $remain, 1)[0]);
 
my $result = "";
while($blockSize != 0)
{
if (!($remain =~ m/\C(\C{$blockSize})(.*)/sg))
{
print "\nBlock size is NOT correct.\n";
last;
}
$result .= $1;
$remain = $2;
$blockSize = ord((split //, $remain, 1)[0]);
}
 
print "RESULT: $result\n";

Please sign in to comment on this gist.

Something went wrong with that request. Please try again.