This drives me crazy, and I see something like it in almost every literary document corpus:
<blockquote>
<i>
<q data-eblatype="startmarker" data-eblasegid="38073">[</q>
Act I, Scene 3
<q data-eblatype="endmarker" data-eblasegid="38073">]</q>
</i>
</blockquote>
The intent here is obvious: they're trying to mark parts of the text with an ID, but instead of just using XML — which is already hylomorphic — to wrap a single tag around the intended inclusion, like this:
<blockquote>
<i data-eblasegid="38073">
<q>[</q>Act I, Scene 3<q>]</q>
</i>
</blockquote>
... they insert pairs of start/end tags, which means the consumer of the data must implement a little parser to put the text into buckets by ID. Also, why the square braces at all?
the markers are def. not necessary. some other decisions on the format are outlined here. most importantly, segments (the entities mapped by id) can potentially self-overlap, which is in conflict with XML’s specs afaik.