The importAsNode(InputStream, boolean)
method of the XmlDomNodeImporterImpl
class, parses the provided InputStream
into a XML document, using a DocumentBuilder
. Under the hood, the InputStream
is wrapped as an InputSource
, whose encoding is unknown - we haven't created the InputSource
ourselves, and neither is an explicit encoding specified for the InputSource
using the setEncoding
method. When the input stream contains umlauts encoded in ISO-8859-1, the parser (in-built Xerces of the Oracle/Sun JRE) incorrectly attempts reading them as UTF-8; see the bug report for the parser behavior.
Going by the API documentation for the InputSource
class, the solution is to either use an InputSource
with an underlying character stream, or to specify the encoding for the byte stream.
This would require introduction of anothe