Skip to content

Instantly share code, notes, and snippets.

@terrancesnyder
Created December 24, 2012 00:25
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save terrancesnyder/4366869 to your computer and use it in GitHub Desktop.
Save terrancesnyder/4366869 to your computer and use it in GitHub Desktop.
Bug with pentaho output servlet
var out = _step_.getTrans().getServletPrintWriter();
out.println("<H1>");
out.println("Hello World!");
out.println("ときょ 東京 コーヒー");
out.println("</H1>");
@terrancesnyder
Copy link
Author

The above will yield and incorrect response with the japanese characters transposed to ????? as if UTF-8 encoding is not set on the print writer in pentaho's output stream... Tried setting in environment variable but this still failed.

<H1>
Hello World!
???????????
</H1>

This should yield;

<H1>
Hello World!
ときょ 東京 コーヒー
</H1>

Logged defect in pentaho:
http://jira.pentaho.com/browse/PDI-6123

@terrancesnyder
Copy link
Author

I would expect the below would be the correct way to create the print writer which is UTF8 aware.

// Get a writer to write the data in UTF-8
res.setContentType("text/html; charset=UTF-8");
out = new PrintWriter(new OutputStreamWriter(res.getOutputStream(), "UTF8"), true);

@terrancesnyder
Copy link
Author

Found a hack to work around this problem, the print writer returned by the call is a jetty HTTPConnection$1 which has a private field for the underlying HttpConnection object. Using some hacky reflection we can grab the HttpConnection object and then force the content type and ask for a print writer that actually is in UTF-8 format.

// hack to get access to raw print writer from jetty
var out = _step_.getTrans().getServletPrintWriter();
var f = out.getClass().getDeclaredField("this$0");
f.setAccessible(true);

// force output stream to UTF-8
var httpConnection = f.get(out);
httpConnection.getResponse().setContentType("application/octet-stream; charset=UTF-8");
httpConnection.getResponse().addHeader("Content-Disposition","attachment;filename=out.txt");

out = httpConnection.getPrintWriter("UTF-8");
out.println("kanji = " + kanji);
out.println("hiragana = " + hiragana);
out.println("katakana = " + katakana);
out.println("romanji = " + romanji);

@mjmaax
Copy link

mjmaax commented Mar 1, 2014

Hello

I have the same problem with Kettle Data integration interface.
I try to extract data from an XML files and to insert them into table of a database.
For this, I wrote a transformation which contains 2 steps linked to each other :

  • step1 : extract data from XML
  • step2 : insert data into a table
    I mapped the fields of the flow (coming from xml) to the fields of the table.When I execute the transformation, the records are inserted into the table but all japanese characters are replaced by '?', just like you posted above.
    Since this is using the graphical interface, I have no idea on how to use your hack to fix this problem. Would you have an idea?

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment