Skip to content

Instantly share code, notes, and snippets.

@masonmark
Created October 4, 2011 01:53
Show Gist options
  • Save masonmark/1260734 to your computer and use it in GitHub Desktop.
Save masonmark/1260734 to your computer and use it in GitHub Desktop.
EUC-encoded Japanese issue debugging
# encoding: utf-8
#
# This is a test script to illustrate a problem with
# the PhantomJS 1.3 static Mac binary.
#
# Using the binary download, this script will fail.
# The Japanese text contained in the EUC-encoded test
# page is not rendered properly, in the console, the
# page content saved to disk, or the rendered image.
#
# Installing the Qt 4.7.4 SDK and then building PhantomJS
# 1.3 myself causes the script to succeed, and the
# correct Japanese to appear in the console, the content
# written to disk, and the rendered image.
#
# Hopefully this will be useful for reproducing the issue.
#
# I am using Mac OS X 10.7.1.
#
# (mason 2011-10-04)
p = new WebPage()
addr = "http://masonmark.com/stuff/euc/"
p.open addr, (status)->
console.log "☆ LOAD STATUS: #{status}"
x = p.evaluate ->
document.title
console.log "☆ PAGE TITLE:#{x}"
console.log "☆ PAGE CONTENT: #{p.content}"
fs = require "fs"
fs.write "/Users/Shared/euc-test-page-content.txt", p.content, "w"
p.render "/Users/Shared/euc-test-page-content.png"
# 1. Manually check that page-content.txt
# contains UTF-8 encoded HTML in Japanese.
# 2. Manually check that page-content.png
# contains an image showing rendered Japanese.
if p.content.indexOf("日本語") is -1
console.log "★★★★★ FAILED: the page content does not contain '日本語'"
phantom.exit 5
else
console.log "☆☆☆☆☆ SUCCESS: w00t w00t"
phantom.exit 0
[mason@IT-PC-MACPRO-01 dna2]$ echo 'First let us use the PhantomJS 1.3 binary download:'First let us use the PhantomJS 1.3 binary download:
[mason@IT-PC-MACPRO-01 dna2]$ phantomjs euc_test.coffee
☆ LOAD STATUS: success
☆ PAGE TITLE:ÆüËܸì - Japanese test
☆ PAGE CONTENT: <html><head>
<meta http-equiv="Content-Type" content="text/html; charset=euc-jp">
<title>ÆüËܸì - Japanese test</title>
</head><body>
<p>ÆüËܸì¤Î¥Æ¥­¥¹¥È¡£</p>
<p>The text above should be in Japanese.</p>
</body></html>
★★★★★ FAILED: the page content does not contain '日本語'
[mason@IT-PC-MACPRO-01 dna2]$
[mason@IT-PC-MACPRO-01 dna2]$ echo 'Now let us try with a freshly compiled 1.3 phantom:'Now let us try with a freshly compiled 1.3 phantom:
[mason@IT-PC-MACPRO-01 dna2]$ /Users/mason/Code/third_party_code/phantomjs/bin/phantomjs euc_test.coffee
☆ LOAD STATUS: success
☆ PAGE TITLE:日本語 - Japanese test
☆ PAGE CONTENT: <html><head>
<meta http-equiv="Content-Type" content="text/html; charset=euc-jp">
<title>日本語 - Japanese test</title>
</head><body>
<p>日本語のテキスト。</p>
<p>The text above should be in Japanese.</p>
</body></html>
☆☆☆☆☆ SUCCESS: w00t w00t
[mason@IT-PC-MACPRO-01 dna2]$
[mason@IT-PC-MACPRO-01 dna2]$ echo 'Now let us try with a freshly compiled 1.3 phantom:'Now let us try with a freshly compiled 1.3 phantom:
[mason@IT-PC-MACPRO-01 dna2]$ /Users/mason/Code/third_party_code/phantomjs/bin/phantomjs euc_test.coffee
☆ LOAD STATUS: success
☆ PAGE TITLE:日本語 - Japanese test
☆ PAGE CONTENT: <html><head>
<meta http-equiv="Content-Type" content="text/html; charset=euc-jp">
<title>日本語 - Japanese test</title>
</head><body>
<p>日本語のテキスト。</p>
<p>The text above should be in Japanese.</p>
</body></html>
☆☆☆☆☆ SUCCESS: w00t w00t
[mason@IT-PC-MACPRO-01 dna2]$
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment