Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Date: Thu, 01 Jan 1970 00:00:00 +0000 (UTC)
From: ABC <abc@example.com>
To: XYZ <xyz@example.com>
Subject:
=?UTF-8?Q?=E7=BB=9F=E4=B8=80=E7=A0=81?=
=?UTF-8?Q?=E5=8F=98=E6=88=90=E4=B9=B1=E7=A0=81?=
=?us-ascii?Q?_removing_this_part_fixes_the_mojibake?=
MIME-Version: 1.0
Content-Type: text/plain;
email body
@lionel-rowe
Copy link
Author

lionel-rowe commented Feb 28, 2022

To reproduce, download the file and open it in Outlook, displaying the mangled subject line "g; d8". For comparison, opening it in Gmail or other webmail clients displays the correct full subject line: "统一码变成乱码 removing this part fixes the mojibake".

In addition, upon forwarding or replying to such emails via Outlook, the subject line becomes permanently corrupted and can no longer be recovered, even in webmail clients.

Note that changing the order of the encoded lines such that the us-ascii one is the first one also fixes the problem. This suggests that Outlook is simply grepping the name of the last encoding specified and treating that as the encoding for the entire subject line.

@lionel-rowe
Copy link
Author

lionel-rowe commented Feb 28, 2022

Explanation of the corrupted "g; d8" subject line:

Due to assumed us-ascii encoding, Outlook thinks it can safely ignore the most-significant bit of each byte. The first 6 bytes, which should be e7 bb 9f e4 b8 80, are thus interpreted as 67 3b 1f 64 38 00 instead. 00, the null byte, is interpreted as terminating the string, so the rest gets truncated. Meanwhile, 67 3b 1f 64 38 correspond to the characters g ; \x1f d 8, of which \x1f is a control character that gets rendered as a space.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment