Skip to content

Instantly share code, notes, and snippets.

@z80oolong
Last active May 17, 2019 10:46
Show Gist options
  • Save z80oolong/606a36ba7d1b456e60b30db17fdc8c54 to your computer and use it in GitHub Desktop.
Save z80oolong/606a36ba7d1b456e60b30db17fdc8c54 to your computer and use it in GitHub Desktop.
mutt 1.9.0 以降において East Asian Ambiguous Character を全角文字の幅で表示する

mutt 1.9.0 以降において East Asian Ambiguous Character を全角文字の幅で表示する

概要

mutt 1.9.0 以降において、 Unicode の規格における東アジア圏の各種文字のうち、いわゆる "◎" や "★" 等の記号文字及び罫線文字等、 East_Asian_Width 特性の値が A (Ambiguous) となる文字 (以下、 East Asian Ambiguous Character) が、日本語環境で文字幅を適切に扱うことが出来ない問題が十分に改善されない状況が発生しています。

差分ファイル mutt-x.y.z-fix.diff (ここに、 x.y.z は安定版のバージョン番号) 及び mutt-HEAD-xxxxxxxx-fix.diff (ここに、 xxxxxxxx は gitlab の commit のリビジョン番号) は、 "ttkzw's site 中の mutt patches" のページにて公開されている、滝澤隆史氏が作成された mutt 1.5.23 対応の East Asian Ambiguous Character の幅を漢字や全角カナ文字等と同じ幅 2 で表示する為の差分ファイル及び滝澤隆史氏及び吉田行範氏が作成された mutt 1.5.23 対応の各種機能拡張の為の差分ファイルを、 mutt 1.9.0 及び Gitlab 上の mutt に再適用及び修正し、一個の差分ファイルに纏めた物です。

なお、今回 mutt-x.y.z-fix.diff 及び mutt-HEAD-xxxxxxxx-fix.diff を作成するにあたって、安定版の mutt 及び Gitlab 版の mutt に再適用した差分ファイルは下記の通りです。

また、 Debian noroot 環境等、ハードリンクの作成が抑止されている環境の為に、システムコール link(2) の実行を回避する修正も同時に行っています。

差分ファイルの適用とソースコードのコンパイル

mutt 1.9.0 以降の安定版の場合

まず最初に、 mutt 1.9.0 以降の安定版のソースコードを取得して、適当なディレクトリに展開します。次に mutt-x.y.z-fix.diff を入手します。

そして、 mutt 1.9.0 以降の安定版のソースコードが置かれているディレクトリより、以下のようにして差分ファイル mutt-x.y.z-fix.diff を適用します。

 $ patch -p1 < /path/to/diff/mutt-x.y.z-fix.diff
 (ここに、/path/to/diff は、 mutt-x.y.z-fix.diff が置かれたディレクトリのパス名)

差分ファイルを適用後、安定版 mutt を、以下のようにして ./configure コマンドに --enable-cjk-ambiguous-width を指定してビルドしてインストールすると、安定版の mutt において、 East Asian Ambiguous Character が全角文字の幅と同じ幅で表示されるようになります。

 $ ./configure --enable-cjk-ambiguous-width ... # (その他のオプションは適宜指定する。)
 $ make
 $ make install

また、 mutt において、システムコール link(2) の実行を回避するには、環境変数 CFLAGS にフラグ -DNO_USE_HARDLINK を追加してビルドします。

 $ /usr/bin/env CFLAGS="-DNO_USE_HARDLINK" ./configure --enable-cjk-ambiguous-width ... # (その他のオプションは適宜指定する。)
 $ make
 $ make install

なお、差分ファイル mutt-1.9.0-fix.diffmutt 1.9.1 から mutt 1.9.4 にも同様に適用可能です。 なお、以下に示す差分ファイルは、それぞれ右に示す mutt の安定版にも適用可能です。

GitLab 上の mutt の場合

GitLab 上の mutt の場合は、差分ファイル mutt-HEAD-xxxxxxxx-fix.diff を用いて、前述した mutt 1.9.0 以降の安定版と同様に、 GitLab 上の mutt のリポジトリを git clone コマンドを用いて入手し、以下のようにして差分ファイル mutt-HEAD-xxxxxxxx-fix.diff を適用します。

 $ patch -p1 < /path/to/diff/mutt-HEAD-xxxxxxxx-fix.diff
 (ここに、/path/to/diff は、 mutt-HEAD-xxxxxxxx-fix.diff が置かれたディレクトリのパス名)

差分ファイルを適用後のビルドの手法は前述とほぼ同様ですが、 GitLab 上の mutt では ./configure スクリプトが作成されていないため、 ./configure コマンドに代えて ./prepare コマンドを実行して、ソースコードのビルドを行います。

 $ ./prepare --enable-cjk-ambiguous-width ... # (その他のオプションは適宜指定する。)
 $ /usr/bin/env CFLAGS="-DNO_USE_HARDLINK" ./prepare --enable-cjk-ambiguous-width ... # (システムコール ```link(2)``` の実行を回避するには、環境変数 CFLAGS を指定する。)
 $ make
 $ make install

設定ファイル .muttrc のオプション指定

以上のようにコンパイルされた mutt において、 East Asian Ambiguous Character が全角文字の幅と同じ幅で表示されるように設定するには、 mutt の設定ファイルである .muttrc に以下の設定を行う必要があります。

...
set cjk_width yes
...

その他各種の拡張機能に関する設定値に関しては、 "ttkzw's site 中の mutt patches" のページを参照して下さい。

謝辞

まず最初に、差分ファイル mutt-1.9.0-fix.diff を作成するにあたって、 mutt 1.5.23 対応の East Asian Ambiguous Character の幅を漢字や全角カナ文字等と同じ幅 2 で表示する為の差分ファイル及び mutt 1.5.23 対応の各種機能拡張の為の差分ファイルを作成された、滝澤隆史氏及び吉田行範氏に心より感謝致します。そして、 mutt の日本語化対応に尽力して下さった全ての方々に心より感謝致します。

最後に mutt の最初の開発者である Michael Elkins 氏及びその他の mutt の開発者の各氏を始め、 mutt に関する全てのことに関わった全ての方々に心より感謝致します。

追記及び御断り

2018/03/23 現在の追記

2018/02/23 現在の GitLab 上の mutt の HEAD の commit である e250c602 に対応した差分ファイル mutt-HEAD-e250c602-fix.diff を追加致しました。これに伴い、 mutt-HEAD-338019b3-fix.diff を削除しました。どうか御了承下さい。

2018/10/08 現在の追記

mutt の最新の安定版である 1.10.1 を始め、その他の安定版の 1.10.0 及び 1.9.5 に対応した差分ファイル mutt-1.10.1-fix.diff, mutt-1.10.0-fix.diff, mutt-1.9.5-fix.diff を追加致しました。

また、 2018/10/08 現在の GitLab 上の mutt の HEAD の commit である f3e0742a に対応した差分ファイル mutt-HEAD-f3e0742a-fix.diff を追加致しました。

これに伴い、 mutt-HEAD-e250c602-fix.diff を削除しました。どうか御了承下さい。

2018/11/07 現在の追記

2018/11/07 現在の GitLab 上の mutt の HEAD の commit である c9ab8553 に対応した差分ファイル mutt-HEAD-c9ab8553-fix.diff を追加致しました。これに伴い、 mutt-HEAD-338019b3-fix.diff を削除しました。どうか御了承下さい。

これに伴い、 mutt-HEAD-f3e0742a-fix.diff を削除しました。どうか御了承下さい。

2019/01/24 現在の追記

2019/01/24 現在の GitLab 上の mutt の HEAD の commit である be40128c に対応した差分ファイル mutt-HEAD-be40128c-fix.diff を追加致しました。これに伴い、 mutt-HEAD-338019b3-fix.diff を削除しました。どうか御了承下さい。

また、 mutt の最新の安定版である 1.11.2 を始め、その他の安定版の 1.11.0 等に対応した差分ファイル mutt-1.11.0-fix.diff, mutt-1.11.2-fix.diff, mutt-1.9.5-fix.diff も併せて追加致しました。

これに伴い、 mutt-HEAD-c9ab8553-fix.diff を削除しました。どうか御了承下さい。

2019/03/07 現在の追記

2019/03/07 現在の GitLab 上の mutt の HEAD の commit である 2366e3d6 に対応した差分ファイル mutt-HEAD-2366e3d6-fix.diff を追加致しました。これに伴い、 mutt-HEAD-338019b3-fix.diff を削除しました。どうか御了承下さい。

これに伴い、 mutt-HEAD-be40128c-fix.diff を削除しました。どうか御了承下さい。

diff --git a/PATCHES b/PATCHES
index e69de29..17743fd 100644
--- a/PATCHES
+++ b/PATCHES
@@ -0,0 +1,5 @@
+patch-1.5.23.tt+yy.delete_prefix.1
+patch-1.5.23.tt.create_rfc2047_params.1
+patch-1.5.23.tt.sanitize_ja.1
+patch-1.5.23.tt.cjk_width_tree_chars.1
+patch-1.5.23.tt.wcwidth.1
diff --git a/charset.c b/charset.c
index 2411f2c..6a5cbd4 100644
--- a/charset.c
+++ b/charset.c
@@ -481,6 +481,9 @@ int mutt_convert_string (char **ps, const char *from, const char *to, int flags)
if (!s || !*s)
return 0;
+ if (option (OPTSANITIZEJACHARS) && !ascii_strncasecmp (from, "iso-2022-jp", 11))
+ mutt_sanitize_ja_chars (s, mutt_strlen(s), 0);
+
if (to && from && (cd = mutt_iconv_open (to, from, flags)) != (iconv_t)-1)
{
int len;
@@ -677,3 +680,188 @@ int mutt_check_charset (const char *s, int strict)
return -1;
}
+
+/*
+ * mutt_sanitize_ja_chars()
+ * Adapted by TAKIZAWA Takashi <taki@cyber.email.ne.jp>
+ *
+ * - It replaces undefined KANJI characters to GETA mark.
+ * - It replaces character of 'JIS X 0201 kana' to '?'.
+ * - If $charset is EUC-JP, it replaces third character 'J' of
+ * escape sequence switching to 'JIS X 0201 latin' to 'B' indicating
+ * 'US-ASCII'.
+ * - If $charset is Shift_JIS, it replaces third character 'B' of
+ * escape sequence switching to 'US-ASCII' to 'J' indicating
+ * 'JIS X 0201 latin'.
+ */
+
+#define ASCII 0
+#define JISX0201LATIN 1
+#define JISX0201KANA 2
+#define JISX0208 3
+#define OTHER_CS 4
+
+void mutt_sanitize_ja_chars(char *s, size_t len, int keep_state)
+{
+ static int cs = ASCII;
+ static int kanji_cont = 0;
+ static int illegal_kanji = 0;
+ static int es = 0;
+ static char pes = '\0';
+ static char ascii_3rd_char = 'B';
+ static char jisx0201_3rd_char = 'J';
+
+ char *p = s;
+ char *p1 = NULL;
+ unsigned char c;
+
+ if (!keep_state || *p == 0x1b) /* consideration about mbstate's buffer */
+ {
+ if (!ascii_strcasecmp (Charset, "euc-jp"))
+ jisx0201_3rd_char = 'B';
+ else if (!ascii_strcasecmp (Charset, "shift_jis"))
+ ascii_3rd_char = 'J';
+ cs = ASCII;
+ kanji_cont = 0;
+ illegal_kanji = 0;
+ es = 0;
+ pes = '\0';
+ }
+
+ for (;p - s < len;p++)
+ {
+ if (es == 0)
+ {
+ if (*p == 0x1b)
+ es++;
+ else
+ {
+ switch (cs)
+ {
+ case ASCII:
+ case JISX0201LATIN:
+ break;
+ case JISX0201KANA:
+ *p = '?';
+ break;
+ case JISX0208:
+ /* replace ku-ten code from 9 to 15 and 85 or more to "GETA MARK" */
+ c = (unsigned char)*p;
+ if (! kanji_cont)
+ {
+ if ((size_t)(p - s + 1) == len)
+ return; /* the last character is a primary byte of KANJI */
+ if (c <= 0x20 || (c >= 0x29 && c <= 0x2f)
+ || (c >= 0x75 && c <= 0xa0))
+ illegal_kanji = 1;
+ kanji_cont = 1;
+ p1 = p;
+ }
+ else
+ {
+ if (c <= 0x20 || c >= 0x7f)
+ illegal_kanji = 1;
+ if (illegal_kanji && p1)
+ *p1 = 0x22, *p = 0x2e;
+ kanji_cont = 0;
+ illegal_kanji = 0;
+ }
+ break;
+ }
+ }
+ }
+ else if (es == 1)
+ {
+ if (*p == '$' || (*p >= '(' && *p <= '/' && *p != ','))
+ {
+ es++;
+ pes = *p;
+ }
+ else
+ {
+ es = 0;
+ return; /* broken */
+ }
+ }
+ else if (es == 2)
+ {
+ if (pes == '(')
+ {
+ switch (*p)
+ {
+ case 'B':
+ cs = ASCII, *p = ascii_3rd_char;
+ break;
+ case 'J':
+ cs = JISX0201LATIN, *p = jisx0201_3rd_char;
+ break;
+ case 'I':
+ /* ready to replace character to '?' */
+ cs = JISX0201KANA, *p = ascii_3rd_char;
+ break;
+ default:
+ cs = OTHER_CS;
+ }
+ es = 0;
+ }
+ else if (pes == '$')
+ {
+ switch (*p)
+ {
+ case '@': /* JIS X 0208-1978 */
+ case 'B': /* JIS X 0208-1983 */
+ cs = JISX0208;
+ es = 0;
+ break;
+ case 'A':
+ cs = OTHER_CS; /* GB 2312 */
+ es = 0;
+ break;
+ case '(':
+ case ')':
+ case '*':
+ case '+':
+ case '-':
+ case '.':
+ case '/':
+ es++;
+ break;
+ default:
+ es = 0;
+ return; /* broken */
+ }
+ }
+ else
+ {
+ cs = OTHER_CS;
+ es = 0;
+ }
+ }
+ else /* es == 3 */
+ {
+ cs = OTHER_CS;
+ es = 0;
+ }
+ }
+}
+
+int mutt_copy_bytes_sanitize_ja (FILE *in, FILE *out, size_t size)
+{
+ char buf[2048];
+ size_t chunk;
+
+ mutt_sanitize_ja_chars (NULL, 0, 0);
+ while (size > 0)
+ {
+ chunk = (size > sizeof (buf)) ? sizeof (buf) : size;
+ if ((chunk = fread (buf, 1, chunk, in)) < 1)
+ break;
+ mutt_sanitize_ja_chars (buf, chunk, 1);
+ if (fwrite (buf, 1, chunk, out) != chunk)
+ return (-1);
+ size -= chunk;
+ }
+
+ return 0;
+}
+
diff --git a/charset.h b/charset.h
index 54891f0..d67b209 100644
--- a/charset.h
+++ b/charset.h
@@ -36,6 +36,9 @@ int iconv_close (iconv_t);
int mutt_convert_string (char **, const char *, const char *, int);
+void mutt_sanitize_ja_chars (char *, size_t, int);
+int mutt_copy_bytes_sanitize_ja (FILE *, FILE *, size_t);
+
iconv_t mutt_iconv_open (const char *, const char *, int);
size_t mutt_iconv (iconv_t, ICONV_CONST char **, size_t *, char **, size_t *, ICONV_CONST char **, const char *);
diff --git a/configure.ac b/configure.ac
index 4155018..f14a841 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1415,6 +1415,16 @@ fi
dnl -- locales --
+AC_ARG_ENABLE(cjk-ambiguous-width, AC_HELP_STRING([--enable-cjk-ambiguous-width], [ Enable East Asian Ambiguous characters support (using own wcwidth)]),
+ [ if test "x$enableval" = "xyes" ; then
+ cjk_width=yes
+ fi
+ ])
+if test "x$cjk_width" = "xyes" ; then
+ AC_DEFINE(USE_CJK_WIDTH,1,[ Define if you want to support East Asian Ambiguous class. ])
+ MUTT_LIB_OBJECTS="$MUTT_LIB_OBJECTS wcwidth.o"
+fi
+
AC_CHECK_HEADERS(wchar.h)
AC_CACHE_CHECK([for wchar_t], mutt_cv_wchar_t,
@@ -1485,7 +1495,10 @@ fi
if test $wc_funcs = yes; then
AC_DEFINE(HAVE_WC_FUNCS,1,[ Define if you are using the system's wchar_t functions. ])
else
- MUTT_LIB_OBJECTS="$MUTT_LIB_OBJECTS utf8.o wcwidth.o"
+ MUTT_LIB_OBJECTS="$MUTT_LIB_OBJECTS utf8.o"
+ if test "x$cjk_width" != "xyes"; then
+ MUTT_LIB_OBJECTS="$MUTT_LIB_OBJECTS wcwidth.o"
+ fi
fi
AC_CACHE_CHECK([for nl_langinfo and CODESET], mutt_cv_langinfo_codeset,
diff --git a/curs_lib.c b/curs_lib.c
index eef8dd2..db3ac2e 100644
--- a/curs_lib.c
+++ b/curs_lib.c
@@ -1177,7 +1177,14 @@ void mutt_format_string (char *dest, size_t destlen,
wc = replacement_char ();
}
if (arboreal && wc < MUTT_TREE_MAX)
- w = 1; /* hack */
+ {
+#ifdef USE_CJK_WIDTH
+ if (Charset_is_utf8 && option (OPTCJKWIDTHTREECHARS) && !option (OPTASCIICHARS))
+ w = wcwidth (TreeChars[wc]);
+ else
+#endif
+ w = 1;
+ }
else
{
#ifdef HAVE_ISWBLANK
@@ -1406,10 +1413,12 @@ int mutt_strwidth (const char *s)
int w;
size_t k, n;
mbstate_t mbstate;
+ int arboreal;
if (!s) return 0;
n = mutt_strlen (s);
+ arboreal = (s[0] < MUTT_TREE_MAX) ? 1 : 0;
memset (&mbstate, 0, sizeof (mbstate));
for (w=0; n && (k = mbrtowc (&wc, s, n, &mbstate)); s += k, n -= k)
@@ -1421,9 +1430,21 @@ int mutt_strwidth (const char *s)
k = (k == (size_t)(-1)) ? 1 : n;
wc = replacement_char ();
}
- if (!IsWPrint (wc))
- wc = '?';
- w += wcwidth (wc);
+ if (wc < MUTT_TREE_MAX && arboreal && k == 1)
+ {
+#ifdef USE_CJK_WIDTH
+ if (Charset_is_utf8 && option (OPTCJKWIDTHTREECHARS) && !option (OPTASCIICHARS))
+ w += wcwidth (TreeChars[wc]);
+ else
+#endif
+ w++;
+ }
+ else
+ {
+ if (!IsWPrint (wc))
+ wc = '?';
+ w += wcwidth (wc);
+ }
}
return w;
}
diff --git a/doc/makedoc-defs.h b/doc/makedoc-defs.h
index 78a4ebc..dd872ba 100644
--- a/doc/makedoc-defs.h
+++ b/doc/makedoc-defs.h
@@ -31,10 +31,10 @@
# ifndef USE_SOCKET
# define USE_SOCKET
# endif
-# ifndef USE_DOTLOCK
+# if !defined(USE_DOTLOCK) && !defined(NO_USE_HARDLINK)
# define USE_DOTLOCK
# endif
-# ifndef DL_STANDALONE
+# if !defined(DL_STANDALONE) && !defined(NO_USE_HARDLINK)
# define DL_STANDALONE
# endif
# ifndef USE_HCACHE
diff --git a/dotlock.c b/dotlock.c
index 5bf0348..5d87850 100644
--- a/dotlock.c
+++ b/dotlock.c
@@ -52,13 +52,13 @@
#include <getopt.h>
#endif
-#ifdef DL_STANDALONE
+#if defined(DL_STANDALONE) && !defined(NO_USE_HARDLINK)
# include "reldate.h"
#endif
#define MAXLINKS 1024 /* maximum link depth */
-#ifdef DL_STANDALONE
+#if defined(DL_STANDALONE) && !defined(NO_USE_HARDLINK)
# define LONG_STRING 1024
# define MAXLOCKATTEMPT 5
@@ -96,7 +96,7 @@ extern int snprintf (char *, size_t, const char *, ...);
static int DotlockFlags;
static int Retry = MAXLOCKATTEMPT;
-#ifdef DL_STANDALONE
+#if defined(DL_STANDALONE) && !defined(NO_USE_HARDLINK)
static char *Hostname;
#endif
@@ -110,7 +110,7 @@ static int dotlock_prepare (char *, size_t, const char *, int fd);
static int dotlock_check_stats (struct stat *, struct stat *);
static int dotlock_dispatch (const char *, int fd);
-#ifdef DL_STANDALONE
+#if defined(DL_STANDALONE) && !defined(NO_USE_HARDLINK)
static int dotlock_init_privs (void);
static void usage (const char *);
#endif
@@ -130,7 +130,7 @@ static int dotlock_unlink (const char *);
static int dotlock_lock (const char *);
-#ifdef DL_STANDALONE
+#if defined(DL_STANDALONE) && !defined(NO_USE_HARDLINK)
#define check_flags(a) if (a & DL_FL_ACTIONS) usage (argv[0])
@@ -327,7 +327,7 @@ END_PRIVILEGED (void)
#endif
}
-#ifdef DL_STANDALONE
+#if defined(DL_STANDALONE) && !defined(NO_USE_HARDLINK)
/*
* Usage information.
diff --git a/globals.h b/globals.h
index 2aa96f9..4a51586 100644
--- a/globals.h
+++ b/globals.h
@@ -24,7 +24,7 @@ WHERE CONTEXT *Context;
WHERE char Errorbuf[STRING];
WHERE char AttachmentMarker[STRING];
-#if defined(DL_STANDALONE) && defined(USE_DOTLOCK)
+#if defined(DL_STANDALONE) && defined(USE_DOTLOCK) && !defined(NO_USE_HARDLINK)
WHERE char *MuttDotlock;
#endif
@@ -300,9 +300,31 @@ const char * const Months[] = { "Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul",
const char * const BodyTypes[] = { "x-unknown", "audio", "application", "image", "message", "model", "multipart", "text", "video" };
const char * const BodyEncodings[] = { "x-unknown", "7bit", "8bit", "quoted-printable", "base64", "binary", "x-uuencoded" };
+#ifdef USE_CJK_WIDTH
+const wchar_t TreeChars[] =
+{
+ 0xFEFF, /* not used */
+ 0x2514, /* M_TREE_LLCORNER WACS_LLCORNER */
+ 0x250C, /* M_TREE_ULCORNER WACS_ULCORNER */
+ 0x251C, /* M_TREE_LTEE WACS_LTEE */
+ 0x2500, /* M_TREE_HLINE WACS_HLINE */
+ 0x2502, /* M_TREE_VLINE WACS_VLINE */
+ 0x0020, /* M_TREE_SPACE */
+ 0x003E, /* M_TREE_RARROW */
+ 0x002A, /* M_TREE_STAR fake thread indicator */
+ 0x0026, /* M_TREE_HIDDEN */
+ 0x003D, /* M_TREE_EQUALS */
+ 0x252C, /* M_TREE_TTEE WACS_TTEE */
+ 0x2534, /* M_TREE_BTEE WACS_BTEE */
+ 0x003F /* M_TREE_MISSING */
+};
+#endif /* USE_CJK_WIDTH */
#else
extern const char * const Weekdays[];
extern const char * const Months[];
+#ifdef USE_CJK_WIDTH
+extern const wchar_t TreeChars[];
+#endif /* USE_CJK_WIDTH */
#endif
#ifdef MAIN_C
diff --git a/handler.c b/handler.c
index 7ce53f9..ab69527 100644
--- a/handler.c
+++ b/handler.c
@@ -100,6 +100,9 @@ static void mutt_convert_to_state(iconv_t cd, char *bufi, size_t *l, STATE *s)
return;
}
+ if (option (OPTSANITIZEJACHARS) && strchr (bufi, 0x1b))
+ mutt_sanitize_ja_chars (bufi, *l, 1);
+
ib = bufi, ibl = *l;
for (;;)
{
@@ -1312,6 +1315,7 @@ static int autoview_handler (BODY *a, STATE *s)
int piped = FALSE;
pid_t thepid;
int rc = 0;
+ char *charset;
snprintf (type, sizeof (type), "%s/%s", TYPE (a), a->subtype);
rfc1524_mailcap_lookup (a, type, entry, MUTT_AUTOVIEW);
@@ -1342,6 +1346,10 @@ static int autoview_handler (BODY *a, STATE *s)
return -1;
}
+ charset = mutt_get_parameter ("charset", a->parameter);
+ if (charset && option (OPTSANITIZEJACHARS) && !ascii_strncasecmp (charset,"iso-2022-jp", 11))
+ mutt_copy_bytes_sanitize_ja (s->fpin, fpin, a->length);
+ else
mutt_copy_bytes (s->fpin, fpin, a->length);
if(!piped)
diff --git a/hdrline.c b/hdrline.c
index ea76e83..c3a4d92 100644
--- a/hdrline.c
+++ b/hdrline.c
@@ -272,6 +272,7 @@ hdr_format_str (char *dest,
#define THREAD_NEW (threads && hdr->collapsed && hdr->num_hidden > 1 && mutt_thread_contains_unread (ctx, hdr) == 1)
#define THREAD_OLD (threads && hdr->collapsed && hdr->num_hidden > 1 && mutt_thread_contains_unread (ctx, hdr) == 2)
size_t len;
+ char *subj;
hdr = hfi->hdr;
ctx = hfi->ctx;
@@ -590,6 +591,7 @@ hdr_format_str (char *dest,
subj = apply_subject_mods(hdr->env);
else
subj = hdr->env->subject;
+ subj = option (OPTDELETEPREFIX) ? hdr->env->real_subj : hdr->env->subject;
if (flags & MUTT_FORMAT_TREE && !hdr->collapsed)
{
if (flags & MUTT_FORMAT_FORCESUBJ)
diff --git a/init.h b/init.h
index 6c44394..bb51bb0 100644
--- a/init.h
+++ b/init.h
@@ -440,6 +440,31 @@ struct option_t MuttVars[] = {
** this variable is \fIunset\fP, no check for new mail is performed
** while the mailbox is open.
*/
+#ifdef USE_CJK_WIDTH
+ { "cjk_width", DT_BOOL, R_NONE, OPTCJKWIDTH, 0 },
+ /*
+ ** .pp
+ ** When this option is set, characters in the East Asian Ambiguous (A)
+ ** category as defined in Unicode Technical Report #11 have a column
+ ** width of 2. Othrwise, they have a column width of 1.
+ ** This variant might be useful for users of CJK legacy encodings
+ ** who want to migrate to UCS without changing the traditional terminal
+ ** character-width behaviour.
+ ** .pp
+ ** \fBNote:\fP this option only affects in UTF-8 encoding.
+ */
+ { "cjk_width_tree_chars", DT_BOOL, R_NONE, OPTCJKWIDTHTREECHARS, 0 },
+ /*
+ ** .pp
+ ** If \fIset\fP, Mutt will use the result of $cjk_width as a column
+ ** width of WACS characters when displaying thread and attachment trees.
+ ** This variant might be useful for users of CJK legacy encodings
+ ** who want to migrate to UCS without changing the traditional terminal
+ ** character-width behaviour.
+ ** .pp
+ ** \fBNote:\fP this option only affects in UTF-8 encoding.
+ */
+#endif
{ "collapse_unread", DT_BOOL, R_NONE, OPTCOLLAPSEUNREAD, 1 },
/*
** .pp
@@ -645,6 +670,17 @@ struct option_t MuttVars[] = {
** If \fI``no''\fP, never attempt to verify cryptographic signatures.
** (Crypto only)
*/
+ { "create_rfc2047_parameters", DT_BOOL, R_NONE, OPTCREATERFC2047PARAMS, 0 },
+ /*
+ ** .pp
+ ** When this variable is set, Mutt will add the following RFC-2047-encoded
+ ** MIME parameter to Content-Type header field as filename for attachment:
+ ** name="=?iso-2022-jp?B?GyRCO244MxsoQi50eHQ=?="
+ ** .pp
+ ** Note: this use of RFC 2047's encoding is explicitly prohibited
+ ** by the standard. You may set this variable only if a mailer
+ ** of recipients can not parse RFC 2231 parameters.
+ */
{ "date_format", DT_STR, R_MENU, UL &DateFmt, UL "!%a, %b %d, %Y at %I:%M:%S%p %Z" },
/*
** .pp
@@ -696,6 +732,19 @@ struct option_t MuttVars[] = {
** If this option is \fIset\fP, mutt's received-attachments menu will not show the subparts of
** individual messages in a multipart/digest. To see these subparts, press ``v'' on that menu.
*/
+ { "delete_prefix", DT_BOOL, R_NONE, OPTDELETEPREFIX, 0 },
+ /*
+ ** .pp
+ ** If set, prefix in Subject: field generated by some mailing lists
+ ** (something like "Subject: [foo-ML:0012] real-subject") can be deleted
+ ** when displaying in index-mode and editing in message reply.
+ ** Deletion pattern can be configured by $$delete_regexp variable.
+ */
+ { "delete_regexp", DT_RX, R_NONE, UL &DeleteRegexp, UL "^(\\[[A-Za-z0-9_.: \\-]*\\][ ]*)" },
+ /*
+ ** .pp
+ ** A regular expression used in $$delete_prefix function.
+ */
{ "display_filter", DT_PATH, R_PAGER, UL &DisplayFilter, UL "" },
/*
** .pp
@@ -703,7 +752,7 @@ struct option_t MuttVars[] = {
** is viewed it is passed as standard input to $$display_filter, and the
** filtered message is read from the standard output.
*/
-#if defined(DL_STANDALONE) && defined(USE_DOTLOCK)
+#if defined(DL_STANDALONE) && defined(USE_DOTLOCK) && !defined(NO_USE_HARDLINK)
{ "dotlock_program", DT_PATH, R_NONE, UL &MuttDotlock, UL BINDIR "/mutt_dotlock" },
/*
** .pp
@@ -2756,6 +2805,28 @@ struct option_t MuttVars[] = {
** that mutt \fIgenerates\fP this kind of encoding. Instead, mutt will
** unconditionally use the encoding specified in RFC2231.
*/
+ { "sanitize_ja_chars", DT_BOOL, R_NONE, OPTSANITIZEJACHARS, 0 },
+ /*
+ ** .pp
+ ** When set, Japanese "platform dependent characters" (illegal
+ ** characters for iso-2022-jp charset; mainly used by MS-Windows
+ ** mailers) are substituted to special character, GETA mark ('ESC $$ B " .
+ ** ESC ( B' in iso-2022-jp), and JIS X 0201 kana characters
+ ** (only for "ESC ) I" cases) are also substituted to "?" to
+ ** prevent garbage characters. JIS X 0201 kana characters are
+ ** not substituted if they appear in 8bit form.
+ ** .pp
+ ** This fixes another Japanese encoding issue. In case $$charset
+ ** is set to "EUC-JP", which does not contain JIS X 0201 roman
+ ** character set, the JIS X 0201 roman part of received messages
+ ** encoded in iso-2022-jp can not be converted to EUC-JP.
+ ** On the other hand, the ASCII part can not be converted to
+ ** Shift_JIS, which does not contain ASCII character set. Thus,
+ ** the converted characters are garbled in these cases. When this
+ ** option is set, the JIS X 0201 roman escape sequence and the
+ ** ASCII escape sequence are replaced appropriately to prevent
+ ** the output from being garbled.
+ */
{ "save_address", DT_BOOL, R_NONE, OPTSAVEADDRESS, 0 },
/*
** .pp
diff --git a/lib.c b/lib.c
index 583d2ff..1f61b39 100644
--- a/lib.c
+++ b/lib.c
@@ -445,6 +445,10 @@ int safe_symlink(const char *oldpath, const char *newpath)
int safe_rename (const char *src, const char *target)
{
+#ifdef NO_USE_HARDLINK
+ /* Android (since 6.0) does not support hardlinks. */
+ return rename(src, target);
+#else
struct stat ssb, tsb;
if (!src || !target)
@@ -537,6 +541,7 @@ int safe_rename (const char *src, const char *target)
return 0;
+#endif /* NO_USE_HARDLINK */
}
diff --git a/main.c b/main.c
index de5a538..17c5524 100644
--- a/main.c
+++ b/main.c
@@ -267,25 +267,25 @@ static void show_version (void)
"-USE_SETGID "
#endif
-#ifdef USE_DOTLOCK
+#if defined(USE_DOTLOCK) && !defined(NO_USE_HARDLINK)
"+USE_DOTLOCK "
#else
"-USE_DOTLOCK "
#endif
-#ifdef DL_STANDALONE
+#if defined(DL_STANDALONE) && !defined(NO_USE_HARDLINK)
"+DL_STANDALONE "
#else
"-DL_STANDALONE "
#endif
-#ifdef USE_FCNTL
+#if defined(USE_FCNTL) || defined(NO_USE_HARDLINK)
"+USE_FCNTL "
#else
"-USE_FCNTL "
#endif
-#ifdef USE_FLOCK
+#if defined(USE_FLOCK) || defined(NO_USE_HARDLINK)
"+USE_FLOCK "
#else
"-USE_FLOCK "
@@ -452,6 +452,12 @@ static void show_version (void)
"-LOCALES_HACK "
#endif
+#ifdef USE_CJK_WIDTH
+ "+USE_CJK_WIDTH "
+#else
+ "-USE_CJK_WIDTH "
+#endif
+
#ifdef HAVE_WC_FUNCS
"+HAVE_WC_FUNCS "
#else
diff --git a/mbyte.c b/mbyte.c
index 0eedaa7..8032bd3 100644
--- a/mbyte.c
+++ b/mbyte.c
@@ -17,7 +17,7 @@
*/
/*
- * Japanese support by TAKIZAWA Takashi <taki@luna.email.ne.jp>.
+ * CJK support by TAKIZAWA Takashi <taki@luna.email.ne.jp>.
*/
#if HAVE_CONFIG_H
@@ -37,8 +37,8 @@
#endif
int Charset_is_utf8 = 0;
+static int charset_is_cjk = 0;
#ifndef HAVE_WC_FUNCS
-static int charset_is_ja = 0;
static iconv_t charset_to_utf8 = (iconv_t)(-1);
static iconv_t charset_from_utf8 = (iconv_t)(-1);
#endif
@@ -50,8 +50,8 @@ void mutt_set_charset (char *charset)
mutt_canonical_charset (buffer, sizeof (buffer), charset);
Charset_is_utf8 = 0;
+ charset_is_cjk = 0;
#ifndef HAVE_WC_FUNCS
- charset_is_ja = 0;
if (charset_to_utf8 != (iconv_t)(-1))
{
iconv_close (charset_to_utf8);
@@ -66,12 +66,18 @@ void mutt_set_charset (char *charset)
if (mutt_is_utf8 (buffer))
Charset_is_utf8 = 1;
-#ifndef HAVE_WC_FUNCS
- else if (!ascii_strcasecmp(buffer, "euc-jp") || !ascii_strcasecmp(buffer, "shift_jis")
- || !ascii_strcasecmp(buffer, "cp932") || !ascii_strcasecmp(buffer, "eucJP-ms"))
+ else if (!ascii_strcasecmp (buffer, "gb2312") ||
+ !ascii_strcasecmp (buffer, "gb18030") ||
+ !ascii_strcasecmp (buffer, "big5") ||
+ !ascii_strcasecmp (buffer, "euc-tw") ||
+ !ascii_strcasecmp (buffer, "EUC-JP") ||
+ !ascii_strcasecmp (buffer, "eucJP-ms") ||
+ !ascii_strcasecmp (buffer, "Shift_JIS") ||
+ !ascii_strcasecmp (buffer, "cp932") ||
+ !ascii_strcasecmp (buffer, "euc-kr"))
{
- charset_is_ja = 1;
-
+ charset_is_cjk = 1;
+#ifndef HAVE_WC_FUNCS
/* Note flags=0 to skip charset-hooks: User masters the $charset
* name, and we are sure of our "utf-8" constant. So there is no
* possibility of wrong name that we would want to try to correct
@@ -80,24 +86,68 @@ void mutt_set_charset (char *charset)
*/
charset_to_utf8 = mutt_iconv_open ("utf-8", charset, 0);
charset_from_utf8 = mutt_iconv_open (charset, "utf-8", 0);
- }
#endif
+ }
#if defined(HAVE_BIND_TEXTDOMAIN_CODESET) && defined(ENABLE_NLS)
bind_textdomain_codeset(PACKAGE, buffer);
#endif
}
+#if !defined(HAVE_WC_FUNCS) || defined(USE_CJK_WIDTH)
+/*
+ * For systems that don't have wcwidth() which functions correctly,
+ * we provide our own wcwidth().
+ * Furthermore, this wcwidth() enables change of character-cell width of
+ * the East Asian Ambiguous class by using $cjk_width.
+ * The function which most systems have cannot do it.
+ * Please read the comment of wcwidth.c about the East Asian Ambiguous
+ * class for details.
+ */
+int wcwidth_ucs(wchar_t ucs);
+int wcwidth_cjk(wchar_t ucs);
+
+int wcwidth (wchar_t wc)
+{
+ if (!Charset_is_utf8)
+ {
+ if (!charset_is_cjk)
+ {
+ /* 8-bit case */
+ if (!wc)
+ return 0;
+ else if ((0 <= wc && wc < 256) && IsPrint (wc))
+ return 1;
+ else
+ return -1;
+ }
+ else
+ {
+ /* CJK */
+ return wcwidth_cjk (wc);
+ }
+ }
+ else {
+#ifdef USE_CJK_WIDTH
+ if (option (OPTCJKWIDTH))
+ return wcwidth_cjk (wc);
+#endif /* USE_CJK_WIDTH */
+ return wcwidth_ucs (wc);
+ }
+}
+#endif /* !HAVE_WC_FUNCS || USE_CJK_WIDTH */
+
+
#ifndef HAVE_WC_FUNCS
/*
* For systems that don't have them, we provide here our own
- * implementations of wcrtomb(), mbrtowc(), iswprint() and wcwidth().
+ * implementations of wcrtomb(), mbrtowc() and iswprint().
* Instead of using the locale, as these functions normally would,
* we use Mutt's Charset variable. We support 3 types of charset:
* (1) For 8-bit charsets, wchar_t uses the same encoding as char.
* (2) For UTF-8, wchar_t uses UCS.
- * (3) For stateless Japanese encodings, we use UCS and convert
+ * (3) For stateless CJK encodings, we use UCS and convert
* via UTF-8 using iconv.
* Unfortunately, we can't handle non-stateless encodings.
*/
@@ -256,7 +306,7 @@ size_t mbrtowc (wchar_t *pwc, const char *s, size_t n, mbstate_t *ps)
int iswprint (wint_t wc)
{
- if (Charset_is_utf8 || charset_is_ja)
+ if (Charset_is_utf8 || charset_is_cjk)
return ((0x20 <= wc && wc < 0x7f) || 0xa0 <= wc);
else
return (0 <= wc && wc < 256) ? IsPrint (wc) : 0;
@@ -264,7 +314,7 @@ int iswprint (wint_t wc)
int iswspace (wint_t wc)
{
- if (Charset_is_utf8 || charset_is_ja)
+ if (Charset_is_utf8 || charset_is_cjk)
return (9 <= wc && wc <= 13) || wc == 32;
else
return (0 <= wc && wc < 256) ? isspace (wc) : 0;
@@ -347,7 +397,7 @@ static int iswalpha_ucs (wint_t wc)
wint_t towupper (wint_t wc)
{
- if (Charset_is_utf8 || charset_is_ja)
+ if (Charset_is_utf8 || charset_is_cjk)
return towupper_ucs (wc);
else
return (0 <= wc && wc < 256) ? toupper (wc) : wc;
@@ -355,7 +405,7 @@ wint_t towupper (wint_t wc)
wint_t towlower (wint_t wc)
{
- if (Charset_is_utf8 || charset_is_ja)
+ if (Charset_is_utf8 || charset_is_cjk)
return towlower_ucs (wc);
else
return (0 <= wc && wc < 256) ? tolower (wc) : wc;
@@ -363,7 +413,7 @@ wint_t towlower (wint_t wc)
int iswalnum (wint_t wc)
{
- if (Charset_is_utf8 || charset_is_ja)
+ if (Charset_is_utf8 || charset_is_cjk)
return iswalnum_ucs (wc);
else
return (0 <= wc && wc < 256) ? isalnum (wc) : 0;
@@ -371,7 +421,7 @@ int iswalnum (wint_t wc)
int iswalpha (wint_t wc)
{
- if (Charset_is_utf8 || charset_is_ja)
+ if (Charset_is_utf8 || charset_is_cjk)
return iswalpha_ucs (wc);
else
return (0 <= wc && wc < 256) ? isalpha (wc) : 0;
@@ -379,58 +429,12 @@ int iswalpha (wint_t wc)
int iswupper (wint_t wc)
{
- if (Charset_is_utf8 || charset_is_ja)
+ if (Charset_is_utf8 || charset_is_cjk)
return iswupper_ucs (wc);
else
return (0 <= wc && wc < 256) ? isupper (wc) : 0;
}
-/*
- * l10n for Japanese:
- * Symbols, Greek and Cyrillic in JIS X 0208, Japanese Kanji
- * Character Set, have a column width of 2.
- */
-int wcwidth_ja (wchar_t ucs)
-{
- if (ucs >= 0x3021)
- return -1; /* continue with the normal check */
- /* a rough range for quick check */
- if ((ucs >= 0x00a1 && ucs <= 0x00fe) || /* Latin-1 Supplement */
- (ucs >= 0x0391 && ucs <= 0x0451) || /* Greek and Cyrillic */
- (ucs >= 0x2010 && ucs <= 0x266f) || /* Symbols */
- (ucs >= 0x3000 && ucs <= 0x3020)) /* CJK Symbols and Punctuation */
- return 2;
- else
- return -1;
-}
-
-int wcwidth_ucs(wchar_t ucs);
-
-int wcwidth (wchar_t wc)
-{
- if (!Charset_is_utf8)
- {
- if (!charset_is_ja)
- {
- /* 8-bit case */
- if (!wc)
- return 0;
- else if ((0 <= wc && wc < 256) && IsPrint (wc))
- return 1;
- else
- return -1;
- }
- else
- {
- /* Japanese */
- int k = wcwidth_ja (wc);
- if (k != -1)
- return k;
- }
- }
- return wcwidth_ucs (wc);
-}
-
size_t utf8rtowc (wchar_t *pwc, const char *s, size_t n, mbstate_t *_ps)
{
static wchar_t mbstate;
diff --git a/mbyte.h b/mbyte.h
index 9c58c9e..224cafb 100644
--- a/mbyte.h
+++ b/mbyte.h
@@ -8,6 +8,12 @@
# ifdef HAVE_WCTYPE_H
# include <wctype.h>
# endif
+# ifdef USE_CJK_WIDTH
+#ifdef wcwidth
+# undef wcwidth
+#endif
+int wcwidth (wchar_t wc);
+# endif /* USE_CJK_WIDTH */
# endif
# ifndef HAVE_WC_FUNCS
@@ -32,6 +38,9 @@
#ifdef iswupper
# undef iswupper
#endif
+#ifdef wcwidth
+# undef wcwidth
+#endif
size_t wcrtomb (char *s, wchar_t wc, mbstate_t *ps);
size_t mbrtowc (wchar_t *pwc, const char *s, size_t n, mbstate_t *ps);
int iswprint (wint_t wc);
@@ -44,7 +53,6 @@ wint_t towlower (wint_t wc);
int wcwidth (wchar_t wc);
# endif /* !HAVE_WC_FUNCS */
-
void mutt_set_charset (char *charset);
extern int Charset_is_utf8;
size_t utf8rtowc (wchar_t *pwc, const char *s, size_t n, mbstate_t *_ps);
diff --git a/mutt.h b/mutt.h
index 663beb4..6a0d9a0 100644
--- a/mutt.h
+++ b/mutt.h
@@ -362,10 +362,16 @@ enum
OPTBROWSERABBRMAILBOXES,
OPTCHECKMBOXSIZE,
OPTCHECKNEW,
+#ifdef USE_CJK_WIDTH
+ OPTCJKWIDTH,
+ OPTCJKWIDTHTREECHARS,
+#endif /* USE_CJK_WIDTH */
OPTCOLLAPSEUNREAD,
OPTCONFIRMAPPEND,
OPTCONFIRMCREATE,
+ OPTCREATERFC2047PARAMS,
OPTDELETEUNTAG,
+ OPTDELETEPREFIX,
OPTDIGESTCOLLAPSE,
OPTDUPTHREADS,
OPTEDITHDRS,
@@ -465,6 +471,7 @@ enum
OPTREVNAME,
OPTREVREAL,
OPTRFC2047PARAMS,
+ OPTSANITIZEJACHARS,
OPTSAVEADDRESS,
OPTSAVEEMPTY,
OPTSAVENAME,
diff --git a/mutt_regex.h b/mutt_regex.h
index f10ecbe..3cd36e0 100644
--- a/mutt_regex.h
+++ b/mutt_regex.h
@@ -52,5 +52,6 @@ WHERE REGEXP QuoteRegexp;
WHERE REGEXP ReplyRegexp;
WHERE REGEXP Smileys;
WHERE REGEXP GecosMask;
+WHERE REGEXP DeleteRegexp;
#endif /* MUTT_REGEX_H */
diff --git a/mx.c b/mx.c
index a015bf4..ea416b3 100644
--- a/mx.c
+++ b/mx.c
@@ -47,7 +47,7 @@
#include "buffy.h"
-#ifdef USE_DOTLOCK
+#if defined(USE_DOTLOCK) && !defined(NO_USE_HARDLINK)
#include "dotlock.h"
#endif
@@ -95,13 +95,13 @@ struct mx_ops* mx_get_ops (int magic)
#define mutt_is_spool(s) (mutt_strcmp (Spoolfile, s) == 0)
-#ifdef USE_DOTLOCK
+#if defined(USE_DOTLOCK) && !defined(NO_USE_HARDLINK)
/* parameters:
* path - file to lock
* retry - should retry if unable to lock?
*/
-#ifdef DL_STANDALONE
+#if defined(DL_STANDALONE) && !defined(NO_USE_HARDLINK)
static int invoke_dotlock (const char *path, int dummy, int flags, int retry)
{
@@ -181,14 +181,14 @@ static int undotlock_file (const char *path, int fd)
*/
int mx_lock_file (const char *path, int fd, int excl, int dot, int timeout)
{
-#if defined (USE_FCNTL) || defined (USE_FLOCK)
+#if defined (USE_FCNTL) || defined (USE_FLOCK) || defined(NO_USE_HARDLINK)
int count;
int attempt;
struct stat sb = { 0 }, prev_sb = { 0 }; /* silence gcc warnings */
#endif
int r = 0;
-#ifdef USE_FCNTL
+#if defined(USE_FCNTL) || defined(NO_USE_HARDLINK)
struct flock lck;
memset (&lck, 0, sizeof (struct flock));
@@ -227,7 +227,7 @@ int mx_lock_file (const char *path, int fd, int excl, int dot, int timeout)
}
#endif /* USE_FCNTL */
-#ifdef USE_FLOCK
+#if defined(USE_FLOCK) || defined(NO_USE_HARDLINK)
count = 0;
attempt = 0;
while (flock (fd, (excl ? LOCK_EX : LOCK_SH) | LOCK_NB) == -1)
@@ -261,7 +261,7 @@ int mx_lock_file (const char *path, int fd, int excl, int dot, int timeout)
}
#endif /* USE_FLOCK */
-#ifdef USE_DOTLOCK
+#if defined(USE_DOTLOCK) && !defined(NO_USE_HARDLINK)
if (r == 0 && dot)
r = dotlock_file (path, fd, timeout);
#endif /* USE_DOTLOCK */
@@ -270,12 +270,12 @@ int mx_lock_file (const char *path, int fd, int excl, int dot, int timeout)
{
/* release any other locks obtained in this routine */
-#ifdef USE_FCNTL
+#if defined(USE_FCNTL) || defined(NO_USE_HARDLINK)
lck.l_type = F_UNLCK;
fcntl (fd, F_SETLK, &lck);
#endif /* USE_FCNTL */
-#ifdef USE_FLOCK
+#if defined(USE_FLOCK) || defined(NO_USE_HARDLINK)
flock (fd, LOCK_UN);
#endif /* USE_FLOCK */
}
@@ -285,7 +285,7 @@ int mx_lock_file (const char *path, int fd, int excl, int dot, int timeout)
int mx_unlock_file (const char *path, int fd, int dot)
{
-#ifdef USE_FCNTL
+#if defined(USE_FCNTL) || defined(NO_USE_HARDLINK)
struct flock unlockit = { F_UNLCK, 0, 0, 0, 0 };
memset (&unlockit, 0, sizeof (struct flock));
@@ -294,11 +294,11 @@ int mx_unlock_file (const char *path, int fd, int dot)
fcntl (fd, F_SETLK, &unlockit);
#endif
-#ifdef USE_FLOCK
+#if defined(USE_FLOCK) || defined(NO_USE_HARDLINK)
flock (fd, LOCK_UN);
#endif
-#ifdef USE_DOTLOCK
+#if defined(USE_DOTLOCK) && !defined(NO_USE_HARDLINK)
if (dot)
undotlock_file (path, fd);
#endif
@@ -309,7 +309,7 @@ int mx_unlock_file (const char *path, int fd, int dot)
static void mx_unlink_empty (const char *path)
{
int fd;
-#ifndef USE_DOTLOCK
+#if !defined(USE_DOTLOCK) || defined(NO_USE_HARDLINK)
struct stat sb;
#endif
@@ -322,7 +322,7 @@ static void mx_unlink_empty (const char *path)
return;
}
-#ifdef USE_DOTLOCK
+#if defined(USE_DOTLOCK) && !defined(NO_USE_HARDLINK)
invoke_dotlock (path, fd, DL_FL_UNLINK, 1);
#else
if (fstat (fd, &sb) == 0 && sb.st_size == 0)
diff --git a/parse.c b/parse.c
index 0ae5594..745d2fc 100644
--- a/parse.c
+++ b/parse.c
@@ -1453,6 +1453,18 @@ ENVELOPE *mutt_read_rfc822_header (FILE *f, HEADER *hdr, short user_hdrs,
e->real_subj = e->subject + pmatch[0].rm_eo;
else
e->real_subj = e->subject;
+ if (option (OPTDELETEPREFIX))
+ {
+ /* if this option is set, mutt will delete the string as [prefix],
+ * [prefix:number] and [prefix number] in Subject line.
+ */
+ if (regexec (DeleteRegexp.rx, e->real_subj, 1, pmatch, 0) == 0)
+ {
+ e->real_subj = e->real_subj + pmatch[0].rm_eo;
+ if (regexec (ReplyRegexp.rx, e->real_subj, 1, pmatch, 0) == 0)
+ e->real_subj = e->real_subj + pmatch[0].rm_eo;
+ }
+ }
}
if (hdr->received < 0)
diff --git a/rfc2047.c b/rfc2047.c
index 8506425..e907b25 100644
--- a/rfc2047.c
+++ b/rfc2047.c
@@ -62,6 +62,9 @@ static size_t convert_string (ICONV_CONST char *f, size_t flen,
size_t obl, n;
int e;
+ if (option (OPTSANITIZEJACHARS) && !ascii_strncasecmp (from, "iso-2022-jp",
+11))
+ mutt_sanitize_ja_chars ((char *) f, flen, 0);
cd = mutt_iconv_open (to, from, 0);
if (cd == (iconv_t)(-1))
return (size_t)(-1);
diff --git a/sendlib.c b/sendlib.c
index ef42854..4503693 100644
--- a/sendlib.c
+++ b/sendlib.c
@@ -348,6 +348,30 @@ int mutt_write_mime_header (BODY *a, FILE *f)
}
}
+ if (a->use_disp && option (OPTCREATERFC2047PARAMS))
+ {
+ if(!(fn = a->d_filename))
+ fn = a->filename;
+
+ if (fn)
+ {
+ char *tmp;
+
+ /* Strip off the leading path... */
+ if ((t = strrchr (fn, '/')))
+ t++;
+ else
+ t = fn;
+
+ buffer[0] = 0;
+ tmp = safe_strdup (t);
+ rfc2047_encode_string (&tmp);
+ rfc822_cat (buffer, sizeof (buffer), tmp, MimeSpecials);
+ FREE (&tmp);
+ fprintf (f, ";\n\tname=%s", buffer);
+ }
+ }
+
fputc ('\n', f);
if (a->description)
diff --git a/wcwidth.c b/wcwidth.c
index 0b94d73..85a1397 100644
--- a/wcwidth.c
+++ b/wcwidth.c
@@ -5,6 +5,51 @@
* http://www.opengroup.org/onlinepubs/007904975/functions/wcwidth.html
* http://www.opengroup.org/onlinepubs/007904975/functions/wcswidth.html
*
+ * In fixed-width output devices, Latin characters all occupy a single
+ * "cell" position of equal width, whereas ideographic CJK characters
+ * occupy two such cells. Interoperability between terminal-line
+ * applications and (teletype-style) character terminals using the
+ * UTF-8 encoding requires agreement on which character should advance
+ * the cursor by how many cell positions. No established formal
+ * standards exist at present on which Unicode character shall occupy
+ * how many cell positions on character terminals. These routines are
+ * a first attempt of defining such behavior based on simple rules
+ * applied to data provided by the Unicode Consortium.
+ *
+ * For some graphical characters, the Unicode standard explicitly
+ * defines a character-cell width via the definition of the East Asian
+ * FullWidth (F), Wide (W), Half-width (H), and Narrow (Na) classes.
+ * In all these cases, there is no ambiguity about which width a
+ * terminal shall use. For characters in the East Asian Ambiguous (A)
+ * class, the width choice depends purely on a preference of backward
+ * compatibility with either historic CJK or Western practice.
+ * Choosing single-width for these characters is easy to justify as
+ * the appropriate long-term solution, as the CJK practice of
+ * displaying these characters as double-width comes from historic
+ * implementation simplicity (8-bit encoded characters were displayed
+ * single-width and 16-bit ones double-width, even for Greek,
+ * Cyrillic, etc.) and not any typographic considerations.
+ *
+ * Much less clear is the choice of width for the Not East Asian
+ * (Neutral) class. Existing practice does not dictate a width for any
+ * of these characters. It would nevertheless make sense
+ * typographically to allocate two character cells to characters such
+ * as for instance EM SPACE or VOLUME INTEGRAL, which cannot be
+ * represented adequately with a single-width glyph. The following
+ * routines at present merely assign a single-cell width to all
+ * neutral characters, in the interest of simplicity. This is not
+ * entirely satisfactory and should be reconsidered before
+ * establishing a formal standard in this area. At the moment, the
+ * decision which Not East Asian (Neutral) characters should be
+ * represented by double-width glyphs cannot yet be answered by
+ * applying a simple rule from the Unicode database content. Setting
+ * up a proper standard for the behavior of UTF-8 character terminals
+ * will require a careful analysis not only of each Unicode character,
+ * but also of each presentation form, something the author of these
+ * routines has avoided to do so far.
+ *
+ * http://www.unicode.org/unicode/reports/tr11/
+ *
* Markus Kuhn -- 2007-05-26 (Unicode 5.0)
*
* Permission to use, copy, modify, and distribute this software
@@ -24,12 +69,34 @@
# include "config.h"
#endif
-#ifndef HAVE_WC_FUNCS
+#if !defined(HAVE_WC_FUNCS) || defined(USE_CJK_WIDTH)
+#include <wchar.h>
-#include "mutt.h"
-#include "mbyte.h"
+struct interval {
+ wchar_t first;
+ wchar_t last;
+};
+
+/* auxiliary function for binary search in interval table */
+static int bisearch(wchar_t ucs, const struct interval *table, int max) {
+ int min = 0;
+ int mid;
+
+ if (ucs < table[0].first || ucs > table[max].last)
+ return 0;
+ while (max >= min) {
+ mid = (min + max) / 2;
+ if (ucs > table[mid].last)
+ min = mid + 1;
+ else if (ucs < table[mid].first)
+ max = mid - 1;
+ else
+ return 1;
+ }
+
+ return 0;
+}
-#include <ctype.h>
/* The following two functions define the column width of an ISO 10646
* character as follows:
@@ -67,62 +134,56 @@ int wcwidth_ucs(wchar_t ucs)
{
/* sorted list of non-overlapping intervals of non-spacing characters */
/* generated by "uniset +cat=Me +cat=Mn +cat=Cf -00AD +1160-11FF +200B c" */
- static const struct interval {
- wchar_t first;
- wchar_t last;
- } combining[] = {
- { 0x0300, 0x036f }, { 0x0483, 0x0486 }, { 0x0488, 0x0489 },
- { 0x0591, 0x05bd }, { 0x05bf, 0x05bf }, { 0x05c1, 0x05c2 },
- { 0x05c4, 0x05c5 }, { 0x05c7, 0x05c7 }, { 0x0600, 0x0603 },
- { 0x0610, 0x0615 }, { 0x064b, 0x065e }, { 0x0670, 0x0670 },
- { 0x06d6, 0x06e4 }, { 0x06e7, 0x06e8 }, { 0x06ea, 0x06ed },
- { 0x070f, 0x070f }, { 0x0711, 0x0711 }, { 0x0730, 0x074a },
- { 0x07a6, 0x07b0 }, { 0x07eb, 0x07f3 }, { 0x0901, 0x0902 },
- { 0x093c, 0x093c }, { 0x0941, 0x0948 }, { 0x094d, 0x094d },
+ static const struct interval combining[] = {
+ { 0x0300, 0x036F }, { 0x0483, 0x0486 }, { 0x0488, 0x0489 },
+ { 0x0591, 0x05BD }, { 0x05BF, 0x05BF }, { 0x05C1, 0x05C2 },
+ { 0x05C4, 0x05C5 }, { 0x05C7, 0x05C7 }, { 0x0600, 0x0603 },
+ { 0x0610, 0x0615 }, { 0x064B, 0x065E }, { 0x0670, 0x0670 },
+ { 0x06D6, 0x06E4 }, { 0x06E7, 0x06E8 }, { 0x06EA, 0x06ED },
+ { 0x070F, 0x070F }, { 0x0711, 0x0711 }, { 0x0730, 0x074A },
+ { 0x07A6, 0x07B0 }, { 0x07EB, 0x07F3 }, { 0x0901, 0x0902 },
+ { 0x093C, 0x093C }, { 0x0941, 0x0948 }, { 0x094D, 0x094D },
{ 0x0951, 0x0954 }, { 0x0962, 0x0963 }, { 0x0981, 0x0981 },
- { 0x09bc, 0x09bc }, { 0x09c1, 0x09c4 }, { 0x09cd, 0x09cd },
- { 0x09e2, 0x09e3 }, { 0x0a01, 0x0a02 }, { 0x0a3c, 0x0a3c },
- { 0x0a41, 0x0a42 }, { 0x0a47, 0x0a48 }, { 0x0a4b, 0x0a4d },
- { 0x0a70, 0x0a71 }, { 0x0a81, 0x0a82 }, { 0x0abc, 0x0abc },
- { 0x0ac1, 0x0ac5 }, { 0x0ac7, 0x0ac8 }, { 0x0acd, 0x0acd },
- { 0x0ae2, 0x0ae3 }, { 0x0b01, 0x0b01 }, { 0x0b3c, 0x0b3c },
- { 0x0b3f, 0x0b3f }, { 0x0b41, 0x0b43 }, { 0x0b4d, 0x0b4d },
- { 0x0b56, 0x0b56 }, { 0x0b82, 0x0b82 }, { 0x0bc0, 0x0bc0 },
- { 0x0bcd, 0x0bcd }, { 0x0c3e, 0x0c40 }, { 0x0c46, 0x0c48 },
- { 0x0c4a, 0x0c4d }, { 0x0c55, 0x0c56 }, { 0x0cbc, 0x0cbc },
- { 0x0cbf, 0x0cbf }, { 0x0cc6, 0x0cc6 }, { 0x0ccc, 0x0ccd },
- { 0x0ce2, 0x0ce3 }, { 0x0d41, 0x0d43 }, { 0x0d4d, 0x0d4d },
- { 0x0dca, 0x0dca }, { 0x0dd2, 0x0dd4 }, { 0x0dd6, 0x0dd6 },
- { 0x0e31, 0x0e31 }, { 0x0e34, 0x0e3a }, { 0x0e47, 0x0e4e },
- { 0x0eb1, 0x0eb1 }, { 0x0eb4, 0x0eb9 }, { 0x0ebb, 0x0ebc },
- { 0x0ec8, 0x0ecd }, { 0x0f18, 0x0f19 }, { 0x0f35, 0x0f35 },
- { 0x0f37, 0x0f37 }, { 0x0f39, 0x0f39 }, { 0x0f71, 0x0f7e },
- { 0x0f80, 0x0f84 }, { 0x0f86, 0x0f87 }, { 0x0f90, 0x0f97 },
- { 0x0f99, 0x0fbc }, { 0x0fc6, 0x0fc6 }, { 0x102d, 0x1030 },
+ { 0x09BC, 0x09BC }, { 0x09C1, 0x09C4 }, { 0x09CD, 0x09CD },
+ { 0x09E2, 0x09E3 }, { 0x0A01, 0x0A02 }, { 0x0A3C, 0x0A3C },
+ { 0x0A41, 0x0A42 }, { 0x0A47, 0x0A48 }, { 0x0A4B, 0x0A4D },
+ { 0x0A70, 0x0A71 }, { 0x0A81, 0x0A82 }, { 0x0ABC, 0x0ABC },
+ { 0x0AC1, 0x0AC5 }, { 0x0AC7, 0x0AC8 }, { 0x0ACD, 0x0ACD },
+ { 0x0AE2, 0x0AE3 }, { 0x0B01, 0x0B01 }, { 0x0B3C, 0x0B3C },
+ { 0x0B3F, 0x0B3F }, { 0x0B41, 0x0B43 }, { 0x0B4D, 0x0B4D },
+ { 0x0B56, 0x0B56 }, { 0x0B82, 0x0B82 }, { 0x0BC0, 0x0BC0 },
+ { 0x0BCD, 0x0BCD }, { 0x0C3E, 0x0C40 }, { 0x0C46, 0x0C48 },
+ { 0x0C4A, 0x0C4D }, { 0x0C55, 0x0C56 }, { 0x0CBC, 0x0CBC },
+ { 0x0CBF, 0x0CBF }, { 0x0CC6, 0x0CC6 }, { 0x0CCC, 0x0CCD },
+ { 0x0CE2, 0x0CE3 }, { 0x0D41, 0x0D43 }, { 0x0D4D, 0x0D4D },
+ { 0x0DCA, 0x0DCA }, { 0x0DD2, 0x0DD4 }, { 0x0DD6, 0x0DD6 },
+ { 0x0E31, 0x0E31 }, { 0x0E34, 0x0E3A }, { 0x0E47, 0x0E4E },
+ { 0x0EB1, 0x0EB1 }, { 0x0EB4, 0x0EB9 }, { 0x0EBB, 0x0EBC },
+ { 0x0EC8, 0x0ECD }, { 0x0F18, 0x0F19 }, { 0x0F35, 0x0F35 },
+ { 0x0F37, 0x0F37 }, { 0x0F39, 0x0F39 }, { 0x0F71, 0x0F7E },
+ { 0x0F80, 0x0F84 }, { 0x0F86, 0x0F87 }, { 0x0F90, 0x0F97 },
+ { 0x0F99, 0x0FBC }, { 0x0FC6, 0x0FC6 }, { 0x102D, 0x1030 },
{ 0x1032, 0x1032 }, { 0x1036, 0x1037 }, { 0x1039, 0x1039 },
- { 0x1058, 0x1059 }, { 0x1160, 0x11ff }, { 0x135f, 0x135f },
+ { 0x1058, 0x1059 }, { 0x1160, 0x11FF }, { 0x135F, 0x135F },
{ 0x1712, 0x1714 }, { 0x1732, 0x1734 }, { 0x1752, 0x1753 },
- { 0x1772, 0x1773 }, { 0x17b4, 0x17b5 }, { 0x17b7, 0x17bd },
- { 0x17c6, 0x17c6 }, { 0x17c9, 0x17d3 }, { 0x17dd, 0x17dd },
- { 0x180b, 0x180d }, { 0x18a9, 0x18a9 }, { 0x1920, 0x1922 },
- { 0x1927, 0x1928 }, { 0x1932, 0x1932 }, { 0x1939, 0x193b },
- { 0x1a17, 0x1a18 }, { 0x1b00, 0x1b03 }, { 0x1b34, 0x1b34 },
- { 0x1b36, 0x1b3a }, { 0x1b3c, 0x1b3c }, { 0x1b42, 0x1b42 },
- { 0x1b6b, 0x1b73 }, { 0x1dc0, 0x1dca }, { 0x1dfe, 0x1dff },
- { 0x200b, 0x200f }, { 0x202a, 0x202e }, { 0x2060, 0x2063 },
- { 0x206a, 0x206f }, { 0x20d0, 0x20ef }, { 0x302a, 0x302f },
- { 0x3099, 0x309a }, { 0xa806, 0xa806 }, { 0xa80b, 0xa80b },
- { 0xa825, 0xa826 }, { 0xfb1e, 0xfb1e }, { 0xfe00, 0xfe0f },
- { 0xfe20, 0xfe23 }, { 0xfeff, 0xfeff }, { 0xfff9, 0xfffb },
- { 0x10a01, 0x10a03 }, { 0x10a05, 0x10a06 }, { 0x10a0c, 0x10a0f },
- { 0x10a38, 0x10a3a }, { 0x10a3f, 0x10a3f }, { 0x1d167, 0x1d169 },
- { 0x1d173, 0x1d182 }, { 0x1d185, 0x1d18b }, { 0x1d1aa, 0x1d1ad },
- { 0x1d242, 0x1d244 }, { 0xe0001, 0xe0001 }, { 0xe0020, 0xe007f },
- { 0xe0100, 0xe01ef }
+ { 0x1772, 0x1773 }, { 0x17B4, 0x17B5 }, { 0x17B7, 0x17BD },
+ { 0x17C6, 0x17C6 }, { 0x17C9, 0x17D3 }, { 0x17DD, 0x17DD },
+ { 0x180B, 0x180D }, { 0x18A9, 0x18A9 }, { 0x1920, 0x1922 },
+ { 0x1927, 0x1928 }, { 0x1932, 0x1932 }, { 0x1939, 0x193B },
+ { 0x1A17, 0x1A18 }, { 0x1B00, 0x1B03 }, { 0x1B34, 0x1B34 },
+ { 0x1B36, 0x1B3A }, { 0x1B3C, 0x1B3C }, { 0x1B42, 0x1B42 },
+ { 0x1B6B, 0x1B73 }, { 0x1DC0, 0x1DCA }, { 0x1DFE, 0x1DFF },
+ { 0x200B, 0x200F }, { 0x202A, 0x202E }, { 0x2060, 0x2063 },
+ { 0x206A, 0x206F }, { 0x20D0, 0x20EF }, { 0x302A, 0x302F },
+ { 0x3099, 0x309A }, { 0xA806, 0xA806 }, { 0xA80B, 0xA80B },
+ { 0xA825, 0xA826 }, { 0xFB1E, 0xFB1E }, { 0xFE00, 0xFE0F },
+ { 0xFE20, 0xFE23 }, { 0xFEFF, 0xFEFF }, { 0xFFF9, 0xFFFB },
+ { 0x10A01, 0x10A03 }, { 0x10A05, 0x10A06 }, { 0x10A0C, 0x10A0F },
+ { 0x10A38, 0x10A3A }, { 0x10A3F, 0x10A3F }, { 0x1D167, 0x1D169 },
+ { 0x1D173, 0x1D182 }, { 0x1D185, 0x1D18B }, { 0x1D1AA, 0x1D1AD },
+ { 0x1D242, 0x1D244 }, { 0xE0001, 0xE0001 }, { 0xE0020, 0xE007F },
+ { 0xE0100, 0xE01EF }
};
- int min = 0;
- int max = sizeof(combining) / sizeof(struct interval) - 1;
- int mid;
/* test for 8-bit control characters */
if (ucs == 0)
@@ -130,20 +191,10 @@ int wcwidth_ucs(wchar_t ucs)
if (ucs < 32 || (ucs >= 0x7f && ucs < 0xa0))
return -1;
- /* first quick check for Latin-1 etc. characters */
- if (ucs < combining[0].first)
- return 1;
-
/* binary search in table of non-spacing characters */
- while (max >= min) {
- mid = (min + max) / 2;
- if (combining[mid].last < ucs)
- min = mid + 1;
- else if (combining[mid].first > ucs)
- max = mid - 1;
- else if (combining[mid].first <= ucs && combining[mid].last >= ucs)
- return 0;
- }
+ if (bisearch(ucs, combining,
+ sizeof(combining) / sizeof(struct interval) - 1))
+ return 0;
/* if we arrive here, ucs is not a combining or C0/C1 control character */
@@ -151,7 +202,7 @@ int wcwidth_ucs(wchar_t ucs)
if (ucs < 0x1100)
return 1;
- return 1 +
+ return 1 +
(ucs >= 0x1100 &&
(ucs <= 0x115f || /* Hangul Jamo init. consonants */
ucs == 0x2329 || ucs == 0x232a ||
@@ -167,15 +218,120 @@ int wcwidth_ucs(wchar_t ucs)
(ucs >= 0x30000 && ucs <= 0x3fffd)));
}
-#endif /* !HAVE_WC_FUNCS */
+#if 0 /* original */
+int wcswidth_ucs(const wchar_t *pwcs, size_t n)
+{
+ int w, width = 0;
+
+ for (;*pwcs && n-- > 0; pwcs++)
+ if ((w = wcwidth_ucs(*pwcs)) < 0)
+ return -1;
+ else
+ width += w;
+
+ return width;
+}
+#endif
+
+/*
+ * The following functions are the same as wcwidth_ucs() and
+ * wcswidth_ucs(), except that spacing characters in the East Asian
+ * Ambiguous (A) category as defined in Unicode Technical Report #11
+ * have a column width of 2. This variant might be useful for users of
+ * CJK legacy encodings who want to migrate to UCS without changing
+ * the traditional terminal character-width behaviour. It is not
+ * otherwise recommended for general use.
+ */
+/*
+ * In addition to the explanation mentioned above,
+ * several characters in the East Asian Narrow (Na) and Not East Asian
+ * (Neutral) category as defined in Unicode Technical Report #11
+ * actually have a column width of 2 in CJK legacy encodings.
+ */
+int wcwidth_cjk(wchar_t ucs)
+{
+ /* sorted list of non-overlapping intervals of East Asian Ambiguous
+ * characters, generated by "uniset +WIDTH-A -cat=Me -cat=Mn -cat=Cf c" */
+ static const struct interval ambiguous[] = {
+ { 0x00A1, 0x00A1 }, { 0x00A4, 0x00A4 }, { 0x00A7, 0x00A8 },
+ { 0x00AA, 0x00AA }, { 0x00AE, 0x00AE }, { 0x00B0, 0x00B4 },
+ { 0x00B6, 0x00BA }, { 0x00BC, 0x00BF }, { 0x00C6, 0x00C6 },
+ { 0x00D0, 0x00D0 }, { 0x00D7, 0x00D8 }, { 0x00DE, 0x00E1 },
+ { 0x00E6, 0x00E6 }, { 0x00E8, 0x00EA }, { 0x00EC, 0x00ED },
+ { 0x00F0, 0x00F0 }, { 0x00F2, 0x00F3 }, { 0x00F7, 0x00FA },
+ { 0x00FC, 0x00FC }, { 0x00FE, 0x00FE }, { 0x0101, 0x0101 },
+ { 0x0111, 0x0111 }, { 0x0113, 0x0113 }, { 0x011B, 0x011B },
+ { 0x0126, 0x0127 }, { 0x012B, 0x012B }, { 0x0131, 0x0133 },
+ { 0x0138, 0x0138 }, { 0x013F, 0x0142 }, { 0x0144, 0x0144 },
+ { 0x0148, 0x014B }, { 0x014D, 0x014D }, { 0x0152, 0x0153 },
+ { 0x0166, 0x0167 }, { 0x016B, 0x016B }, { 0x01CE, 0x01CE },
+ { 0x01D0, 0x01D0 }, { 0x01D2, 0x01D2 }, { 0x01D4, 0x01D4 },
+ { 0x01D6, 0x01D6 }, { 0x01D8, 0x01D8 }, { 0x01DA, 0x01DA },
+ { 0x01DC, 0x01DC }, { 0x0251, 0x0251 }, { 0x0261, 0x0261 },
+ { 0x02C4, 0x02C4 }, { 0x02C7, 0x02C7 }, { 0x02C9, 0x02CB },
+ { 0x02CD, 0x02CD }, { 0x02D0, 0x02D0 }, { 0x02D8, 0x02DB },
+ { 0x02DD, 0x02DD }, { 0x02DF, 0x02DF }, { 0x0391, 0x03A1 },
+ { 0x03A3, 0x03A9 }, { 0x03B1, 0x03C1 }, { 0x03C3, 0x03C9 },
+ { 0x0401, 0x0401 }, { 0x0410, 0x044F }, { 0x0451, 0x0451 },
+ { 0x2010, 0x2010 }, { 0x2013, 0x2016 }, { 0x2018, 0x2019 },
+ { 0x201C, 0x201D }, { 0x2020, 0x2022 }, { 0x2024, 0x2027 },
+ { 0x2030, 0x2030 }, { 0x2032, 0x2033 }, { 0x2035, 0x2035 },
+ { 0x203B, 0x203B }, { 0x203E, 0x203E }, { 0x2074, 0x2074 },
+ { 0x207F, 0x207F }, { 0x2081, 0x2084 }, { 0x20AC, 0x20AC },
+ { 0x2103, 0x2103 }, { 0x2105, 0x2105 }, { 0x2109, 0x2109 },
+ { 0x2113, 0x2113 }, { 0x2116, 0x2116 }, { 0x2121, 0x2122 },
+ { 0x2126, 0x2126 }, { 0x212B, 0x212B }, { 0x2153, 0x2154 },
+ { 0x215B, 0x215E }, { 0x2160, 0x216B }, { 0x2170, 0x2179 },
+ { 0x2190, 0x2199 }, { 0x21B8, 0x21B9 }, { 0x21D2, 0x21D2 },
+ { 0x21D4, 0x21D4 }, { 0x21E7, 0x21E7 }, { 0x2200, 0x2200 },
+ { 0x2202, 0x2203 }, { 0x2207, 0x2208 }, { 0x220B, 0x220B },
+ { 0x220F, 0x220F }, { 0x2211, 0x2211 }, { 0x2215, 0x2215 },
+ { 0x221A, 0x221A }, { 0x221D, 0x2220 }, { 0x2223, 0x2223 },
+ { 0x2225, 0x2225 }, { 0x2227, 0x222C }, { 0x222E, 0x222E },
+ { 0x2234, 0x2237 }, { 0x223C, 0x223D }, { 0x2248, 0x2248 },
+ { 0x224C, 0x224C }, { 0x2252, 0x2252 }, { 0x2260, 0x2261 },
+ { 0x2264, 0x2267 }, { 0x226A, 0x226B }, { 0x226E, 0x226F },
+ { 0x2282, 0x2283 }, { 0x2286, 0x2287 }, { 0x2295, 0x2295 },
+ { 0x2299, 0x2299 }, { 0x22A5, 0x22A5 }, { 0x22BF, 0x22BF },
+ { 0x2312, 0x2312 }, { 0x2460, 0x24E9 }, { 0x24EB, 0x254B },
+ { 0x2550, 0x2573 }, { 0x2580, 0x258F }, { 0x2592, 0x2595 },
+ { 0x25A0, 0x25A1 }, { 0x25A3, 0x25A9 }, { 0x25B2, 0x25B3 },
+ { 0x25B6, 0x25B7 }, { 0x25BC, 0x25BD }, { 0x25C0, 0x25C1 },
+ { 0x25C6, 0x25C8 }, { 0x25CB, 0x25CB }, { 0x25CE, 0x25D1 },
+ { 0x25E2, 0x25E5 }, { 0x25EF, 0x25EF }, { 0x2605, 0x2606 },
+ { 0x2609, 0x2609 }, { 0x260E, 0x260F }, { 0x2614, 0x2615 },
+ { 0x261C, 0x261C }, { 0x261E, 0x261E }, { 0x2640, 0x2640 },
+ { 0x2642, 0x2642 }, { 0x2660, 0x2661 }, { 0x2663, 0x2665 },
+ { 0x2667, 0x266A }, { 0x266C, 0x266D }, { 0x266F, 0x266F },
+ { 0x273D, 0x273D }, { 0x2776, 0x277F }, { 0xE000, 0xF8FF },
+ { 0xFFFD, 0xFFFD }, { 0xF0000, 0xFFFFD }, { 0x100000, 0x10FFFD }
+ };
+
+ /* For Japanese legacy encodings, the following characters are added. */
+ static const struct interval legacy_ja[] = {
+ { 0x00A2, 0x00A3 }, { 0x00A5, 0x00A6 }, { 0x00AC, 0x00AC },
+ { 0x00AF, 0x00AF }, { 0x2212, 0x2212 }
+ };
+
+ /* binary search in table of non-spacing characters */
+ if (bisearch(ucs, ambiguous,
+ sizeof(ambiguous) / sizeof(struct interval) - 1))
+ return 2;
+ if (bisearch(ucs, legacy_ja,
+ sizeof(legacy_ja) / sizeof(struct interval) - 1))
+ return 2;
+
+ return wcwidth_ucs(ucs);
+}
+
#if 0 /* original */
-int wcswidth(const wchar_t *pwcs, size_t n)
+int wcswidth_cjk(const wchar_t *pwcs, size_t n)
{
int w, width = 0;
for (;*pwcs && n-- > 0; pwcs++)
- if ((w = wcwidth(*pwcs)) < 0)
+ if ((w = wcwidth_cjk(*pwcs)) < 0)
return -1;
else
width += w;
@@ -183,3 +339,4 @@ int wcswidth(const wchar_t *pwcs, size_t n)
return width;
}
#endif
+#endif /* !HAVE_WC_FUNCS || USE_CJK_WIDTH */
diff --git a/PATCHES b/PATCHES
index e69de29..17743fd 100644
--- a/PATCHES
+++ b/PATCHES
@@ -0,0 +1,5 @@
+patch-1.5.23.tt+yy.delete_prefix.1
+patch-1.5.23.tt.create_rfc2047_params.1
+patch-1.5.23.tt.sanitize_ja.1
+patch-1.5.23.tt.cjk_width_tree_chars.1
+patch-1.5.23.tt.wcwidth.1
diff --git a/charset.c b/charset.c
index 2411f2c..6a5cbd4 100644
--- a/charset.c
+++ b/charset.c
@@ -481,6 +481,9 @@ int mutt_convert_string (char **ps, const char *from, const char *to, int flags)
if (!s || !*s)
return 0;
+ if (option (OPTSANITIZEJACHARS) && !ascii_strncasecmp (from, "iso-2022-jp", 11))
+ mutt_sanitize_ja_chars (s, mutt_strlen(s), 0);
+
if (to && from && (cd = mutt_iconv_open (to, from, flags)) != (iconv_t)-1)
{
int len;
@@ -677,3 +680,188 @@ int mutt_check_charset (const char *s, int strict)
return -1;
}
+
+/*
+ * mutt_sanitize_ja_chars()
+ * Adapted by TAKIZAWA Takashi <taki@cyber.email.ne.jp>
+ *
+ * - It replaces undefined KANJI characters to GETA mark.
+ * - It replaces character of 'JIS X 0201 kana' to '?'.
+ * - If $charset is EUC-JP, it replaces third character 'J' of
+ * escape sequence switching to 'JIS X 0201 latin' to 'B' indicating
+ * 'US-ASCII'.
+ * - If $charset is Shift_JIS, it replaces third character 'B' of
+ * escape sequence switching to 'US-ASCII' to 'J' indicating
+ * 'JIS X 0201 latin'.
+ */
+
+#define ASCII 0
+#define JISX0201LATIN 1
+#define JISX0201KANA 2
+#define JISX0208 3
+#define OTHER_CS 4
+
+void mutt_sanitize_ja_chars(char *s, size_t len, int keep_state)
+{
+ static int cs = ASCII;
+ static int kanji_cont = 0;
+ static int illegal_kanji = 0;
+ static int es = 0;
+ static char pes = '\0';
+ static char ascii_3rd_char = 'B';
+ static char jisx0201_3rd_char = 'J';
+
+ char *p = s;
+ char *p1 = NULL;
+ unsigned char c;
+
+ if (!keep_state || *p == 0x1b) /* consideration about mbstate's buffer */
+ {
+ if (!ascii_strcasecmp (Charset, "euc-jp"))
+ jisx0201_3rd_char = 'B';
+ else if (!ascii_strcasecmp (Charset, "shift_jis"))
+ ascii_3rd_char = 'J';
+ cs = ASCII;
+ kanji_cont = 0;
+ illegal_kanji = 0;
+ es = 0;
+ pes = '\0';
+ }
+
+ for (;p - s < len;p++)
+ {
+ if (es == 0)
+ {
+ if (*p == 0x1b)
+ es++;
+ else
+ {
+ switch (cs)
+ {
+ case ASCII:
+ case JISX0201LATIN:
+ break;
+ case JISX0201KANA:
+ *p = '?';
+ break;
+ case JISX0208:
+ /* replace ku-ten code from 9 to 15 and 85 or more to "GETA MARK" */
+ c = (unsigned char)*p;
+ if (! kanji_cont)
+ {
+ if ((size_t)(p - s + 1) == len)
+ return; /* the last character is a primary byte of KANJI */
+ if (c <= 0x20 || (c >= 0x29 && c <= 0x2f)
+ || (c >= 0x75 && c <= 0xa0))
+ illegal_kanji = 1;
+ kanji_cont = 1;
+ p1 = p;
+ }
+ else
+ {
+ if (c <= 0x20 || c >= 0x7f)
+ illegal_kanji = 1;
+ if (illegal_kanji && p1)
+ *p1 = 0x22, *p = 0x2e;
+ kanji_cont = 0;
+ illegal_kanji = 0;
+ }
+ break;
+ }
+ }
+ }
+ else if (es == 1)
+ {
+ if (*p == '$' || (*p >= '(' && *p <= '/' && *p != ','))
+ {
+ es++;
+ pes = *p;
+ }
+ else
+ {
+ es = 0;
+ return; /* broken */
+ }
+ }
+ else if (es == 2)
+ {
+ if (pes == '(')
+ {
+ switch (*p)
+ {
+ case 'B':
+ cs = ASCII, *p = ascii_3rd_char;
+ break;
+ case 'J':
+ cs = JISX0201LATIN, *p = jisx0201_3rd_char;
+ break;
+ case 'I':
+ /* ready to replace character to '?' */
+ cs = JISX0201KANA, *p = ascii_3rd_char;
+ break;
+ default:
+ cs = OTHER_CS;
+ }
+ es = 0;
+ }
+ else if (pes == '$')
+ {
+ switch (*p)
+ {
+ case '@': /* JIS X 0208-1978 */
+ case 'B': /* JIS X 0208-1983 */
+ cs = JISX0208;
+ es = 0;
+ break;
+ case 'A':
+ cs = OTHER_CS; /* GB 2312 */
+ es = 0;
+ break;
+ case '(':
+ case ')':
+ case '*':
+ case '+':
+ case '-':
+ case '.':
+ case '/':
+ es++;
+ break;
+ default:
+ es = 0;
+ return; /* broken */
+ }
+ }
+ else
+ {
+ cs = OTHER_CS;
+ es = 0;
+ }
+ }
+ else /* es == 3 */
+ {
+ cs = OTHER_CS;
+ es = 0;
+ }
+ }
+}
+
+int mutt_copy_bytes_sanitize_ja (FILE *in, FILE *out, size_t size)
+{
+ char buf[2048];
+ size_t chunk;
+
+ mutt_sanitize_ja_chars (NULL, 0, 0);
+ while (size > 0)
+ {
+ chunk = (size > sizeof (buf)) ? sizeof (buf) : size;
+ if ((chunk = fread (buf, 1, chunk, in)) < 1)
+ break;
+ mutt_sanitize_ja_chars (buf, chunk, 1);
+ if (fwrite (buf, 1, chunk, out) != chunk)
+ return (-1);
+ size -= chunk;
+ }
+
+ return 0;
+}
+
diff --git a/charset.h b/charset.h
index 54891f0..d67b209 100644
--- a/charset.h
+++ b/charset.h
@@ -36,6 +36,9 @@ int iconv_close (iconv_t);
int mutt_convert_string (char **, const char *, const char *, int);
+void mutt_sanitize_ja_chars (char *, size_t, int);
+int mutt_copy_bytes_sanitize_ja (FILE *, FILE *, size_t);
+
iconv_t mutt_iconv_open (const char *, const char *, int);
size_t mutt_iconv (iconv_t, ICONV_CONST char **, size_t *, char **, size_t *, ICONV_CONST char **, const char *);
diff --git a/configure.ac b/configure.ac
index 4155018..f14a841 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1415,6 +1415,16 @@ fi
dnl -- locales --
+AC_ARG_ENABLE(cjk-ambiguous-width, AC_HELP_STRING([--enable-cjk-ambiguous-width], [ Enable East Asian Ambiguous characters support (using own wcwidth)]),
+ [ if test "x$enableval" = "xyes" ; then
+ cjk_width=yes
+ fi
+ ])
+if test "x$cjk_width" = "xyes" ; then
+ AC_DEFINE(USE_CJK_WIDTH,1,[ Define if you want to support East Asian Ambiguous class. ])
+ MUTT_LIB_OBJECTS="$MUTT_LIB_OBJECTS wcwidth.o"
+fi
+
AC_CHECK_HEADERS(wchar.h)
AC_CACHE_CHECK([for wchar_t], mutt_cv_wchar_t,
@@ -1485,7 +1495,10 @@ fi
if test $wc_funcs = yes; then
AC_DEFINE(HAVE_WC_FUNCS,1,[ Define if you are using the system's wchar_t functions. ])
else
- MUTT_LIB_OBJECTS="$MUTT_LIB_OBJECTS utf8.o wcwidth.o"
+ MUTT_LIB_OBJECTS="$MUTT_LIB_OBJECTS utf8.o"
+ if test "x$cjk_width" != "xyes"; then
+ MUTT_LIB_OBJECTS="$MUTT_LIB_OBJECTS wcwidth.o"
+ fi
fi
AC_CACHE_CHECK([for nl_langinfo and CODESET], mutt_cv_langinfo_codeset,
diff --git a/curs_lib.c b/curs_lib.c
index eef8dd2..db3ac2e 100644
--- a/curs_lib.c
+++ b/curs_lib.c
@@ -1177,7 +1177,14 @@ void mutt_format_string (char *dest, size_t destlen,
wc = replacement_char ();
}
if (arboreal && wc < MUTT_TREE_MAX)
- w = 1; /* hack */
+ {
+#ifdef USE_CJK_WIDTH
+ if (Charset_is_utf8 && option (OPTCJKWIDTHTREECHARS) && !option (OPTASCIICHARS))
+ w = wcwidth (TreeChars[wc]);
+ else
+#endif
+ w = 1;
+ }
else
{
#ifdef HAVE_ISWBLANK
@@ -1406,10 +1413,12 @@ int mutt_strwidth (const char *s)
int w;
size_t k, n;
mbstate_t mbstate;
+ int arboreal;
if (!s) return 0;
n = mutt_strlen (s);
+ arboreal = (s[0] < MUTT_TREE_MAX) ? 1 : 0;
memset (&mbstate, 0, sizeof (mbstate));
for (w=0; n && (k = mbrtowc (&wc, s, n, &mbstate)); s += k, n -= k)
@@ -1421,9 +1430,21 @@ int mutt_strwidth (const char *s)
k = (k == (size_t)(-1)) ? 1 : n;
wc = replacement_char ();
}
- if (!IsWPrint (wc))
- wc = '?';
- w += wcwidth (wc);
+ if (wc < MUTT_TREE_MAX && arboreal && k == 1)
+ {
+#ifdef USE_CJK_WIDTH
+ if (Charset_is_utf8 && option (OPTCJKWIDTHTREECHARS) && !option (OPTASCIICHARS))
+ w += wcwidth (TreeChars[wc]);
+ else
+#endif
+ w++;
+ }
+ else
+ {
+ if (!IsWPrint (wc))
+ wc = '?';
+ w += wcwidth (wc);
+ }
}
return w;
}
diff --git a/doc/makedoc-defs.h b/doc/makedoc-defs.h
index 78a4ebc..dd872ba 100644
--- a/doc/makedoc-defs.h
+++ b/doc/makedoc-defs.h
@@ -31,10 +31,10 @@
# ifndef USE_SOCKET
# define USE_SOCKET
# endif
-# ifndef USE_DOTLOCK
+# if !defined(USE_DOTLOCK) && !defined(NO_USE_HARDLINK)
# define USE_DOTLOCK
# endif
-# ifndef DL_STANDALONE
+# if !defined(DL_STANDALONE) && !defined(NO_USE_HARDLINK)
# define DL_STANDALONE
# endif
# ifndef USE_HCACHE
diff --git a/dotlock.c b/dotlock.c
index 5bf0348..5d87850 100644
--- a/dotlock.c
+++ b/dotlock.c
@@ -52,13 +52,13 @@
#include <getopt.h>
#endif
-#ifdef DL_STANDALONE
+#if defined(DL_STANDALONE) && !defined(NO_USE_HARDLINK)
# include "reldate.h"
#endif
#define MAXLINKS 1024 /* maximum link depth */
-#ifdef DL_STANDALONE
+#if defined(DL_STANDALONE) && !defined(NO_USE_HARDLINK)
# define LONG_STRING 1024
# define MAXLOCKATTEMPT 5
@@ -96,7 +96,7 @@ extern int snprintf (char *, size_t, const char *, ...);
static int DotlockFlags;
static int Retry = MAXLOCKATTEMPT;
-#ifdef DL_STANDALONE
+#if defined(DL_STANDALONE) && !defined(NO_USE_HARDLINK)
static char *Hostname;
#endif
@@ -110,7 +110,7 @@ static int dotlock_prepare (char *, size_t, const char *, int fd);
static int dotlock_check_stats (struct stat *, struct stat *);
static int dotlock_dispatch (const char *, int fd);
-#ifdef DL_STANDALONE
+#if defined(DL_STANDALONE) && !defined(NO_USE_HARDLINK)
static int dotlock_init_privs (void);
static void usage (const char *);
#endif
@@ -130,7 +130,7 @@ static int dotlock_unlink (const char *);
static int dotlock_lock (const char *);
-#ifdef DL_STANDALONE
+#if defined(DL_STANDALONE) && !defined(NO_USE_HARDLINK)
#define check_flags(a) if (a & DL_FL_ACTIONS) usage (argv[0])
@@ -327,7 +327,7 @@ END_PRIVILEGED (void)
#endif
}
-#ifdef DL_STANDALONE
+#if defined(DL_STANDALONE) && !defined(NO_USE_HARDLINK)
/*
* Usage information.
diff --git a/globals.h b/globals.h
index 2aa96f9..4a51586 100644
--- a/globals.h
+++ b/globals.h
@@ -24,7 +24,7 @@ WHERE CONTEXT *Context;
WHERE char Errorbuf[STRING];
WHERE char AttachmentMarker[STRING];
-#if defined(DL_STANDALONE) && defined(USE_DOTLOCK)
+#if defined(DL_STANDALONE) && defined(USE_DOTLOCK) && !defined(NO_USE_HARDLINK)
WHERE char *MuttDotlock;
#endif
@@ -300,9 +300,31 @@ const char * const Months[] = { "Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul",
const char * const BodyTypes[] = { "x-unknown", "audio", "application", "image", "message", "model", "multipart", "text", "video" };
const char * const BodyEncodings[] = { "x-unknown", "7bit", "8bit", "quoted-printable", "base64", "binary", "x-uuencoded" };
+#ifdef USE_CJK_WIDTH
+const wchar_t TreeChars[] =
+{
+ 0xFEFF, /* not used */
+ 0x2514, /* M_TREE_LLCORNER WACS_LLCORNER */
+ 0x250C, /* M_TREE_ULCORNER WACS_ULCORNER */
+ 0x251C, /* M_TREE_LTEE WACS_LTEE */
+ 0x2500, /* M_TREE_HLINE WACS_HLINE */
+ 0x2502, /* M_TREE_VLINE WACS_VLINE */
+ 0x0020, /* M_TREE_SPACE */
+ 0x003E, /* M_TREE_RARROW */
+ 0x002A, /* M_TREE_STAR fake thread indicator */
+ 0x0026, /* M_TREE_HIDDEN */
+ 0x003D, /* M_TREE_EQUALS */
+ 0x252C, /* M_TREE_TTEE WACS_TTEE */
+ 0x2534, /* M_TREE_BTEE WACS_BTEE */
+ 0x003F /* M_TREE_MISSING */
+};
+#endif /* USE_CJK_WIDTH */
#else
extern const char * const Weekdays[];
extern const char * const Months[];
+#ifdef USE_CJK_WIDTH
+extern const wchar_t TreeChars[];
+#endif /* USE_CJK_WIDTH */
#endif
#ifdef MAIN_C
diff --git a/handler.c b/handler.c
index 7ce53f9..ab69527 100644
--- a/handler.c
+++ b/handler.c
@@ -100,6 +100,9 @@ static void mutt_convert_to_state(iconv_t cd, char *bufi, size_t *l, STATE *s)
return;
}
+ if (option (OPTSANITIZEJACHARS) && strchr (bufi, 0x1b))
+ mutt_sanitize_ja_chars (bufi, *l, 1);
+
ib = bufi, ibl = *l;
for (;;)
{
@@ -1312,6 +1315,7 @@ static int autoview_handler (BODY *a, STATE *s)
int piped = FALSE;
pid_t thepid;
int rc = 0;
+ char *charset;
snprintf (type, sizeof (type), "%s/%s", TYPE (a), a->subtype);
rfc1524_mailcap_lookup (a, type, entry, MUTT_AUTOVIEW);
@@ -1342,6 +1346,10 @@ static int autoview_handler (BODY *a, STATE *s)
return -1;
}
+ charset = mutt_get_parameter ("charset", a->parameter);
+ if (charset && option (OPTSANITIZEJACHARS) && !ascii_strncasecmp (charset,"iso-2022-jp", 11))
+ mutt_copy_bytes_sanitize_ja (s->fpin, fpin, a->length);
+ else
mutt_copy_bytes (s->fpin, fpin, a->length);
if(!piped)
diff --git a/hdrline.c b/hdrline.c
index ea76e83..c3a4d92 100644
--- a/hdrline.c
+++ b/hdrline.c
@@ -272,6 +272,7 @@ hdr_format_str (char *dest,
#define THREAD_NEW (threads && hdr->collapsed && hdr->num_hidden > 1 && mutt_thread_contains_unread (ctx, hdr) == 1)
#define THREAD_OLD (threads && hdr->collapsed && hdr->num_hidden > 1 && mutt_thread_contains_unread (ctx, hdr) == 2)
size_t len;
+ char *subj;
hdr = hfi->hdr;
ctx = hfi->ctx;
@@ -590,6 +591,7 @@ hdr_format_str (char *dest,
subj = apply_subject_mods(hdr->env);
else
subj = hdr->env->subject;
+ subj = option (OPTDELETEPREFIX) ? hdr->env->real_subj : hdr->env->subject;
if (flags & MUTT_FORMAT_TREE && !hdr->collapsed)
{
if (flags & MUTT_FORMAT_FORCESUBJ)
diff --git a/init.h b/init.h
index e632ab9..eb8fd8a 100644
--- a/init.h
+++ b/init.h
@@ -440,6 +440,31 @@ struct option_t MuttVars[] = {
** this variable is \fIunset\fP, no check for new mail is performed
** while the mailbox is open.
*/
+#ifdef USE_CJK_WIDTH
+ { "cjk_width", DT_BOOL, R_NONE, OPTCJKWIDTH, 0 },
+ /*
+ ** .pp
+ ** When this option is set, characters in the East Asian Ambiguous (A)
+ ** category as defined in Unicode Technical Report #11 have a column
+ ** width of 2. Othrwise, they have a column width of 1.
+ ** This variant might be useful for users of CJK legacy encodings
+ ** who want to migrate to UCS without changing the traditional terminal
+ ** character-width behaviour.
+ ** .pp
+ ** \fBNote:\fP this option only affects in UTF-8 encoding.
+ */
+ { "cjk_width_tree_chars", DT_BOOL, R_NONE, OPTCJKWIDTHTREECHARS, 0 },
+ /*
+ ** .pp
+ ** If \fIset\fP, Mutt will use the result of $cjk_width as a column
+ ** width of WACS characters when displaying thread and attachment trees.
+ ** This variant might be useful for users of CJK legacy encodings
+ ** who want to migrate to UCS without changing the traditional terminal
+ ** character-width behaviour.
+ ** .pp
+ ** \fBNote:\fP this option only affects in UTF-8 encoding.
+ */
+#endif
{ "collapse_unread", DT_BOOL, R_NONE, OPTCOLLAPSEUNREAD, 1 },
/*
** .pp
@@ -645,6 +670,17 @@ struct option_t MuttVars[] = {
** If \fI``no''\fP, never attempt to verify cryptographic signatures.
** (Crypto only)
*/
+ { "create_rfc2047_parameters", DT_BOOL, R_NONE, OPTCREATERFC2047PARAMS, 0 },
+ /*
+ ** .pp
+ ** When this variable is set, Mutt will add the following RFC-2047-encoded
+ ** MIME parameter to Content-Type header field as filename for attachment:
+ ** name="=?iso-2022-jp?B?GyRCO244MxsoQi50eHQ=?="
+ ** .pp
+ ** Note: this use of RFC 2047's encoding is explicitly prohibited
+ ** by the standard. You may set this variable only if a mailer
+ ** of recipients can not parse RFC 2231 parameters.
+ */
{ "date_format", DT_STR, R_MENU, UL &DateFmt, UL "!%a, %b %d, %Y at %I:%M:%S%p %Z" },
/*
** .pp
@@ -696,6 +732,19 @@ struct option_t MuttVars[] = {
** If this option is \fIset\fP, mutt's received-attachments menu will not show the subparts of
** individual messages in a multipart/digest. To see these subparts, press ``v'' on that menu.
*/
+ { "delete_prefix", DT_BOOL, R_NONE, OPTDELETEPREFIX, 0 },
+ /*
+ ** .pp
+ ** If set, prefix in Subject: field generated by some mailing lists
+ ** (something like "Subject: [foo-ML:0012] real-subject") can be deleted
+ ** when displaying in index-mode and editing in message reply.
+ ** Deletion pattern can be configured by $$delete_regexp variable.
+ */
+ { "delete_regexp", DT_RX, R_NONE, UL &DeleteRegexp, UL "^(\\[[A-Za-z0-9_.: \\-]*\\][ ]*)" },
+ /*
+ ** .pp
+ ** A regular expression used in $$delete_prefix function.
+ */
{ "display_filter", DT_PATH, R_PAGER, UL &DisplayFilter, UL "" },
/*
** .pp
@@ -703,7 +752,7 @@ struct option_t MuttVars[] = {
** is viewed it is passed as standard input to $$display_filter, and the
** filtered message is read from the standard output.
*/
-#if defined(DL_STANDALONE) && defined(USE_DOTLOCK)
+#if defined(DL_STANDALONE) && defined(USE_DOTLOCK) && !defined(NO_USE_HARDLINK)
{ "dotlock_program", DT_PATH, R_NONE, UL &MuttDotlock, UL BINDIR "/mutt_dotlock" },
/*
** .pp
@@ -2773,6 +2822,28 @@ struct option_t MuttVars[] = {
** that mutt \fIgenerates\fP this kind of encoding. Instead, mutt will
** unconditionally use the encoding specified in RFC2231.
*/
+ { "sanitize_ja_chars", DT_BOOL, R_NONE, OPTSANITIZEJACHARS, 0 },
+ /*
+ ** .pp
+ ** When set, Japanese "platform dependent characters" (illegal
+ ** characters for iso-2022-jp charset; mainly used by MS-Windows
+ ** mailers) are substituted to special character, GETA mark ('ESC $$ B " .
+ ** ESC ( B' in iso-2022-jp), and JIS X 0201 kana characters
+ ** (only for "ESC ) I" cases) are also substituted to "?" to
+ ** prevent garbage characters. JIS X 0201 kana characters are
+ ** not substituted if they appear in 8bit form.
+ ** .pp
+ ** This fixes another Japanese encoding issue. In case $$charset
+ ** is set to "EUC-JP", which does not contain JIS X 0201 roman
+ ** character set, the JIS X 0201 roman part of received messages
+ ** encoded in iso-2022-jp can not be converted to EUC-JP.
+ ** On the other hand, the ASCII part can not be converted to
+ ** Shift_JIS, which does not contain ASCII character set. Thus,
+ ** the converted characters are garbled in these cases. When this
+ ** option is set, the JIS X 0201 roman escape sequence and the
+ ** ASCII escape sequence are replaced appropriately to prevent
+ ** the output from being garbled.
+ */
{ "save_address", DT_BOOL, R_NONE, OPTSAVEADDRESS, 0 },
/*
** .pp
diff --git a/lib.c b/lib.c
index 583d2ff..1f61b39 100644
--- a/lib.c
+++ b/lib.c
@@ -445,6 +445,10 @@ int safe_symlink(const char *oldpath, const char *newpath)
int safe_rename (const char *src, const char *target)
{
+#ifdef NO_USE_HARDLINK
+ /* Android (since 6.0) does not support hardlinks. */
+ return rename(src, target);
+#else
struct stat ssb, tsb;
if (!src || !target)
@@ -537,6 +541,7 @@ int safe_rename (const char *src, const char *target)
return 0;
+#endif /* NO_USE_HARDLINK */
}
diff --git a/main.c b/main.c
index de5a538..17c5524 100644
--- a/main.c
+++ b/main.c
@@ -267,25 +267,25 @@ static void show_version (void)
"-USE_SETGID "
#endif
-#ifdef USE_DOTLOCK
+#if defined(USE_DOTLOCK) && !defined(NO_USE_HARDLINK)
"+USE_DOTLOCK "
#else
"-USE_DOTLOCK "
#endif
-#ifdef DL_STANDALONE
+#if defined(DL_STANDALONE) && !defined(NO_USE_HARDLINK)
"+DL_STANDALONE "
#else
"-DL_STANDALONE "
#endif
-#ifdef USE_FCNTL
+#if defined(USE_FCNTL) || defined(NO_USE_HARDLINK)
"+USE_FCNTL "
#else
"-USE_FCNTL "
#endif
-#ifdef USE_FLOCK
+#if defined(USE_FLOCK) || defined(NO_USE_HARDLINK)
"+USE_FLOCK "
#else
"-USE_FLOCK "
@@ -452,6 +452,12 @@ static void show_version (void)
"-LOCALES_HACK "
#endif
+#ifdef USE_CJK_WIDTH
+ "+USE_CJK_WIDTH "
+#else
+ "-USE_CJK_WIDTH "
+#endif
+
#ifdef HAVE_WC_FUNCS
"+HAVE_WC_FUNCS "
#else
diff --git a/mbyte.c b/mbyte.c
index 0eedaa7..8032bd3 100644
--- a/mbyte.c
+++ b/mbyte.c
@@ -17,7 +17,7 @@
*/
/*
- * Japanese support by TAKIZAWA Takashi <taki@luna.email.ne.jp>.
+ * CJK support by TAKIZAWA Takashi <taki@luna.email.ne.jp>.
*/
#if HAVE_CONFIG_H
@@ -37,8 +37,8 @@
#endif
int Charset_is_utf8 = 0;
+static int charset_is_cjk = 0;
#ifndef HAVE_WC_FUNCS
-static int charset_is_ja = 0;
static iconv_t charset_to_utf8 = (iconv_t)(-1);
static iconv_t charset_from_utf8 = (iconv_t)(-1);
#endif
@@ -50,8 +50,8 @@ void mutt_set_charset (char *charset)
mutt_canonical_charset (buffer, sizeof (buffer), charset);
Charset_is_utf8 = 0;
+ charset_is_cjk = 0;
#ifndef HAVE_WC_FUNCS
- charset_is_ja = 0;
if (charset_to_utf8 != (iconv_t)(-1))
{
iconv_close (charset_to_utf8);
@@ -66,12 +66,18 @@ void mutt_set_charset (char *charset)
if (mutt_is_utf8 (buffer))
Charset_is_utf8 = 1;
-#ifndef HAVE_WC_FUNCS
- else if (!ascii_strcasecmp(buffer, "euc-jp") || !ascii_strcasecmp(buffer, "shift_jis")
- || !ascii_strcasecmp(buffer, "cp932") || !ascii_strcasecmp(buffer, "eucJP-ms"))
+ else if (!ascii_strcasecmp (buffer, "gb2312") ||
+ !ascii_strcasecmp (buffer, "gb18030") ||
+ !ascii_strcasecmp (buffer, "big5") ||
+ !ascii_strcasecmp (buffer, "euc-tw") ||
+ !ascii_strcasecmp (buffer, "EUC-JP") ||
+ !ascii_strcasecmp (buffer, "eucJP-ms") ||
+ !ascii_strcasecmp (buffer, "Shift_JIS") ||
+ !ascii_strcasecmp (buffer, "cp932") ||
+ !ascii_strcasecmp (buffer, "euc-kr"))
{
- charset_is_ja = 1;
-
+ charset_is_cjk = 1;
+#ifndef HAVE_WC_FUNCS
/* Note flags=0 to skip charset-hooks: User masters the $charset
* name, and we are sure of our "utf-8" constant. So there is no
* possibility of wrong name that we would want to try to correct
@@ -80,24 +86,68 @@ void mutt_set_charset (char *charset)
*/
charset_to_utf8 = mutt_iconv_open ("utf-8", charset, 0);
charset_from_utf8 = mutt_iconv_open (charset, "utf-8", 0);
- }
#endif
+ }
#if defined(HAVE_BIND_TEXTDOMAIN_CODESET) && defined(ENABLE_NLS)
bind_textdomain_codeset(PACKAGE, buffer);
#endif
}
+#if !defined(HAVE_WC_FUNCS) || defined(USE_CJK_WIDTH)
+/*
+ * For systems that don't have wcwidth() which functions correctly,
+ * we provide our own wcwidth().
+ * Furthermore, this wcwidth() enables change of character-cell width of
+ * the East Asian Ambiguous class by using $cjk_width.
+ * The function which most systems have cannot do it.
+ * Please read the comment of wcwidth.c about the East Asian Ambiguous
+ * class for details.
+ */
+int wcwidth_ucs(wchar_t ucs);
+int wcwidth_cjk(wchar_t ucs);
+
+int wcwidth (wchar_t wc)
+{
+ if (!Charset_is_utf8)
+ {
+ if (!charset_is_cjk)
+ {
+ /* 8-bit case */
+ if (!wc)
+ return 0;
+ else if ((0 <= wc && wc < 256) && IsPrint (wc))
+ return 1;
+ else
+ return -1;
+ }
+ else
+ {
+ /* CJK */
+ return wcwidth_cjk (wc);
+ }
+ }
+ else {
+#ifdef USE_CJK_WIDTH
+ if (option (OPTCJKWIDTH))
+ return wcwidth_cjk (wc);
+#endif /* USE_CJK_WIDTH */
+ return wcwidth_ucs (wc);
+ }
+}
+#endif /* !HAVE_WC_FUNCS || USE_CJK_WIDTH */
+
+
#ifndef HAVE_WC_FUNCS
/*
* For systems that don't have them, we provide here our own
- * implementations of wcrtomb(), mbrtowc(), iswprint() and wcwidth().
+ * implementations of wcrtomb(), mbrtowc() and iswprint().
* Instead of using the locale, as these functions normally would,
* we use Mutt's Charset variable. We support 3 types of charset:
* (1) For 8-bit charsets, wchar_t uses the same encoding as char.
* (2) For UTF-8, wchar_t uses UCS.
- * (3) For stateless Japanese encodings, we use UCS and convert
+ * (3) For stateless CJK encodings, we use UCS and convert
* via UTF-8 using iconv.
* Unfortunately, we can't handle non-stateless encodings.
*/
@@ -256,7 +306,7 @@ size_t mbrtowc (wchar_t *pwc, const char *s, size_t n, mbstate_t *ps)
int iswprint (wint_t wc)
{
- if (Charset_is_utf8 || charset_is_ja)
+ if (Charset_is_utf8 || charset_is_cjk)
return ((0x20 <= wc && wc < 0x7f) || 0xa0 <= wc);
else
return (0 <= wc && wc < 256) ? IsPrint (wc) : 0;
@@ -264,7 +314,7 @@ int iswprint (wint_t wc)
int iswspace (wint_t wc)
{
- if (Charset_is_utf8 || charset_is_ja)
+ if (Charset_is_utf8 || charset_is_cjk)
return (9 <= wc && wc <= 13) || wc == 32;
else
return (0 <= wc && wc < 256) ? isspace (wc) : 0;
@@ -347,7 +397,7 @@ static int iswalpha_ucs (wint_t wc)
wint_t towupper (wint_t wc)
{
- if (Charset_is_utf8 || charset_is_ja)
+ if (Charset_is_utf8 || charset_is_cjk)
return towupper_ucs (wc);
else
return (0 <= wc && wc < 256) ? toupper (wc) : wc;
@@ -355,7 +405,7 @@ wint_t towupper (wint_t wc)
wint_t towlower (wint_t wc)
{
- if (Charset_is_utf8 || charset_is_ja)
+ if (Charset_is_utf8 || charset_is_cjk)
return towlower_ucs (wc);
else
return (0 <= wc && wc < 256) ? tolower (wc) : wc;
@@ -363,7 +413,7 @@ wint_t towlower (wint_t wc)
int iswalnum (wint_t wc)
{
- if (Charset_is_utf8 || charset_is_ja)
+ if (Charset_is_utf8 || charset_is_cjk)
return iswalnum_ucs (wc);
else
return (0 <= wc && wc < 256) ? isalnum (wc) : 0;
@@ -371,7 +421,7 @@ int iswalnum (wint_t wc)
int iswalpha (wint_t wc)
{
- if (Charset_is_utf8 || charset_is_ja)
+ if (Charset_is_utf8 || charset_is_cjk)
return iswalpha_ucs (wc);
else
return (0 <= wc && wc < 256) ? isalpha (wc) : 0;
@@ -379,58 +429,12 @@ int iswalpha (wint_t wc)
int iswupper (wint_t wc)
{
- if (Charset_is_utf8 || charset_is_ja)
+ if (Charset_is_utf8 || charset_is_cjk)
return iswupper_ucs (wc);
else
return (0 <= wc && wc < 256) ? isupper (wc) : 0;
}
-/*
- * l10n for Japanese:
- * Symbols, Greek and Cyrillic in JIS X 0208, Japanese Kanji
- * Character Set, have a column width of 2.
- */
-int wcwidth_ja (wchar_t ucs)
-{
- if (ucs >= 0x3021)
- return -1; /* continue with the normal check */
- /* a rough range for quick check */
- if ((ucs >= 0x00a1 && ucs <= 0x00fe) || /* Latin-1 Supplement */
- (ucs >= 0x0391 && ucs <= 0x0451) || /* Greek and Cyrillic */
- (ucs >= 0x2010 && ucs <= 0x266f) || /* Symbols */
- (ucs >= 0x3000 && ucs <= 0x3020)) /* CJK Symbols and Punctuation */
- return 2;
- else
- return -1;
-}
-
-int wcwidth_ucs(wchar_t ucs);
-
-int wcwidth (wchar_t wc)
-{
- if (!Charset_is_utf8)
- {
- if (!charset_is_ja)
- {
- /* 8-bit case */
- if (!wc)
- return 0;
- else if ((0 <= wc && wc < 256) && IsPrint (wc))
- return 1;
- else
- return -1;
- }
- else
- {
- /* Japanese */
- int k = wcwidth_ja (wc);
- if (k != -1)
- return k;
- }
- }
- return wcwidth_ucs (wc);
-}
-
size_t utf8rtowc (wchar_t *pwc, const char *s, size_t n, mbstate_t *_ps)
{
static wchar_t mbstate;
diff --git a/mbyte.h b/mbyte.h
index 9c58c9e..224cafb 100644
--- a/mbyte.h
+++ b/mbyte.h
@@ -8,6 +8,12 @@
# ifdef HAVE_WCTYPE_H
# include <wctype.h>
# endif
+# ifdef USE_CJK_WIDTH
+#ifdef wcwidth
+# undef wcwidth
+#endif
+int wcwidth (wchar_t wc);
+# endif /* USE_CJK_WIDTH */
# endif
# ifndef HAVE_WC_FUNCS
@@ -32,6 +38,9 @@
#ifdef iswupper
# undef iswupper
#endif
+#ifdef wcwidth
+# undef wcwidth
+#endif
size_t wcrtomb (char *s, wchar_t wc, mbstate_t *ps);
size_t mbrtowc (wchar_t *pwc, const char *s, size_t n, mbstate_t *ps);
int iswprint (wint_t wc);
@@ -44,7 +53,6 @@ wint_t towlower (wint_t wc);
int wcwidth (wchar_t wc);
# endif /* !HAVE_WC_FUNCS */
-
void mutt_set_charset (char *charset);
extern int Charset_is_utf8;
size_t utf8rtowc (wchar_t *pwc, const char *s, size_t n, mbstate_t *_ps);
diff --git a/mutt.h b/mutt.h
index 7f49c11..b592bce 100644
--- a/mutt.h
+++ b/mutt.h
@@ -362,10 +362,16 @@ enum
OPTBROWSERABBRMAILBOXES,
OPTCHECKMBOXSIZE,
OPTCHECKNEW,
+#ifdef USE_CJK_WIDTH
+ OPTCJKWIDTH,
+ OPTCJKWIDTHTREECHARS,
+#endif /* USE_CJK_WIDTH */
OPTCOLLAPSEUNREAD,
OPTCONFIRMAPPEND,
OPTCONFIRMCREATE,
+ OPTCREATERFC2047PARAMS,
OPTDELETEUNTAG,
+ OPTDELETEPREFIX,
OPTDIGESTCOLLAPSE,
OPTDUPTHREADS,
OPTEDITHDRS,
@@ -465,6 +471,7 @@ enum
OPTREVNAME,
OPTREVREAL,
OPTRFC2047PARAMS,
+ OPTSANITIZEJACHARS,
OPTSAVEADDRESS,
OPTSAVEEMPTY,
OPTSAVENAME,
diff --git a/mutt_regex.h b/mutt_regex.h
index f10ecbe..3cd36e0 100644
--- a/mutt_regex.h
+++ b/mutt_regex.h
@@ -52,5 +52,6 @@ WHERE REGEXP QuoteRegexp;
WHERE REGEXP ReplyRegexp;
WHERE REGEXP Smileys;
WHERE REGEXP GecosMask;
+WHERE REGEXP DeleteRegexp;
#endif /* MUTT_REGEX_H */
diff --git a/mx.c b/mx.c
index a015bf4..ea416b3 100644
--- a/mx.c
+++ b/mx.c
@@ -47,7 +47,7 @@
#include "buffy.h"
-#ifdef USE_DOTLOCK
+#if defined(USE_DOTLOCK) && !defined(NO_USE_HARDLINK)
#include "dotlock.h"
#endif
@@ -95,13 +95,13 @@ struct mx_ops* mx_get_ops (int magic)
#define mutt_is_spool(s) (mutt_strcmp (Spoolfile, s) == 0)
-#ifdef USE_DOTLOCK
+#if defined(USE_DOTLOCK) && !defined(NO_USE_HARDLINK)
/* parameters:
* path - file to lock
* retry - should retry if unable to lock?
*/
-#ifdef DL_STANDALONE
+#if defined(DL_STANDALONE) && !defined(NO_USE_HARDLINK)
static int invoke_dotlock (const char *path, int dummy, int flags, int retry)
{
@@ -181,14 +181,14 @@ static int undotlock_file (const char *path, int fd)
*/
int mx_lock_file (const char *path, int fd, int excl, int dot, int timeout)
{
-#if defined (USE_FCNTL) || defined (USE_FLOCK)
+#if defined (USE_FCNTL) || defined (USE_FLOCK) || defined(NO_USE_HARDLINK)
int count;
int attempt;
struct stat sb = { 0 }, prev_sb = { 0 }; /* silence gcc warnings */
#endif
int r = 0;
-#ifdef USE_FCNTL
+#if defined(USE_FCNTL) || defined(NO_USE_HARDLINK)
struct flock lck;
memset (&lck, 0, sizeof (struct flock));
@@ -227,7 +227,7 @@ int mx_lock_file (const char *path, int fd, int excl, int dot, int timeout)
}
#endif /* USE_FCNTL */
-#ifdef USE_FLOCK
+#if defined(USE_FLOCK) || defined(NO_USE_HARDLINK)
count = 0;
attempt = 0;
while (flock (fd, (excl ? LOCK_EX : LOCK_SH) | LOCK_NB) == -1)
@@ -261,7 +261,7 @@ int mx_lock_file (const char *path, int fd, int excl, int dot, int timeout)
}
#endif /* USE_FLOCK */
-#ifdef USE_DOTLOCK
+#if defined(USE_DOTLOCK) && !defined(NO_USE_HARDLINK)
if (r == 0 && dot)
r = dotlock_file (path, fd, timeout);
#endif /* USE_DOTLOCK */
@@ -270,12 +270,12 @@ int mx_lock_file (const char *path, int fd, int excl, int dot, int timeout)
{
/* release any other locks obtained in this routine */
-#ifdef USE_FCNTL
+#if defined(USE_FCNTL) || defined(NO_USE_HARDLINK)
lck.l_type = F_UNLCK;
fcntl (fd, F_SETLK, &lck);
#endif /* USE_FCNTL */
-#ifdef USE_FLOCK
+#if defined(USE_FLOCK) || defined(NO_USE_HARDLINK)
flock (fd, LOCK_UN);
#endif /* USE_FLOCK */
}
@@ -285,7 +285,7 @@ int mx_lock_file (const char *path, int fd, int excl, int dot, int timeout)
int mx_unlock_file (const char *path, int fd, int dot)
{
-#ifdef USE_FCNTL
+#if defined(USE_FCNTL) || defined(NO_USE_HARDLINK)
struct flock unlockit = { F_UNLCK, 0, 0, 0, 0 };
memset (&unlockit, 0, sizeof (struct flock));
@@ -294,11 +294,11 @@ int mx_unlock_file (const char *path, int fd, int dot)
fcntl (fd, F_SETLK, &unlockit);
#endif
-#ifdef USE_FLOCK
+#if defined(USE_FLOCK) || defined(NO_USE_HARDLINK)
flock (fd, LOCK_UN);
#endif
-#ifdef USE_DOTLOCK
+#if defined(USE_DOTLOCK) && !defined(NO_USE_HARDLINK)
if (dot)
undotlock_file (path, fd);
#endif
@@ -309,7 +309,7 @@ int mx_unlock_file (const char *path, int fd, int dot)
static void mx_unlink_empty (const char *path)
{
int fd;
-#ifndef USE_DOTLOCK
+#if !defined(USE_DOTLOCK) || defined(NO_USE_HARDLINK)
struct stat sb;
#endif
@@ -322,7 +322,7 @@ static void mx_unlink_empty (const char *path)
return;
}
-#ifdef USE_DOTLOCK
+#if defined(USE_DOTLOCK) && !defined(NO_USE_HARDLINK)
invoke_dotlock (path, fd, DL_FL_UNLINK, 1);
#else
if (fstat (fd, &sb) == 0 && sb.st_size == 0)
diff --git a/parse.c b/parse.c
index 0ae5594..745d2fc 100644
--- a/parse.c
+++ b/parse.c
@@ -1453,6 +1453,18 @@ ENVELOPE *mutt_read_rfc822_header (FILE *f, HEADER *hdr, short user_hdrs,
e->real_subj = e->subject + pmatch[0].rm_eo;
else
e->real_subj = e->subject;
+ if (option (OPTDELETEPREFIX))
+ {
+ /* if this option is set, mutt will delete the string as [prefix],
+ * [prefix:number] and [prefix number] in Subject line.
+ */
+ if (regexec (DeleteRegexp.rx, e->real_subj, 1, pmatch, 0) == 0)
+ {
+ e->real_subj = e->real_subj + pmatch[0].rm_eo;
+ if (regexec (ReplyRegexp.rx, e->real_subj, 1, pmatch, 0) == 0)
+ e->real_subj = e->real_subj + pmatch[0].rm_eo;
+ }
+ }
}
if (hdr->received < 0)
diff --git a/rfc2047.c b/rfc2047.c
index 8506425..e907b25 100644
--- a/rfc2047.c
+++ b/rfc2047.c
@@ -62,6 +62,9 @@ static size_t convert_string (ICONV_CONST char *f, size_t flen,
size_t obl, n;
int e;
+ if (option (OPTSANITIZEJACHARS) && !ascii_strncasecmp (from, "iso-2022-jp",
+11))
+ mutt_sanitize_ja_chars ((char *) f, flen, 0);
cd = mutt_iconv_open (to, from, 0);
if (cd == (iconv_t)(-1))
return (size_t)(-1);
diff --git a/sendlib.c b/sendlib.c
index ef42854..4503693 100644
--- a/sendlib.c
+++ b/sendlib.c
@@ -348,6 +348,30 @@ int mutt_write_mime_header (BODY *a, FILE *f)
}
}
+ if (a->use_disp && option (OPTCREATERFC2047PARAMS))
+ {
+ if(!(fn = a->d_filename))
+ fn = a->filename;
+
+ if (fn)
+ {
+ char *tmp;
+
+ /* Strip off the leading path... */
+ if ((t = strrchr (fn, '/')))
+ t++;
+ else
+ t = fn;
+
+ buffer[0] = 0;
+ tmp = safe_strdup (t);
+ rfc2047_encode_string (&tmp);
+ rfc822_cat (buffer, sizeof (buffer), tmp, MimeSpecials);
+ FREE (&tmp);
+ fprintf (f, ";\n\tname=%s", buffer);
+ }
+ }
+
fputc ('\n', f);
if (a->description)
diff --git a/wcwidth.c b/wcwidth.c
index 0b94d73..85a1397 100644
--- a/wcwidth.c
+++ b/wcwidth.c
@@ -5,6 +5,51 @@
* http://www.opengroup.org/onlinepubs/007904975/functions/wcwidth.html
* http://www.opengroup.org/onlinepubs/007904975/functions/wcswidth.html
*
+ * In fixed-width output devices, Latin characters all occupy a single
+ * "cell" position of equal width, whereas ideographic CJK characters
+ * occupy two such cells. Interoperability between terminal-line
+ * applications and (teletype-style) character terminals using the
+ * UTF-8 encoding requires agreement on which character should advance
+ * the cursor by how many cell positions. No established formal
+ * standards exist at present on which Unicode character shall occupy
+ * how many cell positions on character terminals. These routines are
+ * a first attempt of defining such behavior based on simple rules
+ * applied to data provided by the Unicode Consortium.
+ *
+ * For some graphical characters, the Unicode standard explicitly
+ * defines a character-cell width via the definition of the East Asian
+ * FullWidth (F), Wide (W), Half-width (H), and Narrow (Na) classes.
+ * In all these cases, there is no ambiguity about which width a
+ * terminal shall use. For characters in the East Asian Ambiguous (A)
+ * class, the width choice depends purely on a preference of backward
+ * compatibility with either historic CJK or Western practice.
+ * Choosing single-width for these characters is easy to justify as
+ * the appropriate long-term solution, as the CJK practice of
+ * displaying these characters as double-width comes from historic
+ * implementation simplicity (8-bit encoded characters were displayed
+ * single-width and 16-bit ones double-width, even for Greek,
+ * Cyrillic, etc.) and not any typographic considerations.
+ *
+ * Much less clear is the choice of width for the Not East Asian
+ * (Neutral) class. Existing practice does not dictate a width for any
+ * of these characters. It would nevertheless make sense
+ * typographically to allocate two character cells to characters such
+ * as for instance EM SPACE or VOLUME INTEGRAL, which cannot be
+ * represented adequately with a single-width glyph. The following
+ * routines at present merely assign a single-cell width to all
+ * neutral characters, in the interest of simplicity. This is not
+ * entirely satisfactory and should be reconsidered before
+ * establishing a formal standard in this area. At the moment, the
+ * decision which Not East Asian (Neutral) characters should be
+ * represented by double-width glyphs cannot yet be answered by
+ * applying a simple rule from the Unicode database content. Setting
+ * up a proper standard for the behavior of UTF-8 character terminals
+ * will require a careful analysis not only of each Unicode character,
+ * but also of each presentation form, something the author of these
+ * routines has avoided to do so far.
+ *
+ * http://www.unicode.org/unicode/reports/tr11/
+ *
* Markus Kuhn -- 2007-05-26 (Unicode 5.0)
*
* Permission to use, copy, modify, and distribute this software
@@ -24,12 +69,34 @@
# include "config.h"
#endif
-#ifndef HAVE_WC_FUNCS
+#if !defined(HAVE_WC_FUNCS) || defined(USE_CJK_WIDTH)
+#include <wchar.h>
-#include "mutt.h"
-#include "mbyte.h"
+struct interval {
+ wchar_t first;
+ wchar_t last;
+};
+
+/* auxiliary function for binary search in interval table */
+static int bisearch(wchar_t ucs, const struct interval *table, int max) {
+ int min = 0;
+ int mid;
+
+ if (ucs < table[0].first || ucs > table[max].last)
+ return 0;
+ while (max >= min) {
+ mid = (min + max) / 2;
+ if (ucs > table[mid].last)
+ min = mid + 1;
+ else if (ucs < table[mid].first)
+ max = mid - 1;
+ else
+ return 1;
+ }
+
+ return 0;
+}
-#include <ctype.h>
/* The following two functions define the column width of an ISO 10646
* character as follows:
@@ -67,62 +134,56 @@ int wcwidth_ucs(wchar_t ucs)
{
/* sorted list of non-overlapping intervals of non-spacing characters */
/* generated by "uniset +cat=Me +cat=Mn +cat=Cf -00AD +1160-11FF +200B c" */
- static const struct interval {
- wchar_t first;
- wchar_t last;
- } combining[] = {
- { 0x0300, 0x036f }, { 0x0483, 0x0486 }, { 0x0488, 0x0489 },
- { 0x0591, 0x05bd }, { 0x05bf, 0x05bf }, { 0x05c1, 0x05c2 },
- { 0x05c4, 0x05c5 }, { 0x05c7, 0x05c7 }, { 0x0600, 0x0603 },
- { 0x0610, 0x0615 }, { 0x064b, 0x065e }, { 0x0670, 0x0670 },
- { 0x06d6, 0x06e4 }, { 0x06e7, 0x06e8 }, { 0x06ea, 0x06ed },
- { 0x070f, 0x070f }, { 0x0711, 0x0711 }, { 0x0730, 0x074a },
- { 0x07a6, 0x07b0 }, { 0x07eb, 0x07f3 }, { 0x0901, 0x0902 },
- { 0x093c, 0x093c }, { 0x0941, 0x0948 }, { 0x094d, 0x094d },
+ static const struct interval combining[] = {
+ { 0x0300, 0x036F }, { 0x0483, 0x0486 }, { 0x0488, 0x0489 },
+ { 0x0591, 0x05BD }, { 0x05BF, 0x05BF }, { 0x05C1, 0x05C2 },
+ { 0x05C4, 0x05C5 }, { 0x05C7, 0x05C7 }, { 0x0600, 0x0603 },
+ { 0x0610, 0x0615 }, { 0x064B, 0x065E }, { 0x0670, 0x0670 },
+ { 0x06D6, 0x06E4 }, { 0x06E7, 0x06E8 }, { 0x06EA, 0x06ED },
+ { 0x070F, 0x070F }, { 0x0711, 0x0711 }, { 0x0730, 0x074A },
+ { 0x07A6, 0x07B0 }, { 0x07EB, 0x07F3 }, { 0x0901, 0x0902 },
+ { 0x093C, 0x093C }, { 0x0941, 0x0948 }, { 0x094D, 0x094D },
{ 0x0951, 0x0954 }, { 0x0962, 0x0963 }, { 0x0981, 0x0981 },
- { 0x09bc, 0x09bc }, { 0x09c1, 0x09c4 }, { 0x09cd, 0x09cd },
- { 0x09e2, 0x09e3 }, { 0x0a01, 0x0a02 }, { 0x0a3c, 0x0a3c },
- { 0x0a41, 0x0a42 }, { 0x0a47, 0x0a48 }, { 0x0a4b, 0x0a4d },
- { 0x0a70, 0x0a71 }, { 0x0a81, 0x0a82 }, { 0x0abc, 0x0abc },
- { 0x0ac1, 0x0ac5 }, { 0x0ac7, 0x0ac8 }, { 0x0acd, 0x0acd },
- { 0x0ae2, 0x0ae3 }, { 0x0b01, 0x0b01 }, { 0x0b3c, 0x0b3c },
- { 0x0b3f, 0x0b3f }, { 0x0b41, 0x0b43 }, { 0x0b4d, 0x0b4d },
- { 0x0b56, 0x0b56 }, { 0x0b82, 0x0b82 }, { 0x0bc0, 0x0bc0 },
- { 0x0bcd, 0x0bcd }, { 0x0c3e, 0x0c40 }, { 0x0c46, 0x0c48 },
- { 0x0c4a, 0x0c4d }, { 0x0c55, 0x0c56 }, { 0x0cbc, 0x0cbc },
- { 0x0cbf, 0x0cbf }, { 0x0cc6, 0x0cc6 }, { 0x0ccc, 0x0ccd },
- { 0x0ce2, 0x0ce3 }, { 0x0d41, 0x0d43 }, { 0x0d4d, 0x0d4d },
- { 0x0dca, 0x0dca }, { 0x0dd2, 0x0dd4 }, { 0x0dd6, 0x0dd6 },
- { 0x0e31, 0x0e31 }, { 0x0e34, 0x0e3a }, { 0x0e47, 0x0e4e },
- { 0x0eb1, 0x0eb1 }, { 0x0eb4, 0x0eb9 }, { 0x0ebb, 0x0ebc },
- { 0x0ec8, 0x0ecd }, { 0x0f18, 0x0f19 }, { 0x0f35, 0x0f35 },
- { 0x0f37, 0x0f37 }, { 0x0f39, 0x0f39 }, { 0x0f71, 0x0f7e },
- { 0x0f80, 0x0f84 }, { 0x0f86, 0x0f87 }, { 0x0f90, 0x0f97 },
- { 0x0f99, 0x0fbc }, { 0x0fc6, 0x0fc6 }, { 0x102d, 0x1030 },
+ { 0x09BC, 0x09BC }, { 0x09C1, 0x09C4 }, { 0x09CD, 0x09CD },
+ { 0x09E2, 0x09E3 }, { 0x0A01, 0x0A02 }, { 0x0A3C, 0x0A3C },
+ { 0x0A41, 0x0A42 }, { 0x0A47, 0x0A48 }, { 0x0A4B, 0x0A4D },
+ { 0x0A70, 0x0A71 }, { 0x0A81, 0x0A82 }, { 0x0ABC, 0x0ABC },
+ { 0x0AC1, 0x0AC5 }, { 0x0AC7, 0x0AC8 }, { 0x0ACD, 0x0ACD },
+ { 0x0AE2, 0x0AE3 }, { 0x0B01, 0x0B01 }, { 0x0B3C, 0x0B3C },
+ { 0x0B3F, 0x0B3F }, { 0x0B41, 0x0B43 }, { 0x0B4D, 0x0B4D },
+ { 0x0B56, 0x0B56 }, { 0x0B82, 0x0B82 }, { 0x0BC0, 0x0BC0 },
+ { 0x0BCD, 0x0BCD }, { 0x0C3E, 0x0C40 }, { 0x0C46, 0x0C48 },
+ { 0x0C4A, 0x0C4D }, { 0x0C55, 0x0C56 }, { 0x0CBC, 0x0CBC },
+ { 0x0CBF, 0x0CBF }, { 0x0CC6, 0x0CC6 }, { 0x0CCC, 0x0CCD },
+ { 0x0CE2, 0x0CE3 }, { 0x0D41, 0x0D43 }, { 0x0D4D, 0x0D4D },
+ { 0x0DCA, 0x0DCA }, { 0x0DD2, 0x0DD4 }, { 0x0DD6, 0x0DD6 },
+ { 0x0E31, 0x0E31 }, { 0x0E34, 0x0E3A }, { 0x0E47, 0x0E4E },
+ { 0x0EB1, 0x0EB1 }, { 0x0EB4, 0x0EB9 }, { 0x0EBB, 0x0EBC },
+ { 0x0EC8, 0x0ECD }, { 0x0F18, 0x0F19 }, { 0x0F35, 0x0F35 },
+ { 0x0F37, 0x0F37 }, { 0x0F39, 0x0F39 }, { 0x0F71, 0x0F7E },
+ { 0x0F80, 0x0F84 }, { 0x0F86, 0x0F87 }, { 0x0F90, 0x0F97 },
+ { 0x0F99, 0x0FBC }, { 0x0FC6, 0x0FC6 }, { 0x102D, 0x1030 },
{ 0x1032, 0x1032 }, { 0x1036, 0x1037 }, { 0x1039, 0x1039 },
- { 0x1058, 0x1059 }, { 0x1160, 0x11ff }, { 0x135f, 0x135f },
+ { 0x1058, 0x1059 }, { 0x1160, 0x11FF }, { 0x135F, 0x135F },
{ 0x1712, 0x1714 }, { 0x1732, 0x1734 }, { 0x1752, 0x1753 },
- { 0x1772, 0x1773 }, { 0x17b4, 0x17b5 }, { 0x17b7, 0x17bd },
- { 0x17c6, 0x17c6 }, { 0x17c9, 0x17d3 }, { 0x17dd, 0x17dd },
- { 0x180b, 0x180d }, { 0x18a9, 0x18a9 }, { 0x1920, 0x1922 },
- { 0x1927, 0x1928 }, { 0x1932, 0x1932 }, { 0x1939, 0x193b },
- { 0x1a17, 0x1a18 }, { 0x1b00, 0x1b03 }, { 0x1b34, 0x1b34 },
- { 0x1b36, 0x1b3a }, { 0x1b3c, 0x1b3c }, { 0x1b42, 0x1b42 },
- { 0x1b6b, 0x1b73 }, { 0x1dc0, 0x1dca }, { 0x1dfe, 0x1dff },
- { 0x200b, 0x200f }, { 0x202a, 0x202e }, { 0x2060, 0x2063 },
- { 0x206a, 0x206f }, { 0x20d0, 0x20ef }, { 0x302a, 0x302f },
- { 0x3099, 0x309a }, { 0xa806, 0xa806 }, { 0xa80b, 0xa80b },
- { 0xa825, 0xa826 }, { 0xfb1e, 0xfb1e }, { 0xfe00, 0xfe0f },
- { 0xfe20, 0xfe23 }, { 0xfeff, 0xfeff }, { 0xfff9, 0xfffb },
- { 0x10a01, 0x10a03 }, { 0x10a05, 0x10a06 }, { 0x10a0c, 0x10a0f },
- { 0x10a38, 0x10a3a }, { 0x10a3f, 0x10a3f }, { 0x1d167, 0x1d169 },
- { 0x1d173, 0x1d182 }, { 0x1d185, 0x1d18b }, { 0x1d1aa, 0x1d1ad },
- { 0x1d242, 0x1d244 }, { 0xe0001, 0xe0001 }, { 0xe0020, 0xe007f },
- { 0xe0100, 0xe01ef }
+ { 0x1772, 0x1773 }, { 0x17B4, 0x17B5 }, { 0x17B7, 0x17BD },
+ { 0x17C6, 0x17C6 }, { 0x17C9, 0x17D3 }, { 0x17DD, 0x17DD },
+ { 0x180B, 0x180D }, { 0x18A9, 0x18A9 }, { 0x1920, 0x1922 },
+ { 0x1927, 0x1928 }, { 0x1932, 0x1932 }, { 0x1939, 0x193B },
+ { 0x1A17, 0x1A18 }, { 0x1B00, 0x1B03 }, { 0x1B34, 0x1B34 },
+ { 0x1B36, 0x1B3A }, { 0x1B3C, 0x1B3C }, { 0x1B42, 0x1B42 },
+ { 0x1B6B, 0x1B73 }, { 0x1DC0, 0x1DCA }, { 0x1DFE, 0x1DFF },
+ { 0x200B, 0x200F }, { 0x202A, 0x202E }, { 0x2060, 0x2063 },
+ { 0x206A, 0x206F }, { 0x20D0, 0x20EF }, { 0x302A, 0x302F },
+ { 0x3099, 0x309A }, { 0xA806, 0xA806 }, { 0xA80B, 0xA80B },
+ { 0xA825, 0xA826 }, { 0xFB1E, 0xFB1E }, { 0xFE00, 0xFE0F },
+ { 0xFE20, 0xFE23 }, { 0xFEFF, 0xFEFF }, { 0xFFF9, 0xFFFB },
+ { 0x10A01, 0x10A03 }, { 0x10A05, 0x10A06 }, { 0x10A0C, 0x10A0F },
+ { 0x10A38, 0x10A3A }, { 0x10A3F, 0x10A3F }, { 0x1D167, 0x1D169 },
+ { 0x1D173, 0x1D182 }, { 0x1D185, 0x1D18B }, { 0x1D1AA, 0x1D1AD },
+ { 0x1D242, 0x1D244 }, { 0xE0001, 0xE0001 }, { 0xE0020, 0xE007F },
+ { 0xE0100, 0xE01EF }
};
- int min = 0;
- int max = sizeof(combining) / sizeof(struct interval) - 1;
- int mid;
/* test for 8-bit control characters */
if (ucs == 0)
@@ -130,20 +191,10 @@ int wcwidth_ucs(wchar_t ucs)
if (ucs < 32 || (ucs >= 0x7f && ucs < 0xa0))
return -1;
- /* first quick check for Latin-1 etc. characters */
- if (ucs < combining[0].first)
- return 1;
-
/* binary search in table of non-spacing characters */
- while (max >= min) {
- mid = (min + max) / 2;
- if (combining[mid].last < ucs)
- min = mid + 1;
- else if (combining[mid].first > ucs)
- max = mid - 1;
- else if (combining[mid].first <= ucs && combining[mid].last >= ucs)
- return 0;
- }
+ if (bisearch(ucs, combining,
+ sizeof(combining) / sizeof(struct interval) - 1))
+ return 0;
/* if we arrive here, ucs is not a combining or C0/C1 control character */
@@ -151,7 +202,7 @@ int wcwidth_ucs(wchar_t ucs)
if (ucs < 0x1100)
return 1;
- return 1 +
+ return 1 +
(ucs >= 0x1100 &&
(ucs <= 0x115f || /* Hangul Jamo init. consonants */
ucs == 0x2329 || ucs == 0x232a ||
@@ -167,15 +218,120 @@ int wcwidth_ucs(wchar_t ucs)
(ucs >= 0x30000 && ucs <= 0x3fffd)));
}
-#endif /* !HAVE_WC_FUNCS */
+#if 0 /* original */
+int wcswidth_ucs(const wchar_t *pwcs, size_t n)
+{
+ int w, width = 0;
+
+ for (;*pwcs && n-- > 0; pwcs++)
+ if ((w = wcwidth_ucs(*pwcs)) < 0)
+ return -1;
+ else
+ width += w;
+
+ return width;
+}
+#endif
+
+/*
+ * The following functions are the same as wcwidth_ucs() and
+ * wcswidth_ucs(), except that spacing characters in the East Asian
+ * Ambiguous (A) category as defined in Unicode Technical Report #11
+ * have a column width of 2. This variant might be useful for users of
+ * CJK legacy encodings who want to migrate to UCS without changing
+ * the traditional terminal character-width behaviour. It is not
+ * otherwise recommended for general use.
+ */
+/*
+ * In addition to the explanation mentioned above,
+ * several characters in the East Asian Narrow (Na) and Not East Asian
+ * (Neutral) category as defined in Unicode Technical Report #11
+ * actually have a column width of 2 in CJK legacy encodings.
+ */
+int wcwidth_cjk(wchar_t ucs)
+{
+ /* sorted list of non-overlapping intervals of East Asian Ambiguous
+ * characters, generated by "uniset +WIDTH-A -cat=Me -cat=Mn -cat=Cf c" */
+ static const struct interval ambiguous[] = {
+ { 0x00A1, 0x00A1 }, { 0x00A4, 0x00A4 }, { 0x00A7, 0x00A8 },
+ { 0x00AA, 0x00AA }, { 0x00AE, 0x00AE }, { 0x00B0, 0x00B4 },
+ { 0x00B6, 0x00BA }, { 0x00BC, 0x00BF }, { 0x00C6, 0x00C6 },
+ { 0x00D0, 0x00D0 }, { 0x00D7, 0x00D8 }, { 0x00DE, 0x00E1 },
+ { 0x00E6, 0x00E6 }, { 0x00E8, 0x00EA }, { 0x00EC, 0x00ED },
+ { 0x00F0, 0x00F0 }, { 0x00F2, 0x00F3 }, { 0x00F7, 0x00FA },
+ { 0x00FC, 0x00FC }, { 0x00FE, 0x00FE }, { 0x0101, 0x0101 },
+ { 0x0111, 0x0111 }, { 0x0113, 0x0113 }, { 0x011B, 0x011B },
+ { 0x0126, 0x0127 }, { 0x012B, 0x012B }, { 0x0131, 0x0133 },
+ { 0x0138, 0x0138 }, { 0x013F, 0x0142 }, { 0x0144, 0x0144 },
+ { 0x0148, 0x014B }, { 0x014D, 0x014D }, { 0x0152, 0x0153 },
+ { 0x0166, 0x0167 }, { 0x016B, 0x016B }, { 0x01CE, 0x01CE },
+ { 0x01D0, 0x01D0 }, { 0x01D2, 0x01D2 }, { 0x01D4, 0x01D4 },
+ { 0x01D6, 0x01D6 }, { 0x01D8, 0x01D8 }, { 0x01DA, 0x01DA },
+ { 0x01DC, 0x01DC }, { 0x0251, 0x0251 }, { 0x0261, 0x0261 },
+ { 0x02C4, 0x02C4 }, { 0x02C7, 0x02C7 }, { 0x02C9, 0x02CB },
+ { 0x02CD, 0x02CD }, { 0x02D0, 0x02D0 }, { 0x02D8, 0x02DB },
+ { 0x02DD, 0x02DD }, { 0x02DF, 0x02DF }, { 0x0391, 0x03A1 },
+ { 0x03A3, 0x03A9 }, { 0x03B1, 0x03C1 }, { 0x03C3, 0x03C9 },
+ { 0x0401, 0x0401 }, { 0x0410, 0x044F }, { 0x0451, 0x0451 },
+ { 0x2010, 0x2010 }, { 0x2013, 0x2016 }, { 0x2018, 0x2019 },
+ { 0x201C, 0x201D }, { 0x2020, 0x2022 }, { 0x2024, 0x2027 },
+ { 0x2030, 0x2030 }, { 0x2032, 0x2033 }, { 0x2035, 0x2035 },
+ { 0x203B, 0x203B }, { 0x203E, 0x203E }, { 0x2074, 0x2074 },
+ { 0x207F, 0x207F }, { 0x2081, 0x2084 }, { 0x20AC, 0x20AC },
+ { 0x2103, 0x2103 }, { 0x2105, 0x2105 }, { 0x2109, 0x2109 },
+ { 0x2113, 0x2113 }, { 0x2116, 0x2116 }, { 0x2121, 0x2122 },
+ { 0x2126, 0x2126 }, { 0x212B, 0x212B }, { 0x2153, 0x2154 },
+ { 0x215B, 0x215E }, { 0x2160, 0x216B }, { 0x2170, 0x2179 },
+ { 0x2190, 0x2199 }, { 0x21B8, 0x21B9 }, { 0x21D2, 0x21D2 },
+ { 0x21D4, 0x21D4 }, { 0x21E7, 0x21E7 }, { 0x2200, 0x2200 },
+ { 0x2202, 0x2203 }, { 0x2207, 0x2208 }, { 0x220B, 0x220B },
+ { 0x220F, 0x220F }, { 0x2211, 0x2211 }, { 0x2215, 0x2215 },
+ { 0x221A, 0x221A }, { 0x221D, 0x2220 }, { 0x2223, 0x2223 },
+ { 0x2225, 0x2225 }, { 0x2227, 0x222C }, { 0x222E, 0x222E },
+ { 0x2234, 0x2237 }, { 0x223C, 0x223D }, { 0x2248, 0x2248 },
+ { 0x224C, 0x224C }, { 0x2252, 0x2252 }, { 0x2260, 0x2261 },
+ { 0x2264, 0x2267 }, { 0x226A, 0x226B }, { 0x226E, 0x226F },
+ { 0x2282, 0x2283 }, { 0x2286, 0x2287 }, { 0x2295, 0x2295 },
+ { 0x2299, 0x2299 }, { 0x22A5, 0x22A5 }, { 0x22BF, 0x22BF },
+ { 0x2312, 0x2312 }, { 0x2460, 0x24E9 }, { 0x24EB, 0x254B },
+ { 0x2550, 0x2573 }, { 0x2580, 0x258F }, { 0x2592, 0x2595 },
+ { 0x25A0, 0x25A1 }, { 0x25A3, 0x25A9 }, { 0x25B2, 0x25B3 },
+ { 0x25B6, 0x25B7 }, { 0x25BC, 0x25BD }, { 0x25C0, 0x25C1 },
+ { 0x25C6, 0x25C8 }, { 0x25CB, 0x25CB }, { 0x25CE, 0x25D1 },
+ { 0x25E2, 0x25E5 }, { 0x25EF, 0x25EF }, { 0x2605, 0x2606 },
+ { 0x2609, 0x2609 }, { 0x260E, 0x260F }, { 0x2614, 0x2615 },
+ { 0x261C, 0x261C }, { 0x261E, 0x261E }, { 0x2640, 0x2640 },
+ { 0x2642, 0x2642 }, { 0x2660, 0x2661 }, { 0x2663, 0x2665 },
+ { 0x2667, 0x266A }, { 0x266C, 0x266D }, { 0x266F, 0x266F },
+ { 0x273D, 0x273D }, { 0x2776, 0x277F }, { 0xE000, 0xF8FF },
+ { 0xFFFD, 0xFFFD }, { 0xF0000, 0xFFFFD }, { 0x100000, 0x10FFFD }
+ };
+
+ /* For Japanese legacy encodings, the following characters are added. */
+ static const struct interval legacy_ja[] = {
+ { 0x00A2, 0x00A3 }, { 0x00A5, 0x00A6 }, { 0x00AC, 0x00AC },
+ { 0x00AF, 0x00AF }, { 0x2212, 0x2212 }
+ };
+
+ /* binary search in table of non-spacing characters */
+ if (bisearch(ucs, ambiguous,
+ sizeof(ambiguous) / sizeof(struct interval) - 1))
+ return 2;
+ if (bisearch(ucs, legacy_ja,
+ sizeof(legacy_ja) / sizeof(struct interval) - 1))
+ return 2;
+
+ return wcwidth_ucs(ucs);
+}
+
#if 0 /* original */
-int wcswidth(const wchar_t *pwcs, size_t n)
+int wcswidth_cjk(const wchar_t *pwcs, size_t n)
{
int w, width = 0;
for (;*pwcs && n-- > 0; pwcs++)
- if ((w = wcwidth(*pwcs)) < 0)
+ if ((w = wcwidth_cjk(*pwcs)) < 0)
return -1;
else
width += w;
@@ -183,3 +339,4 @@ int wcswidth(const wchar_t *pwcs, size_t n)
return width;
}
#endif
+#endif /* !HAVE_WC_FUNCS || USE_CJK_WIDTH */
diff --git a/PATCHES b/PATCHES
index e69de29..17743fd 100644
--- a/PATCHES
+++ b/PATCHES
@@ -0,0 +1,5 @@
+patch-1.5.23.tt+yy.delete_prefix.1
+patch-1.5.23.tt.create_rfc2047_params.1
+patch-1.5.23.tt.sanitize_ja.1
+patch-1.5.23.tt.cjk_width_tree_chars.1
+patch-1.5.23.tt.wcwidth.1
diff --git a/charset.c b/charset.c
index 2411f2c..6a5cbd4 100644
--- a/charset.c
+++ b/charset.c
@@ -481,6 +481,9 @@ int mutt_convert_string (char **ps, const char *from, const char *to, int flags)
if (!s || !*s)
return 0;
+ if (option (OPTSANITIZEJACHARS) && !ascii_strncasecmp (from, "iso-2022-jp", 11))
+ mutt_sanitize_ja_chars (s, mutt_strlen(s), 0);
+
if (to && from && (cd = mutt_iconv_open (to, from, flags)) != (iconv_t)-1)
{
int len;
@@ -677,3 +680,188 @@ int mutt_check_charset (const char *s, int strict)
return -1;
}
+
+/*
+ * mutt_sanitize_ja_chars()
+ * Adapted by TAKIZAWA Takashi <taki@cyber.email.ne.jp>
+ *
+ * - It replaces undefined KANJI characters to GETA mark.
+ * - It replaces character of 'JIS X 0201 kana' to '?'.
+ * - If $charset is EUC-JP, it replaces third character 'J' of
+ * escape sequence switching to 'JIS X 0201 latin' to 'B' indicating
+ * 'US-ASCII'.
+ * - If $charset is Shift_JIS, it replaces third character 'B' of
+ * escape sequence switching to 'US-ASCII' to 'J' indicating
+ * 'JIS X 0201 latin'.
+ */
+
+#define ASCII 0
+#define JISX0201LATIN 1
+#define JISX0201KANA 2
+#define JISX0208 3
+#define OTHER_CS 4
+
+void mutt_sanitize_ja_chars(char *s, size_t len, int keep_state)
+{
+ static int cs = ASCII;
+ static int kanji_cont = 0;
+ static int illegal_kanji = 0;
+ static int es = 0;
+ static char pes = '\0';
+ static char ascii_3rd_char = 'B';
+ static char jisx0201_3rd_char = 'J';
+
+ char *p = s;
+ char *p1 = NULL;
+ unsigned char c;
+
+ if (!keep_state || *p == 0x1b) /* consideration about mbstate's buffer */
+ {
+ if (!ascii_strcasecmp (Charset, "euc-jp"))
+ jisx0201_3rd_char = 'B';
+ else if (!ascii_strcasecmp (Charset, "shift_jis"))
+ ascii_3rd_char = 'J';
+ cs = ASCII;
+ kanji_cont = 0;
+ illegal_kanji = 0;
+ es = 0;
+ pes = '\0';
+ }
+
+ for (;p - s < len;p++)
+ {
+ if (es == 0)
+ {
+ if (*p == 0x1b)
+ es++;
+ else
+ {
+ switch (cs)
+ {
+ case ASCII:
+ case JISX0201LATIN:
+ break;
+ case JISX0201KANA:
+ *p = '?';
+ break;
+ case JISX0208:
+ /* replace ku-ten code from 9 to 15 and 85 or more to "GETA MARK" */
+ c = (unsigned char)*p;
+ if (! kanji_cont)
+ {
+ if ((size_t)(p - s + 1) == len)
+ return; /* the last character is a primary byte of KANJI */
+ if (c <= 0x20 || (c >= 0x29 && c <= 0x2f)
+ || (c >= 0x75 && c <= 0xa0))
+ illegal_kanji = 1;
+ kanji_cont = 1;
+ p1 = p;
+ }
+ else
+ {
+ if (c <= 0x20 || c >= 0x7f)
+ illegal_kanji = 1;
+ if (illegal_kanji && p1)
+ *p1 = 0x22, *p = 0x2e;
+ kanji_cont = 0;
+ illegal_kanji = 0;
+ }
+ break;
+ }
+ }
+ }
+ else if (es == 1)
+ {
+ if (*p == '$' || (*p >= '(' && *p <= '/' && *p != ','))
+ {
+ es++;
+ pes = *p;
+ }
+ else
+ {
+ es = 0;
+ return; /* broken */
+ }
+ }
+ else if (es == 2)
+ {
+ if (pes == '(')
+ {
+ switch (*p)
+ {
+ case 'B':
+ cs = ASCII, *p = ascii_3rd_char;
+ break;
+ case 'J':
+ cs = JISX0201LATIN, *p = jisx0201_3rd_char;
+ break;
+ case 'I':
+ /* ready to replace character to '?' */
+ cs = JISX0201KANA, *p = ascii_3rd_char;
+ break;
+ default:
+ cs = OTHER_CS;
+ }
+ es = 0;
+ }
+ else if (pes == '$')
+ {
+ switch (*p)
+ {
+ case '@': /* JIS X 0208-1978 */
+ case 'B': /* JIS X 0208-1983 */
+ cs = JISX0208;
+ es = 0;
+ break;
+ case 'A':
+ cs = OTHER_CS; /* GB 2312 */
+ es = 0;
+ break;
+ case '(':
+ case ')':
+ case '*':
+ case '+':
+ case '-':
+ case '.':
+ case '/':
+ es++;
+ break;
+ default:
+ es = 0;
+ return; /* broken */
+ }
+ }
+ else
+ {
+ cs = OTHER_CS;
+ es = 0;
+ }
+ }
+ else /* es == 3 */
+ {
+ cs = OTHER_CS;
+ es = 0;
+ }
+ }
+}
+
+int mutt_copy_bytes_sanitize_ja (FILE *in, FILE *out, size_t size)
+{
+ char buf[2048];
+ size_t chunk;
+
+ mutt_sanitize_ja_chars (NULL, 0, 0);
+ while (size > 0)
+ {
+ chunk = (size > sizeof (buf)) ? sizeof (buf) : size;
+ if ((chunk = fread (buf, 1, chunk, in)) < 1)
+ break;
+ mutt_sanitize_ja_chars (buf, chunk, 1);
+ if (fwrite (buf, 1, chunk, out) != chunk)
+ return (-1);
+ size -= chunk;
+ }
+
+ return 0;
+}
+
diff --git a/charset.h b/charset.h
index 54891f0..d67b209 100644
--- a/charset.h
+++ b/charset.h
@@ -36,6 +36,9 @@ int iconv_close (iconv_t);
int mutt_convert_string (char **, const char *, const char *, int);
+void mutt_sanitize_ja_chars (char *, size_t, int);
+int mutt_copy_bytes_sanitize_ja (FILE *, FILE *, size_t);
+
iconv_t mutt_iconv_open (const char *, const char *, int);
size_t mutt_iconv (iconv_t, ICONV_CONST char **, size_t *, char **, size_t *, ICONV_CONST char **, const char *);
diff --git a/configure.ac b/configure.ac
index d8aebe3..46e41eb 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1464,6 +1464,16 @@ fi
dnl -- locales --
+AC_ARG_ENABLE(cjk-ambiguous-width, AC_HELP_STRING([--enable-cjk-ambiguous-width], [ Enable East Asian Ambiguous characters support (using own wcwidth)]),
+ [ if test "x$enableval" = "xyes" ; then
+ cjk_width=yes
+ fi
+ ])
+if test "x$cjk_width" = "xyes" ; then
+ AC_DEFINE(USE_CJK_WIDTH,1,[ Define if you want to support East Asian Ambiguous class. ])
+ MUTT_LIB_OBJECTS="$MUTT_LIB_OBJECTS wcwidth.o"
+fi
+
AC_CHECK_HEADERS(wchar.h)
AC_CACHE_CHECK([for wchar_t], mutt_cv_wchar_t,
@@ -1534,7 +1544,10 @@ fi
if test $wc_funcs = yes; then
AC_DEFINE(HAVE_WC_FUNCS,1,[ Define if you are using the system's wchar_t functions. ])
else
- MUTT_LIB_OBJECTS="$MUTT_LIB_OBJECTS utf8.o wcwidth.o"
+ MUTT_LIB_OBJECTS="$MUTT_LIB_OBJECTS utf8.o"
+ if test "x$cjk_width" != "xyes"; then
+ MUTT_LIB_OBJECTS="$MUTT_LIB_OBJECTS wcwidth.o"
+ fi
fi
AC_CACHE_CHECK([for nl_langinfo and CODESET], mutt_cv_langinfo_codeset,
diff --git a/curs_lib.c b/curs_lib.c
index 5472f45..ad12f4a 100644
--- a/curs_lib.c
+++ b/curs_lib.c
@@ -1220,7 +1220,14 @@ void mutt_format_string (char *dest, size_t destlen,
wc = replacement_char ();
}
if (arboreal && wc < MUTT_TREE_MAX)
- w = 1; /* hack */
+ {
+#ifdef USE_CJK_WIDTH
+ if (Charset_is_utf8 && option (OPTCJKWIDTHTREECHARS) && !option (OPTASCIICHARS))
+ w = wcwidth (TreeChars[wc]);
+ else
+#endif
+ w = 1;
+ }
else
{
#ifdef HAVE_ISWBLANK
@@ -1449,10 +1456,12 @@ int mutt_strwidth (const char *s)
int w;
size_t k, n;
mbstate_t mbstate;
+ int arboreal;
if (!s) return 0;
n = mutt_strlen (s);
+ arboreal = (s[0] < MUTT_TREE_MAX) ? 1 : 0;
memset (&mbstate, 0, sizeof (mbstate));
for (w=0; n && (k = mbrtowc (&wc, s, n, &mbstate)); s += k, n -= k)
@@ -1464,9 +1473,21 @@ int mutt_strwidth (const char *s)
k = (k == (size_t)(-1)) ? 1 : n;
wc = replacement_char ();
}
- if (!IsWPrint (wc))
- wc = '?';
- w += wcwidth (wc);
+ if (wc < MUTT_TREE_MAX && arboreal && k == 1)
+ {
+#ifdef USE_CJK_WIDTH
+ if (Charset_is_utf8 && option (OPTCJKWIDTHTREECHARS) && !option (OPTASCIICHARS))
+ w += wcwidth (TreeChars[wc]);
+ else
+#endif
+ w++;
+ }
+ else
+ {
+ if (!IsWPrint (wc))
+ wc = '?';
+ w += wcwidth (wc);
+ }
}
return w;
}
diff --git a/doc/makedoc-defs.h b/doc/makedoc-defs.h
index 78a4ebc..dd872ba 100644
--- a/doc/makedoc-defs.h
+++ b/doc/makedoc-defs.h
@@ -31,10 +31,10 @@
# ifndef USE_SOCKET
# define USE_SOCKET
# endif
-# ifndef USE_DOTLOCK
+# if !defined(USE_DOTLOCK) && !defined(NO_USE_HARDLINK)
# define USE_DOTLOCK
# endif
-# ifndef DL_STANDALONE
+# if !defined(DL_STANDALONE) && !defined(NO_USE_HARDLINK)
# define DL_STANDALONE
# endif
# ifndef USE_HCACHE
diff --git a/dotlock.c b/dotlock.c
index 5bf0348..5d87850 100644
--- a/dotlock.c
+++ b/dotlock.c
@@ -52,13 +52,13 @@
#include <getopt.h>
#endif
-#ifdef DL_STANDALONE
+#if defined(DL_STANDALONE) && !defined(NO_USE_HARDLINK)
# include "reldate.h"
#endif
#define MAXLINKS 1024 /* maximum link depth */
-#ifdef DL_STANDALONE
+#if defined(DL_STANDALONE) && !defined(NO_USE_HARDLINK)
# define LONG_STRING 1024
# define MAXLOCKATTEMPT 5
@@ -96,7 +96,7 @@ extern int snprintf (char *, size_t, const char *, ...);
static int DotlockFlags;
static int Retry = MAXLOCKATTEMPT;
-#ifdef DL_STANDALONE
+#if defined(DL_STANDALONE) && !defined(NO_USE_HARDLINK)
static char *Hostname;
#endif
@@ -110,7 +110,7 @@ static int dotlock_prepare (char *, size_t, const char *, int fd);
static int dotlock_check_stats (struct stat *, struct stat *);
static int dotlock_dispatch (const char *, int fd);
-#ifdef DL_STANDALONE
+#if defined(DL_STANDALONE) && !defined(NO_USE_HARDLINK)
static int dotlock_init_privs (void);
static void usage (const char *);
#endif
@@ -130,7 +130,7 @@ static int dotlock_unlink (const char *);
static int dotlock_lock (const char *);
-#ifdef DL_STANDALONE
+#if defined(DL_STANDALONE) && !defined(NO_USE_HARDLINK)
#define check_flags(a) if (a & DL_FL_ACTIONS) usage (argv[0])
@@ -327,7 +327,7 @@ END_PRIVILEGED (void)
#endif
}
-#ifdef DL_STANDALONE
+#if defined(DL_STANDALONE) && !defined(NO_USE_HARDLINK)
/*
* Usage information.
diff --git a/globals.h b/globals.h
index cecb46d..276c360 100644
--- a/globals.h
+++ b/globals.h
@@ -24,7 +24,7 @@ WHERE CONTEXT *Context;
WHERE char Errorbuf[STRING];
WHERE char AttachmentMarker[STRING];
-#if defined(DL_STANDALONE) && defined(USE_DOTLOCK)
+#if defined(DL_STANDALONE) && defined(USE_DOTLOCK) && !defined(NO_USE_HARDLINK)
WHERE char *MuttDotlock;
#endif
@@ -303,9 +303,31 @@ const char * const Months[] = { "Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul",
const char * const BodyTypes[] = { "x-unknown", "audio", "application", "image", "message", "model", "multipart", "text", "video" };
const char * const BodyEncodings[] = { "x-unknown", "7bit", "8bit", "quoted-printable", "base64", "binary", "x-uuencoded" };
+#ifdef USE_CJK_WIDTH
+const wchar_t TreeChars[] =
+{
+ 0xFEFF, /* not used */
+ 0x2514, /* M_TREE_LLCORNER WACS_LLCORNER */
+ 0x250C, /* M_TREE_ULCORNER WACS_ULCORNER */
+ 0x251C, /* M_TREE_LTEE WACS_LTEE */
+ 0x2500, /* M_TREE_HLINE WACS_HLINE */
+ 0x2502, /* M_TREE_VLINE WACS_VLINE */
+ 0x0020, /* M_TREE_SPACE */
+ 0x003E, /* M_TREE_RARROW */
+ 0x002A, /* M_TREE_STAR fake thread indicator */
+ 0x0026, /* M_TREE_HIDDEN */
+ 0x003D, /* M_TREE_EQUALS */
+ 0x252C, /* M_TREE_TTEE WACS_TTEE */
+ 0x2534, /* M_TREE_BTEE WACS_BTEE */
+ 0x003F /* M_TREE_MISSING */
+};
+#endif /* USE_CJK_WIDTH */
#else
extern const char * const Weekdays[];
extern const char * const Months[];
+#ifdef USE_CJK_WIDTH
+extern const wchar_t TreeChars[];
+#endif /* USE_CJK_WIDTH */
#endif
#ifdef MAIN_C
diff --git a/handler.c b/handler.c
index 7ce53f9..ab69527 100644
--- a/handler.c
+++ b/handler.c
@@ -100,6 +100,9 @@ static void mutt_convert_to_state(iconv_t cd, char *bufi, size_t *l, STATE *s)
return;
}
+ if (option (OPTSANITIZEJACHARS) && strchr (bufi, 0x1b))
+ mutt_sanitize_ja_chars (bufi, *l, 1);
+
ib = bufi, ibl = *l;
for (;;)
{
@@ -1312,6 +1315,7 @@ static int autoview_handler (BODY *a, STATE *s)
int piped = FALSE;
pid_t thepid;
int rc = 0;
+ char *charset;
snprintf (type, sizeof (type), "%s/%s", TYPE (a), a->subtype);
rfc1524_mailcap_lookup (a, type, entry, MUTT_AUTOVIEW);
@@ -1342,6 +1346,10 @@ static int autoview_handler (BODY *a, STATE *s)
return -1;
}
+ charset = mutt_get_parameter ("charset", a->parameter);
+ if (charset && option (OPTSANITIZEJACHARS) && !ascii_strncasecmp (charset,"iso-2022-jp", 11))
+ mutt_copy_bytes_sanitize_ja (s->fpin, fpin, a->length);
+ else
mutt_copy_bytes (s->fpin, fpin, a->length);
if(!piped)
diff --git a/hdrline.c b/hdrline.c
index 5e79d32..85df51e 100644
--- a/hdrline.c
+++ b/hdrline.c
@@ -272,6 +272,7 @@ hdr_format_str (char *dest,
#define THREAD_NEW (threads && hdr->collapsed && hdr->num_hidden > 1 && mutt_thread_contains_unread (ctx, hdr) == 1)
#define THREAD_OLD (threads && hdr->collapsed && hdr->num_hidden > 1 && mutt_thread_contains_unread (ctx, hdr) == 2)
size_t len;
+ char *subj;
hdr = hfi->hdr;
ctx = hfi->ctx;
@@ -590,6 +591,7 @@ hdr_format_str (char *dest,
subj = apply_subject_mods(hdr->env);
else
subj = hdr->env->subject;
+ subj = option (OPTDELETEPREFIX) ? hdr->env->real_subj : hdr->env->subject;
if (flags & MUTT_FORMAT_TREE && !hdr->collapsed)
{
if (flags & MUTT_FORMAT_FORCESUBJ)
diff --git a/init.h b/init.h
index 3fca38c..9046fb1 100644
--- a/init.h
+++ b/init.h
@@ -442,6 +442,31 @@ struct option_t MuttVars[] = {
** this variable is \fIunset\fP, no check for new mail is performed
** while the mailbox is open.
*/
+#ifdef USE_CJK_WIDTH
+ { "cjk_width", DT_BOOL, R_NONE, OPTCJKWIDTH, 0 },
+ /*
+ ** .pp
+ ** When this option is set, characters in the East Asian Ambiguous (A)
+ ** category as defined in Unicode Technical Report #11 have a column
+ ** width of 2. Othrwise, they have a column width of 1.
+ ** This variant might be useful for users of CJK legacy encodings
+ ** who want to migrate to UCS without changing the traditional terminal
+ ** character-width behaviour.
+ ** .pp
+ ** \fBNote:\fP this option only affects in UTF-8 encoding.
+ */
+ { "cjk_width_tree_chars", DT_BOOL, R_NONE, OPTCJKWIDTHTREECHARS, 0 },
+ /*
+ ** .pp
+ ** If \fIset\fP, Mutt will use the result of $cjk_width as a column
+ ** width of WACS characters when displaying thread and attachment trees.
+ ** This variant might be useful for users of CJK legacy encodings
+ ** who want to migrate to UCS without changing the traditional terminal
+ ** character-width behaviour.
+ ** .pp
+ ** \fBNote:\fP this option only affects in UTF-8 encoding.
+ */
+#endif
{ "collapse_unread", DT_BOOL, R_NONE, OPTCOLLAPSEUNREAD, 1 },
/*
** .pp
@@ -647,6 +672,17 @@ struct option_t MuttVars[] = {
** If \fI``no''\fP, never attempt to verify cryptographic signatures.
** (Crypto only)
*/
+ { "create_rfc2047_parameters", DT_BOOL, R_NONE, OPTCREATERFC2047PARAMS, 0 },
+ /*
+ ** .pp
+ ** When this variable is set, Mutt will add the following RFC-2047-encoded
+ ** MIME parameter to Content-Type header field as filename for attachment:
+ ** name="=?iso-2022-jp?B?GyRCO244MxsoQi50eHQ=?="
+ ** .pp
+ ** Note: this use of RFC 2047's encoding is explicitly prohibited
+ ** by the standard. You may set this variable only if a mailer
+ ** of recipients can not parse RFC 2231 parameters.
+ */
{ "date_format", DT_STR, R_MENU, UL &DateFmt, UL "!%a, %b %d, %Y at %I:%M:%S%p %Z" },
/*
** .pp
@@ -698,6 +734,19 @@ struct option_t MuttVars[] = {
** If this option is \fIset\fP, mutt's received-attachments menu will not show the subparts of
** individual messages in a multipart/digest. To see these subparts, press ``v'' on that menu.
*/
+ { "delete_prefix", DT_BOOL, R_NONE, OPTDELETEPREFIX, 0 },
+ /*
+ ** .pp
+ ** If set, prefix in Subject: field generated by some mailing lists
+ ** (something like "Subject: [foo-ML:0012] real-subject") can be deleted
+ ** when displaying in index-mode and editing in message reply.
+ ** Deletion pattern can be configured by $$delete_regexp variable.
+ */
+ { "delete_regexp", DT_RX, R_NONE, UL &DeleteRegexp, UL "^(\\[[A-Za-z0-9_.: \\-]*\\][ ]*)" },
+ /*
+ ** .pp
+ ** A regular expression used in $$delete_prefix function.
+ */
{ "display_filter", DT_PATH, R_PAGER, UL &DisplayFilter, UL "" },
/*
** .pp
@@ -705,7 +754,7 @@ struct option_t MuttVars[] = {
** is viewed it is passed as standard input to $$display_filter, and the
** filtered message is read from the standard output.
*/
-#if defined(DL_STANDALONE) && defined(USE_DOTLOCK)
+#if defined(DL_STANDALONE) && defined(USE_DOTLOCK) && !defined(NO_USE_HARDLINK)
{ "dotlock_program", DT_PATH, R_NONE, UL &MuttDotlock, UL BINDIR "/mutt_dotlock" },
/*
** .pp
@@ -2828,6 +2877,28 @@ struct option_t MuttVars[] = {
** that mutt \fIgenerates\fP this kind of encoding. Instead, mutt will
** unconditionally use the encoding specified in RFC2231.
*/
+ { "sanitize_ja_chars", DT_BOOL, R_NONE, OPTSANITIZEJACHARS, 0 },
+ /*
+ ** .pp
+ ** When set, Japanese "platform dependent characters" (illegal
+ ** characters for iso-2022-jp charset; mainly used by MS-Windows
+ ** mailers) are substituted to special character, GETA mark ('ESC $$ B " .
+ ** ESC ( B' in iso-2022-jp), and JIS X 0201 kana characters
+ ** (only for "ESC ) I" cases) are also substituted to "?" to
+ ** prevent garbage characters. JIS X 0201 kana characters are
+ ** not substituted if they appear in 8bit form.
+ ** .pp
+ ** This fixes another Japanese encoding issue. In case $$charset
+ ** is set to "EUC-JP", which does not contain JIS X 0201 roman
+ ** character set, the JIS X 0201 roman part of received messages
+ ** encoded in iso-2022-jp can not be converted to EUC-JP.
+ ** On the other hand, the ASCII part can not be converted to
+ ** Shift_JIS, which does not contain ASCII character set. Thus,
+ ** the converted characters are garbled in these cases. When this
+ ** option is set, the JIS X 0201 roman escape sequence and the
+ ** ASCII escape sequence are replaced appropriately to prevent
+ ** the output from being garbled.
+ */
{ "save_address", DT_BOOL, R_NONE, OPTSAVEADDRESS, 0 },
/*
** .pp
diff --git a/lib.c b/lib.c
index 345d3be..9d86412 100644
--- a/lib.c
+++ b/lib.c
@@ -445,6 +445,10 @@ int safe_symlink(const char *oldpath, const char *newpath)
int safe_rename (const char *src, const char *target)
{
+#ifdef NO_USE_HARDLINK
+ /* Android (since 6.0) does not support hardlinks. */
+ return rename(src, target);
+#else
struct stat ssb, tsb;
int link_errno;
@@ -569,6 +573,7 @@ success:
return 0;
+#endif /* NO_USE_HARDLINK */
}
diff --git a/main.c b/main.c
index 10fc525..0dc0687 100644
--- a/main.c
+++ b/main.c
@@ -271,25 +271,25 @@ static void show_version (void)
"-USE_SETGID "
#endif
-#ifdef USE_DOTLOCK
+#if defined(USE_DOTLOCK) && !defined(NO_USE_HARDLINK)
"+USE_DOTLOCK "
#else
"-USE_DOTLOCK "
#endif
-#ifdef DL_STANDALONE
+#if defined(DL_STANDALONE) && !defined(NO_USE_HARDLINK)
"+DL_STANDALONE "
#else
"-DL_STANDALONE "
#endif
-#ifdef USE_FCNTL
+#if defined(USE_FCNTL) || defined(NO_USE_HARDLINK)
"+USE_FCNTL "
#else
"-USE_FCNTL "
#endif
-#ifdef USE_FLOCK
+#if defined(USE_FLOCK) || defined(NO_USE_HARDLINK)
"+USE_FLOCK "
#else
"-USE_FLOCK "
@@ -456,6 +456,12 @@ static void show_version (void)
"-LOCALES_HACK "
#endif
+#ifdef USE_CJK_WIDTH
+ "+USE_CJK_WIDTH "
+#else
+ "-USE_CJK_WIDTH "
+#endif
+
#ifdef HAVE_WC_FUNCS
"+HAVE_WC_FUNCS "
#else
diff --git a/mbyte.c b/mbyte.c
index b4df70a..cf28d70 100644
--- a/mbyte.c
+++ b/mbyte.c
@@ -17,7 +17,7 @@
*/
/*
- * Japanese support by TAKIZAWA Takashi <taki@luna.email.ne.jp>.
+ * CJK support by TAKIZAWA Takashi <taki@luna.email.ne.jp>.
*/
#if HAVE_CONFIG_H
@@ -37,8 +37,8 @@
#endif
int Charset_is_utf8 = 0;
+static int charset_is_cjk = 0;
#ifndef HAVE_WC_FUNCS
-static int charset_is_ja = 0;
static iconv_t charset_to_utf8 = (iconv_t)(-1);
static iconv_t charset_from_utf8 = (iconv_t)(-1);
#endif
@@ -50,8 +50,8 @@ void mutt_set_charset (char *charset)
mutt_canonical_charset (buffer, sizeof (buffer), charset);
Charset_is_utf8 = 0;
+ charset_is_cjk = 0;
#ifndef HAVE_WC_FUNCS
- charset_is_ja = 0;
if (charset_to_utf8 != (iconv_t)(-1))
{
iconv_close (charset_to_utf8);
@@ -66,12 +66,18 @@ void mutt_set_charset (char *charset)
if (mutt_is_utf8 (buffer))
Charset_is_utf8 = 1;
-#ifndef HAVE_WC_FUNCS
- else if (!ascii_strcasecmp(buffer, "euc-jp") || !ascii_strcasecmp(buffer, "shift_jis")
- || !ascii_strcasecmp(buffer, "cp932") || !ascii_strcasecmp(buffer, "eucJP-ms"))
+ else if (!ascii_strcasecmp (buffer, "gb2312") ||
+ !ascii_strcasecmp (buffer, "gb18030") ||
+ !ascii_strcasecmp (buffer, "big5") ||
+ !ascii_strcasecmp (buffer, "euc-tw") ||
+ !ascii_strcasecmp (buffer, "EUC-JP") ||
+ !ascii_strcasecmp (buffer, "eucJP-ms") ||
+ !ascii_strcasecmp (buffer, "Shift_JIS") ||
+ !ascii_strcasecmp (buffer, "cp932") ||
+ !ascii_strcasecmp (buffer, "euc-kr"))
{
- charset_is_ja = 1;
-
+ charset_is_cjk = 1;
+#ifndef HAVE_WC_FUNCS
/* Note flags=0 to skip charset-hooks: User masters the $charset
* name, and we are sure of our "utf-8" constant. So there is no
* possibility of wrong name that we would want to try to correct
@@ -80,24 +86,68 @@ void mutt_set_charset (char *charset)
*/
charset_to_utf8 = mutt_iconv_open ("utf-8", charset, 0);
charset_from_utf8 = mutt_iconv_open (charset, "utf-8", 0);
- }
#endif
+ }
#if defined(HAVE_BIND_TEXTDOMAIN_CODESET) && defined(ENABLE_NLS)
bind_textdomain_codeset(PACKAGE, buffer);
#endif
}
+#if !defined(HAVE_WC_FUNCS) || defined(USE_CJK_WIDTH)
+/*
+ * For systems that don't have wcwidth() which functions correctly,
+ * we provide our own wcwidth().
+ * Furthermore, this wcwidth() enables change of character-cell width of
+ * the East Asian Ambiguous class by using $cjk_width.
+ * The function which most systems have cannot do it.
+ * Please read the comment of wcwidth.c about the East Asian Ambiguous
+ * class for details.
+ */
+int wcwidth_ucs(wchar_t ucs);
+int wcwidth_cjk(wchar_t ucs);
+
+int wcwidth (wchar_t wc)
+{
+ if (!Charset_is_utf8)
+ {
+ if (!charset_is_cjk)
+ {
+ /* 8-bit case */
+ if (!wc)
+ return 0;
+ else if ((0 <= wc && wc < 256) && IsPrint (wc))
+ return 1;
+ else
+ return -1;
+ }
+ else
+ {
+ /* CJK */
+ return wcwidth_cjk (wc);
+ }
+ }
+ else {
+#ifdef USE_CJK_WIDTH
+ if (option (OPTCJKWIDTH))
+ return wcwidth_cjk (wc);
+#endif /* USE_CJK_WIDTH */
+ return wcwidth_ucs (wc);
+ }
+}
+#endif /* !HAVE_WC_FUNCS || USE_CJK_WIDTH */
+
+
#ifndef HAVE_WC_FUNCS
/*
* For systems that don't have them, we provide here our own
- * implementations of wcrtomb(), mbrtowc(), iswprint() and wcwidth().
+ * implementations of wcrtomb(), mbrtowc() and iswprint().
* Instead of using the locale, as these functions normally would,
* we use Mutt's Charset variable. We support 3 types of charset:
* (1) For 8-bit charsets, wchar_t uses the same encoding as char.
* (2) For UTF-8, wchar_t uses UCS.
- * (3) For stateless Japanese encodings, we use UCS and convert
+ * (3) For stateless CJK encodings, we use UCS and convert
* via UTF-8 using iconv.
* Unfortunately, we can't handle non-stateless encodings.
*/
@@ -256,7 +306,7 @@ size_t mbrtowc (wchar_t *pwc, const char *s, size_t n, mbstate_t *ps)
int iswprint (wint_t wc)
{
- if (Charset_is_utf8 || charset_is_ja)
+ if (Charset_is_utf8 || charset_is_cjk)
return ((0x20 <= wc && wc < 0x7f) || 0xa0 <= wc);
else
return (0 <= wc && wc < 256) ? IsPrint (wc) : 0;
@@ -264,7 +314,7 @@ int iswprint (wint_t wc)
int iswspace (wint_t wc)
{
- if (Charset_is_utf8 || charset_is_ja)
+ if (Charset_is_utf8 || charset_is_cjk)
return (9 <= wc && wc <= 13) || wc == 32;
else
return (0 <= wc && wc < 256) ? isspace (wc) : 0;
@@ -347,7 +397,7 @@ static int iswalpha_ucs (wint_t wc)
wint_t towupper (wint_t wc)
{
- if (Charset_is_utf8 || charset_is_ja)
+ if (Charset_is_utf8 || charset_is_cjk)
return towupper_ucs (wc);
else
return (0 <= wc && wc < 256) ? toupper (wc) : wc;
@@ -355,7 +405,7 @@ wint_t towupper (wint_t wc)
wint_t towlower (wint_t wc)
{
- if (Charset_is_utf8 || charset_is_ja)
+ if (Charset_is_utf8 || charset_is_cjk)
return towlower_ucs (wc);
else
return (0 <= wc && wc < 256) ? tolower (wc) : wc;
@@ -363,7 +413,7 @@ wint_t towlower (wint_t wc)
int iswalnum (wint_t wc)
{
- if (Charset_is_utf8 || charset_is_ja)
+ if (Charset_is_utf8 || charset_is_cjk)
return iswalnum_ucs (wc);
else
return (0 <= wc && wc < 256) ? isalnum (wc) : 0;
@@ -371,7 +421,7 @@ int iswalnum (wint_t wc)
int iswalpha (wint_t wc)
{
- if (Charset_is_utf8 || charset_is_ja)
+ if (Charset_is_utf8 || charset_is_cjk)
return iswalpha_ucs (wc);
else
return (0 <= wc && wc < 256) ? isalpha (wc) : 0;
@@ -379,58 +429,12 @@ int iswalpha (wint_t wc)
int iswupper (wint_t wc)
{
- if (Charset_is_utf8 || charset_is_ja)
+ if (Charset_is_utf8 || charset_is_cjk)
return iswupper_ucs (wc);
else
return (0 <= wc && wc < 256) ? isupper (wc) : 0;
}
-/*
- * l10n for Japanese:
- * Symbols, Greek and Cyrillic in JIS X 0208, Japanese Kanji
- * Character Set, have a column width of 2.
- */
-int wcwidth_ja (wchar_t ucs)
-{
- if (ucs >= 0x3021)
- return -1; /* continue with the normal check */
- /* a rough range for quick check */
- if ((ucs >= 0x00a1 && ucs <= 0x00fe) || /* Latin-1 Supplement */
- (ucs >= 0x0391 && ucs <= 0x0451) || /* Greek and Cyrillic */
- (ucs >= 0x2010 && ucs <= 0x266f) || /* Symbols */
- (ucs >= 0x3000 && ucs <= 0x3020)) /* CJK Symbols and Punctuation */
- return 2;
- else
- return -1;
-}
-
-int wcwidth_ucs(wchar_t ucs);
-
-int wcwidth (wchar_t wc)
-{
- if (!Charset_is_utf8)
- {
- if (!charset_is_ja)
- {
- /* 8-bit case */
- if (!wc)
- return 0;
- else if ((0 <= wc && wc < 256) && IsPrint (wc))
- return 1;
- else
- return -1;
- }
- else
- {
- /* Japanese */
- int k = wcwidth_ja (wc);
- if (k != -1)
- return k;
- }
- }
- return wcwidth_ucs (wc);
-}
-
size_t utf8rtowc (wchar_t *pwc, const char *s, size_t n, mbstate_t *_ps)
{
static wchar_t mbstate;
diff --git a/mbyte.h b/mbyte.h
index 9c58c9e..224cafb 100644
--- a/mbyte.h
+++ b/mbyte.h
@@ -8,6 +8,12 @@
# ifdef HAVE_WCTYPE_H
# include <wctype.h>
# endif
+# ifdef USE_CJK_WIDTH
+#ifdef wcwidth
+# undef wcwidth
+#endif
+int wcwidth (wchar_t wc);
+# endif /* USE_CJK_WIDTH */
# endif
# ifndef HAVE_WC_FUNCS
@@ -32,6 +38,9 @@
#ifdef iswupper
# undef iswupper
#endif
+#ifdef wcwidth
+# undef wcwidth
+#endif
size_t wcrtomb (char *s, wchar_t wc, mbstate_t *ps);
size_t mbrtowc (wchar_t *pwc, const char *s, size_t n, mbstate_t *ps);
int iswprint (wint_t wc);
@@ -44,7 +53,6 @@ wint_t towlower (wint_t wc);
int wcwidth (wchar_t wc);
# endif /* !HAVE_WC_FUNCS */
-
void mutt_set_charset (char *charset);
extern int Charset_is_utf8;
size_t utf8rtowc (wchar_t *pwc, const char *s, size_t n, mbstate_t *_ps);
diff --git a/mutt.h b/mutt.h
index d80face..fd2adc4 100644
--- a/mutt.h
+++ b/mutt.h
@@ -372,10 +372,16 @@ enum
OPTBROWSERABBRMAILBOXES,
OPTCHECKMBOXSIZE,
OPTCHECKNEW,
+#ifdef USE_CJK_WIDTH
+ OPTCJKWIDTH,
+ OPTCJKWIDTHTREECHARS,
+#endif /* USE_CJK_WIDTH */
OPTCOLLAPSEUNREAD,
OPTCONFIRMAPPEND,
OPTCONFIRMCREATE,
+ OPTCREATERFC2047PARAMS,
OPTDELETEUNTAG,
+ OPTDELETEPREFIX,
OPTDIGESTCOLLAPSE,
OPTDUPTHREADS,
OPTEDITHDRS,
@@ -477,6 +483,7 @@ enum
OPTREVNAME,
OPTREVREAL,
OPTRFC2047PARAMS,
+ OPTSANITIZEJACHARS,
OPTSAVEADDRESS,
OPTSAVEEMPTY,
OPTSAVENAME,
diff --git a/mutt_regex.h b/mutt_regex.h
index f10ecbe..3cd36e0 100644
--- a/mutt_regex.h
+++ b/mutt_regex.h
@@ -52,5 +52,6 @@ WHERE REGEXP QuoteRegexp;
WHERE REGEXP ReplyRegexp;
WHERE REGEXP Smileys;
WHERE REGEXP GecosMask;
+WHERE REGEXP DeleteRegexp;
#endif /* MUTT_REGEX_H */
diff --git a/mx.c b/mx.c
index 9d311ef..cd4d185 100644
--- a/mx.c
+++ b/mx.c
@@ -47,7 +47,7 @@
#include "buffy.h"
-#ifdef USE_DOTLOCK
+#if defined(USE_DOTLOCK) && !defined(NO_USE_HARDLINK)
#include "dotlock.h"
#endif
@@ -95,13 +95,13 @@ struct mx_ops* mx_get_ops (int magic)
#define mutt_is_spool(s) (mutt_strcmp (Spoolfile, s) == 0)
-#ifdef USE_DOTLOCK
+#if defined(USE_DOTLOCK) && !defined(NO_USE_HARDLINK)
/* parameters:
* path - file to lock
* retry - should retry if unable to lock?
*/
-#ifdef DL_STANDALONE
+#if defined(DL_STANDALONE) && !defined(NO_USE_HARDLINK)
static int invoke_dotlock (const char *path, int dummy, int flags, int retry)
{
@@ -181,14 +181,14 @@ static int undotlock_file (const char *path, int fd)
*/
int mx_lock_file (const char *path, int fd, int excl, int dot, int timeout)
{
-#if defined (USE_FCNTL) || defined (USE_FLOCK)
+#if defined (USE_FCNTL) || defined (USE_FLOCK) || defined(NO_USE_HARDLINK)
int count;
int attempt;
struct stat sb = { 0 }, prev_sb = { 0 }; /* silence gcc warnings */
#endif
int r = 0;
-#ifdef USE_FCNTL
+#if defined(USE_FCNTL) || defined(NO_USE_HARDLINK)
struct flock lck;
memset (&lck, 0, sizeof (struct flock));
@@ -227,7 +227,7 @@ int mx_lock_file (const char *path, int fd, int excl, int dot, int timeout)
}
#endif /* USE_FCNTL */
-#ifdef USE_FLOCK
+#if defined(USE_FLOCK) || defined(NO_USE_HARDLINK)
count = 0;
attempt = 0;
while (flock (fd, (excl ? LOCK_EX : LOCK_SH) | LOCK_NB) == -1)
@@ -261,7 +261,7 @@ int mx_lock_file (const char *path, int fd, int excl, int dot, int timeout)
}
#endif /* USE_FLOCK */
-#ifdef USE_DOTLOCK
+#if defined(USE_DOTLOCK) && !defined(NO_USE_HARDLINK)
if (r == 0 && dot)
r = dotlock_file (path, fd, timeout);
#endif /* USE_DOTLOCK */
@@ -270,12 +270,12 @@ int mx_lock_file (const char *path, int fd, int excl, int dot, int timeout)
{
/* release any other locks obtained in this routine */
-#ifdef USE_FCNTL
+#if defined(USE_FCNTL) || defined(NO_USE_HARDLINK)
lck.l_type = F_UNLCK;
fcntl (fd, F_SETLK, &lck);
#endif /* USE_FCNTL */
-#ifdef USE_FLOCK
+#if defined(USE_FLOCK) || defined(NO_USE_HARDLINK)
flock (fd, LOCK_UN);
#endif /* USE_FLOCK */
}
@@ -285,7 +285,7 @@ int mx_lock_file (const char *path, int fd, int excl, int dot, int timeout)
int mx_unlock_file (const char *path, int fd, int dot)
{
-#ifdef USE_FCNTL
+#if defined(USE_FCNTL) || defined(NO_USE_HARDLINK)
struct flock unlockit = { F_UNLCK, 0, 0, 0, 0 };
memset (&unlockit, 0, sizeof (struct flock));
@@ -294,11 +294,11 @@ int mx_unlock_file (const char *path, int fd, int dot)
fcntl (fd, F_SETLK, &unlockit);
#endif
-#ifdef USE_FLOCK
+#if defined(USE_FLOCK) || defined(NO_USE_HARDLINK)
flock (fd, LOCK_UN);
#endif
-#ifdef USE_DOTLOCK
+#if defined(USE_DOTLOCK) && !defined(NO_USE_HARDLINK)
if (dot)
undotlock_file (path, fd);
#endif
@@ -309,7 +309,7 @@ int mx_unlock_file (const char *path, int fd, int dot)
static void mx_unlink_empty (const char *path)
{
int fd;
-#ifndef USE_DOTLOCK
+#if !defined(USE_DOTLOCK) || defined(NO_USE_HARDLINK)
struct stat sb;
#endif
@@ -322,7 +322,7 @@ static void mx_unlink_empty (const char *path)
return;
}
-#ifdef USE_DOTLOCK
+#if defined(USE_DOTLOCK) && !defined(NO_USE_HARDLINK)
invoke_dotlock (path, fd, DL_FL_UNLINK, 1);
#else
if (fstat (fd, &sb) == 0 && sb.st_size == 0)
diff --git a/parse.c b/parse.c
index 0ae5594..745d2fc 100644
--- a/parse.c
+++ b/parse.c
@@ -1453,6 +1453,18 @@ ENVELOPE *mutt_read_rfc822_header (FILE *f, HEADER *hdr, short user_hdrs,
e->real_subj = e->subject + pmatch[0].rm_eo;
else
e->real_subj = e->subject;
+ if (option (OPTDELETEPREFIX))
+ {
+ /* if this option is set, mutt will delete the string as [prefix],
+ * [prefix:number] and [prefix number] in Subject line.
+ */
+ if (regexec (DeleteRegexp.rx, e->real_subj, 1, pmatch, 0) == 0)
+ {
+ e->real_subj = e->real_subj + pmatch[0].rm_eo;
+ if (regexec (ReplyRegexp.rx, e->real_subj, 1, pmatch, 0) == 0)
+ e->real_subj = e->real_subj + pmatch[0].rm_eo;
+ }
+ }
}
if (hdr->received < 0)
diff --git a/rfc2047.c b/rfc2047.c
index 8506425..e907b25 100644
--- a/rfc2047.c
+++ b/rfc2047.c
@@ -62,6 +62,9 @@ static size_t convert_string (ICONV_CONST char *f, size_t flen,
size_t obl, n;
int e;
+ if (option (OPTSANITIZEJACHARS) && !ascii_strncasecmp (from, "iso-2022-jp",
+11))
+ mutt_sanitize_ja_chars ((char *) f, flen, 0);
cd = mutt_iconv_open (to, from, 0);
if (cd == (iconv_t)(-1))
return (size_t)(-1);
diff --git a/sendlib.c b/sendlib.c
index 2128c94..2c45e68 100644
--- a/sendlib.c
+++ b/sendlib.c
@@ -348,6 +348,30 @@ int mutt_write_mime_header (BODY *a, FILE *f)
}
}
+ if (a->use_disp && option (OPTCREATERFC2047PARAMS))
+ {
+ if(!(fn = a->d_filename))
+ fn = a->filename;
+
+ if (fn)
+ {
+ char *tmp;
+
+ /* Strip off the leading path... */
+ if ((t = strrchr (fn, '/')))
+ t++;
+ else
+ t = fn;
+
+ buffer[0] = 0;
+ tmp = safe_strdup (t);
+ rfc2047_encode_string (&tmp);
+ rfc822_cat (buffer, sizeof (buffer), tmp, MimeSpecials);
+ FREE (&tmp);
+ fprintf (f, ";\n\tname=%s", buffer);
+ }
+ }
+
fputc ('\n', f);
if (a->description)
diff --git a/wcwidth.c b/wcwidth.c
index 0b94d73..85a1397 100644
--- a/wcwidth.c
+++ b/wcwidth.c
@@ -5,6 +5,51 @@
* http://www.opengroup.org/onlinepubs/007904975/functions/wcwidth.html
* http://www.opengroup.org/onlinepubs/007904975/functions/wcswidth.html
*
+ * In fixed-width output devices, Latin characters all occupy a single
+ * "cell" position of equal width, whereas ideographic CJK characters
+ * occupy two such cells. Interoperability between terminal-line
+ * applications and (teletype-style) character terminals using the
+ * UTF-8 encoding requires agreement on which character should advance
+ * the cursor by how many cell positions. No established formal
+ * standards exist at present on which Unicode character shall occupy
+ * how many cell positions on character terminals. These routines are
+ * a first attempt of defining such behavior based on simple rules
+ * applied to data provided by the Unicode Consortium.
+ *
+ * For some graphical characters, the Unicode standard explicitly
+ * defines a character-cell width via the definition of the East Asian
+ * FullWidth (F), Wide (W), Half-width (H), and Narrow (Na) classes.
+ * In all these cases, there is no ambiguity about which width a
+ * terminal shall use. For characters in the East Asian Ambiguous (A)
+ * class, the width choice depends purely on a preference of backward
+ * compatibility with either historic CJK or Western practice.
+ * Choosing single-width for these characters is easy to justify as
+ * the appropriate long-term solution, as the CJK practice of
+ * displaying these characters as double-width comes from historic
+ * implementation simplicity (8-bit encoded characters were displayed
+ * single-width and 16-bit ones double-width, even for Greek,
+ * Cyrillic, etc.) and not any typographic considerations.
+ *
+ * Much less clear is the choice of width for the Not East Asian
+ * (Neutral) class. Existing practice does not dictate a width for any
+ * of these characters. It would nevertheless make sense
+ * typographically to allocate two character cells to characters such
+ * as for instance EM SPACE or VOLUME INTEGRAL, which cannot be
+ * represented adequately with a single-width glyph. The following
+ * routines at present merely assign a single-cell width to all
+ * neutral characters, in the interest of simplicity. This is not
+ * entirely satisfactory and should be reconsidered before
+ * establishing a formal standard in this area. At the moment, the
+ * decision which Not East Asian (Neutral) characters should be
+ * represented by double-width glyphs cannot yet be answered by
+ * applying a simple rule from the Unicode database content. Setting
+ * up a proper standard for the behavior of UTF-8 character terminals
+ * will require a careful analysis not only of each Unicode character,
+ * but also of each presentation form, something the author of these
+ * routines has avoided to do so far.
+ *
+ * http://www.unicode.org/unicode/reports/tr11/
+ *
* Markus Kuhn -- 2007-05-26 (Unicode 5.0)
*
* Permission to use, copy, modify, and distribute this software
@@ -24,12 +69,34 @@
# include "config.h"
#endif
-#ifndef HAVE_WC_FUNCS
+#if !defined(HAVE_WC_FUNCS) || defined(USE_CJK_WIDTH)
+#include <wchar.h>
-#include "mutt.h"
-#include "mbyte.h"
+struct interval {
+ wchar_t first;
+ wchar_t last;
+};
+
+/* auxiliary function for binary search in interval table */
+static int bisearch(wchar_t ucs, const struct interval *table, int max) {
+ int min = 0;
+ int mid;
+
+ if (ucs < table[0].first || ucs > table[max].last)
+ return 0;
+ while (max >= min) {
+ mid = (min + max) / 2;
+ if (ucs > table[mid].last)
+ min = mid + 1;
+ else if (ucs < table[mid].first)
+ max = mid - 1;
+ else
+ return 1;
+ }
+
+ return 0;
+}
-#include <ctype.h>
/* The following two functions define the column width of an ISO 10646
* character as follows:
@@ -67,62 +134,56 @@ int wcwidth_ucs(wchar_t ucs)
{
/* sorted list of non-overlapping intervals of non-spacing characters */
/* generated by "uniset +cat=Me +cat=Mn +cat=Cf -00AD +1160-11FF +200B c" */
- static const struct interval {
- wchar_t first;
- wchar_t last;
- } combining[] = {
- { 0x0300, 0x036f }, { 0x0483, 0x0486 }, { 0x0488, 0x0489 },
- { 0x0591, 0x05bd }, { 0x05bf, 0x05bf }, { 0x05c1, 0x05c2 },
- { 0x05c4, 0x05c5 }, { 0x05c7, 0x05c7 }, { 0x0600, 0x0603 },
- { 0x0610, 0x0615 }, { 0x064b, 0x065e }, { 0x0670, 0x0670 },
- { 0x06d6, 0x06e4 }, { 0x06e7, 0x06e8 }, { 0x06ea, 0x06ed },
- { 0x070f, 0x070f }, { 0x0711, 0x0711 }, { 0x0730, 0x074a },
- { 0x07a6, 0x07b0 }, { 0x07eb, 0x07f3 }, { 0x0901, 0x0902 },
- { 0x093c, 0x093c }, { 0x0941, 0x0948 }, { 0x094d, 0x094d },
+ static const struct interval combining[] = {
+ { 0x0300, 0x036F }, { 0x0483, 0x0486 }, { 0x0488, 0x0489 },
+ { 0x0591, 0x05BD }, { 0x05BF, 0x05BF }, { 0x05C1, 0x05C2 },
+ { 0x05C4, 0x05C5 }, { 0x05C7, 0x05C7 }, { 0x0600, 0x0603 },
+ { 0x0610, 0x0615 }, { 0x064B, 0x065E }, { 0x0670, 0x0670 },
+ { 0x06D6, 0x06E4 }, { 0x06E7, 0x06E8 }, { 0x06EA, 0x06ED },
+ { 0x070F, 0x070F }, { 0x0711, 0x0711 }, { 0x0730, 0x074A },
+ { 0x07A6, 0x07B0 }, { 0x07EB, 0x07F3 }, { 0x0901, 0x0902 },
+ { 0x093C, 0x093C }, { 0x0941, 0x0948 }, { 0x094D, 0x094D },
{ 0x0951, 0x0954 }, { 0x0962, 0x0963 }, { 0x0981, 0x0981 },
- { 0x09bc, 0x09bc }, { 0x09c1, 0x09c4 }, { 0x09cd, 0x09cd },
- { 0x09e2, 0x09e3 }, { 0x0a01, 0x0a02 }, { 0x0a3c, 0x0a3c },
- { 0x0a41, 0x0a42 }, { 0x0a47, 0x0a48 }, { 0x0a4b, 0x0a4d },
- { 0x0a70, 0x0a71 }, { 0x0a81, 0x0a82 }, { 0x0abc, 0x0abc },
- { 0x0ac1, 0x0ac5 }, { 0x0ac7, 0x0ac8 }, { 0x0acd, 0x0acd },
- { 0x0ae2, 0x0ae3 }, { 0x0b01, 0x0b01 }, { 0x0b3c, 0x0b3c },
- { 0x0b3f, 0x0b3f }, { 0x0b41, 0x0b43 }, { 0x0b4d, 0x0b4d },
- { 0x0b56, 0x0b56 }, { 0x0b82, 0x0b82 }, { 0x0bc0, 0x0bc0 },
- { 0x0bcd, 0x0bcd }, { 0x0c3e, 0x0c40 }, { 0x0c46, 0x0c48 },
- { 0x0c4a, 0x0c4d }, { 0x0c55, 0x0c56 }, { 0x0cbc, 0x0cbc },
- { 0x0cbf, 0x0cbf }, { 0x0cc6, 0x0cc6 }, { 0x0ccc, 0x0ccd },
- { 0x0ce2, 0x0ce3 }, { 0x0d41, 0x0d43 }, { 0x0d4d, 0x0d4d },
- { 0x0dca, 0x0dca }, { 0x0dd2, 0x0dd4 }, { 0x0dd6, 0x0dd6 },
- { 0x0e31, 0x0e31 }, { 0x0e34, 0x0e3a }, { 0x0e47, 0x0e4e },
- { 0x0eb1, 0x0eb1 }, { 0x0eb4, 0x0eb9 }, { 0x0ebb, 0x0ebc },
- { 0x0ec8, 0x0ecd }, { 0x0f18, 0x0f19 }, { 0x0f35, 0x0f35 },
- { 0x0f37, 0x0f37 }, { 0x0f39, 0x0f39 }, { 0x0f71, 0x0f7e },
- { 0x0f80, 0x0f84 }, { 0x0f86, 0x0f87 }, { 0x0f90, 0x0f97 },
- { 0x0f99, 0x0fbc }, { 0x0fc6, 0x0fc6 }, { 0x102d, 0x1030 },
+ { 0x09BC, 0x09BC }, { 0x09C1, 0x09C4 }, { 0x09CD, 0x09CD },
+ { 0x09E2, 0x09E3 }, { 0x0A01, 0x0A02 }, { 0x0A3C, 0x0A3C },
+ { 0x0A41, 0x0A42 }, { 0x0A47, 0x0A48 }, { 0x0A4B, 0x0A4D },
+ { 0x0A70, 0x0A71 }, { 0x0A81, 0x0A82 }, { 0x0ABC, 0x0ABC },
+ { 0x0AC1, 0x0AC5 }, { 0x0AC7, 0x0AC8 }, { 0x0ACD, 0x0ACD },
+ { 0x0AE2, 0x0AE3 }, { 0x0B01, 0x0B01 }, { 0x0B3C, 0x0B3C },
+ { 0x0B3F, 0x0B3F }, { 0x0B41, 0x0B43 }, { 0x0B4D, 0x0B4D },
+ { 0x0B56, 0x0B56 }, { 0x0B82, 0x0B82 }, { 0x0BC0, 0x0BC0 },
+ { 0x0BCD, 0x0BCD }, { 0x0C3E, 0x0C40 }, { 0x0C46, 0x0C48 },
+ { 0x0C4A, 0x0C4D }, { 0x0C55, 0x0C56 }, { 0x0CBC, 0x0CBC },
+ { 0x0CBF, 0x0CBF }, { 0x0CC6, 0x0CC6 }, { 0x0CCC, 0x0CCD },
+ { 0x0CE2, 0x0CE3 }, { 0x0D41, 0x0D43 }, { 0x0D4D, 0x0D4D },
+ { 0x0DCA, 0x0DCA }, { 0x0DD2, 0x0DD4 }, { 0x0DD6, 0x0DD6 },
+ { 0x0E31, 0x0E31 }, { 0x0E34, 0x0E3A }, { 0x0E47, 0x0E4E },
+ { 0x0EB1, 0x0EB1 }, { 0x0EB4, 0x0EB9 }, { 0x0EBB, 0x0EBC },
+ { 0x0EC8, 0x0ECD }, { 0x0F18, 0x0F19 }, { 0x0F35, 0x0F35 },
+ { 0x0F37, 0x0F37 }, { 0x0F39, 0x0F39 }, { 0x0F71, 0x0F7E },
+ { 0x0F80, 0x0F84 }, { 0x0F86, 0x0F87 }, { 0x0F90, 0x0F97 },
+ { 0x0F99, 0x0FBC }, { 0x0FC6, 0x0FC6 }, { 0x102D, 0x1030 },
{ 0x1032, 0x1032 }, { 0x1036, 0x1037 }, { 0x1039, 0x1039 },
- { 0x1058, 0x1059 }, { 0x1160, 0x11ff }, { 0x135f, 0x135f },
+ { 0x1058, 0x1059 }, { 0x1160, 0x11FF }, { 0x135F, 0x135F },
{ 0x1712, 0x1714 }, { 0x1732, 0x1734 }, { 0x1752, 0x1753 },
- { 0x1772, 0x1773 }, { 0x17b4, 0x17b5 }, { 0x17b7, 0x17bd },
- { 0x17c6, 0x17c6 }, { 0x17c9, 0x17d3 }, { 0x17dd, 0x17dd },
- { 0x180b, 0x180d }, { 0x18a9, 0x18a9 }, { 0x1920, 0x1922 },
- { 0x1927, 0x1928 }, { 0x1932, 0x1932 }, { 0x1939, 0x193b },
- { 0x1a17, 0x1a18 }, { 0x1b00, 0x1b03 }, { 0x1b34, 0x1b34 },
- { 0x1b36, 0x1b3a }, { 0x1b3c, 0x1b3c }, { 0x1b42, 0x1b42 },
- { 0x1b6b, 0x1b73 }, { 0x1dc0, 0x1dca }, { 0x1dfe, 0x1dff },
- { 0x200b, 0x200f }, { 0x202a, 0x202e }, { 0x2060, 0x2063 },
- { 0x206a, 0x206f }, { 0x20d0, 0x20ef }, { 0x302a, 0x302f },
- { 0x3099, 0x309a }, { 0xa806, 0xa806 }, { 0xa80b, 0xa80b },
- { 0xa825, 0xa826 }, { 0xfb1e, 0xfb1e }, { 0xfe00, 0xfe0f },
- { 0xfe20, 0xfe23 }, { 0xfeff, 0xfeff }, { 0xfff9, 0xfffb },
- { 0x10a01, 0x10a03 }, { 0x10a05, 0x10a06 }, { 0x10a0c, 0x10a0f },
- { 0x10a38, 0x10a3a }, { 0x10a3f, 0x10a3f }, { 0x1d167, 0x1d169 },
- { 0x1d173, 0x1d182 }, { 0x1d185, 0x1d18b }, { 0x1d1aa, 0x1d1ad },
- { 0x1d242, 0x1d244 }, { 0xe0001, 0xe0001 }, { 0xe0020, 0xe007f },
- { 0xe0100, 0xe01ef }
+ { 0x1772, 0x1773 }, { 0x17B4, 0x17B5 }, { 0x17B7, 0x17BD },
+ { 0x17C6, 0x17C6 }, { 0x17C9, 0x17D3 }, { 0x17DD, 0x17DD },
+ { 0x180B, 0x180D }, { 0x18A9, 0x18A9 }, { 0x1920, 0x1922 },
+ { 0x1927, 0x1928 }, { 0x1932, 0x1932 }, { 0x1939, 0x193B },
+ { 0x1A17, 0x1A18 }, { 0x1B00, 0x1B03 }, { 0x1B34, 0x1B34 },
+ { 0x1B36, 0x1B3A }, { 0x1B3C, 0x1B3C }, { 0x1B42, 0x1B42 },
+ { 0x1B6B, 0x1B73 }, { 0x1DC0, 0x1DCA }, { 0x1DFE, 0x1DFF },
+ { 0x200B, 0x200F }, { 0x202A, 0x202E }, { 0x2060, 0x2063 },
+ { 0x206A, 0x206F }, { 0x20D0, 0x20EF }, { 0x302A, 0x302F },
+ { 0x3099, 0x309A }, { 0xA806, 0xA806 }, { 0xA80B, 0xA80B },
+ { 0xA825, 0xA826 }, { 0xFB1E, 0xFB1E }, { 0xFE00, 0xFE0F },
+ { 0xFE20, 0xFE23 }, { 0xFEFF, 0xFEFF }, { 0xFFF9, 0xFFFB },
+ { 0x10A01, 0x10A03 }, { 0x10A05, 0x10A06 }, { 0x10A0C, 0x10A0F },
+ { 0x10A38, 0x10A3A }, { 0x10A3F, 0x10A3F }, { 0x1D167, 0x1D169 },
+ { 0x1D173, 0x1D182 }, { 0x1D185, 0x1D18B }, { 0x1D1AA, 0x1D1AD },
+ { 0x1D242, 0x1D244 }, { 0xE0001, 0xE0001 }, { 0xE0020, 0xE007F },
+ { 0xE0100, 0xE01EF }
};
- int min = 0;
- int max = sizeof(combining) / sizeof(struct interval) - 1;
- int mid;
/* test for 8-bit control characters */
if (ucs == 0)
@@ -130,20 +191,10 @@ int wcwidth_ucs(wchar_t ucs)
if (ucs < 32 || (ucs >= 0x7f && ucs < 0xa0))
return -1;
- /* first quick check for Latin-1 etc. characters */
- if (ucs < combining[0].first)
- return 1;
-
/* binary search in table of non-spacing characters */
- while (max >= min) {
- mid = (min + max) / 2;
- if (combining[mid].last < ucs)
- min = mid + 1;
- else if (combining[mid].first > ucs)
- max = mid - 1;
- else if (combining[mid].first <= ucs && combining[mid].last >= ucs)
- return 0;
- }
+ if (bisearch(ucs, combining,
+ sizeof(combining) / sizeof(struct interval) - 1))
+ return 0;
/* if we arrive here, ucs is not a combining or C0/C1 control character */
@@ -151,7 +202,7 @@ int wcwidth_ucs(wchar_t ucs)
if (ucs < 0x1100)
return 1;
- return 1 +
+ return 1 +
(ucs >= 0x1100 &&
(ucs <= 0x115f || /* Hangul Jamo init. consonants */
ucs == 0x2329 || ucs == 0x232a ||
@@ -167,15 +218,120 @@ int wcwidth_ucs(wchar_t ucs)
(ucs >= 0x30000 && ucs <= 0x3fffd)));
}
-#endif /* !HAVE_WC_FUNCS */
+#if 0 /* original */
+int wcswidth_ucs(const wchar_t *pwcs, size_t n)
+{
+ int w, width = 0;
+
+ for (;*pwcs && n-- > 0; pwcs++)
+ if ((w = wcwidth_ucs(*pwcs)) < 0)
+ return -1;
+ else
+ width += w;
+
+ return width;
+}
+#endif
+
+/*
+ * The following functions are the same as wcwidth_ucs() and
+ * wcswidth_ucs(), except that spacing characters in the East Asian
+ * Ambiguous (A) category as defined in Unicode Technical Report #11
+ * have a column width of 2. This variant might be useful for users of
+ * CJK legacy encodings who want to migrate to UCS without changing
+ * the traditional terminal character-width behaviour. It is not
+ * otherwise recommended for general use.
+ */
+/*
+ * In addition to the explanation mentioned above,
+ * several characters in the East Asian Narrow (Na) and Not East Asian
+ * (Neutral) category as defined in Unicode Technical Report #11
+ * actually have a column width of 2 in CJK legacy encodings.
+ */
+int wcwidth_cjk(wchar_t ucs)
+{
+ /* sorted list of non-overlapping intervals of East Asian Ambiguous
+ * characters, generated by "uniset +WIDTH-A -cat=Me -cat=Mn -cat=Cf c" */
+ static const struct interval ambiguous[] = {
+ { 0x00A1, 0x00A1 }, { 0x00A4, 0x00A4 }, { 0x00A7, 0x00A8 },
+ { 0x00AA, 0x00AA }, { 0x00AE, 0x00AE }, { 0x00B0, 0x00B4 },
+ { 0x00B6, 0x00BA }, { 0x00BC, 0x00BF }, { 0x00C6, 0x00C6 },
+ { 0x00D0, 0x00D0 }, { 0x00D7, 0x00D8 }, { 0x00DE, 0x00E1 },
+ { 0x00E6, 0x00E6 }, { 0x00E8, 0x00EA }, { 0x00EC, 0x00ED },
+ { 0x00F0, 0x00F0 }, { 0x00F2, 0x00F3 }, { 0x00F7, 0x00FA },
+ { 0x00FC, 0x00FC }, { 0x00FE, 0x00FE }, { 0x0101, 0x0101 },
+ { 0x0111, 0x0111 }, { 0x0113, 0x0113 }, { 0x011B, 0x011B },
+ { 0x0126, 0x0127 }, { 0x012B, 0x012B }, { 0x0131, 0x0133 },
+ { 0x0138, 0x0138 }, { 0x013F, 0x0142 }, { 0x0144, 0x0144 },
+ { 0x0148, 0x014B }, { 0x014D, 0x014D }, { 0x0152, 0x0153 },
+ { 0x0166, 0x0167 }, { 0x016B, 0x016B }, { 0x01CE, 0x01CE },
+ { 0x01D0, 0x01D0 }, { 0x01D2, 0x01D2 }, { 0x01D4, 0x01D4 },
+ { 0x01D6, 0x01D6 }, { 0x01D8, 0x01D8 }, { 0x01DA, 0x01DA },
+ { 0x01DC, 0x01DC }, { 0x0251, 0x0251 }, { 0x0261, 0x0261 },
+ { 0x02C4, 0x02C4 }, { 0x02C7, 0x02C7 }, { 0x02C9, 0x02CB },
+ { 0x02CD, 0x02CD }, { 0x02D0, 0x02D0 }, { 0x02D8, 0x02DB },
+ { 0x02DD, 0x02DD }, { 0x02DF, 0x02DF }, { 0x0391, 0x03A1 },
+ { 0x03A3, 0x03A9 }, { 0x03B1, 0x03C1 }, { 0x03C3, 0x03C9 },
+ { 0x0401, 0x0401 }, { 0x0410, 0x044F }, { 0x0451, 0x0451 },
+ { 0x2010, 0x2010 }, { 0x2013, 0x2016 }, { 0x2018, 0x2019 },
+ { 0x201C, 0x201D }, { 0x2020, 0x2022 }, { 0x2024, 0x2027 },
+ { 0x2030, 0x2030 }, { 0x2032, 0x2033 }, { 0x2035, 0x2035 },
+ { 0x203B, 0x203B }, { 0x203E, 0x203E }, { 0x2074, 0x2074 },
+ { 0x207F, 0x207F }, { 0x2081, 0x2084 }, { 0x20AC, 0x20AC },
+ { 0x2103, 0x2103 }, { 0x2105, 0x2105 }, { 0x2109, 0x2109 },
+ { 0x2113, 0x2113 }, { 0x2116, 0x2116 }, { 0x2121, 0x2122 },
+ { 0x2126, 0x2126 }, { 0x212B, 0x212B }, { 0x2153, 0x2154 },
+ { 0x215B, 0x215E }, { 0x2160, 0x216B }, { 0x2170, 0x2179 },
+ { 0x2190, 0x2199 }, { 0x21B8, 0x21B9 }, { 0x21D2, 0x21D2 },
+ { 0x21D4, 0x21D4 }, { 0x21E7, 0x21E7 }, { 0x2200, 0x2200 },
+ { 0x2202, 0x2203 }, { 0x2207, 0x2208 }, { 0x220B, 0x220B },
+ { 0x220F, 0x220F }, { 0x2211, 0x2211 }, { 0x2215, 0x2215 },
+ { 0x221A, 0x221A }, { 0x221D, 0x2220 }, { 0x2223, 0x2223 },
+ { 0x2225, 0x2225 }, { 0x2227, 0x222C }, { 0x222E, 0x222E },
+ { 0x2234, 0x2237 }, { 0x223C, 0x223D }, { 0x2248, 0x2248 },
+ { 0x224C, 0x224C }, { 0x2252, 0x2252 }, { 0x2260, 0x2261 },
+ { 0x2264, 0x2267 }, { 0x226A, 0x226B }, { 0x226E, 0x226F },
+ { 0x2282, 0x2283 }, { 0x2286, 0x2287 }, { 0x2295, 0x2295 },
+ { 0x2299, 0x2299 }, { 0x22A5, 0x22A5 }, { 0x22BF, 0x22BF },
+ { 0x2312, 0x2312 }, { 0x2460, 0x24E9 }, { 0x24EB, 0x254B },
+ { 0x2550, 0x2573 }, { 0x2580, 0x258F }, { 0x2592, 0x2595 },
+ { 0x25A0, 0x25A1 }, { 0x25A3, 0x25A9 }, { 0x25B2, 0x25B3 },
+ { 0x25B6, 0x25B7 }, { 0x25BC, 0x25BD }, { 0x25C0, 0x25C1 },
+ { 0x25C6, 0x25C8 }, { 0x25CB, 0x25CB }, { 0x25CE, 0x25D1 },
+ { 0x25E2, 0x25E5 }, { 0x25EF, 0x25EF }, { 0x2605, 0x2606 },
+ { 0x2609, 0x2609 }, { 0x260E, 0x260F }, { 0x2614, 0x2615 },
+ { 0x261C, 0x261C }, { 0x261E, 0x261E }, { 0x2640, 0x2640 },
+ { 0x2642, 0x2642 }, { 0x2660, 0x2661 }, { 0x2663, 0x2665 },
+ { 0x2667, 0x266A }, { 0x266C, 0x266D }, { 0x266F, 0x266F },
+ { 0x273D, 0x273D }, { 0x2776, 0x277F }, { 0xE000, 0xF8FF },
+ { 0xFFFD, 0xFFFD }, { 0xF0000, 0xFFFFD }, { 0x100000, 0x10FFFD }
+ };
+
+ /* For Japanese legacy encodings, the following characters are added. */
+ static const struct interval legacy_ja[] = {
+ { 0x00A2, 0x00A3 }, { 0x00A5, 0x00A6 }, { 0x00AC, 0x00AC },
+ { 0x00AF, 0x00AF }, { 0x2212, 0x2212 }
+ };
+
+ /* binary search in table of non-spacing characters */
+ if (bisearch(ucs, ambiguous,
+ sizeof(ambiguous) / sizeof(struct interval) - 1))
+ return 2;
+ if (bisearch(ucs, legacy_ja,
+ sizeof(legacy_ja) / sizeof(struct interval) - 1))
+ return 2;
+
+ return wcwidth_ucs(ucs);
+}
+
#if 0 /* original */
-int wcswidth(const wchar_t *pwcs, size_t n)
+int wcswidth_cjk(const wchar_t *pwcs, size_t n)
{
int w, width = 0;
for (;*pwcs && n-- > 0; pwcs++)
- if ((w = wcwidth(*pwcs)) < 0)
+ if ((w = wcwidth_cjk(*pwcs)) < 0)
return -1;
else
width += w;
@@ -183,3 +339,4 @@ int wcswidth(const wchar_t *pwcs, size_t n)
return width;
}
#endif
+#endif /* !HAVE_WC_FUNCS || USE_CJK_WIDTH */
diff --git a/PATCHES b/PATCHES
index e69de29..17743fd 100644
--- a/PATCHES
+++ b/PATCHES
@@ -0,0 +1,5 @@
+patch-1.5.23.tt+yy.delete_prefix.1
+patch-1.5.23.tt.create_rfc2047_params.1
+patch-1.5.23.tt.sanitize_ja.1
+patch-1.5.23.tt.cjk_width_tree_chars.1
+patch-1.5.23.tt.wcwidth.1
diff --git a/charset.c b/charset.c
index 2411f2c..6a5cbd4 100644
--- a/charset.c
+++ b/charset.c
@@ -481,6 +481,9 @@ int mutt_convert_string (char **ps, const char *from, const char *to, int flags)
if (!s || !*s)
return 0;
+ if (option (OPTSANITIZEJACHARS) && !ascii_strncasecmp (from, "iso-2022-jp", 11))
+ mutt_sanitize_ja_chars (s, mutt_strlen(s), 0);
+
if (to && from && (cd = mutt_iconv_open (to, from, flags)) != (iconv_t)-1)
{
int len;
@@ -677,3 +680,188 @@ int mutt_check_charset (const char *s, int strict)
return -1;
}
+
+/*
+ * mutt_sanitize_ja_chars()
+ * Adapted by TAKIZAWA Takashi <taki@cyber.email.ne.jp>
+ *
+ * - It replaces undefined KANJI characters to GETA mark.
+ * - It replaces character of 'JIS X 0201 kana' to '?'.
+ * - If $charset is EUC-JP, it replaces third character 'J' of
+ * escape sequence switching to 'JIS X 0201 latin' to 'B' indicating
+ * 'US-ASCII'.
+ * - If $charset is Shift_JIS, it replaces third character 'B' of
+ * escape sequence switching to 'US-ASCII' to 'J' indicating
+ * 'JIS X 0201 latin'.
+ */
+
+#define ASCII 0
+#define JISX0201LATIN 1
+#define JISX0201KANA 2
+#define JISX0208 3
+#define OTHER_CS 4
+
+void mutt_sanitize_ja_chars(char *s, size_t len, int keep_state)
+{
+ static int cs = ASCII;
+ static int kanji_cont = 0;
+ static int illegal_kanji = 0;
+ static int es = 0;
+ static char pes = '\0';
+ static char ascii_3rd_char = 'B';
+ static char jisx0201_3rd_char = 'J';
+
+ char *p = s;
+ char *p1 = NULL;
+ unsigned char c;
+
+ if (!keep_state || *p == 0x1b) /* consideration about mbstate's buffer */
+ {
+ if (!ascii_strcasecmp (Charset, "euc-jp"))
+ jisx0201_3rd_char = 'B';
+ else if (!ascii_strcasecmp (Charset, "shift_jis"))
+ ascii_3rd_char = 'J';
+ cs = ASCII;
+ kanji_cont = 0;
+ illegal_kanji = 0;
+ es = 0;
+ pes = '\0';
+ }
+
+ for (;p - s < len;p++)
+ {
+ if (es == 0)
+ {
+ if (*p == 0x1b)
+ es++;
+ else
+ {
+ switch (cs)
+ {
+ case ASCII:
+ case JISX0201LATIN:
+ break;
+ case JISX0201KANA:
+ *p = '?';
+ break;
+ case JISX0208:
+ /* replace ku-ten code from 9 to 15 and 85 or more to "GETA MARK" */
+ c = (unsigned char)*p;
+ if (! kanji_cont)
+ {
+ if ((size_t)(p - s + 1) == len)
+ return; /* the last character is a primary byte of KANJI */
+ if (c <= 0x20 || (c >= 0x29 && c <= 0x2f)
+ || (c >= 0x75 && c <= 0xa0))
+ illegal_kanji = 1;
+ kanji_cont = 1;
+ p1 = p;
+ }
+ else
+ {
+ if (c <= 0x20 || c >= 0x7f)
+ illegal_kanji = 1;
+ if (illegal_kanji && p1)
+ *p1 = 0x22, *p = 0x2e;
+ kanji_cont = 0;
+ illegal_kanji = 0;
+ }
+ break;
+ }
+ }
+ }
+ else if (es == 1)
+ {
+ if (*p == '$' || (*p >= '(' && *p <= '/' && *p != ','))
+ {
+ es++;
+ pes = *p;
+ }
+ else
+ {
+ es = 0;
+ return; /* broken */
+ }
+ }
+ else if (es == 2)
+ {
+ if (pes == '(')
+ {
+ switch (*p)
+ {
+ case 'B':
+ cs = ASCII, *p = ascii_3rd_char;
+ break;
+ case 'J':
+ cs = JISX0201LATIN, *p = jisx0201_3rd_char;
+ break;
+ case 'I':
+ /* ready to replace character to '?' */
+ cs = JISX0201KANA, *p = ascii_3rd_char;
+ break;
+ default:
+ cs = OTHER_CS;
+ }
+ es = 0;
+ }
+ else if (pes == '$')
+ {
+ switch (*p)
+ {
+ case '@': /* JIS X 0208-1978 */
+ case 'B': /* JIS X 0208-1983 */
+ cs = JISX0208;
+ es = 0;
+ break;
+ case 'A':
+ cs = OTHER_CS; /* GB 2312 */
+ es = 0;
+ break;
+ case '(':
+ case ')':
+ case '*':
+ case '+':
+ case '-':
+ case '.':
+ case '/':
+ es++;
+ break;
+ default:
+ es = 0;
+ return; /* broken */
+ }
+ }
+ else
+ {
+ cs = OTHER_CS;
+ es = 0;
+ }
+ }
+ else /* es == 3 */
+ {
+ cs = OTHER_CS;
+ es = 0;
+ }
+ }
+}
+
+int mutt_copy_bytes_sanitize_ja (FILE *in, FILE *out, size_t size)
+{
+ char buf[2048];
+ size_t chunk;
+
+ mutt_sanitize_ja_chars (NULL, 0, 0);
+ while (size > 0)
+ {
+ chunk = (size > sizeof (buf)) ? sizeof (buf) : size;
+ if ((chunk = fread (buf, 1, chunk, in)) < 1)
+ break;
+ mutt_sanitize_ja_chars (buf, chunk, 1);
+ if (fwrite (buf, 1, chunk, out) != chunk)
+ return (-1);
+ size -= chunk;
+ }
+
+ return 0;
+}
+
diff --git a/charset.h b/charset.h
index 54891f0..d67b209 100644
--- a/charset.h
+++ b/charset.h
@@ -36,6 +36,9 @@ int iconv_close (iconv_t);
int mutt_convert_string (char **, const char *, const char *, int);
+void mutt_sanitize_ja_chars (char *, size_t, int);
+int mutt_copy_bytes_sanitize_ja (FILE *, FILE *, size_t);
+
iconv_t mutt_iconv_open (const char *, const char *, int);
size_t mutt_iconv (iconv_t, ICONV_CONST char **, size_t *, char **, size_t *, ICONV_CONST char **, const char *);
diff --git a/configure.ac b/configure.ac
index d8aebe3..46e41eb 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1464,6 +1464,16 @@ fi
dnl -- locales --
+AC_ARG_ENABLE(cjk-ambiguous-width, AC_HELP_STRING([--enable-cjk-ambiguous-width], [ Enable East Asian Ambiguous characters support (using own wcwidth)]),
+ [ if test "x$enableval" = "xyes" ; then
+ cjk_width=yes
+ fi
+ ])
+if test "x$cjk_width" = "xyes" ; then
+ AC_DEFINE(USE_CJK_WIDTH,1,[ Define if you want to support East Asian Ambiguous class. ])
+ MUTT_LIB_OBJECTS="$MUTT_LIB_OBJECTS wcwidth.o"
+fi
+
AC_CHECK_HEADERS(wchar.h)
AC_CACHE_CHECK([for wchar_t], mutt_cv_wchar_t,
@@ -1534,7 +1544,10 @@ fi
if test $wc_funcs = yes; then
AC_DEFINE(HAVE_WC_FUNCS,1,[ Define if you are using the system's wchar_t functions. ])
else
- MUTT_LIB_OBJECTS="$MUTT_LIB_OBJECTS utf8.o wcwidth.o"
+ MUTT_LIB_OBJECTS="$MUTT_LIB_OBJECTS utf8.o"
+ if test "x$cjk_width" != "xyes"; then
+ MUTT_LIB_OBJECTS="$MUTT_LIB_OBJECTS wcwidth.o"
+ fi
fi
AC_CACHE_CHECK([for nl_langinfo and CODESET], mutt_cv_langinfo_codeset,
diff --git a/curs_lib.c b/curs_lib.c
index 5472f45..ad12f4a 100644
--- a/curs_lib.c
+++ b/curs_lib.c
@@ -1220,7 +1220,14 @@ void mutt_format_string (char *dest, size_t destlen,
wc = replacement_char ();
}
if (arboreal && wc < MUTT_TREE_MAX)
- w = 1; /* hack */
+ {
+#ifdef USE_CJK_WIDTH
+ if (Charset_is_utf8 && option (OPTCJKWIDTHTREECHARS) && !option (OPTASCIICHARS))
+ w = wcwidth (TreeChars[wc]);
+ else
+#endif
+ w = 1;
+ }
else
{
#ifdef HAVE_ISWBLANK
@@ -1449,10 +1456,12 @@ int mutt_strwidth (const char *s)
int w;
size_t k, n;
mbstate_t mbstate;
+ int arboreal;
if (!s) return 0;
n = mutt_strlen (s);
+ arboreal = (s[0] < MUTT_TREE_MAX) ? 1 : 0;
memset (&mbstate, 0, sizeof (mbstate));
for (w=0; n && (k = mbrtowc (&wc, s, n, &mbstate)); s += k, n -= k)
@@ -1464,9 +1473,21 @@ int mutt_strwidth (const char *s)
k = (k == (size_t)(-1)) ? 1 : n;
wc = replacement_char ();
}
- if (!IsWPrint (wc))
- wc = '?';
- w += wcwidth (wc);
+ if (wc < MUTT_TREE_MAX && arboreal && k == 1)
+ {
+#ifdef USE_CJK_WIDTH
+ if (Charset_is_utf8 && option (OPTCJKWIDTHTREECHARS) && !option (OPTASCIICHARS))
+ w += wcwidth (TreeChars[wc]);
+ else
+#endif
+ w++;
+ }
+ else
+ {
+ if (!IsWPrint (wc))
+ wc = '?';
+ w += wcwidth (wc);
+ }
}
return w;
}
diff --git a/doc/makedoc-defs.h b/doc/makedoc-defs.h
index 78a4ebc..dd872ba 100644
--- a/doc/makedoc-defs.h
+++ b/doc/makedoc-defs.h
@@ -31,10 +31,10 @@
# ifndef USE_SOCKET
# define USE_SOCKET
# endif
-# ifndef USE_DOTLOCK
+# if !defined(USE_DOTLOCK) && !defined(NO_USE_HARDLINK)
# define USE_DOTLOCK
# endif
-# ifndef DL_STANDALONE
+# if !defined(DL_STANDALONE) && !defined(NO_USE_HARDLINK)
# define DL_STANDALONE
# endif
# ifndef USE_HCACHE
diff --git a/dotlock.c b/dotlock.c
index 5bf0348..5d87850 100644
--- a/dotlock.c
+++ b/dotlock.c
@@ -52,13 +52,13 @@
#include <getopt.h>
#endif
-#ifdef DL_STANDALONE
+#if defined(DL_STANDALONE) && !defined(NO_USE_HARDLINK)
# include "reldate.h"
#endif
#define MAXLINKS 1024 /* maximum link depth */
-#ifdef DL_STANDALONE
+#if defined(DL_STANDALONE) && !defined(NO_USE_HARDLINK)
# define LONG_STRING 1024
# define MAXLOCKATTEMPT 5
@@ -96,7 +96,7 @@ extern int snprintf (char *, size_t, const char *, ...);
static int DotlockFlags;
static int Retry = MAXLOCKATTEMPT;
-#ifdef DL_STANDALONE
+#if defined(DL_STANDALONE) && !defined(NO_USE_HARDLINK)
static char *Hostname;
#endif
@@ -110,7 +110,7 @@ static int dotlock_prepare (char *, size_t, const char *, int fd);
static int dotlock_check_stats (struct stat *, struct stat *);
static int dotlock_dispatch (const char *, int fd);
-#ifdef DL_STANDALONE
+#if defined(DL_STANDALONE) && !defined(NO_USE_HARDLINK)
static int dotlock_init_privs (void);
static void usage (const char *);
#endif
@@ -130,7 +130,7 @@ static int dotlock_unlink (const char *);
static int dotlock_lock (const char *);
-#ifdef DL_STANDALONE
+#if defined(DL_STANDALONE) && !defined(NO_USE_HARDLINK)
#define check_flags(a) if (a & DL_FL_ACTIONS) usage (argv[0])
@@ -327,7 +327,7 @@ END_PRIVILEGED (void)
#endif
}
-#ifdef DL_STANDALONE
+#if defined(DL_STANDALONE) && !defined(NO_USE_HARDLINK)
/*
* Usage information.
diff --git a/globals.h b/globals.h
index cecb46d..276c360 100644
--- a/globals.h
+++ b/globals.h
@@ -24,7 +24,7 @@ WHERE CONTEXT *Context;
WHERE char Errorbuf[STRING];
WHERE char AttachmentMarker[STRING];
-#if defined(DL_STANDALONE) && defined(USE_DOTLOCK)
+#if defined(DL_STANDALONE) && defined(USE_DOTLOCK) && !defined(NO_USE_HARDLINK)
WHERE char *MuttDotlock;
#endif
@@ -303,9 +303,31 @@ const char * const Months[] = { "Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul",
const char * const BodyTypes[] = { "x-unknown", "audio", "application", "image", "message", "model", "multipart", "text", "video" };
const char * const BodyEncodings[] = { "x-unknown", "7bit", "8bit", "quoted-printable", "base64", "binary", "x-uuencoded" };
+#ifdef USE_CJK_WIDTH
+const wchar_t TreeChars[] =
+{
+ 0xFEFF, /* not used */
+ 0x2514, /* M_TREE_LLCORNER WACS_LLCORNER */
+ 0x250C, /* M_TREE_ULCORNER WACS_ULCORNER */
+ 0x251C, /* M_TREE_LTEE WACS_LTEE */
+ 0x2500, /* M_TREE_HLINE WACS_HLINE */
+ 0x2502, /* M_TREE_VLINE WACS_VLINE */
+ 0x0020, /* M_TREE_SPACE */
+ 0x003E, /* M_TREE_RARROW */
+ 0x002A, /* M_TREE_STAR fake thread indicator */
+ 0x0026, /* M_TREE_HIDDEN */
+ 0x003D, /* M_TREE_EQUALS */
+ 0x252C, /* M_TREE_TTEE WACS_TTEE */
+ 0x2534, /* M_TREE_BTEE WACS_BTEE */
+ 0x003F /* M_TREE_MISSING */
+};
+#endif /* USE_CJK_WIDTH */
#else
extern const char * const Weekdays[];
extern const char * const Months[];
+#ifdef USE_CJK_WIDTH
+extern const wchar_t TreeChars[];
+#endif /* USE_CJK_WIDTH */
#endif
#ifdef MAIN_C
diff --git a/handler.c b/handler.c
index c332e1f..0ed8615 100644
--- a/handler.c
+++ b/handler.c
@@ -100,6 +100,9 @@ static void mutt_convert_to_state(iconv_t cd, char *bufi, size_t *l, STATE *s)
return;
}
+ if (option (OPTSANITIZEJACHARS) && strchr (bufi, 0x1b))
+ mutt_sanitize_ja_chars (bufi, *l, 1);
+
ib = bufi, ibl = *l;
for (;;)
{
@@ -1312,6 +1315,7 @@ static int autoview_handler (BODY *a, STATE *s)
int piped = FALSE;
pid_t thepid;
int rc = 0;
+ char *charset;
snprintf (type, sizeof (type), "%s/%s", TYPE (a), a->subtype);
rfc1524_mailcap_lookup (a, type, entry, MUTT_AUTOVIEW);
@@ -1342,6 +1346,10 @@ static int autoview_handler (BODY *a, STATE *s)
return -1;
}
+ charset = mutt_get_parameter ("charset", a->parameter);
+ if (charset && option (OPTSANITIZEJACHARS) && !ascii_strncasecmp (charset,"iso-2022-jp", 11))
+ mutt_copy_bytes_sanitize_ja (s->fpin, fpin, a->length);
+ else
mutt_copy_bytes (s->fpin, fpin, a->length);
if(!piped)
diff --git a/hdrline.c b/hdrline.c
index 5e79d32..85df51e 100644
--- a/hdrline.c
+++ b/hdrline.c
@@ -272,6 +272,7 @@ hdr_format_str (char *dest,
#define THREAD_NEW (threads && hdr->collapsed && hdr->num_hidden > 1 && mutt_thread_contains_unread (ctx, hdr) == 1)
#define THREAD_OLD (threads && hdr->collapsed && hdr->num_hidden > 1 && mutt_thread_contains_unread (ctx, hdr) == 2)
size_t len;
+ char *subj;
hdr = hfi->hdr;
ctx = hfi->ctx;
@@ -590,6 +591,7 @@ hdr_format_str (char *dest,
subj = apply_subject_mods(hdr->env);
else
subj = hdr->env->subject;
+ subj = option (OPTDELETEPREFIX) ? hdr->env->real_subj : hdr->env->subject;
if (flags & MUTT_FORMAT_TREE && !hdr->collapsed)
{
if (flags & MUTT_FORMAT_FORCESUBJ)
diff --git a/init.h b/init.h
index 1258507..3ece61c 100644
--- a/init.h
+++ b/init.h
@@ -442,6 +442,31 @@ struct option_t MuttVars[] = {
** this variable is \fIunset\fP, no check for new mail is performed
** while the mailbox is open.
*/
+#ifdef USE_CJK_WIDTH
+ { "cjk_width", DT_BOOL, R_NONE, OPTCJKWIDTH, 0 },
+ /*
+ ** .pp
+ ** When this option is set, characters in the East Asian Ambiguous (A)
+ ** category as defined in Unicode Technical Report #11 have a column
+ ** width of 2. Othrwise, they have a column width of 1.
+ ** This variant might be useful for users of CJK legacy encodings
+ ** who want to migrate to UCS without changing the traditional terminal
+ ** character-width behaviour.
+ ** .pp
+ ** \fBNote:\fP this option only affects in UTF-8 encoding.
+ */
+ { "cjk_width_tree_chars", DT_BOOL, R_NONE, OPTCJKWIDTHTREECHARS, 0 },
+ /*
+ ** .pp
+ ** If \fIset\fP, Mutt will use the result of $cjk_width as a column
+ ** width of WACS characters when displaying thread and attachment trees.
+ ** This variant might be useful for users of CJK legacy encodings
+ ** who want to migrate to UCS without changing the traditional terminal
+ ** character-width behaviour.
+ ** .pp
+ ** \fBNote:\fP this option only affects in UTF-8 encoding.
+ */
+#endif
{ "collapse_unread", DT_BOOL, R_NONE, OPTCOLLAPSEUNREAD, 1 },
/*
** .pp
@@ -647,6 +672,17 @@ struct option_t MuttVars[] = {
** If \fI``no''\fP, never attempt to verify cryptographic signatures.
** (Crypto only)
*/
+ { "create_rfc2047_parameters", DT_BOOL, R_NONE, OPTCREATERFC2047PARAMS, 0 },
+ /*
+ ** .pp
+ ** When this variable is set, Mutt will add the following RFC-2047-encoded
+ ** MIME parameter to Content-Type header field as filename for attachment:
+ ** name="=?iso-2022-jp?B?GyRCO244MxsoQi50eHQ=?="
+ ** .pp
+ ** Note: this use of RFC 2047's encoding is explicitly prohibited
+ ** by the standard. You may set this variable only if a mailer
+ ** of recipients can not parse RFC 2231 parameters.
+ */
{ "date_format", DT_STR, R_MENU, UL &DateFmt, UL "!%a, %b %d, %Y at %I:%M:%S%p %Z" },
/*
** .pp
@@ -698,6 +734,19 @@ struct option_t MuttVars[] = {
** If this option is \fIset\fP, mutt's received-attachments menu will not show the subparts of
** individual messages in a multipart/digest. To see these subparts, press ``v'' on that menu.
*/
+ { "delete_prefix", DT_BOOL, R_NONE, OPTDELETEPREFIX, 0 },
+ /*
+ ** .pp
+ ** If set, prefix in Subject: field generated by some mailing lists
+ ** (something like "Subject: [foo-ML:0012] real-subject") can be deleted
+ ** when displaying in index-mode and editing in message reply.
+ ** Deletion pattern can be configured by $$delete_regexp variable.
+ */
+ { "delete_regexp", DT_RX, R_NONE, UL &DeleteRegexp, UL "^(\\[[A-Za-z0-9_.: \\-]*\\][ ]*)" },
+ /*
+ ** .pp
+ ** A regular expression used in $$delete_prefix function.
+ */
{ "display_filter", DT_PATH, R_PAGER, UL &DisplayFilter, UL "" },
/*
** .pp
@@ -705,7 +754,7 @@ struct option_t MuttVars[] = {
** is viewed it is passed as standard input to $$display_filter, and the
** filtered message is read from the standard output.
*/
-#if defined(DL_STANDALONE) && defined(USE_DOTLOCK)
+#if defined(DL_STANDALONE) && defined(USE_DOTLOCK) && !defined(NO_USE_HARDLINK)
{ "dotlock_program", DT_PATH, R_NONE, UL &MuttDotlock, UL BINDIR "/mutt_dotlock" },
/*
** .pp
@@ -2828,6 +2877,28 @@ struct option_t MuttVars[] = {
** that mutt \fIgenerates\fP this kind of encoding. Instead, mutt will
** unconditionally use the encoding specified in RFC2231.
*/
+ { "sanitize_ja_chars", DT_BOOL, R_NONE, OPTSANITIZEJACHARS, 0 },
+ /*
+ ** .pp
+ ** When set, Japanese "platform dependent characters" (illegal
+ ** characters for iso-2022-jp charset; mainly used by MS-Windows
+ ** mailers) are substituted to special character, GETA mark ('ESC $$ B " .
+ ** ESC ( B' in iso-2022-jp), and JIS X 0201 kana characters
+ ** (only for "ESC ) I" cases) are also substituted to "?" to
+ ** prevent garbage characters. JIS X 0201 kana characters are
+ ** not substituted if they appear in 8bit form.
+ ** .pp
+ ** This fixes another Japanese encoding issue. In case $$charset
+ ** is set to "EUC-JP", which does not contain JIS X 0201 roman
+ ** character set, the JIS X 0201 roman part of received messages
+ ** encoded in iso-2022-jp can not be converted to EUC-JP.
+ ** On the other hand, the ASCII part can not be converted to
+ ** Shift_JIS, which does not contain ASCII character set. Thus,
+ ** the converted characters are garbled in these cases. When this
+ ** option is set, the JIS X 0201 roman escape sequence and the
+ ** ASCII escape sequence are replaced appropriately to prevent
+ ** the output from being garbled.
+ */
{ "save_address", DT_BOOL, R_NONE, OPTSAVEADDRESS, 0 },
/*
** .pp
diff --git a/lib.c b/lib.c
index 345d3be..9d86412 100644
--- a/lib.c
+++ b/lib.c
@@ -445,6 +445,10 @@ int safe_symlink(const char *oldpath, const char *newpath)
int safe_rename (const char *src, const char *target)
{
+#ifdef NO_USE_HARDLINK
+ /* Android (since 6.0) does not support hardlinks. */
+ return rename(src, target);
+#else
struct stat ssb, tsb;
int link_errno;
@@ -569,6 +573,7 @@ success:
return 0;
+#endif /* NO_USE_HARDLINK */
}
diff --git a/main.c b/main.c
index e52c134..63f742a 100644
--- a/main.c
+++ b/main.c
@@ -271,25 +271,25 @@ static void show_version (void)
"-USE_SETGID "
#endif
-#ifdef USE_DOTLOCK
+#if defined(USE_DOTLOCK) && !defined(NO_USE_HARDLINK)
"+USE_DOTLOCK "
#else
"-USE_DOTLOCK "
#endif
-#ifdef DL_STANDALONE
+#if defined(DL_STANDALONE) && !defined(NO_USE_HARDLINK)
"+DL_STANDALONE "
#else
"-DL_STANDALONE "
#endif
-#ifdef USE_FCNTL
+#if defined(USE_FCNTL) || defined(NO_USE_HARDLINK)
"+USE_FCNTL "
#else
"-USE_FCNTL "
#endif
-#ifdef USE_FLOCK
+#if defined(USE_FLOCK) || defined(NO_USE_HARDLINK)
"+USE_FLOCK "
#else
"-USE_FLOCK "
@@ -456,6 +456,12 @@ static void show_version (void)
"-LOCALES_HACK "
#endif
+#ifdef USE_CJK_WIDTH
+ "+USE_CJK_WIDTH "
+#else
+ "-USE_CJK_WIDTH "
+#endif
+
#ifdef HAVE_WC_FUNCS
"+HAVE_WC_FUNCS "
#else
diff --git a/mbyte.c b/mbyte.c
index b4df70a..cf28d70 100644
--- a/mbyte.c
+++ b/mbyte.c
@@ -17,7 +17,7 @@
*/
/*
- * Japanese support by TAKIZAWA Takashi <taki@luna.email.ne.jp>.
+ * CJK support by TAKIZAWA Takashi <taki@luna.email.ne.jp>.
*/
#if HAVE_CONFIG_H
@@ -37,8 +37,8 @@
#endif
int Charset_is_utf8 = 0;
+static int charset_is_cjk = 0;
#ifndef HAVE_WC_FUNCS
-static int charset_is_ja = 0;
static iconv_t charset_to_utf8 = (iconv_t)(-1);
static iconv_t charset_from_utf8 = (iconv_t)(-1);
#endif
@@ -50,8 +50,8 @@ void mutt_set_charset (char *charset)
mutt_canonical_charset (buffer, sizeof (buffer), charset);
Charset_is_utf8 = 0;
+ charset_is_cjk = 0;
#ifndef HAVE_WC_FUNCS
- charset_is_ja = 0;
if (charset_to_utf8 != (iconv_t)(-1))
{
iconv_close (charset_to_utf8);
@@ -66,12 +66,18 @@ void mutt_set_charset (char *charset)
if (mutt_is_utf8 (buffer))
Charset_is_utf8 = 1;
-#ifndef HAVE_WC_FUNCS
- else if (!ascii_strcasecmp(buffer, "euc-jp") || !ascii_strcasecmp(buffer, "shift_jis")
- || !ascii_strcasecmp(buffer, "cp932") || !ascii_strcasecmp(buffer, "eucJP-ms"))
+ else if (!ascii_strcasecmp (buffer, "gb2312") ||
+ !ascii_strcasecmp (buffer, "gb18030") ||
+ !ascii_strcasecmp (buffer, "big5") ||
+ !ascii_strcasecmp (buffer, "euc-tw") ||
+ !ascii_strcasecmp (buffer, "EUC-JP") ||
+ !ascii_strcasecmp (buffer, "eucJP-ms") ||
+ !ascii_strcasecmp (buffer, "Shift_JIS") ||
+ !ascii_strcasecmp (buffer, "cp932") ||
+ !ascii_strcasecmp (buffer, "euc-kr"))
{
- charset_is_ja = 1;
-
+ charset_is_cjk = 1;
+#ifndef HAVE_WC_FUNCS
/* Note flags=0 to skip charset-hooks: User masters the $charset
* name, and we are sure of our "utf-8" constant. So there is no
* possibility of wrong name that we would want to try to correct
@@ -80,24 +86,68 @@ void mutt_set_charset (char *charset)
*/
charset_to_utf8 = mutt_iconv_open ("utf-8", charset, 0);
charset_from_utf8 = mutt_iconv_open (charset, "utf-8", 0);
- }
#endif
+ }
#if defined(HAVE_BIND_TEXTDOMAIN_CODESET) && defined(ENABLE_NLS)
bind_textdomain_codeset(PACKAGE, buffer);
#endif
}
+#if !defined(HAVE_WC_FUNCS) || defined(USE_CJK_WIDTH)
+/*
+ * For systems that don't have wcwidth() which functions correctly,
+ * we provide our own wcwidth().
+ * Furthermore, this wcwidth() enables change of character-cell width of
+ * the East Asian Ambiguous class by using $cjk_width.
+ * The function which most systems have cannot do it.
+ * Please read the comment of wcwidth.c about the East Asian Ambiguous
+ * class for details.
+ */
+int wcwidth_ucs(wchar_t ucs);
+int wcwidth_cjk(wchar_t ucs);
+
+int wcwidth (wchar_t wc)
+{
+ if (!Charset_is_utf8)
+ {
+ if (!charset_is_cjk)
+ {
+ /* 8-bit case */
+ if (!wc)
+ return 0;
+ else if ((0 <= wc && wc < 256) && IsPrint (wc))
+ return 1;
+ else
+ return -1;
+ }
+ else
+ {
+ /* CJK */
+ return wcwidth_cjk (wc);
+ }
+ }
+ else {
+#ifdef USE_CJK_WIDTH
+ if (option (OPTCJKWIDTH))
+ return wcwidth_cjk (wc);
+#endif /* USE_CJK_WIDTH */
+ return wcwidth_ucs (wc);
+ }
+}
+#endif /* !HAVE_WC_FUNCS || USE_CJK_WIDTH */
+
+
#ifndef HAVE_WC_FUNCS
/*
* For systems that don't have them, we provide here our own
- * implementations of wcrtomb(), mbrtowc(), iswprint() and wcwidth().
+ * implementations of wcrtomb(), mbrtowc() and iswprint().
* Instead of using the locale, as these functions normally would,
* we use Mutt's Charset variable. We support 3 types of charset:
* (1) For 8-bit charsets, wchar_t uses the same encoding as char.
* (2) For UTF-8, wchar_t uses UCS.
- * (3) For stateless Japanese encodings, we use UCS and convert
+ * (3) For stateless CJK encodings, we use UCS and convert
* via UTF-8 using iconv.
* Unfortunately, we can't handle non-stateless encodings.
*/
@@ -256,7 +306,7 @@ size_t mbrtowc (wchar_t *pwc, const char *s, size_t n, mbstate_t *ps)
int iswprint (wint_t wc)
{
- if (Charset_is_utf8 || charset_is_ja)
+ if (Charset_is_utf8 || charset_is_cjk)
return ((0x20 <= wc && wc < 0x7f) || 0xa0 <= wc);
else
return (0 <= wc && wc < 256) ? IsPrint (wc) : 0;
@@ -264,7 +314,7 @@ int iswprint (wint_t wc)
int iswspace (wint_t wc)
{
- if (Charset_is_utf8 || charset_is_ja)
+ if (Charset_is_utf8 || charset_is_cjk)
return (9 <= wc && wc <= 13) || wc == 32;
else
return (0 <= wc && wc < 256) ? isspace (wc) : 0;
@@ -347,7 +397,7 @@ static int iswalpha_ucs (wint_t wc)
wint_t towupper (wint_t wc)
{
- if (Charset_is_utf8 || charset_is_ja)
+ if (Charset_is_utf8 || charset_is_cjk)
return towupper_ucs (wc);
else
return (0 <= wc && wc < 256) ? toupper (wc) : wc;
@@ -355,7 +405,7 @@ wint_t towupper (wint_t wc)
wint_t towlower (wint_t wc)
{
- if (Charset_is_utf8 || charset_is_ja)
+ if (Charset_is_utf8 || charset_is_cjk)
return towlower_ucs (wc);
else
return (0 <= wc && wc < 256) ? tolower (wc) : wc;
@@ -363,7 +413,7 @@ wint_t towlower (wint_t wc)
int iswalnum (wint_t wc)
{
- if (Charset_is_utf8 || charset_is_ja)
+ if (Charset_is_utf8 || charset_is_cjk)
return iswalnum_ucs (wc);
else
return (0 <= wc && wc < 256) ? isalnum (wc) : 0;
@@ -371,7 +421,7 @@ int iswalnum (wint_t wc)
int iswalpha (wint_t wc)
{
- if (Charset_is_utf8 || charset_is_ja)
+ if (Charset_is_utf8 || charset_is_cjk)
return iswalpha_ucs (wc);
else
return (0 <= wc && wc < 256) ? isalpha (wc) : 0;
@@ -379,58 +429,12 @@ int iswalpha (wint_t wc)
int iswupper (wint_t wc)
{
- if (Charset_is_utf8 || charset_is_ja)
+ if (Charset_is_utf8 || charset_is_cjk)
return iswupper_ucs (wc);
else
return (0 <= wc && wc < 256) ? isupper (wc) : 0;
}
-/*
- * l10n for Japanese:
- * Symbols, Greek and Cyrillic in JIS X 0208, Japanese Kanji
- * Character Set, have a column width of 2.
- */
-int wcwidth_ja (wchar_t ucs)
-{
- if (ucs >= 0x3021)
- return -1; /* continue with the normal check */
- /* a rough range for quick check */
- if ((ucs >= 0x00a1 && ucs <= 0x00fe) || /* Latin-1 Supplement */
- (ucs >= 0x0391 && ucs <= 0x0451) || /* Greek and Cyrillic */
- (ucs >= 0x2010 && ucs <= 0x266f) || /* Symbols */
- (ucs >= 0x3000 && ucs <= 0x3020)) /* CJK Symbols and Punctuation */
- return 2;
- else
- return -1;
-}
-
-int wcwidth_ucs(wchar_t ucs);
-
-int wcwidth (wchar_t wc)
-{
- if (!Charset_is_utf8)
- {
- if (!charset_is_ja)
- {
- /* 8-bit case */
- if (!wc)
- return 0;
- else if ((0 <= wc && wc < 256) && IsPrint (wc))
- return 1;
- else
- return -1;
- }
- else
- {
- /* Japanese */
- int k = wcwidth_ja (wc);
- if (k != -1)
- return k;
- }
- }
- return wcwidth_ucs (wc);
-}
-
size_t utf8rtowc (wchar_t *pwc, const char *s, size_t n, mbstate_t *_ps)
{
static wchar_t mbstate;
diff --git a/mbyte.h b/mbyte.h
index 9c58c9e..224cafb 100644
--- a/mbyte.h
+++ b/mbyte.h
@@ -8,6 +8,12 @@
# ifdef HAVE_WCTYPE_H
# include <wctype.h>
# endif
+# ifdef USE_CJK_WIDTH
+#ifdef wcwidth
+# undef wcwidth
+#endif
+int wcwidth (wchar_t wc);
+# endif /* USE_CJK_WIDTH */
# endif
# ifndef HAVE_WC_FUNCS
@@ -32,6 +38,9 @@
#ifdef iswupper
# undef iswupper
#endif
+#ifdef wcwidth
+# undef wcwidth
+#endif
size_t wcrtomb (char *s, wchar_t wc, mbstate_t *ps);
size_t mbrtowc (wchar_t *pwc, const char *s, size_t n, mbstate_t *ps);
int iswprint (wint_t wc);
@@ -44,7 +53,6 @@ wint_t towlower (wint_t wc);
int wcwidth (wchar_t wc);
# endif /* !HAVE_WC_FUNCS */
-
void mutt_set_charset (char *charset);
extern int Charset_is_utf8;
size_t utf8rtowc (wchar_t *pwc, const char *s, size_t n, mbstate_t *_ps);
diff --git a/mutt.h b/mutt.h
index d80face..fd2adc4 100644
--- a/mutt.h
+++ b/mutt.h
@@ -372,10 +372,16 @@ enum
OPTBROWSERABBRMAILBOXES,
OPTCHECKMBOXSIZE,
OPTCHECKNEW,
+#ifdef USE_CJK_WIDTH
+ OPTCJKWIDTH,
+ OPTCJKWIDTHTREECHARS,
+#endif /* USE_CJK_WIDTH */
OPTCOLLAPSEUNREAD,
OPTCONFIRMAPPEND,
OPTCONFIRMCREATE,
+ OPTCREATERFC2047PARAMS,
OPTDELETEUNTAG,
+ OPTDELETEPREFIX,
OPTDIGESTCOLLAPSE,
OPTDUPTHREADS,
OPTEDITHDRS,
@@ -477,6 +483,7 @@ enum
OPTREVNAME,
OPTREVREAL,
OPTRFC2047PARAMS,
+ OPTSANITIZEJACHARS,
OPTSAVEADDRESS,
OPTSAVEEMPTY,
OPTSAVENAME,
diff --git a/mutt_regex.h b/mutt_regex.h
index f10ecbe..3cd36e0 100644
--- a/mutt_regex.h
+++ b/mutt_regex.h
@@ -52,5 +52,6 @@ WHERE REGEXP QuoteRegexp;
WHERE REGEXP ReplyRegexp;
WHERE REGEXP Smileys;
WHERE REGEXP GecosMask;
+WHERE REGEXP DeleteRegexp;
#endif /* MUTT_REGEX_H */
diff --git a/mx.c b/mx.c
index 9d311ef..cd4d185 100644
--- a/mx.c
+++ b/mx.c
@@ -47,7 +47,7 @@
#include "buffy.h"
-#ifdef USE_DOTLOCK
+#if defined(USE_DOTLOCK) && !defined(NO_USE_HARDLINK)
#include "dotlock.h"
#endif
@@ -95,13 +95,13 @@ struct mx_ops* mx_get_ops (int magic)
#define mutt_is_spool(s) (mutt_strcmp (Spoolfile, s) == 0)
-#ifdef USE_DOTLOCK
+#if defined(USE_DOTLOCK) && !defined(NO_USE_HARDLINK)
/* parameters:
* path - file to lock
* retry - should retry if unable to lock?
*/
-#ifdef DL_STANDALONE
+#if defined(DL_STANDALONE) && !defined(NO_USE_HARDLINK)
static int invoke_dotlock (const char *path, int dummy, int flags, int retry)
{
@@ -181,14 +181,14 @@ static int undotlock_file (const char *path, int fd)
*/
int mx_lock_file (const char *path, int fd, int excl, int dot, int timeout)
{
-#if defined (USE_FCNTL) || defined (USE_FLOCK)
+#if defined (USE_FCNTL) || defined (USE_FLOCK) || defined(NO_USE_HARDLINK)
int count;
int attempt;
struct stat sb = { 0 }, prev_sb = { 0 }; /* silence gcc warnings */
#endif
int r = 0;
-#ifdef USE_FCNTL
+#if defined(USE_FCNTL) || defined(NO_USE_HARDLINK)
struct flock lck;
memset (&lck, 0, sizeof (struct flock));
@@ -227,7 +227,7 @@ int mx_lock_file (const char *path, int fd, int excl, int dot, int timeout)
}
#endif /* USE_FCNTL */
-#ifdef USE_FLOCK
+#if defined(USE_FLOCK) || defined(NO_USE_HARDLINK)
count = 0;
attempt = 0;
while (flock (fd, (excl ? LOCK_EX : LOCK_SH) | LOCK_NB) == -1)
@@ -261,7 +261,7 @@ int mx_lock_file (const char *path, int fd, int excl, int dot, int timeout)
}
#endif /* USE_FLOCK */
-#ifdef USE_DOTLOCK
+#if defined(USE_DOTLOCK) && !defined(NO_USE_HARDLINK)
if (r == 0 && dot)
r = dotlock_file (path, fd, timeout);
#endif /* USE_DOTLOCK */
@@ -270,12 +270,12 @@ int mx_lock_file (const char *path, int fd, int excl, int dot, int timeout)
{
/* release any other locks obtained in this routine */
-#ifdef USE_FCNTL
+#if defined(USE_FCNTL) || defined(NO_USE_HARDLINK)
lck.l_type = F_UNLCK;
fcntl (fd, F_SETLK, &lck);
#endif /* USE_FCNTL */
-#ifdef USE_FLOCK
+#if defined(USE_FLOCK) || defined(NO_USE_HARDLINK)
flock (fd, LOCK_UN);
#endif /* USE_FLOCK */
}
@@ -285,7 +285,7 @@ int mx_lock_file (const char *path, int fd, int excl, int dot, int timeout)
int mx_unlock_file (const char *path, int fd, int dot)
{
-#ifdef USE_FCNTL
+#if defined(USE_FCNTL) || defined(NO_USE_HARDLINK)
struct flock unlockit = { F_UNLCK, 0, 0, 0, 0 };
memset (&unlockit, 0, sizeof (struct flock));
@@ -294,11 +294,11 @@ int mx_unlock_file (const char *path, int fd, int dot)
fcntl (fd, F_SETLK, &unlockit);
#endif
-#ifdef USE_FLOCK
+#if defined(USE_FLOCK) || defined(NO_USE_HARDLINK)
flock (fd, LOCK_UN);
#endif
-#ifdef USE_DOTLOCK
+#if defined(USE_DOTLOCK) && !defined(NO_USE_HARDLINK)
if (dot)
undotlock_file (path, fd);
#endif
@@ -309,7 +309,7 @@ int mx_unlock_file (const char *path, int fd, int dot)
static void mx_unlink_empty (const char *path)
{
int fd;
-#ifndef USE_DOTLOCK
+#if !defined(USE_DOTLOCK) || defined(NO_USE_HARDLINK)
struct stat sb;
#endif
@@ -322,7 +322,7 @@ static void mx_unlink_empty (const char *path)
return;
}
-#ifdef USE_DOTLOCK
+#if defined(USE_DOTLOCK) && !defined(NO_USE_HARDLINK)
invoke_dotlock (path, fd, DL_FL_UNLINK, 1);
#else
if (fstat (fd, &sb) == 0 && sb.st_size == 0)
diff --git a/parse.c b/parse.c
index cc85785..e28f18f 100644
--- a/parse.c
+++ b/parse.c
@@ -1455,6 +1455,18 @@ ENVELOPE *mutt_read_rfc822_header (FILE *f, HEADER *hdr, short user_hdrs,
e->real_subj = e->subject + pmatch[0].rm_eo;
else
e->real_subj = e->subject;
+ if (option (OPTDELETEPREFIX))
+ {
+ /* if this option is set, mutt will delete the string as [prefix],
+ * [prefix:number] and [prefix number] in Subject line.
+ */
+ if (regexec (DeleteRegexp.rx, e->real_subj, 1, pmatch, 0) == 0)
+ {
+ e->real_subj = e->real_subj + pmatch[0].rm_eo;
+ if (regexec (ReplyRegexp.rx, e->real_subj, 1, pmatch, 0) == 0)
+ e->real_subj = e->real_subj + pmatch[0].rm_eo;
+ }
+ }
}
if (hdr->received < 0)
diff --git a/rfc2047.c b/rfc2047.c
index 8506425..e907b25 100644
--- a/rfc2047.c
+++ b/rfc2047.c
@@ -62,6 +62,9 @@ static size_t convert_string (ICONV_CONST char *f, size_t flen,
size_t obl, n;
int e;
+ if (option (OPTSANITIZEJACHARS) && !ascii_strncasecmp (from, "iso-2022-jp",
+11))
+ mutt_sanitize_ja_chars ((char *) f, flen, 0);
cd = mutt_iconv_open (to, from, 0);
if (cd == (iconv_t)(-1))
return (size_t)(-1);
diff --git a/sendlib.c b/sendlib.c
index 2128c94..2c45e68 100644
--- a/sendlib.c
+++ b/sendlib.c
@@ -348,6 +348,30 @@ int mutt_write_mime_header (BODY *a, FILE *f)
}
}
+ if (a->use_disp && option (OPTCREATERFC2047PARAMS))
+ {
+ if(!(fn = a->d_filename))
+ fn = a->filename;
+
+ if (fn)
+ {
+ char *tmp;
+
+ /* Strip off the leading path... */
+ if ((t = strrchr (fn, '/')))
+ t++;
+ else
+ t = fn;
+
+ buffer[0] = 0;
+ tmp = safe_strdup (t);
+ rfc2047_encode_string (&tmp);
+ rfc822_cat (buffer, sizeof (buffer), tmp, MimeSpecials);
+ FREE (&tmp);
+ fprintf (f, ";\n\tname=%s", buffer);
+ }
+ }
+
fputc ('\n', f);
if (a->description)
diff --git a/wcwidth.c b/wcwidth.c
index 0b94d73..85a1397 100644
--- a/wcwidth.c
+++ b/wcwidth.c
@@ -5,6 +5,51 @@
* http://www.opengroup.org/onlinepubs/007904975/functions/wcwidth.html
* http://www.opengroup.org/onlinepubs/007904975/functions/wcswidth.html
*
+ * In fixed-width output devices, Latin characters all occupy a single
+ * "cell" position of equal width, whereas ideographic CJK characters
+ * occupy two such cells. Interoperability between terminal-line
+ * applications and (teletype-style) character terminals using the
+ * UTF-8 encoding requires agreement on which character should advance
+ * the cursor by how many cell positions. No established formal
+ * standards exist at present on which Unicode character shall occupy
+ * how many cell positions on character terminals. These routines are
+ * a first attempt of defining such behavior based on simple rules
+ * applied to data provided by the Unicode Consortium.
+ *
+ * For some graphical characters, the Unicode standard explicitly
+ * defines a character-cell width via the definition of the East Asian
+ * FullWidth (F), Wide (W), Half-width (H), and Narrow (Na) classes.
+ * In all these cases, there is no ambiguity about which width a
+ * terminal shall use. For characters in the East Asian Ambiguous (A)
+ * class, the width choice depends purely on a preference of backward
+ * compatibility with either historic CJK or Western practice.
+ * Choosing single-width for these characters is easy to justify as
+ * the appropriate long-term solution, as the CJK practice of
+ * displaying these characters as double-width comes from historic
+ * implementation simplicity (8-bit encoded characters were displayed
+ * single-width and 16-bit ones double-width, even for Greek,
+ * Cyrillic, etc.) and not any typographic considerations.
+ *
+ * Much less clear is the choice of width for the Not East Asian
+ * (Neutral) class. Existing practice does not dictate a width for any
+ * of these characters. It would nevertheless make sense
+ * typographically to allocate two character cells to characters such
+ * as for instance EM SPACE or VOLUME INTEGRAL, which cannot be
+ * represented adequately with a single-width glyph. The following
+ * routines at present merely assign a single-cell width to all
+ * neutral characters, in the interest of simplicity. This is not
+ * entirely satisfactory and should be reconsidered before
+ * establishing a formal standard in this area. At the moment, the
+ * decision which Not East Asian (Neutral) characters should be
+ * represented by double-width glyphs cannot yet be answered by
+ * applying a simple rule from the Unicode database content. Setting
+ * up a proper standard for the behavior of UTF-8 character terminals
+ * will require a careful analysis not only of each Unicode character,
+ * but also of each presentation form, something the author of these
+ * routines has avoided to do so far.
+ *
+ * http://www.unicode.org/unicode/reports/tr11/
+ *
* Markus Kuhn -- 2007-05-26 (Unicode 5.0)
*
* Permission to use, copy, modify, and distribute this software
@@ -24,12 +69,34 @@
# include "config.h"
#endif
-#ifndef HAVE_WC_FUNCS
+#if !defined(HAVE_WC_FUNCS) || defined(USE_CJK_WIDTH)
+#include <wchar.h>
-#include "mutt.h"
-#include "mbyte.h"
+struct interval {
+ wchar_t first;
+ wchar_t last;
+};
+
+/* auxiliary function for binary search in interval table */
+static int bisearch(wchar_t ucs, const struct interval *table, int max) {
+ int min = 0;
+ int mid;
+
+ if (ucs < table[0].first || ucs > table[max].last)
+ return 0;
+ while (max >= min) {
+ mid = (min + max) / 2;
+ if (ucs > table[mid].last)
+ min = mid + 1;
+ else if (ucs < table[mid].first)
+ max = mid - 1;
+ else
+ return 1;
+ }
+
+ return 0;
+}
-#include <ctype.h>
/* The following two functions define the column width of an ISO 10646
* character as follows:
@@ -67,62 +134,56 @@ int wcwidth_ucs(wchar_t ucs)
{
/* sorted list of non-overlapping intervals of non-spacing characters */
/* generated by "uniset +cat=Me +cat=Mn +cat=Cf -00AD +1160-11FF +200B c" */
- static const struct interval {
- wchar_t first;
- wchar_t last;
- } combining[] = {
- { 0x0300, 0x036f }, { 0x0483, 0x0486 }, { 0x0488, 0x0489 },
- { 0x0591, 0x05bd }, { 0x05bf, 0x05bf }, { 0x05c1, 0x05c2 },
- { 0x05c4, 0x05c5 }, { 0x05c7, 0x05c7 }, { 0x0600, 0x0603 },
- { 0x0610, 0x0615 }, { 0x064b, 0x065e }, { 0x0670, 0x0670 },
- { 0x06d6, 0x06e4 }, { 0x06e7, 0x06e8 }, { 0x06ea, 0x06ed },
- { 0x070f, 0x070f }, { 0x0711, 0x0711 }, { 0x0730, 0x074a },
- { 0x07a6, 0x07b0 }, { 0x07eb, 0x07f3 }, { 0x0901, 0x0902 },
- { 0x093c, 0x093c }, { 0x0941, 0x0948 }, { 0x094d, 0x094d },
+ static const struct interval combining[] = {
+ { 0x0300, 0x036F }, { 0x0483, 0x0486 }, { 0x0488, 0x0489 },
+ { 0x0591, 0x05BD }, { 0x05BF, 0x05BF }, { 0x05C1, 0x05C2 },
+ { 0x05C4, 0x05C5 }, { 0x05C7, 0x05C7 }, { 0x0600, 0x0603 },
+ { 0x0610, 0x0615 }, { 0x064B, 0x065E }, { 0x0670, 0x0670 },
+ { 0x06D6, 0x06E4 }, { 0x06E7, 0x06E8 }, { 0x06EA, 0x06ED },
+ { 0x070F, 0x070F }, { 0x0711, 0x0711 }, { 0x0730, 0x074A },
+ { 0x07A6, 0x07B0 }, { 0x07EB, 0x07F3 }, { 0x0901, 0x0902 },
+ { 0x093C, 0x093C }, { 0x0941, 0x0948 }, { 0x094D, 0x094D },
{ 0x0951, 0x0954 }, { 0x0962, 0x0963 }, { 0x0981, 0x0981 },
- { 0x09bc, 0x09bc }, { 0x09c1, 0x09c4 }, { 0x09cd, 0x09cd },
- { 0x09e2, 0x09e3 }, { 0x0a01, 0x0a02 }, { 0x0a3c, 0x0a3c },
- { 0x0a41, 0x0a42 }, { 0x0a47, 0x0a48 }, { 0x0a4b, 0x0a4d },
- { 0x0a70, 0x0a71 }, { 0x0a81, 0x0a82 }, { 0x0abc, 0x0abc },
- { 0x0ac1, 0x0ac5 }, { 0x0ac7, 0x0ac8 }, { 0x0acd, 0x0acd },
- { 0x0ae2, 0x0ae3 }, { 0x0b01, 0x0b01 }, { 0x0b3c, 0x0b3c },
- { 0x0b3f, 0x0b3f }, { 0x0b41, 0x0b43 }, { 0x0b4d, 0x0b4d },
- { 0x0b56, 0x0b56 }, { 0x0b82, 0x0b82 }, { 0x0bc0, 0x0bc0 },
- { 0x0bcd, 0x0bcd }, { 0x0c3e, 0x0c40 }, { 0x0c46, 0x0c48 },
- { 0x0c4a, 0x0c4d }, { 0x0c55, 0x0c56 }, { 0x0cbc, 0x0cbc },
- { 0x0cbf, 0x0cbf }, { 0x0cc6, 0x0cc6 }, { 0x0ccc, 0x0ccd },
- { 0x0ce2, 0x0ce3 }, { 0x0d41, 0x0d43 }, { 0x0d4d, 0x0d4d },
- { 0x0dca, 0x0dca }, { 0x0dd2, 0x0dd4 }, { 0x0dd6, 0x0dd6 },
- { 0x0e31, 0x0e31 }, { 0x0e34, 0x0e3a }, { 0x0e47, 0x0e4e },
- { 0x0eb1, 0x0eb1 }, { 0x0eb4, 0x0eb9 }, { 0x0ebb, 0x0ebc },
- { 0x0ec8, 0x0ecd }, { 0x0f18, 0x0f19 }, { 0x0f35, 0x0f35 },
- { 0x0f37, 0x0f37 }, { 0x0f39, 0x0f39 }, { 0x0f71, 0x0f7e },
- { 0x0f80, 0x0f84 }, { 0x0f86, 0x0f87 }, { 0x0f90, 0x0f97 },
- { 0x0f99, 0x0fbc }, { 0x0fc6, 0x0fc6 }, { 0x102d, 0x1030 },
+ { 0x09BC, 0x09BC }, { 0x09C1, 0x09C4 }, { 0x09CD, 0x09CD },
+ { 0x09E2, 0x09E3 }, { 0x0A01, 0x0A02 }, { 0x0A3C, 0x0A3C },
+ { 0x0A41, 0x0A42 }, { 0x0A47, 0x0A48 }, { 0x0A4B, 0x0A4D },
+ { 0x0A70, 0x0A71 }, { 0x0A81, 0x0A82 }, { 0x0ABC, 0x0ABC },
+ { 0x0AC1, 0x0AC5 }, { 0x0AC7, 0x0AC8 }, { 0x0ACD, 0x0ACD },
+ { 0x0AE2, 0x0AE3 }, { 0x0B01, 0x0B01 }, { 0x0B3C, 0x0B3C },
+ { 0x0B3F, 0x0B3F }, { 0x0B41, 0x0B43 }, { 0x0B4D, 0x0B4D },
+ { 0x0B56, 0x0B56 }, { 0x0B82, 0x0B82 }, { 0x0BC0, 0x0BC0 },
+ { 0x0BCD, 0x0BCD }, { 0x0C3E, 0x0C40 }, { 0x0C46, 0x0C48 },
+ { 0x0C4A, 0x0C4D }, { 0x0C55, 0x0C56 }, { 0x0CBC, 0x0CBC },
+ { 0x0CBF, 0x0CBF }, { 0x0CC6, 0x0CC6 }, { 0x0CCC, 0x0CCD },
+ { 0x0CE2, 0x0CE3 }, { 0x0D41, 0x0D43 }, { 0x0D4D, 0x0D4D },
+ { 0x0DCA, 0x0DCA }, { 0x0DD2, 0x0DD4 }, { 0x0DD6, 0x0DD6 },
+ { 0x0E31, 0x0E31 }, { 0x0E34, 0x0E3A }, { 0x0E47, 0x0E4E },
+ { 0x0EB1, 0x0EB1 }, { 0x0EB4, 0x0EB9 }, { 0x0EBB, 0x0EBC },
+ { 0x0EC8, 0x0ECD }, { 0x0F18, 0x0F19 }, { 0x0F35, 0x0F35 },
+ { 0x0F37, 0x0F37 }, { 0x0F39, 0x0F39 }, { 0x0F71, 0x0F7E },
+ { 0x0F80, 0x0F84 }, { 0x0F86, 0x0F87 }, { 0x0F90, 0x0F97 },
+ { 0x0F99, 0x0FBC }, { 0x0FC6, 0x0FC6 }, { 0x102D, 0x1030 },
{ 0x1032, 0x1032 }, { 0x1036, 0x1037 }, { 0x1039, 0x1039 },
- { 0x1058, 0x1059 }, { 0x1160, 0x11ff }, { 0x135f, 0x135f },
+ { 0x1058, 0x1059 }, { 0x1160, 0x11FF }, { 0x135F, 0x135F },
{ 0x1712, 0x1714 }, { 0x1732, 0x1734 }, { 0x1752, 0x1753 },
- { 0x1772, 0x1773 }, { 0x17b4, 0x17b5 }, { 0x17b7, 0x17bd },
- { 0x17c6, 0x17c6 }, { 0x17c9, 0x17d3 }, { 0x17dd, 0x17dd },
- { 0x180b, 0x180d }, { 0x18a9, 0x18a9 }, { 0x1920, 0x1922 },
- { 0x1927, 0x1928 }, { 0x1932, 0x1932 }, { 0x1939, 0x193b },
- { 0x1a17, 0x1a18 }, { 0x1b00, 0x1b03 }, { 0x1b34, 0x1b34 },
- { 0x1b36, 0x1b3a }, { 0x1b3c, 0x1b3c }, { 0x1b42, 0x1b42 },
- { 0x1b6b, 0x1b73 }, { 0x1dc0, 0x1dca }, { 0x1dfe, 0x1dff },
- { 0x200b, 0x200f }, { 0x202a, 0x202e }, { 0x2060, 0x2063 },
- { 0x206a, 0x206f }, { 0x20d0, 0x20ef }, { 0x302a, 0x302f },
- { 0x3099, 0x309a }, { 0xa806, 0xa806 }, { 0xa80b, 0xa80b },
- { 0xa825, 0xa826 }, { 0xfb1e, 0xfb1e }, { 0xfe00, 0xfe0f },
- { 0xfe20, 0xfe23 }, { 0xfeff, 0xfeff }, { 0xfff9, 0xfffb },
- { 0x10a01, 0x10a03 }, { 0x10a05, 0x10a06 }, { 0x10a0c, 0x10a0f },
- { 0x10a38, 0x10a3a }, { 0x10a3f, 0x10a3f }, { 0x1d167, 0x1d169 },
- { 0x1d173, 0x1d182 }, { 0x1d185, 0x1d18b }, { 0x1d1aa, 0x1d1ad },
- { 0x1d242, 0x1d244 }, { 0xe0001, 0xe0001 }, { 0xe0020, 0xe007f },
- { 0xe0100, 0xe01ef }
+ { 0x1772, 0x1773 }, { 0x17B4, 0x17B5 }, { 0x17B7, 0x17BD },
+ { 0x17C6, 0x17C6 }, { 0x17C9, 0x17D3 }, { 0x17DD, 0x17DD },
+ { 0x180B, 0x180D }, { 0x18A9, 0x18A9 }, { 0x1920, 0x1922 },
+ { 0x1927, 0x1928 }, { 0x1932, 0x1932 }, { 0x1939, 0x193B },
+ { 0x1A17, 0x1A18 }, { 0x1B00, 0x1B03 }, { 0x1B34, 0x1B34 },
+ { 0x1B36, 0x1B3A }, { 0x1B3C, 0x1B3C }, { 0x1B42, 0x1B42 },
+ { 0x1B6B, 0x1B73 }, { 0x1DC0, 0x1DCA }, { 0x1DFE, 0x1DFF },
+ { 0x200B, 0x200F }, { 0x202A, 0x202E }, { 0x2060, 0x2063 },
+ { 0x206A, 0x206F }, { 0x20D0, 0x20EF }, { 0x302A, 0x302F },
+ { 0x3099, 0x309A }, { 0xA806, 0xA806 }, { 0xA80B, 0xA80B },
+ { 0xA825, 0xA826 }, { 0xFB1E, 0xFB1E }, { 0xFE00, 0xFE0F },
+ { 0xFE20, 0xFE23 }, { 0xFEFF, 0xFEFF }, { 0xFFF9, 0xFFFB },
+ { 0x10A01, 0x10A03 }, { 0x10A05, 0x10A06 }, { 0x10A0C, 0x10A0F },
+ { 0x10A38, 0x10A3A }, { 0x10A3F, 0x10A3F }, { 0x1D167, 0x1D169 },
+ { 0x1D173, 0x1D182 }, { 0x1D185, 0x1D18B }, { 0x1D1AA, 0x1D1AD },
+ { 0x1D242, 0x1D244 }, { 0xE0001, 0xE0001 }, { 0xE0020, 0xE007F },
+ { 0xE0100, 0xE01EF }
};
- int min = 0;
- int max = sizeof(combining) / sizeof(struct interval) - 1;
- int mid;
/* test for 8-bit control characters */
if (ucs == 0)
@@ -130,20 +191,10 @@ int wcwidth_ucs(wchar_t ucs)
if (ucs < 32 || (ucs >= 0x7f && ucs < 0xa0))
return -1;
- /* first quick check for Latin-1 etc. characters */
- if (ucs < combining[0].first)
- return 1;
-
/* binary search in table of non-spacing characters */
- while (max >= min) {
- mid = (min + max) / 2;
- if (combining[mid].last < ucs)
- min = mid + 1;
- else if (combining[mid].first > ucs)
- max = mid - 1;
- else if (combining[mid].first <= ucs && combining[mid].last >= ucs)
- return 0;
- }
+ if (bisearch(ucs, combining,
+ sizeof(combining) / sizeof(struct interval) - 1))
+ return 0;
/* if we arrive here, ucs is not a combining or C0/C1 control character */
@@ -151,7 +202,7 @@ int wcwidth_ucs(wchar_t ucs)
if (ucs < 0x1100)
return 1;
- return 1 +
+ return 1 +
(ucs >= 0x1100 &&
(ucs <= 0x115f || /* Hangul Jamo init. consonants */
ucs == 0x2329 || ucs == 0x232a ||
@@ -167,15 +218,120 @@ int wcwidth_ucs(wchar_t ucs)
(ucs >= 0x30000 && ucs <= 0x3fffd)));
}
-#endif /* !HAVE_WC_FUNCS */
+#if 0 /* original */
+int wcswidth_ucs(const wchar_t *pwcs, size_t n)
+{
+ int w, width = 0;
+
+ for (;*pwcs && n-- > 0; pwcs++)
+ if ((w = wcwidth_ucs(*pwcs)) < 0)
+ return -1;
+ else
+ width += w;
+
+ return width;
+}
+#endif
+
+/*
+ * The following functions are the same as wcwidth_ucs() and
+ * wcswidth_ucs(), except that spacing characters in the East Asian
+ * Ambiguous (A) category as defined in Unicode Technical Report #11
+ * have a column width of 2. This variant might be useful for users of
+ * CJK legacy encodings who want to migrate to UCS without changing
+ * the traditional terminal character-width behaviour. It is not
+ * otherwise recommended for general use.
+ */
+/*
+ * In addition to the explanation mentioned above,
+ * several characters in the East Asian Narrow (Na) and Not East Asian
+ * (Neutral) category as defined in Unicode Technical Report #11
+ * actually have a column width of 2 in CJK legacy encodings.
+ */
+int wcwidth_cjk(wchar_t ucs)
+{
+ /* sorted list of non-overlapping intervals of East Asian Ambiguous
+ * characters, generated by "uniset +WIDTH-A -cat=Me -cat=Mn -cat=Cf c" */
+ static const struct interval ambiguous[] = {
+ { 0x00A1, 0x00A1 }, { 0x00A4, 0x00A4 }, { 0x00A7, 0x00A8 },
+ { 0x00AA, 0x00AA }, { 0x00AE, 0x00AE }, { 0x00B0, 0x00B4 },
+ { 0x00B6, 0x00BA }, { 0x00BC, 0x00BF }, { 0x00C6, 0x00C6 },
+ { 0x00D0, 0x00D0 }, { 0x00D7, 0x00D8 }, { 0x00DE, 0x00E1 },
+ { 0x00E6, 0x00E6 }, { 0x00E8, 0x00EA }, { 0x00EC, 0x00ED },
+ { 0x00F0, 0x00F0 }, { 0x00F2, 0x00F3 }, { 0x00F7, 0x00FA },
+ { 0x00FC, 0x00FC }, { 0x00FE, 0x00FE }, { 0x0101, 0x0101 },
+ { 0x0111, 0x0111 }, { 0x0113, 0x0113 }, { 0x011B, 0x011B },
+ { 0x0126, 0x0127 }, { 0x012B, 0x012B }, { 0x0131, 0x0133 },
+ { 0x0138, 0x0138 }, { 0x013F, 0x0142 }, { 0x0144, 0x0144 },
+ { 0x0148, 0x014B }, { 0x014D, 0x014D }, { 0x0152, 0x0153 },
+ { 0x0166, 0x0167 }, { 0x016B, 0x016B }, { 0x01CE, 0x01CE },
+ { 0x01D0, 0x01D0 }, { 0x01D2, 0x01D2 }, { 0x01D4, 0x01D4 },
+ { 0x01D6, 0x01D6 }, { 0x01D8, 0x01D8 }, { 0x01DA, 0x01DA },
+ { 0x01DC, 0x01DC }, { 0x0251, 0x0251 }, { 0x0261, 0x0261 },
+ { 0x02C4, 0x02C4 }, { 0x02C7, 0x02C7 }, { 0x02C9, 0x02CB },
+ { 0x02CD, 0x02CD }, { 0x02D0, 0x02D0 }, { 0x02D8, 0x02DB },
+ { 0x02DD, 0x02DD }, { 0x02DF, 0x02DF }, { 0x0391, 0x03A1 },
+ { 0x03A3, 0x03A9 }, { 0x03B1, 0x03C1 }, { 0x03C3, 0x03C9 },
+ { 0x0401, 0x0401 }, { 0x0410, 0x044F }, { 0x0451, 0x0451 },
+ { 0x2010, 0x2010 }, { 0x2013, 0x2016 }, { 0x2018, 0x2019 },
+ { 0x201C, 0x201D }, { 0x2020, 0x2022 }, { 0x2024, 0x2027 },
+ { 0x2030, 0x2030 }, { 0x2032, 0x2033 }, { 0x2035, 0x2035 },
+ { 0x203B, 0x203B }, { 0x203E, 0x203E }, { 0x2074, 0x2074 },
+ { 0x207F, 0x207F }, { 0x2081, 0x2084 }, { 0x20AC, 0x20AC },
+ { 0x2103, 0x2103 }, { 0x2105, 0x2105 }, { 0x2109, 0x2109 },
+ { 0x2113, 0x2113 }, { 0x2116, 0x2116 }, { 0x2121, 0x2122 },
+ { 0x2126, 0x2126 }, { 0x212B, 0x212B }, { 0x2153, 0x2154 },
+ { 0x215B, 0x215E }, { 0x2160, 0x216B }, { 0x2170, 0x2179 },
+ { 0x2190, 0x2199 }, { 0x21B8, 0x21B9 }, { 0x21D2, 0x21D2 },
+ { 0x21D4, 0x21D4 }, { 0x21E7, 0x21E7 }, { 0x2200, 0x2200 },
+ { 0x2202, 0x2203 }, { 0x2207, 0x2208 }, { 0x220B, 0x220B },
+ { 0x220F, 0x220F }, { 0x2211, 0x2211 }, { 0x2215, 0x2215 },
+ { 0x221A, 0x221A }, { 0x221D, 0x2220 }, { 0x2223, 0x2223 },
+ { 0x2225, 0x2225 }, { 0x2227, 0x222C }, { 0x222E, 0x222E },
+ { 0x2234, 0x2237 }, { 0x223C, 0x223D }, { 0x2248, 0x2248 },
+ { 0x224C, 0x224C }, { 0x2252, 0x2252 }, { 0x2260, 0x2261 },
+ { 0x2264, 0x2267 }, { 0x226A, 0x226B }, { 0x226E, 0x226F },
+ { 0x2282, 0x2283 }, { 0x2286, 0x2287 }, { 0x2295, 0x2295 },
+ { 0x2299, 0x2299 }, { 0x22A5, 0x22A5 }, { 0x22BF, 0x22BF },
+ { 0x2312, 0x2312 }, { 0x2460, 0x24E9 }, { 0x24EB, 0x254B },
+ { 0x2550, 0x2573 }, { 0x2580, 0x258F }, { 0x2592, 0x2595 },
+ { 0x25A0, 0x25A1 }, { 0x25A3, 0x25A9 }, { 0x25B2, 0x25B3 },
+ { 0x25B6, 0x25B7 }, { 0x25BC, 0x25BD }, { 0x25C0, 0x25C1 },
+ { 0x25C6, 0x25C8 }, { 0x25CB, 0x25CB }, { 0x25CE, 0x25D1 },
+ { 0x25E2, 0x25E5 }, { 0x25EF, 0x25EF }, { 0x2605, 0x2606 },
+ { 0x2609, 0x2609 }, { 0x260E, 0x260F }, { 0x2614, 0x2615 },
+ { 0x261C, 0x261C }, { 0x261E, 0x261E }, { 0x2640, 0x2640 },
+ { 0x2642, 0x2642 }, { 0x2660, 0x2661 }, { 0x2663, 0x2665 },
+ { 0x2667, 0x266A }, { 0x266C, 0x266D }, { 0x266F, 0x266F },
+ { 0x273D, 0x273D }, { 0x2776, 0x277F }, { 0xE000, 0xF8FF },
+ { 0xFFFD, 0xFFFD }, { 0xF0000, 0xFFFFD }, { 0x100000, 0x10FFFD }
+ };
+
+ /* For Japanese legacy encodings, the following characters are added. */
+ static const struct interval legacy_ja[] = {
+ { 0x00A2, 0x00A3 }, { 0x00A5, 0x00A6 }, { 0x00AC, 0x00AC },
+ { 0x00AF, 0x00AF }, { 0x2212, 0x2212 }
+ };
+
+ /* binary search in table of non-spacing characters */
+ if (bisearch(ucs, ambiguous,
+ sizeof(ambiguous) / sizeof(struct interval) - 1))
+ return 2;
+ if (bisearch(ucs, legacy_ja,
+ sizeof(legacy_ja) / sizeof(struct interval) - 1))
+ return 2;
+
+ return wcwidth_ucs(ucs);
+}
+
#if 0 /* original */
-int wcswidth(const wchar_t *pwcs, size_t n)
+int wcswidth_cjk(const wchar_t *pwcs, size_t n)
{
int w, width = 0;
for (;*pwcs && n-- > 0; pwcs++)
- if ((w = wcwidth(*pwcs)) < 0)
+ if ((w = wcwidth_cjk(*pwcs)) < 0)
return -1;
else
width += w;
@@ -183,3 +339,4 @@ int wcswidth(const wchar_t *pwcs, size_t n)
return width;
}
#endif
+#endif /* !HAVE_WC_FUNCS || USE_CJK_WIDTH */
diff --git a/.gitignore b/.gitignore
new file mode 100644
index 0000000..e3e6a1d
--- /dev/null
+++ b/.gitignore
@@ -0,0 +1,90 @@
+# .gitignore is rewrited from .hgignore.
+
+# autoconf products
+aclocal.m4
+autom4te.cache/
+compile
+Makefile.in
+contrib/Makefile.in
+doc/Makefile.in
+imap/Makefile.in
+m4/Makefile.in
+po/Makefile.in
+config.h
+config.h.in
+config.h.in~
+config.log
+config.status
+#configure
+hcachever.sh
+muttbug.sh
+stamp-h1
+doc/Muttrc
+doc/instdoc.sh
+po/POTFILES
+config.guess
+config.sub
+depcomp
+install-sh
+missing
+mkinstalldirs
+
+# built objects
+flea
+hcversion.h
+keymap_alldefs.h
+keymap_defs.h
+doc/makedoc
+mutt
+mutt_dotlock
+mutt_dotlock.c
+mutt_md5
+patchlist.c
+conststrings.c
+pgpewrap
+pgpring
+reldate.h
+smime_keys
+txt2c
+stamp-doc-rc
+doc/instdoc
+doc/manual.txt
+doc/manual.xml
+doc/manual.aux
+doc/manual.log
+doc/manual.out
+doc/manual.tex
+doc/manual.pdf
+doc/mutt.1
+doc/muttrc.man
+doc/pgpewrap.1
+doc/pgpring.1
+doc/*.html
+doc/stamp-*
+doc/smime_keys.1
+po/mutt.pot
+
+# xcode droppings
+build/
+.xcodeproj/
+
+# eclipse
+.cproject
+.object
+.settings/
+
+.deps
+Makefile
+GPATH
+GRTAGS
+GTAGS
+TAGS
+cscope.*
+*.swp
+*.o
+*.gmo
+*.orig
+*.rej
+*.a
+.gdb_history
+*~
diff --git a/PATCHES b/PATCHES
index e69de29..17743fd 100644
--- a/PATCHES
+++ b/PATCHES
@@ -0,0 +1,5 @@
+patch-1.5.23.tt+yy.delete_prefix.1
+patch-1.5.23.tt.create_rfc2047_params.1
+patch-1.5.23.tt.sanitize_ja.1
+patch-1.5.23.tt.cjk_width_tree_chars.1
+patch-1.5.23.tt.wcwidth.1
diff --git a/charset.c b/charset.c
index 2411f2c..6a5cbd4 100644
--- a/charset.c
+++ b/charset.c
@@ -481,6 +481,9 @@ int mutt_convert_string (char **ps, const char *from, const char *to, int flags)
if (!s || !*s)
return 0;
+ if (option (OPTSANITIZEJACHARS) && !ascii_strncasecmp (from, "iso-2022-jp", 11))
+ mutt_sanitize_ja_chars (s, mutt_strlen(s), 0);
+
if (to && from && (cd = mutt_iconv_open (to, from, flags)) != (iconv_t)-1)
{
int len;
@@ -677,3 +680,188 @@ int mutt_check_charset (const char *s, int strict)
return -1;
}
+
+/*
+ * mutt_sanitize_ja_chars()
+ * Adapted by TAKIZAWA Takashi <taki@cyber.email.ne.jp>
+ *
+ * - It replaces undefined KANJI characters to GETA mark.
+ * - It replaces character of 'JIS X 0201 kana' to '?'.
+ * - If $charset is EUC-JP, it replaces third character 'J' of
+ * escape sequence switching to 'JIS X 0201 latin' to 'B' indicating
+ * 'US-ASCII'.
+ * - If $charset is Shift_JIS, it replaces third character 'B' of
+ * escape sequence switching to 'US-ASCII' to 'J' indicating
+ * 'JIS X 0201 latin'.
+ */
+
+#define ASCII 0
+#define JISX0201LATIN 1
+#define JISX0201KANA 2
+#define JISX0208 3
+#define OTHER_CS 4
+
+void mutt_sanitize_ja_chars(char *s, size_t len, int keep_state)
+{
+ static int cs = ASCII;
+ static int kanji_cont = 0;
+ static int illegal_kanji = 0;
+ static int es = 0;
+ static char pes = '\0';
+ static char ascii_3rd_char = 'B';
+ static char jisx0201_3rd_char = 'J';
+
+ char *p = s;
+ char *p1 = NULL;
+ unsigned char c;
+
+ if (!keep_state || *p == 0x1b) /* consideration about mbstate's buffer */
+ {
+ if (!ascii_strcasecmp (Charset, "euc-jp"))
+ jisx0201_3rd_char = 'B';
+ else if (!ascii_strcasecmp (Charset, "shift_jis"))
+ ascii_3rd_char = 'J';
+ cs = ASCII;
+ kanji_cont = 0;
+ illegal_kanji = 0;
+ es = 0;
+ pes = '\0';
+ }
+
+ for (;p - s < len;p++)
+ {
+ if (es == 0)
+ {
+ if (*p == 0x1b)
+ es++;
+ else
+ {
+ switch (cs)
+ {
+ case ASCII:
+ case JISX0201LATIN:
+ break;
+ case JISX0201KANA:
+ *p = '?';
+ break;
+ case JISX0208:
+ /* replace ku-ten code from 9 to 15 and 85 or more to "GETA MARK" */
+ c = (unsigned char)*p;
+ if (! kanji_cont)
+ {
+ if ((size_t)(p - s + 1) == len)
+ return; /* the last character is a primary byte of KANJI */
+ if (c <= 0x20 || (c >= 0x29 && c <= 0x2f)
+ || (c >= 0x75 && c <= 0xa0))
+ illegal_kanji = 1;
+ kanji_cont = 1;
+ p1 = p;
+ }
+ else
+ {
+ if (c <= 0x20 || c >= 0x7f)
+ illegal_kanji = 1;
+ if (illegal_kanji && p1)
+ *p1 = 0x22, *p = 0x2e;
+ kanji_cont = 0;
+ illegal_kanji = 0;
+ }
+ break;
+ }
+ }
+ }
+ else if (es == 1)
+ {
+ if (*p == '$' || (*p >= '(' && *p <= '/' && *p != ','))
+ {
+ es++;
+ pes = *p;
+ }
+ else
+ {
+ es = 0;
+ return; /* broken */
+ }
+ }
+ else if (es == 2)
+ {
+ if (pes == '(')
+ {
+ switch (*p)
+ {
+ case 'B':
+ cs = ASCII, *p = ascii_3rd_char;
+ break;
+ case 'J':
+ cs = JISX0201LATIN, *p = jisx0201_3rd_char;
+ break;
+ case 'I':
+ /* ready to replace character to '?' */
+ cs = JISX0201KANA, *p = ascii_3rd_char;
+ break;
+ default:
+ cs = OTHER_CS;
+ }
+ es = 0;
+ }
+ else if (pes == '$')
+ {
+ switch (*p)
+ {
+ case '@': /* JIS X 0208-1978 */
+ case 'B': /* JIS X 0208-1983 */
+ cs = JISX0208;
+ es = 0;
+ break;
+ case 'A':
+ cs = OTHER_CS; /* GB 2312 */
+ es = 0;
+ break;
+ case '(':
+ case ')':
+ case '*':
+ case '+':
+ case '-':
+ case '.':
+ case '/':
+ es++;
+ break;
+ default:
+ es = 0;
+ return; /* broken */
+ }
+ }
+ else
+ {
+ cs = OTHER_CS;
+ es = 0;
+ }
+ }
+ else /* es == 3 */
+ {
+ cs = OTHER_CS;
+ es = 0;
+ }
+ }
+}
+
+int mutt_copy_bytes_sanitize_ja (FILE *in, FILE *out, size_t size)
+{
+ char buf[2048];
+ size_t chunk;
+
+ mutt_sanitize_ja_chars (NULL, 0, 0);
+ while (size > 0)
+ {
+ chunk = (size > sizeof (buf)) ? sizeof (buf) : size;
+ if ((chunk = fread (buf, 1, chunk, in)) < 1)
+ break;
+ mutt_sanitize_ja_chars (buf, chunk, 1);
+ if (fwrite (buf, 1, chunk, out) != chunk)
+ return (-1);
+ size -= chunk;
+ }
+
+ return 0;
+}
+
diff --git a/charset.h b/charset.h
index 54891f0..d67b209 100644
--- a/charset.h
+++ b/charset.h
@@ -36,6 +36,9 @@ int iconv_close (iconv_t);
int mutt_convert_string (char **, const char *, const char *, int);
+void mutt_sanitize_ja_chars (char *, size_t, int);
+int mutt_copy_bytes_sanitize_ja (FILE *, FILE *, size_t);
+
iconv_t mutt_iconv_open (const char *, const char *, int);
size_t mutt_iconv (iconv_t, ICONV_CONST char **, size_t *, char **, size_t *, ICONV_CONST char **, const char *);
diff --git a/configure b/configure
index 3e406dc..f9e0db3 100755
--- a/configure
+++ b/configure
@@ -828,6 +828,7 @@ with_libiconv_prefix
enable_nls
with_included_gettext
with_idn
+enable_cjk_ambiguous_width
with_wc_funcs
enable_doc
enable_full_doc
@@ -1498,6 +1499,9 @@ Optional Features:
--enable-hcache Enable header caching
--disable-iconv Disable iconv support
--disable-nls Do not use Native Language Support
+ --enable-cjk-ambiguous-width
+ Enable East Asian Ambiguous characters support
+ (using own wcwidth)
--disable-doc Do not build the documentation
--disable-full-doc Omit disabled variables
@@ -12836,6 +12840,21 @@ fi
fi
+# Check whether --enable-cjk-ambiguous-width was given.
+if test "${enable_cjk_ambiguous_width+set}" = set; then :
+ enableval=$enable_cjk_ambiguous_width; if test "x$enableval" = "xyes" ; then
+ cjk_width=yes
+ fi
+
+fi
+
+if test "x$cjk_width" = "xyes" ; then
+
+$as_echo "#define USE_CJK_WIDTH 1" >>confdefs.h
+
+ MUTT_LIB_OBJECTS="$MUTT_LIB_OBJECTS wcwidth.o"
+fi
+
for ac_header in wchar.h
do :
ac_fn_c_check_header_mongrel "$LINENO" "wchar.h" "ac_cv_header_wchar_h" "$ac_includes_default"
@@ -13064,7 +13083,10 @@ if test $wc_funcs = yes; then
$as_echo "#define HAVE_WC_FUNCS 1" >>confdefs.h
else
- MUTT_LIB_OBJECTS="$MUTT_LIB_OBJECTS utf8.o wcwidth.o"
+ MUTT_LIB_OBJECTS="$MUTT_LIB_OBJECTS utf8.o"
+ if test "x$cjk_width" != "xyes"; then
+ MUTT_LIB_OBJECTS="$MUTT_LIB_OBJECTS wcwidth.o"
+ fi
fi
{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for nl_langinfo and CODESET" >&5
diff --git a/configure.ac b/configure.ac
index a9c3206..64f7239 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1327,6 +1327,16 @@ fi
dnl -- locales --
+AC_ARG_ENABLE(cjk-ambiguous-width, AC_HELP_STRING([--enable-cjk-ambiguous-width], [ Enable East Asian Ambiguous characters support (using own wcwidth)]),
+ [ if test "x$enableval" = "xyes" ; then
+ cjk_width=yes
+ fi
+ ])
+if test "x$cjk_width" = "xyes" ; then
+ AC_DEFINE(USE_CJK_WIDTH,1,[ Define if you want to support East Asian Ambiguous class. ])
+ MUTT_LIB_OBJECTS="$MUTT_LIB_OBJECTS wcwidth.o"
+fi
+
AC_CHECK_HEADERS(wchar.h)
AC_CACHE_CHECK([for wchar_t], mutt_cv_wchar_t,
@@ -1397,7 +1407,10 @@ fi
if test $wc_funcs = yes; then
AC_DEFINE(HAVE_WC_FUNCS,1,[ Define if you are using the system's wchar_t functions. ])
else
- MUTT_LIB_OBJECTS="$MUTT_LIB_OBJECTS utf8.o wcwidth.o"
+ MUTT_LIB_OBJECTS="$MUTT_LIB_OBJECTS utf8.o"
+ if test "x$cjk_width" != "xyes"; then
+ MUTT_LIB_OBJECTS="$MUTT_LIB_OBJECTS wcwidth.o"
+ fi
fi
AC_CACHE_CHECK([for nl_langinfo and CODESET], mutt_cv_langinfo_codeset,
diff --git a/curs_lib.c b/curs_lib.c
index 1a2bb9e..07357ed 100644
--- a/curs_lib.c
+++ b/curs_lib.c
@@ -1083,7 +1083,14 @@ void mutt_format_string (char *dest, size_t destlen,
wc = replacement_char ();
}
if (arboreal && wc < MUTT_TREE_MAX)
- w = 1; /* hack */
+ {
+#ifdef USE_CJK_WIDTH
+ if (Charset_is_utf8 && option (OPTCJKWIDTHTREECHARS) && !option (OPTASCIICHARS))
+ w = wcwidth (TreeChars[wc]);
+ else
+#endif
+ w = 1;
+ }
else
{
#ifdef HAVE_ISWBLANK
@@ -1312,10 +1319,12 @@ int mutt_strwidth (const char *s)
int w;
size_t k, n;
mbstate_t mbstate;
+ int arboreal;
if (!s) return 0;
n = mutt_strlen (s);
+ arboreal = (s[0] < MUTT_TREE_MAX) ? 1 : 0;
memset (&mbstate, 0, sizeof (mbstate));
for (w=0; n && (k = mbrtowc (&wc, s, n, &mbstate)); s += k, n -= k)
@@ -1327,9 +1336,21 @@ int mutt_strwidth (const char *s)
k = (k == (size_t)(-1)) ? 1 : n;
wc = replacement_char ();
}
- if (!IsWPrint (wc))
- wc = '?';
- w += wcwidth (wc);
+ if (wc < MUTT_TREE_MAX && arboreal && k == 1)
+ {
+#ifdef USE_CJK_WIDTH
+ if (Charset_is_utf8 && option (OPTCJKWIDTHTREECHARS) && !option (OPTASCIICHARS))
+ w += wcwidth (TreeChars[wc]);
+ else
+#endif
+ w++;
+ }
+ else
+ {
+ if (!IsWPrint (wc))
+ wc = '?';
+ w += wcwidth (wc);
+ }
}
return w;
}
diff --git a/doc/makedoc-defs.h b/doc/makedoc-defs.h
index 2da29f4..10d8b9f 100644
--- a/doc/makedoc-defs.h
+++ b/doc/makedoc-defs.h
@@ -31,10 +31,10 @@
# ifndef USE_SOCKET
# define USE_SOCKET
# endif
-# ifndef USE_DOTLOCK
+# if !defined(USE_DOTLOCK) && !defined(NO_USE_HARDLINK)
# define USE_DOTLOCK
# endif
-# ifndef DL_STANDALONE
+# if !defined(DL_STANDALONE) && !defined(NO_USE_HARDLINK)
# define DL_STANDALONE
# endif
# ifndef USE_HCACHE
diff --git a/dotlock.c b/dotlock.c
index 5bf0348..5d87850 100644
--- a/dotlock.c
+++ b/dotlock.c
@@ -52,13 +52,13 @@
#include <getopt.h>
#endif
-#ifdef DL_STANDALONE
+#if defined(DL_STANDALONE) && !defined(NO_USE_HARDLINK)
# include "reldate.h"
#endif
#define MAXLINKS 1024 /* maximum link depth */
-#ifdef DL_STANDALONE
+#if defined(DL_STANDALONE) && !defined(NO_USE_HARDLINK)
# define LONG_STRING 1024
# define MAXLOCKATTEMPT 5
@@ -96,7 +96,7 @@ extern int snprintf (char *, size_t, const char *, ...);
static int DotlockFlags;
static int Retry = MAXLOCKATTEMPT;
-#ifdef DL_STANDALONE
+#if defined(DL_STANDALONE) && !defined(NO_USE_HARDLINK)
static char *Hostname;
#endif
@@ -110,7 +110,7 @@ static int dotlock_prepare (char *, size_t, const char *, int fd);
static int dotlock_check_stats (struct stat *, struct stat *);
static int dotlock_dispatch (const char *, int fd);
-#ifdef DL_STANDALONE
+#if defined(DL_STANDALONE) && !defined(NO_USE_HARDLINK)
static int dotlock_init_privs (void);
static void usage (const char *);
#endif
@@ -130,7 +130,7 @@ static int dotlock_unlink (const char *);
static int dotlock_lock (const char *);
-#ifdef DL_STANDALONE
+#if defined(DL_STANDALONE) && !defined(NO_USE_HARDLINK)
#define check_flags(a) if (a & DL_FL_ACTIONS) usage (argv[0])
@@ -327,7 +327,7 @@ END_PRIVILEGED (void)
#endif
}
-#ifdef DL_STANDALONE
+#if defined(DL_STANDALONE) && !defined(NO_USE_HARDLINK)
/*
* Usage information.
diff --git a/globals.h b/globals.h
index 9634691..4437372 100644
--- a/globals.h
+++ b/globals.h
@@ -24,7 +24,7 @@ WHERE CONTEXT *Context;
WHERE char Errorbuf[STRING];
WHERE char AttachmentMarker[STRING];
-#if defined(DL_STANDALONE) && defined(USE_DOTLOCK)
+#if defined(DL_STANDALONE) && defined(USE_DOTLOCK) && !defined(NO_USE_HARDLINK)
WHERE char *MuttDotlock;
#endif
@@ -298,9 +298,31 @@ const char * const Months[] = { "Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul",
const char * const BodyTypes[] = { "x-unknown", "audio", "application", "image", "message", "model", "multipart", "text", "video" };
const char * const BodyEncodings[] = { "x-unknown", "7bit", "8bit", "quoted-printable", "base64", "binary", "x-uuencoded" };
+#ifdef USE_CJK_WIDTH
+const wchar_t TreeChars[] =
+{
+ 0xFEFF, /* not used */
+ 0x2514, /* M_TREE_LLCORNER WACS_LLCORNER */
+ 0x250C, /* M_TREE_ULCORNER WACS_ULCORNER */
+ 0x251C, /* M_TREE_LTEE WACS_LTEE */
+ 0x2500, /* M_TREE_HLINE WACS_HLINE */
+ 0x2502, /* M_TREE_VLINE WACS_VLINE */
+ 0x0020, /* M_TREE_SPACE */
+ 0x003E, /* M_TREE_RARROW */
+ 0x002A, /* M_TREE_STAR fake thread indicator */
+ 0x0026, /* M_TREE_HIDDEN */
+ 0x003D, /* M_TREE_EQUALS */
+ 0x252C, /* M_TREE_TTEE WACS_TTEE */
+ 0x2534, /* M_TREE_BTEE WACS_BTEE */
+ 0x003F /* M_TREE_MISSING */
+};
+#endif /* USE_CJK_WIDTH */
#else
extern const char * const Weekdays[];
extern const char * const Months[];
+#ifdef USE_CJK_WIDTH
+extern const wchar_t TreeChars[];
+#endif /* USE_CJK_WIDTH */
#endif
#ifdef MAIN_C
diff --git a/handler.c b/handler.c
index 7ce53f9..ab69527 100644
--- a/handler.c
+++ b/handler.c
@@ -100,6 +100,9 @@ static void mutt_convert_to_state(iconv_t cd, char *bufi, size_t *l, STATE *s)
return;
}
+ if (option (OPTSANITIZEJACHARS) && strchr (bufi, 0x1b))
+ mutt_sanitize_ja_chars (bufi, *l, 1);
+
ib = bufi, ibl = *l;
for (;;)
{
@@ -1312,6 +1315,7 @@ static int autoview_handler (BODY *a, STATE *s)
int piped = FALSE;
pid_t thepid;
int rc = 0;
+ char *charset;
snprintf (type, sizeof (type), "%s/%s", TYPE (a), a->subtype);
rfc1524_mailcap_lookup (a, type, entry, MUTT_AUTOVIEW);
@@ -1342,6 +1346,10 @@ static int autoview_handler (BODY *a, STATE *s)
return -1;
}
+ charset = mutt_get_parameter ("charset", a->parameter);
+ if (charset && option (OPTSANITIZEJACHARS) && !ascii_strncasecmp (charset,"iso-2022-jp", 11))
+ mutt_copy_bytes_sanitize_ja (s->fpin, fpin, a->length);
+ else
mutt_copy_bytes (s->fpin, fpin, a->length);
if(!piped)
diff --git a/hdrline.c b/hdrline.c
index ba118bf..2e6a10b 100644
--- a/hdrline.c
+++ b/hdrline.c
@@ -272,6 +272,7 @@ hdr_format_str (char *dest,
#define THREAD_NEW (threads && hdr->collapsed && hdr->num_hidden > 1 && mutt_thread_contains_unread (ctx, hdr) == 1)
#define THREAD_OLD (threads && hdr->collapsed && hdr->num_hidden > 1 && mutt_thread_contains_unread (ctx, hdr) == 2)
size_t len;
+ char *subj;
hdr = hfi->hdr;
ctx = hfi->ctx;
@@ -590,6 +591,7 @@ hdr_format_str (char *dest,
subj = apply_subject_mods(hdr->env);
else
subj = hdr->env->subject;
+ subj = option (OPTDELETEPREFIX) ? hdr->env->real_subj : hdr->env->subject;
if (flags & MUTT_FORMAT_TREE && !hdr->collapsed)
{
if (flags & MUTT_FORMAT_FORCESUBJ)
diff --git a/init.h b/init.h
index 035752f..8ba243b 100644
--- a/init.h
+++ b/init.h
@@ -401,6 +401,31 @@ struct option_t MuttVars[] = {
** this variable is \fIunset\fP, no check for new mail is performed
** while the mailbox is open.
*/
+#ifdef USE_CJK_WIDTH
+ { "cjk_width", DT_BOOL, R_NONE, OPTCJKWIDTH, 0 },
+ /*
+ ** .pp
+ ** When this option is set, characters in the East Asian Ambiguous (A)
+ ** category as defined in Unicode Technical Report #11 have a column
+ ** width of 2. Othrwise, they have a column width of 1.
+ ** This variant might be useful for users of CJK legacy encodings
+ ** who want to migrate to UCS without changing the traditional terminal
+ ** character-width behaviour.
+ ** .pp
+ ** \fBNote:\fP this option only affects in UTF-8 encoding.
+ */
+ { "cjk_width_tree_chars", DT_BOOL, R_NONE, OPTCJKWIDTHTREECHARS, 0 },
+ /*
+ ** .pp
+ ** If \fIset\fP, Mutt will use the result of $cjk_width as a column
+ ** width of WACS characters when displaying thread and attachment trees.
+ ** This variant might be useful for users of CJK legacy encodings
+ ** who want to migrate to UCS without changing the traditional terminal
+ ** character-width behaviour.
+ ** .pp
+ ** \fBNote:\fP this option only affects in UTF-8 encoding.
+ */
+#endif
{ "collapse_unread", DT_BOOL, R_NONE, OPTCOLLAPSEUNREAD, 1 },
/*
** .pp
@@ -606,6 +631,17 @@ struct option_t MuttVars[] = {
** If \fI``no''\fP, never attempt to verify cryptographic signatures.
** (Crypto only)
*/
+ { "create_rfc2047_parameters", DT_BOOL, R_NONE, OPTCREATERFC2047PARAMS, 0 },
+ /*
+ ** .pp
+ ** When this variable is set, Mutt will add the following RFC-2047-encoded
+ ** MIME parameter to Content-Type header field as filename for attachment:
+ ** name="=?iso-2022-jp?B?GyRCO244MxsoQi50eHQ=?="
+ ** .pp
+ ** Note: this use of RFC 2047's encoding is explicitly prohibited
+ ** by the standard. You may set this variable only if a mailer
+ ** of recipients can not parse RFC 2231 parameters.
+ */
{ "date_format", DT_STR, R_MENU, UL &DateFmt, UL "!%a, %b %d, %Y at %I:%M:%S%p %Z" },
/*
** .pp
@@ -657,6 +693,19 @@ struct option_t MuttVars[] = {
** If this option is \fIset\fP, mutt's received-attachments menu will not show the subparts of
** individual messages in a multipart/digest. To see these subparts, press ``v'' on that menu.
*/
+ { "delete_prefix", DT_BOOL, R_NONE, OPTDELETEPREFIX, 0 },
+ /*
+ ** .pp
+ ** If set, prefix in Subject: field generated by some mailing lists
+ ** (something like "Subject: [foo-ML:0012] real-subject") can be deleted
+ ** when displaying in index-mode and editing in message reply.
+ ** Deletion pattern can be configured by $$delete_regexp variable.
+ */
+ { "delete_regexp", DT_RX, R_NONE, UL &DeleteRegexp, UL "^(\\[[A-Za-z0-9_.: \\-]*\\][ ]*)" },
+ /*
+ ** .pp
+ ** A regular expression used in $$delete_prefix function.
+ */
{ "display_filter", DT_PATH, R_PAGER, UL &DisplayFilter, UL "" },
/*
** .pp
@@ -664,7 +713,7 @@ struct option_t MuttVars[] = {
** is viewed it is passed as standard input to $$display_filter, and the
** filtered message is read from the standard output.
*/
-#if defined(DL_STANDALONE) && defined(USE_DOTLOCK)
+#if defined(DL_STANDALONE) && defined(USE_DOTLOCK) && !defined(NO_USE_HARDLINK)
{ "dotlock_program", DT_PATH, R_NONE, UL &MuttDotlock, UL BINDIR "/mutt_dotlock" },
/*
** .pp
@@ -2695,6 +2744,28 @@ struct option_t MuttVars[] = {
** that mutt \fIgenerates\fP this kind of encoding. Instead, mutt will
** unconditionally use the encoding specified in RFC2231.
*/
+ { "sanitize_ja_chars", DT_BOOL, R_NONE, OPTSANITIZEJACHARS, 0 },
+ /*
+ ** .pp
+ ** When set, Japanese "platform dependent characters" (illegal
+ ** characters for iso-2022-jp charset; mainly used by MS-Windows
+ ** mailers) are substituted to special character, GETA mark ('ESC $$ B " .
+ ** ESC ( B' in iso-2022-jp), and JIS X 0201 kana characters
+ ** (only for "ESC ) I" cases) are also substituted to "?" to
+ ** prevent garbage characters. JIS X 0201 kana characters are
+ ** not substituted if they appear in 8bit form.
+ ** .pp
+ ** This fixes another Japanese encoding issue. In case $$charset
+ ** is set to "EUC-JP", which does not contain JIS X 0201 roman
+ ** character set, the JIS X 0201 roman part of received messages
+ ** encoded in iso-2022-jp can not be converted to EUC-JP.
+ ** On the other hand, the ASCII part can not be converted to
+ ** Shift_JIS, which does not contain ASCII character set. Thus,
+ ** the converted characters are garbled in these cases. When this
+ ** option is set, the JIS X 0201 roman escape sequence and the
+ ** ASCII escape sequence are replaced appropriately to prevent
+ ** the output from being garbled.
+ */
{ "save_address", DT_BOOL, R_NONE, OPTSAVEADDRESS, 0 },
/*
** .pp
diff --git a/lib.c b/lib.c
index 224232b..abf8d3f 100644
--- a/lib.c
+++ b/lib.c
@@ -445,6 +445,10 @@ int safe_symlink(const char *oldpath, const char *newpath)
int safe_rename (const char *src, const char *target)
{
+#ifdef NO_USE_HARDLINK
+ /* Android (since 6.0) does not support hardlinks. */
+ return rename(src, target);
+#else
struct stat ssb, tsb;
if (!src || !target)
@@ -537,6 +541,7 @@ int safe_rename (const char *src, const char *target)
return 0;
+#endif /* NO_USE_HARDLINK */
}
diff --git a/main.c b/main.c
index edb36e0..d791aea 100644
--- a/main.c
+++ b/main.c
@@ -260,25 +260,25 @@ static void show_version (void)
"-USE_SETGID "
#endif
-#ifdef USE_DOTLOCK
+#if defined(USE_DOTLOCK) && !defined(NO_USE_HARDLINK)
"+USE_DOTLOCK "
#else
"-USE_DOTLOCK "
#endif
-#ifdef DL_STANDALONE
+#if defined(DL_STANDALONE) && !defined(NO_USE_HARDLINK)
"+DL_STANDALONE "
#else
"-DL_STANDALONE "
#endif
-#ifdef USE_FCNTL
+#if defined(USE_FCNTL) || defined(NO_USE_HARDLINK)
"+USE_FCNTL "
#else
"-USE_FCNTL "
#endif
-#ifdef USE_FLOCK
+#if defined(USE_FLOCK) || defined(NO_USE_HARDLINK)
"+USE_FLOCK "
#else
"-USE_FLOCK "
@@ -439,6 +439,12 @@ static void show_version (void)
"-LOCALES_HACK "
#endif
+#ifdef USE_CJK_WIDTH
+ "+USE_CJK_WIDTH "
+#else
+ "-USE_CJK_WIDTH "
+#endif
+
#ifdef HAVE_WC_FUNCS
"+HAVE_WC_FUNCS "
#else
diff --git a/mbyte.c b/mbyte.c
index 0eedaa7..8032bd3 100644
--- a/mbyte.c
+++ b/mbyte.c
@@ -17,7 +17,7 @@
*/
/*
- * Japanese support by TAKIZAWA Takashi <taki@luna.email.ne.jp>.
+ * CJK support by TAKIZAWA Takashi <taki@luna.email.ne.jp>.
*/
#if HAVE_CONFIG_H
@@ -37,8 +37,8 @@
#endif
int Charset_is_utf8 = 0;
+static int charset_is_cjk = 0;
#ifndef HAVE_WC_FUNCS
-static int charset_is_ja = 0;
static iconv_t charset_to_utf8 = (iconv_t)(-1);
static iconv_t charset_from_utf8 = (iconv_t)(-1);
#endif
@@ -50,8 +50,8 @@ void mutt_set_charset (char *charset)
mutt_canonical_charset (buffer, sizeof (buffer), charset);
Charset_is_utf8 = 0;
+ charset_is_cjk = 0;
#ifndef HAVE_WC_FUNCS
- charset_is_ja = 0;
if (charset_to_utf8 != (iconv_t)(-1))
{
iconv_close (charset_to_utf8);
@@ -66,12 +66,18 @@ void mutt_set_charset (char *charset)
if (mutt_is_utf8 (buffer))
Charset_is_utf8 = 1;
-#ifndef HAVE_WC_FUNCS
- else if (!ascii_strcasecmp(buffer, "euc-jp") || !ascii_strcasecmp(buffer, "shift_jis")
- || !ascii_strcasecmp(buffer, "cp932") || !ascii_strcasecmp(buffer, "eucJP-ms"))
+ else if (!ascii_strcasecmp (buffer, "gb2312") ||
+ !ascii_strcasecmp (buffer, "gb18030") ||
+ !ascii_strcasecmp (buffer, "big5") ||
+ !ascii_strcasecmp (buffer, "euc-tw") ||
+ !ascii_strcasecmp (buffer, "EUC-JP") ||
+ !ascii_strcasecmp (buffer, "eucJP-ms") ||
+ !ascii_strcasecmp (buffer, "Shift_JIS") ||
+ !ascii_strcasecmp (buffer, "cp932") ||
+ !ascii_strcasecmp (buffer, "euc-kr"))
{
- charset_is_ja = 1;
-
+ charset_is_cjk = 1;
+#ifndef HAVE_WC_FUNCS
/* Note flags=0 to skip charset-hooks: User masters the $charset
* name, and we are sure of our "utf-8" constant. So there is no
* possibility of wrong name that we would want to try to correct
@@ -80,24 +86,68 @@ void mutt_set_charset (char *charset)
*/
charset_to_utf8 = mutt_iconv_open ("utf-8", charset, 0);
charset_from_utf8 = mutt_iconv_open (charset, "utf-8", 0);
- }
#endif
+ }
#if defined(HAVE_BIND_TEXTDOMAIN_CODESET) && defined(ENABLE_NLS)
bind_textdomain_codeset(PACKAGE, buffer);
#endif
}
+#if !defined(HAVE_WC_FUNCS) || defined(USE_CJK_WIDTH)
+/*
+ * For systems that don't have wcwidth() which functions correctly,
+ * we provide our own wcwidth().
+ * Furthermore, this wcwidth() enables change of character-cell width of
+ * the East Asian Ambiguous class by using $cjk_width.
+ * The function which most systems have cannot do it.
+ * Please read the comment of wcwidth.c about the East Asian Ambiguous
+ * class for details.
+ */
+int wcwidth_ucs(wchar_t ucs);
+int wcwidth_cjk(wchar_t ucs);
+
+int wcwidth (wchar_t wc)
+{
+ if (!Charset_is_utf8)
+ {
+ if (!charset_is_cjk)
+ {
+ /* 8-bit case */
+ if (!wc)
+ return 0;
+ else if ((0 <= wc && wc < 256) && IsPrint (wc))
+ return 1;
+ else
+ return -1;
+ }
+ else
+ {
+ /* CJK */
+ return wcwidth_cjk (wc);
+ }
+ }
+ else {
+#ifdef USE_CJK_WIDTH
+ if (option (OPTCJKWIDTH))
+ return wcwidth_cjk (wc);
+#endif /* USE_CJK_WIDTH */
+ return wcwidth_ucs (wc);
+ }
+}
+#endif /* !HAVE_WC_FUNCS || USE_CJK_WIDTH */
+
+
#ifndef HAVE_WC_FUNCS
/*
* For systems that don't have them, we provide here our own
- * implementations of wcrtomb(), mbrtowc(), iswprint() and wcwidth().
+ * implementations of wcrtomb(), mbrtowc() and iswprint().
* Instead of using the locale, as these functions normally would,
* we use Mutt's Charset variable. We support 3 types of charset:
* (1) For 8-bit charsets, wchar_t uses the same encoding as char.
* (2) For UTF-8, wchar_t uses UCS.
- * (3) For stateless Japanese encodings, we use UCS and convert
+ * (3) For stateless CJK encodings, we use UCS and convert
* via UTF-8 using iconv.
* Unfortunately, we can't handle non-stateless encodings.
*/
@@ -256,7 +306,7 @@ size_t mbrtowc (wchar_t *pwc, const char *s, size_t n, mbstate_t *ps)
int iswprint (wint_t wc)
{
- if (Charset_is_utf8 || charset_is_ja)
+ if (Charset_is_utf8 || charset_is_cjk)
return ((0x20 <= wc && wc < 0x7f) || 0xa0 <= wc);
else
return (0 <= wc && wc < 256) ? IsPrint (wc) : 0;
@@ -264,7 +314,7 @@ int iswprint (wint_t wc)
int iswspace (wint_t wc)
{
- if (Charset_is_utf8 || charset_is_ja)
+ if (Charset_is_utf8 || charset_is_cjk)
return (9 <= wc && wc <= 13) || wc == 32;
else
return (0 <= wc && wc < 256) ? isspace (wc) : 0;
@@ -347,7 +397,7 @@ static int iswalpha_ucs (wint_t wc)
wint_t towupper (wint_t wc)
{
- if (Charset_is_utf8 || charset_is_ja)
+ if (Charset_is_utf8 || charset_is_cjk)
return towupper_ucs (wc);
else
return (0 <= wc && wc < 256) ? toupper (wc) : wc;
@@ -355,7 +405,7 @@ wint_t towupper (wint_t wc)
wint_t towlower (wint_t wc)
{
- if (Charset_is_utf8 || charset_is_ja)
+ if (Charset_is_utf8 || charset_is_cjk)
return towlower_ucs (wc);
else
return (0 <= wc && wc < 256) ? tolower (wc) : wc;
@@ -363,7 +413,7 @@ wint_t towlower (wint_t wc)
int iswalnum (wint_t wc)
{
- if (Charset_is_utf8 || charset_is_ja)
+ if (Charset_is_utf8 || charset_is_cjk)
return iswalnum_ucs (wc);
else
return (0 <= wc && wc < 256) ? isalnum (wc) : 0;
@@ -371,7 +421,7 @@ int iswalnum (wint_t wc)
int iswalpha (wint_t wc)
{
- if (Charset_is_utf8 || charset_is_ja)
+ if (Charset_is_utf8 || charset_is_cjk)
return iswalpha_ucs (wc);
else
return (0 <= wc && wc < 256) ? isalpha (wc) : 0;
@@ -379,58 +429,12 @@ int iswalpha (wint_t wc)
int iswupper (wint_t wc)
{
- if (Charset_is_utf8 || charset_is_ja)
+ if (Charset_is_utf8 || charset_is_cjk)
return iswupper_ucs (wc);
else
return (0 <= wc && wc < 256) ? isupper (wc) : 0;
}
-/*
- * l10n for Japanese:
- * Symbols, Greek and Cyrillic in JIS X 0208, Japanese Kanji
- * Character Set, have a column width of 2.
- */
-int wcwidth_ja (wchar_t ucs)
-{
- if (ucs >= 0x3021)
- return -1; /* continue with the normal check */
- /* a rough range for quick check */
- if ((ucs >= 0x00a1 && ucs <= 0x00fe) || /* Latin-1 Supplement */
- (ucs >= 0x0391 && ucs <= 0x0451) || /* Greek and Cyrillic */
- (ucs >= 0x2010 && ucs <= 0x266f) || /* Symbols */
- (ucs >= 0x3000 && ucs <= 0x3020)) /* CJK Symbols and Punctuation */
- return 2;
- else
- return -1;
-}
-
-int wcwidth_ucs(wchar_t ucs);
-
-int wcwidth (wchar_t wc)
-{
- if (!Charset_is_utf8)
- {
- if (!charset_is_ja)
- {
- /* 8-bit case */
- if (!wc)
- return 0;
- else if ((0 <= wc && wc < 256) && IsPrint (wc))
- return 1;
- else
- return -1;
- }
- else
- {
- /* Japanese */
- int k = wcwidth_ja (wc);
- if (k != -1)
- return k;
- }
- }
- return wcwidth_ucs (wc);
-}
-
size_t utf8rtowc (wchar_t *pwc, const char *s, size_t n, mbstate_t *_ps)
{
static wchar_t mbstate;
diff --git a/mbyte.h b/mbyte.h
index 9c58c9e..224cafb 100644
--- a/mbyte.h
+++ b/mbyte.h
@@ -8,6 +8,12 @@
# ifdef HAVE_WCTYPE_H
# include <wctype.h>
# endif
+# ifdef USE_CJK_WIDTH
+#ifdef wcwidth
+# undef wcwidth
+#endif
+int wcwidth (wchar_t wc);
+# endif /* USE_CJK_WIDTH */
# endif
# ifndef HAVE_WC_FUNCS
@@ -32,6 +38,9 @@
#ifdef iswupper
# undef iswupper
#endif
+#ifdef wcwidth
+# undef wcwidth
+#endif
size_t wcrtomb (char *s, wchar_t wc, mbstate_t *ps);
size_t mbrtowc (wchar_t *pwc, const char *s, size_t n, mbstate_t *ps);
int iswprint (wint_t wc);
@@ -44,7 +53,6 @@ wint_t towlower (wint_t wc);
int wcwidth (wchar_t wc);
# endif /* !HAVE_WC_FUNCS */
-
void mutt_set_charset (char *charset);
extern int Charset_is_utf8;
size_t utf8rtowc (wchar_t *pwc, const char *s, size_t n, mbstate_t *_ps);
diff --git a/mutt.h b/mutt.h
index 54a807f..a623f6c 100644
--- a/mutt.h
+++ b/mutt.h
@@ -344,10 +344,16 @@ enum
OPTBRAILLEFRIENDLY,
OPTCHECKMBOXSIZE,
OPTCHECKNEW,
+#ifdef USE_CJK_WIDTH
+ OPTCJKWIDTH,
+ OPTCJKWIDTHTREECHARS,
+#endif /* USE_CJK_WIDTH */
OPTCOLLAPSEUNREAD,
OPTCONFIRMAPPEND,
OPTCONFIRMCREATE,
+ OPTCREATERFC2047PARAMS,
OPTDELETEUNTAG,
+ OPTDELETEPREFIX,
OPTDIGESTCOLLAPSE,
OPTDUPTHREADS,
OPTEDITHDRS,
@@ -447,6 +453,7 @@ enum
OPTREVNAME,
OPTREVREAL,
OPTRFC2047PARAMS,
+ OPTSANITIZEJACHARS,
OPTSAVEADDRESS,
OPTSAVEEMPTY,
OPTSAVENAME,
diff --git a/mutt_dotlock.c b/mutt_dotlock.c
index 5bf0348..5d87850 100644
--- a/mutt_dotlock.c
+++ b/mutt_dotlock.c
@@ -52,13 +52,13 @@
#include <getopt.h>
#endif
-#ifdef DL_STANDALONE
+#if defined(DL_STANDALONE) && !defined(NO_USE_HARDLINK)
# include "reldate.h"
#endif
#define MAXLINKS 1024 /* maximum link depth */
-#ifdef DL_STANDALONE
+#if defined(DL_STANDALONE) && !defined(NO_USE_HARDLINK)
# define LONG_STRING 1024
# define MAXLOCKATTEMPT 5
@@ -96,7 +96,7 @@ extern int snprintf (char *, size_t, const char *, ...);
static int DotlockFlags;
static int Retry = MAXLOCKATTEMPT;
-#ifdef DL_STANDALONE
+#if defined(DL_STANDALONE) && !defined(NO_USE_HARDLINK)
static char *Hostname;
#endif
@@ -110,7 +110,7 @@ static int dotlock_prepare (char *, size_t, const char *, int fd);
static int dotlock_check_stats (struct stat *, struct stat *);
static int dotlock_dispatch (const char *, int fd);
-#ifdef DL_STANDALONE
+#if defined(DL_STANDALONE) && !defined(NO_USE_HARDLINK)
static int dotlock_init_privs (void);
static void usage (const char *);
#endif
@@ -130,7 +130,7 @@ static int dotlock_unlink (const char *);
static int dotlock_lock (const char *);
-#ifdef DL_STANDALONE
+#if defined(DL_STANDALONE) && !defined(NO_USE_HARDLINK)
#define check_flags(a) if (a & DL_FL_ACTIONS) usage (argv[0])
@@ -327,7 +327,7 @@ END_PRIVILEGED (void)
#endif
}
-#ifdef DL_STANDALONE
+#if defined(DL_STANDALONE) && !defined(NO_USE_HARDLINK)
/*
* Usage information.
diff --git a/mutt_regex.h b/mutt_regex.h
index 1cc4a3e..d12b346 100644
--- a/mutt_regex.h
+++ b/mutt_regex.h
@@ -51,5 +51,6 @@ WHERE REGEXP QuoteRegexp;
WHERE REGEXP ReplyRegexp;
WHERE REGEXP Smileys;
WHERE REGEXP GecosMask;
+WHERE REGEXP DeleteRegexp;
#endif /* MUTT_REGEX_H */
diff --git a/mx.c b/mx.c
index 5d6782a..6f55e36 100644
--- a/mx.c
+++ b/mx.c
@@ -47,7 +47,7 @@
#include "buffy.h"
-#ifdef USE_DOTLOCK
+#if defined(USE_DOTLOCK) && !defined(NO_USE_HARDLINK)
#include "dotlock.h"
#endif
@@ -95,13 +95,13 @@ struct mx_ops* mx_get_ops (int magic)
#define mutt_is_spool(s) (mutt_strcmp (Spoolfile, s) == 0)
-#ifdef USE_DOTLOCK
+#if defined(USE_DOTLOCK) && !defined(NO_USE_HARDLINK)
/* parameters:
* path - file to lock
* retry - should retry if unable to lock?
*/
-#ifdef DL_STANDALONE
+#if defined(DL_STANDALONE) && !defined(NO_USE_HARDLINK)
static int invoke_dotlock (const char *path, int dummy, int flags, int retry)
{
@@ -181,14 +181,14 @@ static int undotlock_file (const char *path, int fd)
*/
int mx_lock_file (const char *path, int fd, int excl, int dot, int timeout)
{
-#if defined (USE_FCNTL) || defined (USE_FLOCK)
+#if defined (USE_FCNTL) || defined (USE_FLOCK) || defined(NO_USE_HARDLINK)
int count;
int attempt;
struct stat sb = { 0 }, prev_sb = { 0 }; /* silence gcc warnings */
#endif
int r = 0;
-#ifdef USE_FCNTL
+#if defined(USE_FCNTL) || defined(NO_USE_HARDLINK)
struct flock lck;
memset (&lck, 0, sizeof (struct flock));
@@ -227,7 +227,7 @@ int mx_lock_file (const char *path, int fd, int excl, int dot, int timeout)
}
#endif /* USE_FCNTL */
-#ifdef USE_FLOCK
+#if defined(USE_FLOCK) || defined(NO_USE_HARDLINK)
count = 0;
attempt = 0;
while (flock (fd, (excl ? LOCK_EX : LOCK_SH) | LOCK_NB) == -1)
@@ -261,7 +261,7 @@ int mx_lock_file (const char *path, int fd, int excl, int dot, int timeout)
}
#endif /* USE_FLOCK */
-#ifdef USE_DOTLOCK
+#if defined(USE_DOTLOCK) && !defined(NO_USE_HARDLINK)
if (r == 0 && dot)
r = dotlock_file (path, fd, timeout);
#endif /* USE_DOTLOCK */
@@ -270,12 +270,12 @@ int mx_lock_file (const char *path, int fd, int excl, int dot, int timeout)
{
/* release any other locks obtained in this routine */
-#ifdef USE_FCNTL
+#if defined(USE_FCNTL) || defined(NO_USE_HARDLINK)
lck.l_type = F_UNLCK;
fcntl (fd, F_SETLK, &lck);
#endif /* USE_FCNTL */
-#ifdef USE_FLOCK
+#if defined(USE_FLOCK) || defined(NO_USE_HARDLINK)
flock (fd, LOCK_UN);
#endif /* USE_FLOCK */
}
@@ -285,7 +285,7 @@ int mx_lock_file (const char *path, int fd, int excl, int dot, int timeout)
int mx_unlock_file (const char *path, int fd, int dot)
{
-#ifdef USE_FCNTL
+#if defined(USE_FCNTL) || defined(NO_USE_HARDLINK)
struct flock unlockit = { F_UNLCK, 0, 0, 0, 0 };
memset (&unlockit, 0, sizeof (struct flock));
@@ -294,11 +294,11 @@ int mx_unlock_file (const char *path, int fd, int dot)
fcntl (fd, F_SETLK, &unlockit);
#endif
-#ifdef USE_FLOCK
+#if defined(USE_FLOCK) || defined(NO_USE_HARDLINK)
flock (fd, LOCK_UN);
#endif
-#ifdef USE_DOTLOCK
+#if defined(USE_DOTLOCK) && !defined(NO_USE_HARDLINK)
if (dot)
undotlock_file (path, fd);
#endif
@@ -309,7 +309,7 @@ int mx_unlock_file (const char *path, int fd, int dot)
static void mx_unlink_empty (const char *path)
{
int fd;
-#ifndef USE_DOTLOCK
+#if !defined(USE_DOTLOCK) || defined(NO_USE_HARDLINK)
struct stat sb;
#endif
@@ -322,7 +322,7 @@ static void mx_unlink_empty (const char *path)
return;
}
-#ifdef USE_DOTLOCK
+#if defined(USE_DOTLOCK) && !defined(NO_USE_HARDLINK)
invoke_dotlock (path, fd, DL_FL_UNLINK, 1);
#else
if (fstat (fd, &sb) == 0 && sb.st_size == 0)
diff --git a/parse.c b/parse.c
index 0ae5594..745d2fc 100644
--- a/parse.c
+++ b/parse.c
@@ -1453,6 +1453,18 @@ ENVELOPE *mutt_read_rfc822_header (FILE *f, HEADER *hdr, short user_hdrs,
e->real_subj = e->subject + pmatch[0].rm_eo;
else
e->real_subj = e->subject;
+ if (option (OPTDELETEPREFIX))
+ {
+ /* if this option is set, mutt will delete the string as [prefix],
+ * [prefix:number] and [prefix number] in Subject line.
+ */
+ if (regexec (DeleteRegexp.rx, e->real_subj, 1, pmatch, 0) == 0)
+ {
+ e->real_subj = e->real_subj + pmatch[0].rm_eo;
+ if (regexec (ReplyRegexp.rx, e->real_subj, 1, pmatch, 0) == 0)
+ e->real_subj = e->real_subj + pmatch[0].rm_eo;
+ }
+ }
}
if (hdr->received < 0)
diff --git a/rfc2047.c b/rfc2047.c
index 8506425..e907b25 100644
--- a/rfc2047.c
+++ b/rfc2047.c
@@ -62,6 +62,9 @@ static size_t convert_string (ICONV_CONST char *f, size_t flen,
size_t obl, n;
int e;
+ if (option (OPTSANITIZEJACHARS) && !ascii_strncasecmp (from, "iso-2022-jp",
+11))
+ mutt_sanitize_ja_chars ((char *) f, flen, 0);
cd = mutt_iconv_open (to, from, 0);
if (cd == (iconv_t)(-1))
return (size_t)(-1);
diff --git a/sendlib.c b/sendlib.c
index d09b8ce..cf0df92 100644
--- a/sendlib.c
+++ b/sendlib.c
@@ -348,6 +348,30 @@ int mutt_write_mime_header (BODY *a, FILE *f)
}
}
+ if (a->use_disp && option (OPTCREATERFC2047PARAMS))
+ {
+ if(!(fn = a->d_filename))
+ fn = a->filename;
+
+ if (fn)
+ {
+ char *tmp;
+
+ /* Strip off the leading path... */
+ if ((t = strrchr (fn, '/')))
+ t++;
+ else
+ t = fn;
+
+ buffer[0] = 0;
+ tmp = safe_strdup (t);
+ rfc2047_encode_string (&tmp);
+ rfc822_cat (buffer, sizeof (buffer), tmp, MimeSpecials);
+ FREE (&tmp);
+ fprintf (f, ";\n\tname=%s", buffer);
+ }
+ }
+
fputc ('\n', f);
if (a->description)
diff --git a/wcwidth.c b/wcwidth.c
index 0b94d73..85a1397 100644
--- a/wcwidth.c
+++ b/wcwidth.c
@@ -5,6 +5,51 @@
* http://www.opengroup.org/onlinepubs/007904975/functions/wcwidth.html
* http://www.opengroup.org/onlinepubs/007904975/functions/wcswidth.html
*
+ * In fixed-width output devices, Latin characters all occupy a single
+ * "cell" position of equal width, whereas ideographic CJK characters
+ * occupy two such cells. Interoperability between terminal-line
+ * applications and (teletype-style) character terminals using the
+ * UTF-8 encoding requires agreement on which character should advance
+ * the cursor by how many cell positions. No established formal
+ * standards exist at present on which Unicode character shall occupy
+ * how many cell positions on character terminals. These routines are
+ * a first attempt of defining such behavior based on simple rules
+ * applied to data provided by the Unicode Consortium.
+ *
+ * For some graphical characters, the Unicode standard explicitly
+ * defines a character-cell width via the definition of the East Asian
+ * FullWidth (F), Wide (W), Half-width (H), and Narrow (Na) classes.
+ * In all these cases, there is no ambiguity about which width a
+ * terminal shall use. For characters in the East Asian Ambiguous (A)
+ * class, the width choice depends purely on a preference of backward
+ * compatibility with either historic CJK or Western practice.
+ * Choosing single-width for these characters is easy to justify as
+ * the appropriate long-term solution, as the CJK practice of
+ * displaying these characters as double-width comes from historic
+ * implementation simplicity (8-bit encoded characters were displayed
+ * single-width and 16-bit ones double-width, even for Greek,
+ * Cyrillic, etc.) and not any typographic considerations.
+ *
+ * Much less clear is the choice of width for the Not East Asian
+ * (Neutral) class. Existing practice does not dictate a width for any
+ * of these characters. It would nevertheless make sense
+ * typographically to allocate two character cells to characters such
+ * as for instance EM SPACE or VOLUME INTEGRAL, which cannot be
+ * represented adequately with a single-width glyph. The following
+ * routines at present merely assign a single-cell width to all
+ * neutral characters, in the interest of simplicity. This is not
+ * entirely satisfactory and should be reconsidered before
+ * establishing a formal standard in this area. At the moment, the
+ * decision which Not East Asian (Neutral) characters should be
+ * represented by double-width glyphs cannot yet be answered by
+ * applying a simple rule from the Unicode database content. Setting
+ * up a proper standard for the behavior of UTF-8 character terminals
+ * will require a careful analysis not only of each Unicode character,
+ * but also of each presentation form, something the author of these
+ * routines has avoided to do so far.
+ *
+ * http://www.unicode.org/unicode/reports/tr11/
+ *
* Markus Kuhn -- 2007-05-26 (Unicode 5.0)
*
* Permission to use, copy, modify, and distribute this software
@@ -24,12 +69,34 @@
# include "config.h"
#endif
-#ifndef HAVE_WC_FUNCS
+#if !defined(HAVE_WC_FUNCS) || defined(USE_CJK_WIDTH)
+#include <wchar.h>
-#include "mutt.h"
-#include "mbyte.h"
+struct interval {
+ wchar_t first;
+ wchar_t last;
+};
+
+/* auxiliary function for binary search in interval table */
+static int bisearch(wchar_t ucs, const struct interval *table, int max) {
+ int min = 0;
+ int mid;
+
+ if (ucs < table[0].first || ucs > table[max].last)
+ return 0;
+ while (max >= min) {
+ mid = (min + max) / 2;
+ if (ucs > table[mid].last)
+ min = mid + 1;
+ else if (ucs < table[mid].first)
+ max = mid - 1;
+ else
+ return 1;
+ }
+
+ return 0;
+}
-#include <ctype.h>
/* The following two functions define the column width of an ISO 10646
* character as follows:
@@ -67,62 +134,56 @@ int wcwidth_ucs(wchar_t ucs)
{
/* sorted list of non-overlapping intervals of non-spacing characters */
/* generated by "uniset +cat=Me +cat=Mn +cat=Cf -00AD +1160-11FF +200B c" */
- static const struct interval {
- wchar_t first;
- wchar_t last;
- } combining[] = {
- { 0x0300, 0x036f }, { 0x0483, 0x0486 }, { 0x0488, 0x0489 },
- { 0x0591, 0x05bd }, { 0x05bf, 0x05bf }, { 0x05c1, 0x05c2 },
- { 0x05c4, 0x05c5 }, { 0x05c7, 0x05c7 }, { 0x0600, 0x0603 },
- { 0x0610, 0x0615 }, { 0x064b, 0x065e }, { 0x0670, 0x0670 },
- { 0x06d6, 0x06e4 }, { 0x06e7, 0x06e8 }, { 0x06ea, 0x06ed },
- { 0x070f, 0x070f }, { 0x0711, 0x0711 }, { 0x0730, 0x074a },
- { 0x07a6, 0x07b0 }, { 0x07eb, 0x07f3 }, { 0x0901, 0x0902 },
- { 0x093c, 0x093c }, { 0x0941, 0x0948 }, { 0x094d, 0x094d },
+ static const struct interval combining[] = {
+ { 0x0300, 0x036F }, { 0x0483, 0x0486 }, { 0x0488, 0x0489 },
+ { 0x0591, 0x05BD }, { 0x05BF, 0x05BF }, { 0x05C1, 0x05C2 },
+ { 0x05C4, 0x05C5 }, { 0x05C7, 0x05C7 }, { 0x0600, 0x0603 },
+ { 0x0610, 0x0615 }, { 0x064B, 0x065E }, { 0x0670, 0x0670 },
+ { 0x06D6, 0x06E4 }, { 0x06E7, 0x06E8 }, { 0x06EA, 0x06ED },
+ { 0x070F, 0x070F }, { 0x0711, 0x0711 }, { 0x0730, 0x074A },
+ { 0x07A6, 0x07B0 }, { 0x07EB, 0x07F3 }, { 0x0901, 0x0902 },
+ { 0x093C, 0x093C }, { 0x0941, 0x0948 }, { 0x094D, 0x094D },
{ 0x0951, 0x0954 }, { 0x0962, 0x0963 }, { 0x0981, 0x0981 },
- { 0x09bc, 0x09bc }, { 0x09c1, 0x09c4 }, { 0x09cd, 0x09cd },
- { 0x09e2, 0x09e3 }, { 0x0a01, 0x0a02 }, { 0x0a3c, 0x0a3c },
- { 0x0a41, 0x0a42 }, { 0x0a47, 0x0a48 }, { 0x0a4b, 0x0a4d },
- { 0x0a70, 0x0a71 }, { 0x0a81, 0x0a82 }, { 0x0abc, 0x0abc },
- { 0x0ac1, 0x0ac5 }, { 0x0ac7, 0x0ac8 }, { 0x0acd, 0x0acd },
- { 0x0ae2, 0x0ae3 }, { 0x0b01, 0x0b01 }, { 0x0b3c, 0x0b3c },
- { 0x0b3f, 0x0b3f }, { 0x0b41, 0x0b43 }, { 0x0b4d, 0x0b4d },
- { 0x0b56, 0x0b56 }, { 0x0b82, 0x0b82 }, { 0x0bc0, 0x0bc0 },
- { 0x0bcd, 0x0bcd }, { 0x0c3e, 0x0c40 }, { 0x0c46, 0x0c48 },
- { 0x0c4a, 0x0c4d }, { 0x0c55, 0x0c56 }, { 0x0cbc, 0x0cbc },
- { 0x0cbf, 0x0cbf }, { 0x0cc6, 0x0cc6 }, { 0x0ccc, 0x0ccd },
- { 0x0ce2, 0x0ce3 }, { 0x0d41, 0x0d43 }, { 0x0d4d, 0x0d4d },
- { 0x0dca, 0x0dca }, { 0x0dd2, 0x0dd4 }, { 0x0dd6, 0x0dd6 },
- { 0x0e31, 0x0e31 }, { 0x0e34, 0x0e3a }, { 0x0e47, 0x0e4e },
- { 0x0eb1, 0x0eb1 }, { 0x0eb4, 0x0eb9 }, { 0x0ebb, 0x0ebc },
- { 0x0ec8, 0x0ecd }, { 0x0f18, 0x0f19 }, { 0x0f35, 0x0f35 },
- { 0x0f37, 0x0f37 }, { 0x0f39, 0x0f39 }, { 0x0f71, 0x0f7e },
- { 0x0f80, 0x0f84 }, { 0x0f86, 0x0f87 }, { 0x0f90, 0x0f97 },
- { 0x0f99, 0x0fbc }, { 0x0fc6, 0x0fc6 }, { 0x102d, 0x1030 },
+ { 0x09BC, 0x09BC }, { 0x09C1, 0x09C4 }, { 0x09CD, 0x09CD },
+ { 0x09E2, 0x09E3 }, { 0x0A01, 0x0A02 }, { 0x0A3C, 0x0A3C },
+ { 0x0A41, 0x0A42 }, { 0x0A47, 0x0A48 }, { 0x0A4B, 0x0A4D },
+ { 0x0A70, 0x0A71 }, { 0x0A81, 0x0A82 }, { 0x0ABC, 0x0ABC },
+ { 0x0AC1, 0x0AC5 }, { 0x0AC7, 0x0AC8 }, { 0x0ACD, 0x0ACD },
+ { 0x0AE2, 0x0AE3 }, { 0x0B01, 0x0B01 }, { 0x0B3C, 0x0B3C },
+ { 0x0B3F, 0x0B3F }, { 0x0B41, 0x0B43 }, { 0x0B4D, 0x0B4D },
+ { 0x0B56, 0x0B56 }, { 0x0B82, 0x0B82 }, { 0x0BC0, 0x0BC0 },
+ { 0x0BCD, 0x0BCD }, { 0x0C3E, 0x0C40 }, { 0x0C46, 0x0C48 },
+ { 0x0C4A, 0x0C4D }, { 0x0C55, 0x0C56 }, { 0x0CBC, 0x0CBC },
+ { 0x0CBF, 0x0CBF }, { 0x0CC6, 0x0CC6 }, { 0x0CCC, 0x0CCD },
+ { 0x0CE2, 0x0CE3 }, { 0x0D41, 0x0D43 }, { 0x0D4D, 0x0D4D },
+ { 0x0DCA, 0x0DCA }, { 0x0DD2, 0x0DD4 }, { 0x0DD6, 0x0DD6 },
+ { 0x0E31, 0x0E31 }, { 0x0E34, 0x0E3A }, { 0x0E47, 0x0E4E },
+ { 0x0EB1, 0x0EB1 }, { 0x0EB4, 0x0EB9 }, { 0x0EBB, 0x0EBC },
+ { 0x0EC8, 0x0ECD }, { 0x0F18, 0x0F19 }, { 0x0F35, 0x0F35 },
+ { 0x0F37, 0x0F37 }, { 0x0F39, 0x0F39 }, { 0x0F71, 0x0F7E },
+ { 0x0F80, 0x0F84 }, { 0x0F86, 0x0F87 }, { 0x0F90, 0x0F97 },
+ { 0x0F99, 0x0FBC }, { 0x0FC6, 0x0FC6 }, { 0x102D, 0x1030 },
{ 0x1032, 0x1032 }, { 0x1036, 0x1037 }, { 0x1039, 0x1039 },
- { 0x1058, 0x1059 }, { 0x1160, 0x11ff }, { 0x135f, 0x135f },
+ { 0x1058, 0x1059 }, { 0x1160, 0x11FF }, { 0x135F, 0x135F },
{ 0x1712, 0x1714 }, { 0x1732, 0x1734 }, { 0x1752, 0x1753 },
- { 0x1772, 0x1773 }, { 0x17b4, 0x17b5 }, { 0x17b7, 0x17bd },
- { 0x17c6, 0x17c6 }, { 0x17c9, 0x17d3 }, { 0x17dd, 0x17dd },
- { 0x180b, 0x180d }, { 0x18a9, 0x18a9 }, { 0x1920, 0x1922 },
- { 0x1927, 0x1928 }, { 0x1932, 0x1932 }, { 0x1939, 0x193b },
- { 0x1a17, 0x1a18 }, { 0x1b00, 0x1b03 }, { 0x1b34, 0x1b34 },
- { 0x1b36, 0x1b3a }, { 0x1b3c, 0x1b3c }, { 0x1b42, 0x1b42 },
- { 0x1b6b, 0x1b73 }, { 0x1dc0, 0x1dca }, { 0x1dfe, 0x1dff },
- { 0x200b, 0x200f }, { 0x202a, 0x202e }, { 0x2060, 0x2063 },
- { 0x206a, 0x206f }, { 0x20d0, 0x20ef }, { 0x302a, 0x302f },
- { 0x3099, 0x309a }, { 0xa806, 0xa806 }, { 0xa80b, 0xa80b },
- { 0xa825, 0xa826 }, { 0xfb1e, 0xfb1e }, { 0xfe00, 0xfe0f },
- { 0xfe20, 0xfe23 }, { 0xfeff, 0xfeff }, { 0xfff9, 0xfffb },
- { 0x10a01, 0x10a03 }, { 0x10a05, 0x10a06 }, { 0x10a0c, 0x10a0f },
- { 0x10a38, 0x10a3a }, { 0x10a3f, 0x10a3f }, { 0x1d167, 0x1d169 },
- { 0x1d173, 0x1d182 }, { 0x1d185, 0x1d18b }, { 0x1d1aa, 0x1d1ad },
- { 0x1d242, 0x1d244 }, { 0xe0001, 0xe0001 }, { 0xe0020, 0xe007f },
- { 0xe0100, 0xe01ef }
+ { 0x1772, 0x1773 }, { 0x17B4, 0x17B5 }, { 0x17B7, 0x17BD },
+ { 0x17C6, 0x17C6 }, { 0x17C9, 0x17D3 }, { 0x17DD, 0x17DD },
+ { 0x180B, 0x180D }, { 0x18A9, 0x18A9 }, { 0x1920, 0x1922 },
+ { 0x1927, 0x1928 }, { 0x1932, 0x1932 }, { 0x1939, 0x193B },
+ { 0x1A17, 0x1A18 }, { 0x1B00, 0x1B03 }, { 0x1B34, 0x1B34 },
+ { 0x1B36, 0x1B3A }, { 0x1B3C, 0x1B3C }, { 0x1B42, 0x1B42 },
+ { 0x1B6B, 0x1B73 }, { 0x1DC0, 0x1DCA }, { 0x1DFE, 0x1DFF },
+ { 0x200B, 0x200F }, { 0x202A, 0x202E }, { 0x2060, 0x2063 },
+ { 0x206A, 0x206F }, { 0x20D0, 0x20EF }, { 0x302A, 0x302F },
+ { 0x3099, 0x309A }, { 0xA806, 0xA806 }, { 0xA80B, 0xA80B },
+ { 0xA825, 0xA826 }, { 0xFB1E, 0xFB1E }, { 0xFE00, 0xFE0F },
+ { 0xFE20, 0xFE23 }, { 0xFEFF, 0xFEFF }, { 0xFFF9, 0xFFFB },
+ { 0x10A01, 0x10A03 }, { 0x10A05, 0x10A06 }, { 0x10A0C, 0x10A0F },
+ { 0x10A38, 0x10A3A }, { 0x10A3F, 0x10A3F }, { 0x1D167, 0x1D169 },
+ { 0x1D173, 0x1D182 }, { 0x1D185, 0x1D18B }, { 0x1D1AA, 0x1D1AD },
+ { 0x1D242, 0x1D244 }, { 0xE0001, 0xE0001 }, { 0xE0020, 0xE007F },
+ { 0xE0100, 0xE01EF }
};
- int min = 0;
- int max = sizeof(combining) / sizeof(struct interval) - 1;
- int mid;
/* test for 8-bit control characters */
if (ucs == 0)
@@ -130,20 +191,10 @@ int wcwidth_ucs(wchar_t ucs)
if (ucs < 32 || (ucs >= 0x7f && ucs < 0xa0))
return -1;
- /* first quick check for Latin-1 etc. characters */
- if (ucs < combining[0].first)
- return 1;
-
/* binary search in table of non-spacing characters */
- while (max >= min) {
- mid = (min + max) / 2;
- if (combining[mid].last < ucs)
- min = mid + 1;
- else if (combining[mid].first > ucs)
- max = mid - 1;
- else if (combining[mid].first <= ucs && combining[mid].last >= ucs)
- return 0;
- }
+ if (bisearch(ucs, combining,
+ sizeof(combining) / sizeof(struct interval) - 1))
+ return 0;
/* if we arrive here, ucs is not a combining or C0/C1 control character */
@@ -151,7 +202,7 @@ int wcwidth_ucs(wchar_t ucs)
if (ucs < 0x1100)
return 1;
- return 1 +
+ return 1 +
(ucs >= 0x1100 &&
(ucs <= 0x115f || /* Hangul Jamo init. consonants */
ucs == 0x2329 || ucs == 0x232a ||
@@ -167,15 +218,120 @@ int wcwidth_ucs(wchar_t ucs)
(ucs >= 0x30000 && ucs <= 0x3fffd)));
}
-#endif /* !HAVE_WC_FUNCS */
+#if 0 /* original */
+int wcswidth_ucs(const wchar_t *pwcs, size_t n)
+{
+ int w, width = 0;
+
+ for (;*pwcs && n-- > 0; pwcs++)
+ if ((w = wcwidth_ucs(*pwcs)) < 0)
+ return -1;
+ else
+ width += w;
+
+ return width;
+}
+#endif
+
+/*
+ * The following functions are the same as wcwidth_ucs() and
+ * wcswidth_ucs(), except that spacing characters in the East Asian
+ * Ambiguous (A) category as defined in Unicode Technical Report #11
+ * have a column width of 2. This variant might be useful for users of
+ * CJK legacy encodings who want to migrate to UCS without changing
+ * the traditional terminal character-width behaviour. It is not
+ * otherwise recommended for general use.
+ */
+/*
+ * In addition to the explanation mentioned above,
+ * several characters in the East Asian Narrow (Na) and Not East Asian
+ * (Neutral) category as defined in Unicode Technical Report #11
+ * actually have a column width of 2 in CJK legacy encodings.
+ */
+int wcwidth_cjk(wchar_t ucs)
+{
+ /* sorted list of non-overlapping intervals of East Asian Ambiguous
+ * characters, generated by "uniset +WIDTH-A -cat=Me -cat=Mn -cat=Cf c" */
+ static const struct interval ambiguous[] = {
+ { 0x00A1, 0x00A1 }, { 0x00A4, 0x00A4 }, { 0x00A7, 0x00A8 },
+ { 0x00AA, 0x00AA }, { 0x00AE, 0x00AE }, { 0x00B0, 0x00B4 },
+ { 0x00B6, 0x00BA }, { 0x00BC, 0x00BF }, { 0x00C6, 0x00C6 },
+ { 0x00D0, 0x00D0 }, { 0x00D7, 0x00D8 }, { 0x00DE, 0x00E1 },
+ { 0x00E6, 0x00E6 }, { 0x00E8, 0x00EA }, { 0x00EC, 0x00ED },
+ { 0x00F0, 0x00F0 }, { 0x00F2, 0x00F3 }, { 0x00F7, 0x00FA },
+ { 0x00FC, 0x00FC }, { 0x00FE, 0x00FE }, { 0x0101, 0x0101 },
+ { 0x0111, 0x0111 }, { 0x0113, 0x0113 }, { 0x011B, 0x011B },
+ { 0x0126, 0x0127 }, { 0x012B, 0x012B }, { 0x0131, 0x0133 },
+ { 0x0138, 0x0138 }, { 0x013F, 0x0142 }, { 0x0144, 0x0144 },
+ { 0x0148, 0x014B }, { 0x014D, 0x014D }, { 0x0152, 0x0153 },
+ { 0x0166, 0x0167 }, { 0x016B, 0x016B }, { 0x01CE, 0x01CE },
+ { 0x01D0, 0x01D0 }, { 0x01D2, 0x01D2 }, { 0x01D4, 0x01D4 },
+ { 0x01D6, 0x01D6 }, { 0x01D8, 0x01D8 }, { 0x01DA, 0x01DA },
+ { 0x01DC, 0x01DC }, { 0x0251, 0x0251 }, { 0x0261, 0x0261 },
+ { 0x02C4, 0x02C4 }, { 0x02C7, 0x02C7 }, { 0x02C9, 0x02CB },
+ { 0x02CD, 0x02CD }, { 0x02D0, 0x02D0 }, { 0x02D8, 0x02DB },
+ { 0x02DD, 0x02DD }, { 0x02DF, 0x02DF }, { 0x0391, 0x03A1 },
+ { 0x03A3, 0x03A9 }, { 0x03B1, 0x03C1 }, { 0x03C3, 0x03C9 },
+ { 0x0401, 0x0401 }, { 0x0410, 0x044F }, { 0x0451, 0x0451 },
+ { 0x2010, 0x2010 }, { 0x2013, 0x2016 }, { 0x2018, 0x2019 },
+ { 0x201C, 0x201D }, { 0x2020, 0x2022 }, { 0x2024, 0x2027 },
+ { 0x2030, 0x2030 }, { 0x2032, 0x2033 }, { 0x2035, 0x2035 },
+ { 0x203B, 0x203B }, { 0x203E, 0x203E }, { 0x2074, 0x2074 },
+ { 0x207F, 0x207F }, { 0x2081, 0x2084 }, { 0x20AC, 0x20AC },
+ { 0x2103, 0x2103 }, { 0x2105, 0x2105 }, { 0x2109, 0x2109 },
+ { 0x2113, 0x2113 }, { 0x2116, 0x2116 }, { 0x2121, 0x2122 },
+ { 0x2126, 0x2126 }, { 0x212B, 0x212B }, { 0x2153, 0x2154 },
+ { 0x215B, 0x215E }, { 0x2160, 0x216B }, { 0x2170, 0x2179 },
+ { 0x2190, 0x2199 }, { 0x21B8, 0x21B9 }, { 0x21D2, 0x21D2 },
+ { 0x21D4, 0x21D4 }, { 0x21E7, 0x21E7 }, { 0x2200, 0x2200 },
+ { 0x2202, 0x2203 }, { 0x2207, 0x2208 }, { 0x220B, 0x220B },
+ { 0x220F, 0x220F }, { 0x2211, 0x2211 }, { 0x2215, 0x2215 },
+ { 0x221A, 0x221A }, { 0x221D, 0x2220 }, { 0x2223, 0x2223 },
+ { 0x2225, 0x2225 }, { 0x2227, 0x222C }, { 0x222E, 0x222E },
+ { 0x2234, 0x2237 }, { 0x223C, 0x223D }, { 0x2248, 0x2248 },
+ { 0x224C, 0x224C }, { 0x2252, 0x2252 }, { 0x2260, 0x2261 },
+ { 0x2264, 0x2267 }, { 0x226A, 0x226B }, { 0x226E, 0x226F },
+ { 0x2282, 0x2283 }, { 0x2286, 0x2287 }, { 0x2295, 0x2295 },
+ { 0x2299, 0x2299 }, { 0x22A5, 0x22A5 }, { 0x22BF, 0x22BF },
+ { 0x2312, 0x2312 }, { 0x2460, 0x24E9 }, { 0x24EB, 0x254B },
+ { 0x2550, 0x2573 }, { 0x2580, 0x258F }, { 0x2592, 0x2595 },
+ { 0x25A0, 0x25A1 }, { 0x25A3, 0x25A9 }, { 0x25B2, 0x25B3 },
+ { 0x25B6, 0x25B7 }, { 0x25BC, 0x25BD }, { 0x25C0, 0x25C1 },
+ { 0x25C6, 0x25C8 }, { 0x25CB, 0x25CB }, { 0x25CE, 0x25D1 },
+ { 0x25E2, 0x25E5 }, { 0x25EF, 0x25EF }, { 0x2605, 0x2606 },
+ { 0x2609, 0x2609 }, { 0x260E, 0x260F }, { 0x2614, 0x2615 },
+ { 0x261C, 0x261C }, { 0x261E, 0x261E }, { 0x2640, 0x2640 },
+ { 0x2642, 0x2642 }, { 0x2660, 0x2661 }, { 0x2663, 0x2665 },
+ { 0x2667, 0x266A }, { 0x266C, 0x266D }, { 0x266F, 0x266F },
+ { 0x273D, 0x273D }, { 0x2776, 0x277F }, { 0xE000, 0xF8FF },
+ { 0xFFFD, 0xFFFD }, { 0xF0000, 0xFFFFD }, { 0x100000, 0x10FFFD }
+ };
+
+ /* For Japanese legacy encodings, the following characters are added. */
+ static const struct interval legacy_ja[] = {
+ { 0x00A2, 0x00A3 }, { 0x00A5, 0x00A6 }, { 0x00AC, 0x00AC },
+ { 0x00AF, 0x00AF }, { 0x2212, 0x2212 }
+ };
+
+ /* binary search in table of non-spacing characters */
+ if (bisearch(ucs, ambiguous,
+ sizeof(ambiguous) / sizeof(struct interval) - 1))
+ return 2;
+ if (bisearch(ucs, legacy_ja,
+ sizeof(legacy_ja) / sizeof(struct interval) - 1))
+ return 2;
+
+ return wcwidth_ucs(ucs);
+}
+
#if 0 /* original */
-int wcswidth(const wchar_t *pwcs, size_t n)
+int wcswidth_cjk(const wchar_t *pwcs, size_t n)
{
int w, width = 0;
for (;*pwcs && n-- > 0; pwcs++)
- if ((w = wcwidth(*pwcs)) < 0)
+ if ((w = wcwidth_cjk(*pwcs)) < 0)
return -1;
else
width += w;
@@ -183,3 +339,4 @@ int wcswidth(const wchar_t *pwcs, size_t n)
return width;
}
#endif
+#endif /* !HAVE_WC_FUNCS || USE_CJK_WIDTH */
diff --git a/.gitignore b/.gitignore
new file mode 100644
index 0000000..e3e6a1d
--- /dev/null
+++ b/.gitignore
@@ -0,0 +1,90 @@
+# .gitignore is rewrited from .hgignore.
+
+# autoconf products
+aclocal.m4
+autom4te.cache/
+compile
+Makefile.in
+contrib/Makefile.in
+doc/Makefile.in
+imap/Makefile.in
+m4/Makefile.in
+po/Makefile.in
+config.h
+config.h.in
+config.h.in~
+config.log
+config.status
+#configure
+hcachever.sh
+muttbug.sh
+stamp-h1
+doc/Muttrc
+doc/instdoc.sh
+po/POTFILES
+config.guess
+config.sub
+depcomp
+install-sh
+missing
+mkinstalldirs
+
+# built objects
+flea
+hcversion.h
+keymap_alldefs.h
+keymap_defs.h
+doc/makedoc
+mutt
+mutt_dotlock
+mutt_dotlock.c
+mutt_md5
+patchlist.c
+conststrings.c
+pgpewrap
+pgpring
+reldate.h
+smime_keys
+txt2c
+stamp-doc-rc
+doc/instdoc
+doc/manual.txt
+doc/manual.xml
+doc/manual.aux
+doc/manual.log
+doc/manual.out
+doc/manual.tex
+doc/manual.pdf
+doc/mutt.1
+doc/muttrc.man
+doc/pgpewrap.1
+doc/pgpring.1
+doc/*.html
+doc/stamp-*
+doc/smime_keys.1
+po/mutt.pot
+
+# xcode droppings
+build/
+.xcodeproj/
+
+# eclipse
+.cproject
+.object
+.settings/
+
+.deps
+Makefile
+GPATH
+GRTAGS
+GTAGS
+TAGS
+cscope.*
+*.swp
+*.o
+*.gmo
+*.orig
+*.rej
+*.a
+.gdb_history
+*~
diff --git a/PATCHES b/PATCHES
index e69de29..17743fd 100644
--- a/PATCHES
+++ b/PATCHES
@@ -0,0 +1,5 @@
+patch-1.5.23.tt+yy.delete_prefix.1
+patch-1.5.23.tt.create_rfc2047_params.1
+patch-1.5.23.tt.sanitize_ja.1
+patch-1.5.23.tt.cjk_width_tree_chars.1
+patch-1.5.23.tt.wcwidth.1
diff --git a/charset.c b/charset.c
index 2411f2c..6a5cbd4 100644
--- a/charset.c
+++ b/charset.c
@@ -481,6 +481,9 @@ int mutt_convert_string (char **ps, const char *from, const char *to, int flags)
if (!s || !*s)
return 0;
+ if (option (OPTSANITIZEJACHARS) && !ascii_strncasecmp (from, "iso-2022-jp", 11))
+ mutt_sanitize_ja_chars (s, mutt_strlen(s), 0);
+
if (to && from && (cd = mutt_iconv_open (to, from, flags)) != (iconv_t)-1)
{
int len;
@@ -677,3 +680,188 @@ int mutt_check_charset (const char *s, int strict)
return -1;
}
+
+/*
+ * mutt_sanitize_ja_chars()
+ * Adapted by TAKIZAWA Takashi <taki@cyber.email.ne.jp>
+ *
+ * - It replaces undefined KANJI characters to GETA mark.
+ * - It replaces character of 'JIS X 0201 kana' to '?'.
+ * - If $charset is EUC-JP, it replaces third character 'J' of
+ * escape sequence switching to 'JIS X 0201 latin' to 'B' indicating
+ * 'US-ASCII'.
+ * - If $charset is Shift_JIS, it replaces third character 'B' of
+ * escape sequence switching to 'US-ASCII' to 'J' indicating
+ * 'JIS X 0201 latin'.
+ */
+
+#define ASCII 0
+#define JISX0201LATIN 1
+#define JISX0201KANA 2
+#define JISX0208 3
+#define OTHER_CS 4
+
+void mutt_sanitize_ja_chars(char *s, size_t len, int keep_state)
+{
+ static int cs = ASCII;
+ static int kanji_cont = 0;
+ static int illegal_kanji = 0;
+ static int es = 0;
+ static char pes = '\0';
+ static char ascii_3rd_char = 'B';
+ static char jisx0201_3rd_char = 'J';
+
+ char *p = s;
+ char *p1 = NULL;
+ unsigned char c;
+
+ if (!keep_state || *p == 0x1b) /* consideration about mbstate's buffer */
+ {
+ if (!ascii_strcasecmp (Charset, "euc-jp"))
+ jisx0201_3rd_char = 'B';
+ else if (!ascii_strcasecmp (Charset, "shift_jis"))
+ ascii_3rd_char = 'J';
+ cs = ASCII;
+ kanji_cont = 0;
+ illegal_kanji = 0;
+ es = 0;
+ pes = '\0';
+ }
+
+ for (;p - s < len;p++)
+ {
+ if (es == 0)
+ {
+ if (*p == 0x1b)
+ es++;
+ else
+ {
+ switch (cs)
+ {
+ case ASCII:
+ case JISX0201LATIN:
+ break;
+ case JISX0201KANA:
+ *p = '?';
+ break;
+ case JISX0208:
+ /* replace ku-ten code from 9 to 15 and 85 or more to "GETA MARK" */
+ c = (unsigned char)*p;
+ if (! kanji_cont)
+ {
+ if ((size_t)(p - s + 1) == len)
+ return; /* the last character is a primary byte of KANJI */
+ if (c <= 0x20 || (c >= 0x29 && c <= 0x2f)
+ || (c >= 0x75 && c <= 0xa0))
+ illegal_kanji = 1;
+ kanji_cont = 1;
+ p1 = p;
+ }
+ else
+ {
+ if (c <= 0x20 || c >= 0x7f)
+ illegal_kanji = 1;
+ if (illegal_kanji && p1)
+ *p1 = 0x22, *p = 0x2e;
+ kanji_cont = 0;
+ illegal_kanji = 0;
+ }
+ break;
+ }
+ }
+ }
+ else if (es == 1)
+ {
+ if (*p == '$' || (*p >= '(' && *p <= '/' && *p != ','))
+ {
+ es++;
+ pes = *p;
+ }
+ else
+ {
+ es = 0;
+ return; /* broken */
+ }
+ }
+ else if (es == 2)
+ {
+ if (pes == '(')
+ {
+ switch (*p)
+ {
+ case 'B':
+ cs = ASCII, *p = ascii_3rd_char;
+ break;
+ case 'J':
+ cs = JISX0201LATIN, *p = jisx0201_3rd_char;
+ break;
+ case 'I':
+ /* ready to replace character to '?' */
+ cs = JISX0201KANA, *p = ascii_3rd_char;
+ break;
+ default:
+ cs = OTHER_CS;
+ }
+ es = 0;
+ }
+ else if (pes == '$')
+ {
+ switch (*p)
+ {
+ case '@': /* JIS X 0208-1978 */
+ case 'B': /* JIS X 0208-1983 */
+ cs = JISX0208;
+ es = 0;
+ break;
+ case 'A':
+ cs = OTHER_CS; /* GB 2312 */
+ es = 0;
+ break;
+ case '(':
+ case ')':
+ case '*':
+ case '+':
+ case '-':
+ case '.':
+ case '/':
+ es++;
+ break;
+ default:
+ es = 0;
+ return; /* broken */
+ }
+ }
+ else
+ {
+ cs = OTHER_CS;
+ es = 0;
+ }
+ }
+ else /* es == 3 */
+ {
+ cs = OTHER_CS;
+ es = 0;
+ }
+ }
+}
+
+int mutt_copy_bytes_sanitize_ja (FILE *in, FILE *out, size_t size)
+{
+ char buf[2048];
+ size_t chunk;
+
+ mutt_sanitize_ja_chars (NULL, 0, 0);
+ while (size > 0)
+ {
+ chunk = (size > sizeof (buf)) ? sizeof (buf) : size;
+ if ((chunk = fread (buf, 1, chunk, in)) < 1)
+ break;
+ mutt_sanitize_ja_chars (buf, chunk, 1);
+ if (fwrite (buf, 1, chunk, out) != chunk)
+ return (-1);
+ size -= chunk;
+ }
+
+ return 0;
+}
+
diff --git a/charset.h b/charset.h
index 54891f0..d67b209 100644
--- a/charset.h
+++ b/charset.h
@@ -36,6 +36,9 @@ int iconv_close (iconv_t);
int mutt_convert_string (char **, const char *, const char *, int);
+void mutt_sanitize_ja_chars (char *, size_t, int);
+int mutt_copy_bytes_sanitize_ja (FILE *, FILE *, size_t);
+
iconv_t mutt_iconv_open (const char *, const char *, int);
size_t mutt_iconv (iconv_t, ICONV_CONST char **, size_t *, char **, size_t *, ICONV_CONST char **, const char *);
diff --git a/configure b/configure
index de9054a..20e67fc 100755
--- a/configure
+++ b/configure
@@ -828,6 +828,7 @@ with_libiconv_prefix
enable_nls
with_included_gettext
with_idn
+enable_cjk_ambiguous_width
with_wc_funcs
enable_doc
enable_full_doc
@@ -1498,6 +1499,9 @@ Optional Features:
--enable-hcache Enable header caching
--disable-iconv Disable iconv support
--disable-nls Do not use Native Language Support
+ --enable-cjk-ambiguous-width
+ Enable East Asian Ambiguous characters support
+ (using own wcwidth)
--disable-doc Do not build the documentation
--disable-full-doc Omit disabled variables
@@ -12920,6 +12924,21 @@ fi
fi
+# Check whether --enable-cjk-ambiguous-width was given.
+if test "${enable_cjk_ambiguous_width+set}" = set; then :
+ enableval=$enable_cjk_ambiguous_width; if test "x$enableval" = "xyes" ; then
+ cjk_width=yes
+ fi
+
+fi
+
+if test "x$cjk_width" = "xyes" ; then
+
+$as_echo "#define USE_CJK_WIDTH 1" >>confdefs.h
+
+ MUTT_LIB_OBJECTS="$MUTT_LIB_OBJECTS wcwidth.o"
+fi
+
for ac_header in wchar.h
do :
ac_fn_c_check_header_mongrel "$LINENO" "wchar.h" "ac_cv_header_wchar_h" "$ac_includes_default"
@@ -13148,7 +13167,10 @@ if test $wc_funcs = yes; then
$as_echo "#define HAVE_WC_FUNCS 1" >>confdefs.h
else
- MUTT_LIB_OBJECTS="$MUTT_LIB_OBJECTS utf8.o wcwidth.o"
+ MUTT_LIB_OBJECTS="$MUTT_LIB_OBJECTS utf8.o"
+ if test "x$cjk_width" != "xyes"; then
+ MUTT_LIB_OBJECTS="$MUTT_LIB_OBJECTS wcwidth.o"
+ fi
fi
{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for nl_langinfo and CODESET" >&5
diff --git a/configure.ac b/configure.ac
index 616177f..f4fc1fe 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1330,6 +1330,16 @@ fi
dnl -- locales --
+AC_ARG_ENABLE(cjk-ambiguous-width, AC_HELP_STRING([--enable-cjk-ambiguous-width], [ Enable East Asian Ambiguous characters support (using own wcwidth)]),
+ [ if test "x$enableval" = "xyes" ; then
+ cjk_width=yes
+ fi
+ ])
+if test "x$cjk_width" = "xyes" ; then
+ AC_DEFINE(USE_CJK_WIDTH,1,[ Define if you want to support East Asian Ambiguous class. ])
+ MUTT_LIB_OBJECTS="$MUTT_LIB_OBJECTS wcwidth.o"
+fi
+
AC_CHECK_HEADERS(wchar.h)
AC_CACHE_CHECK([for wchar_t], mutt_cv_wchar_t,
@@ -1400,7 +1410,10 @@ fi
if test $wc_funcs = yes; then
AC_DEFINE(HAVE_WC_FUNCS,1,[ Define if you are using the system's wchar_t functions. ])
else
- MUTT_LIB_OBJECTS="$MUTT_LIB_OBJECTS utf8.o wcwidth.o"
+ MUTT_LIB_OBJECTS="$MUTT_LIB_OBJECTS utf8.o"
+ if test "x$cjk_width" != "xyes"; then
+ MUTT_LIB_OBJECTS="$MUTT_LIB_OBJECTS wcwidth.o"
+ fi
fi
AC_CACHE_CHECK([for nl_langinfo and CODESET], mutt_cv_langinfo_codeset,
diff --git a/curs_lib.c b/curs_lib.c
index ecac6d9..88ab6d1 100644
--- a/curs_lib.c
+++ b/curs_lib.c
@@ -1083,7 +1083,14 @@ void mutt_format_string (char *dest, size_t destlen,
wc = replacement_char ();
}
if (arboreal && wc < MUTT_TREE_MAX)
- w = 1; /* hack */
+ {
+#ifdef USE_CJK_WIDTH
+ if (Charset_is_utf8 && option (OPTCJKWIDTHTREECHARS) && !option (OPTASCIICHARS))
+ w = wcwidth (TreeChars[wc]);
+ else
+#endif
+ w = 1;
+ }
else
{
#ifdef HAVE_ISWBLANK
@@ -1312,10 +1319,12 @@ int mutt_strwidth (const char *s)
int w;
size_t k, n;
mbstate_t mbstate;
+ int arboreal;
if (!s) return 0;
n = mutt_strlen (s);
+ arboreal = (s[0] < MUTT_TREE_MAX) ? 1 : 0;
memset (&mbstate, 0, sizeof (mbstate));
for (w=0; n && (k = mbrtowc (&wc, s, n, &mbstate)); s += k, n -= k)
@@ -1327,9 +1336,21 @@ int mutt_strwidth (const char *s)
k = (k == (size_t)(-1)) ? 1 : n;
wc = replacement_char ();
}
- if (!IsWPrint (wc))
- wc = '?';
- w += wcwidth (wc);
+ if (wc < MUTT_TREE_MAX && arboreal && k == 1)
+ {
+#ifdef USE_CJK_WIDTH
+ if (Charset_is_utf8 && option (OPTCJKWIDTHTREECHARS) && !option (OPTASCIICHARS))
+ w += wcwidth (TreeChars[wc]);
+ else
+#endif
+ w++;
+ }
+ else
+ {
+ if (!IsWPrint (wc))
+ wc = '?';
+ w += wcwidth (wc);
+ }
}
return w;
}
diff --git a/doc/makedoc-defs.h b/doc/makedoc-defs.h
index 2da29f4..10d8b9f 100644
--- a/doc/makedoc-defs.h
+++ b/doc/makedoc-defs.h
@@ -31,10 +31,10 @@
# ifndef USE_SOCKET
# define USE_SOCKET
# endif
-# ifndef USE_DOTLOCK
+# if !defined(USE_DOTLOCK) && !defined(NO_USE_HARDLINK)
# define USE_DOTLOCK
# endif
-# ifndef DL_STANDALONE
+# if !defined(DL_STANDALONE) && !defined(NO_USE_HARDLINK)
# define DL_STANDALONE
# endif
# ifndef USE_HCACHE
diff --git a/dotlock.c b/dotlock.c
index 5bf0348..5d87850 100644
--- a/dotlock.c
+++ b/dotlock.c
@@ -52,13 +52,13 @@
#include <getopt.h>
#endif
-#ifdef DL_STANDALONE
+#if defined(DL_STANDALONE) && !defined(NO_USE_HARDLINK)
# include "reldate.h"
#endif
#define MAXLINKS 1024 /* maximum link depth */
-#ifdef DL_STANDALONE
+#if defined(DL_STANDALONE) && !defined(NO_USE_HARDLINK)
# define LONG_STRING 1024
# define MAXLOCKATTEMPT 5
@@ -96,7 +96,7 @@ extern int snprintf (char *, size_t, const char *, ...);
static int DotlockFlags;
static int Retry = MAXLOCKATTEMPT;
-#ifdef DL_STANDALONE
+#if defined(DL_STANDALONE) && !defined(NO_USE_HARDLINK)
static char *Hostname;
#endif
@@ -110,7 +110,7 @@ static int dotlock_prepare (char *, size_t, const char *, int fd);
static int dotlock_check_stats (struct stat *, struct stat *);
static int dotlock_dispatch (const char *, int fd);
-#ifdef DL_STANDALONE
+#if defined(DL_STANDALONE) && !defined(NO_USE_HARDLINK)
static int dotlock_init_privs (void);
static void usage (const char *);
#endif
@@ -130,7 +130,7 @@ static int dotlock_unlink (const char *);
static int dotlock_lock (const char *);
-#ifdef DL_STANDALONE
+#if defined(DL_STANDALONE) && !defined(NO_USE_HARDLINK)
#define check_flags(a) if (a & DL_FL_ACTIONS) usage (argv[0])
@@ -327,7 +327,7 @@ END_PRIVILEGED (void)
#endif
}
-#ifdef DL_STANDALONE
+#if defined(DL_STANDALONE) && !defined(NO_USE_HARDLINK)
/*
* Usage information.
diff --git a/globals.h b/globals.h
index 9634691..4437372 100644
--- a/globals.h
+++ b/globals.h
@@ -24,7 +24,7 @@ WHERE CONTEXT *Context;
WHERE char Errorbuf[STRING];
WHERE char AttachmentMarker[STRING];
-#if defined(DL_STANDALONE) && defined(USE_DOTLOCK)
+#if defined(DL_STANDALONE) && defined(USE_DOTLOCK) && !defined(NO_USE_HARDLINK)
WHERE char *MuttDotlock;
#endif
@@ -298,9 +298,31 @@ const char * const Months[] = { "Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul",
const char * const BodyTypes[] = { "x-unknown", "audio", "application", "image", "message", "model", "multipart", "text", "video" };
const char * const BodyEncodings[] = { "x-unknown", "7bit", "8bit", "quoted-printable", "base64", "binary", "x-uuencoded" };
+#ifdef USE_CJK_WIDTH
+const wchar_t TreeChars[] =
+{
+ 0xFEFF, /* not used */
+ 0x2514, /* M_TREE_LLCORNER WACS_LLCORNER */
+ 0x250C, /* M_TREE_ULCORNER WACS_ULCORNER */
+ 0x251C, /* M_TREE_LTEE WACS_LTEE */
+ 0x2500, /* M_TREE_HLINE WACS_HLINE */
+ 0x2502, /* M_TREE_VLINE WACS_VLINE */
+ 0x0020, /* M_TREE_SPACE */
+ 0x003E, /* M_TREE_RARROW */
+ 0x002A, /* M_TREE_STAR fake thread indicator */
+ 0x0026, /* M_TREE_HIDDEN */
+ 0x003D, /* M_TREE_EQUALS */
+ 0x252C, /* M_TREE_TTEE WACS_TTEE */
+ 0x2534, /* M_TREE_BTEE WACS_BTEE */
+ 0x003F /* M_TREE_MISSING */
+};
+#endif /* USE_CJK_WIDTH */
#else
extern const char * const Weekdays[];
extern const char * const Months[];
+#ifdef USE_CJK_WIDTH
+extern const wchar_t TreeChars[];
+#endif /* USE_CJK_WIDTH */
#endif
#ifdef MAIN_C
diff --git a/handler.c b/handler.c
index 7ce53f9..ab69527 100644
--- a/handler.c
+++ b/handler.c
@@ -100,6 +100,9 @@ static void mutt_convert_to_state(iconv_t cd, char *bufi, size_t *l, STATE *s)
return;
}
+ if (option (OPTSANITIZEJACHARS) && strchr (bufi, 0x1b))
+ mutt_sanitize_ja_chars (bufi, *l, 1);
+
ib = bufi, ibl = *l;
for (;;)
{
@@ -1312,6 +1315,7 @@ static int autoview_handler (BODY *a, STATE *s)
int piped = FALSE;
pid_t thepid;
int rc = 0;
+ char *charset;
snprintf (type, sizeof (type), "%s/%s", TYPE (a), a->subtype);
rfc1524_mailcap_lookup (a, type, entry, MUTT_AUTOVIEW);
@@ -1342,6 +1346,10 @@ static int autoview_handler (BODY *a, STATE *s)
return -1;
}
+ charset = mutt_get_parameter ("charset", a->parameter);
+ if (charset && option (OPTSANITIZEJACHARS) && !ascii_strncasecmp (charset,"iso-2022-jp", 11))
+ mutt_copy_bytes_sanitize_ja (s->fpin, fpin, a->length);
+ else
mutt_copy_bytes (s->fpin, fpin, a->length);
if(!piped)
diff --git a/hdrline.c b/hdrline.c
index ba118bf..2e6a10b 100644
--- a/hdrline.c
+++ b/hdrline.c
@@ -272,6 +272,7 @@ hdr_format_str (char *dest,
#define THREAD_NEW (threads && hdr->collapsed && hdr->num_hidden > 1 && mutt_thread_contains_unread (ctx, hdr) == 1)
#define THREAD_OLD (threads && hdr->collapsed && hdr->num_hidden > 1 && mutt_thread_contains_unread (ctx, hdr) == 2)
size_t len;
+ char *subj;
hdr = hfi->hdr;
ctx = hfi->ctx;
@@ -590,6 +591,7 @@ hdr_format_str (char *dest,
subj = apply_subject_mods(hdr->env);
else
subj = hdr->env->subject;
+ subj = option (OPTDELETEPREFIX) ? hdr->env->real_subj : hdr->env->subject;
if (flags & MUTT_FORMAT_TREE && !hdr->collapsed)
{
if (flags & MUTT_FORMAT_FORCESUBJ)
diff --git a/init.h b/init.h
index 035752f..8ba243b 100644
--- a/init.h
+++ b/init.h
@@ -401,6 +401,31 @@ struct option_t MuttVars[] = {
** this variable is \fIunset\fP, no check for new mail is performed
** while the mailbox is open.
*/
+#ifdef USE_CJK_WIDTH
+ { "cjk_width", DT_BOOL, R_NONE, OPTCJKWIDTH, 0 },
+ /*
+ ** .pp
+ ** When this option is set, characters in the East Asian Ambiguous (A)
+ ** category as defined in Unicode Technical Report #11 have a column
+ ** width of 2. Othrwise, they have a column width of 1.
+ ** This variant might be useful for users of CJK legacy encodings
+ ** who want to migrate to UCS without changing the traditional terminal
+ ** character-width behaviour.
+ ** .pp
+ ** \fBNote:\fP this option only affects in UTF-8 encoding.
+ */
+ { "cjk_width_tree_chars", DT_BOOL, R_NONE, OPTCJKWIDTHTREECHARS, 0 },
+ /*
+ ** .pp
+ ** If \fIset\fP, Mutt will use the result of $cjk_width as a column
+ ** width of WACS characters when displaying thread and attachment trees.
+ ** This variant might be useful for users of CJK legacy encodings
+ ** who want to migrate to UCS without changing the traditional terminal
+ ** character-width behaviour.
+ ** .pp
+ ** \fBNote:\fP this option only affects in UTF-8 encoding.
+ */
+#endif
{ "collapse_unread", DT_BOOL, R_NONE, OPTCOLLAPSEUNREAD, 1 },
/*
** .pp
@@ -606,6 +631,17 @@ struct option_t MuttVars[] = {
** If \fI``no''\fP, never attempt to verify cryptographic signatures.
** (Crypto only)
*/
+ { "create_rfc2047_parameters", DT_BOOL, R_NONE, OPTCREATERFC2047PARAMS, 0 },
+ /*
+ ** .pp
+ ** When this variable is set, Mutt will add the following RFC-2047-encoded
+ ** MIME parameter to Content-Type header field as filename for attachment:
+ ** name="=?iso-2022-jp?B?GyRCO244MxsoQi50eHQ=?="
+ ** .pp
+ ** Note: this use of RFC 2047's encoding is explicitly prohibited
+ ** by the standard. You may set this variable only if a mailer
+ ** of recipients can not parse RFC 2231 parameters.
+ */
{ "date_format", DT_STR, R_MENU, UL &DateFmt, UL "!%a, %b %d, %Y at %I:%M:%S%p %Z" },
/*
** .pp
@@ -657,6 +693,19 @@ struct option_t MuttVars[] = {
** If this option is \fIset\fP, mutt's received-attachments menu will not show the subparts of
** individual messages in a multipart/digest. To see these subparts, press ``v'' on that menu.
*/
+ { "delete_prefix", DT_BOOL, R_NONE, OPTDELETEPREFIX, 0 },
+ /*
+ ** .pp
+ ** If set, prefix in Subject: field generated by some mailing lists
+ ** (something like "Subject: [foo-ML:0012] real-subject") can be deleted
+ ** when displaying in index-mode and editing in message reply.
+ ** Deletion pattern can be configured by $$delete_regexp variable.
+ */
+ { "delete_regexp", DT_RX, R_NONE, UL &DeleteRegexp, UL "^(\\[[A-Za-z0-9_.: \\-]*\\][ ]*)" },
+ /*
+ ** .pp
+ ** A regular expression used in $$delete_prefix function.
+ */
{ "display_filter", DT_PATH, R_PAGER, UL &DisplayFilter, UL "" },
/*
** .pp
@@ -664,7 +713,7 @@ struct option_t MuttVars[] = {
** is viewed it is passed as standard input to $$display_filter, and the
** filtered message is read from the standard output.
*/
-#if defined(DL_STANDALONE) && defined(USE_DOTLOCK)
+#if defined(DL_STANDALONE) && defined(USE_DOTLOCK) && !defined(NO_USE_HARDLINK)
{ "dotlock_program", DT_PATH, R_NONE, UL &MuttDotlock, UL BINDIR "/mutt_dotlock" },
/*
** .pp
@@ -2695,6 +2744,28 @@ struct option_t MuttVars[] = {
** that mutt \fIgenerates\fP this kind of encoding. Instead, mutt will
** unconditionally use the encoding specified in RFC2231.
*/
+ { "sanitize_ja_chars", DT_BOOL, R_NONE, OPTSANITIZEJACHARS, 0 },
+ /*
+ ** .pp
+ ** When set, Japanese "platform dependent characters" (illegal
+ ** characters for iso-2022-jp charset; mainly used by MS-Windows
+ ** mailers) are substituted to special character, GETA mark ('ESC $$ B " .
+ ** ESC ( B' in iso-2022-jp), and JIS X 0201 kana characters
+ ** (only for "ESC ) I" cases) are also substituted to "?" to
+ ** prevent garbage characters. JIS X 0201 kana characters are
+ ** not substituted if they appear in 8bit form.
+ ** .pp
+ ** This fixes another Japanese encoding issue. In case $$charset
+ ** is set to "EUC-JP", which does not contain JIS X 0201 roman
+ ** character set, the JIS X 0201 roman part of received messages
+ ** encoded in iso-2022-jp can not be converted to EUC-JP.
+ ** On the other hand, the ASCII part can not be converted to
+ ** Shift_JIS, which does not contain ASCII character set. Thus,
+ ** the converted characters are garbled in these cases. When this
+ ** option is set, the JIS X 0201 roman escape sequence and the
+ ** ASCII escape sequence are replaced appropriately to prevent
+ ** the output from being garbled.
+ */
{ "save_address", DT_BOOL, R_NONE, OPTSAVEADDRESS, 0 },
/*
** .pp
diff --git a/lib.c b/lib.c
index 583d2ff..1f61b39 100644
--- a/lib.c
+++ b/lib.c
@@ -445,6 +445,10 @@ int safe_symlink(const char *oldpath, const char *newpath)
int safe_rename (const char *src, const char *target)
{
+#ifdef NO_USE_HARDLINK
+ /* Android (since 6.0) does not support hardlinks. */
+ return rename(src, target);
+#else
struct stat ssb, tsb;
if (!src || !target)
@@ -537,6 +541,7 @@ int safe_rename (const char *src, const char *target)
return 0;
+#endif /* NO_USE_HARDLINK */
}
diff --git a/main.c b/main.c
index 26d7dc7..b46398b 100644
--- a/main.c
+++ b/main.c
@@ -260,25 +260,25 @@ static void show_version (void)
"-USE_SETGID "
#endif
-#ifdef USE_DOTLOCK
+#if defined(USE_DOTLOCK) && !defined(NO_USE_HARDLINK)
"+USE_DOTLOCK "
#else
"-USE_DOTLOCK "
#endif
-#ifdef DL_STANDALONE
+#if defined(DL_STANDALONE) && !defined(NO_USE_HARDLINK)
"+DL_STANDALONE "
#else
"-DL_STANDALONE "
#endif
-#ifdef USE_FCNTL
+#if defined(USE_FCNTL) || defined(NO_USE_HARDLINK)
"+USE_FCNTL "
#else
"-USE_FCNTL "
#endif
-#ifdef USE_FLOCK
+#if defined(USE_FLOCK) || defined(NO_USE_HARDLINK)
"+USE_FLOCK "
#else
"-USE_FLOCK "
@@ -439,6 +439,12 @@ static void show_version (void)
"-LOCALES_HACK "
#endif
+#ifdef USE_CJK_WIDTH
+ "+USE_CJK_WIDTH "
+#else
+ "-USE_CJK_WIDTH "
+#endif
+
#ifdef HAVE_WC_FUNCS
"+HAVE_WC_FUNCS "
#else
diff --git a/mbyte.c b/mbyte.c
index 0eedaa7..8032bd3 100644
--- a/mbyte.c
+++ b/mbyte.c
@@ -17,7 +17,7 @@
*/
/*
- * Japanese support by TAKIZAWA Takashi <taki@luna.email.ne.jp>.
+ * CJK support by TAKIZAWA Takashi <taki@luna.email.ne.jp>.
*/
#if HAVE_CONFIG_H
@@ -37,8 +37,8 @@
#endif
int Charset_is_utf8 = 0;
+static int charset_is_cjk = 0;
#ifndef HAVE_WC_FUNCS
-static int charset_is_ja = 0;
static iconv_t charset_to_utf8 = (iconv_t)(-1);
static iconv_t charset_from_utf8 = (iconv_t)(-1);
#endif
@@ -50,8 +50,8 @@ void mutt_set_charset (char *charset)
mutt_canonical_charset (buffer, sizeof (buffer), charset);
Charset_is_utf8 = 0;
+ charset_is_cjk = 0;
#ifndef HAVE_WC_FUNCS
- charset_is_ja = 0;
if (charset_to_utf8 != (iconv_t)(-1))
{
iconv_close (charset_to_utf8);
@@ -66,12 +66,18 @@ void mutt_set_charset (char *charset)
if (mutt_is_utf8 (buffer))
Charset_is_utf8 = 1;
-#ifndef HAVE_WC_FUNCS
- else if (!ascii_strcasecmp(buffer, "euc-jp") || !ascii_strcasecmp(buffer, "shift_jis")
- || !ascii_strcasecmp(buffer, "cp932") || !ascii_strcasecmp(buffer, "eucJP-ms"))
+ else if (!ascii_strcasecmp (buffer, "gb2312") ||
+ !ascii_strcasecmp (buffer, "gb18030") ||
+ !ascii_strcasecmp (buffer, "big5") ||
+ !ascii_strcasecmp (buffer, "euc-tw") ||
+ !ascii_strcasecmp (buffer, "EUC-JP") ||
+ !ascii_strcasecmp (buffer, "eucJP-ms") ||
+ !ascii_strcasecmp (buffer, "Shift_JIS") ||
+ !ascii_strcasecmp (buffer, "cp932") ||
+ !ascii_strcasecmp (buffer, "euc-kr"))
{
- charset_is_ja = 1;
-
+ charset_is_cjk = 1;
+#ifndef HAVE_WC_FUNCS
/* Note flags=0 to skip charset-hooks: User masters the $charset
* name, and we are sure of our "utf-8" constant. So there is no
* possibility of wrong name that we would want to try to correct
@@ -80,24 +86,68 @@ void mutt_set_charset (char *charset)
*/
charset_to_utf8 = mutt_iconv_open ("utf-8", charset, 0);
charset_from_utf8 = mutt_iconv_open (charset, "utf-8", 0);
- }
#endif
+ }
#if defined(HAVE_BIND_TEXTDOMAIN_CODESET) && defined(ENABLE_NLS)
bind_textdomain_codeset(PACKAGE, buffer);
#endif
}
+#if !defined(HAVE_WC_FUNCS) || defined(USE_CJK_WIDTH)
+/*
+ * For systems that don't have wcwidth() which functions correctly,
+ * we provide our own wcwidth().
+ * Furthermore, this wcwidth() enables change of character-cell width of
+ * the East Asian Ambiguous class by using $cjk_width.
+ * The function which most systems have cannot do it.
+ * Please read the comment of wcwidth.c about the East Asian Ambiguous
+ * class for details.
+ */
+int wcwidth_ucs(wchar_t ucs);
+int wcwidth_cjk(wchar_t ucs);
+
+int wcwidth (wchar_t wc)
+{
+ if (!Charset_is_utf8)
+ {
+ if (!charset_is_cjk)
+ {
+ /* 8-bit case */
+ if (!wc)
+ return 0;
+ else if ((0 <= wc && wc < 256) && IsPrint (wc))
+ return 1;
+ else
+ return -1;
+ }
+ else
+ {
+ /* CJK */
+ return wcwidth_cjk (wc);
+ }
+ }
+ else {
+#ifdef USE_CJK_WIDTH
+ if (option (OPTCJKWIDTH))
+ return wcwidth_cjk (wc);
+#endif /* USE_CJK_WIDTH */
+ return wcwidth_ucs (wc);
+ }
+}
+#endif /* !HAVE_WC_FUNCS || USE_CJK_WIDTH */
+
+
#ifndef HAVE_WC_FUNCS
/*
* For systems that don't have them, we provide here our own
- * implementations of wcrtomb(), mbrtowc(), iswprint() and wcwidth().
+ * implementations of wcrtomb(), mbrtowc() and iswprint().
* Instead of using the locale, as these functions normally would,
* we use Mutt's Charset variable. We support 3 types of charset:
* (1) For 8-bit charsets, wchar_t uses the same encoding as char.
* (2) For UTF-8, wchar_t uses UCS.
- * (3) For stateless Japanese encodings, we use UCS and convert
+ * (3) For stateless CJK encodings, we use UCS and convert
* via UTF-8 using iconv.
* Unfortunately, we can't handle non-stateless encodings.
*/
@@ -256,7 +306,7 @@ size_t mbrtowc (wchar_t *pwc, const char *s, size_t n, mbstate_t *ps)
int iswprint (wint_t wc)
{
- if (Charset_is_utf8 || charset_is_ja)
+ if (Charset_is_utf8 || charset_is_cjk)
return ((0x20 <= wc && wc < 0x7f) || 0xa0 <= wc);
else
return (0 <= wc && wc < 256) ? IsPrint (wc) : 0;
@@ -264,7 +314,7 @@ int iswprint (wint_t wc)
int iswspace (wint_t wc)
{
- if (Charset_is_utf8 || charset_is_ja)
+ if (Charset_is_utf8 || charset_is_cjk)
return (9 <= wc && wc <= 13) || wc == 32;
else
return (0 <= wc && wc < 256) ? isspace (wc) : 0;
@@ -347,7 +397,7 @@ static int iswalpha_ucs (wint_t wc)
wint_t towupper (wint_t wc)
{
- if (Charset_is_utf8 || charset_is_ja)
+ if (Charset_is_utf8 || charset_is_cjk)
return towupper_ucs (wc);
else
return (0 <= wc && wc < 256) ? toupper (wc) : wc;
@@ -355,7 +405,7 @@ wint_t towupper (wint_t wc)
wint_t towlower (wint_t wc)
{
- if (Charset_is_utf8 || charset_is_ja)
+ if (Charset_is_utf8 || charset_is_cjk)
return towlower_ucs (wc);
else
return (0 <= wc && wc < 256) ? tolower (wc) : wc;
@@ -363,7 +413,7 @@ wint_t towlower (wint_t wc)
int iswalnum (wint_t wc)
{
- if (Charset_is_utf8 || charset_is_ja)
+ if (Charset_is_utf8 || charset_is_cjk)
return iswalnum_ucs (wc);
else
return (0 <= wc && wc < 256) ? isalnum (wc) : 0;
@@ -371,7 +421,7 @@ int iswalnum (wint_t wc)
int iswalpha (wint_t wc)
{
- if (Charset_is_utf8 || charset_is_ja)
+ if (Charset_is_utf8 || charset_is_cjk)
return iswalpha_ucs (wc);
else
return (0 <= wc && wc < 256) ? isalpha (wc) : 0;
@@ -379,58 +429,12 @@ int iswalpha (wint_t wc)
int iswupper (wint_t wc)
{
- if (Charset_is_utf8 || charset_is_ja)
+ if (Charset_is_utf8 || charset_is_cjk)
return iswupper_ucs (wc);
else
return (0 <= wc && wc < 256) ? isupper (wc) : 0;
}
-/*
- * l10n for Japanese:
- * Symbols, Greek and Cyrillic in JIS X 0208, Japanese Kanji
- * Character Set, have a column width of 2.
- */
-int wcwidth_ja (wchar_t ucs)
-{
- if (ucs >= 0x3021)
- return -1; /* continue with the normal check */
- /* a rough range for quick check */
- if ((ucs >= 0x00a1 && ucs <= 0x00fe) || /* Latin-1 Supplement */
- (ucs >= 0x0391 && ucs <= 0x0451) || /* Greek and Cyrillic */
- (ucs >= 0x2010 && ucs <= 0x266f) || /* Symbols */
- (ucs >= 0x3000 && ucs <= 0x3020)) /* CJK Symbols and Punctuation */
- return 2;
- else
- return -1;
-}
-
-int wcwidth_ucs(wchar_t ucs);
-
-int wcwidth (wchar_t wc)
-{
- if (!Charset_is_utf8)
- {
- if (!charset_is_ja)
- {
- /* 8-bit case */
- if (!wc)
- return 0;
- else if ((0 <= wc && wc < 256) && IsPrint (wc))
- return 1;
- else
- return -1;
- }
- else
- {
- /* Japanese */
- int k = wcwidth_ja (wc);
- if (k != -1)
- return k;
- }
- }
- return wcwidth_ucs (wc);
-}
-
size_t utf8rtowc (wchar_t *pwc, const char *s, size_t n, mbstate_t *_ps)
{
static wchar_t mbstate;
diff --git a/mbyte.h b/mbyte.h
index 9c58c9e..224cafb 100644
--- a/mbyte.h
+++ b/mbyte.h
@@ -8,6 +8,12 @@
# ifdef HAVE_WCTYPE_H
# include <wctype.h>
# endif
+# ifdef USE_CJK_WIDTH
+#ifdef wcwidth
+# undef wcwidth
+#endif
+int wcwidth (wchar_t wc);
+# endif /* USE_CJK_WIDTH */
# endif
# ifndef HAVE_WC_FUNCS
@@ -32,6 +38,9 @@
#ifdef iswupper
# undef iswupper
#endif
+#ifdef wcwidth
+# undef wcwidth
+#endif
size_t wcrtomb (char *s, wchar_t wc, mbstate_t *ps);
size_t mbrtowc (wchar_t *pwc, const char *s, size_t n, mbstate_t *ps);
int iswprint (wint_t wc);
@@ -44,7 +53,6 @@ wint_t towlower (wint_t wc);
int wcwidth (wchar_t wc);
# endif /* !HAVE_WC_FUNCS */
-
void mutt_set_charset (char *charset);
extern int Charset_is_utf8;
size_t utf8rtowc (wchar_t *pwc, const char *s, size_t n, mbstate_t *_ps);
diff --git a/mutt.h b/mutt.h
index 54a807f..a623f6c 100644
--- a/mutt.h
+++ b/mutt.h
@@ -344,10 +344,16 @@ enum
OPTBRAILLEFRIENDLY,
OPTCHECKMBOXSIZE,
OPTCHECKNEW,
+#ifdef USE_CJK_WIDTH
+ OPTCJKWIDTH,
+ OPTCJKWIDTHTREECHARS,
+#endif /* USE_CJK_WIDTH */
OPTCOLLAPSEUNREAD,
OPTCONFIRMAPPEND,
OPTCONFIRMCREATE,
+ OPTCREATERFC2047PARAMS,
OPTDELETEUNTAG,
+ OPTDELETEPREFIX,
OPTDIGESTCOLLAPSE,
OPTDUPTHREADS,
OPTEDITHDRS,
@@ -447,6 +453,7 @@ enum
OPTREVNAME,
OPTREVREAL,
OPTRFC2047PARAMS,
+ OPTSANITIZEJACHARS,
OPTSAVEADDRESS,
OPTSAVEEMPTY,
OPTSAVENAME,
diff --git a/mutt_dotlock.c b/mutt_dotlock.c
index 5bf0348..5d87850 100644
--- a/mutt_dotlock.c
+++ b/mutt_dotlock.c
@@ -52,13 +52,13 @@
#include <getopt.h>
#endif
-#ifdef DL_STANDALONE
+#if defined(DL_STANDALONE) && !defined(NO_USE_HARDLINK)
# include "reldate.h"
#endif
#define MAXLINKS 1024 /* maximum link depth */
-#ifdef DL_STANDALONE
+#if defined(DL_STANDALONE) && !defined(NO_USE_HARDLINK)
# define LONG_STRING 1024
# define MAXLOCKATTEMPT 5
@@ -96,7 +96,7 @@ extern int snprintf (char *, size_t, const char *, ...);
static int DotlockFlags;
static int Retry = MAXLOCKATTEMPT;
-#ifdef DL_STANDALONE
+#if defined(DL_STANDALONE) && !defined(NO_USE_HARDLINK)
static char *Hostname;
#endif
@@ -110,7 +110,7 @@ static int dotlock_prepare (char *, size_t, const char *, int fd);
static int dotlock_check_stats (struct stat *, struct stat *);
static int dotlock_dispatch (const char *, int fd);
-#ifdef DL_STANDALONE
+#if defined(DL_STANDALONE) && !defined(NO_USE_HARDLINK)
static int dotlock_init_privs (void);
static void usage (const char *);
#endif
@@ -130,7 +130,7 @@ static int dotlock_unlink (const char *);
static int dotlock_lock (const char *);
-#ifdef DL_STANDALONE
+#if defined(DL_STANDALONE) && !defined(NO_USE_HARDLINK)
#define check_flags(a) if (a & DL_FL_ACTIONS) usage (argv[0])
@@ -327,7 +327,7 @@ END_PRIVILEGED (void)
#endif
}
-#ifdef DL_STANDALONE
+#if defined(DL_STANDALONE) && !defined(NO_USE_HARDLINK)
/*
* Usage information.
diff --git a/mutt_regex.h b/mutt_regex.h
index c89a2f4..4c10d73 100644
--- a/mutt_regex.h
+++ b/mutt_regex.h
@@ -51,5 +51,6 @@ WHERE REGEXP QuoteRegexp;
WHERE REGEXP ReplyRegexp;
WHERE REGEXP Smileys;
WHERE REGEXP GecosMask;
+WHERE REGEXP DeleteRegexp;
#endif /* MUTT_REGEX_H */
diff --git a/mx.c b/mx.c
index 1f99120..39831b7 100644
--- a/mx.c
+++ b/mx.c
@@ -47,7 +47,7 @@
#include "buffy.h"
-#ifdef USE_DOTLOCK
+#if defined(USE_DOTLOCK) && !defined(NO_USE_HARDLINK)
#include "dotlock.h"
#endif
@@ -95,13 +95,13 @@ struct mx_ops* mx_get_ops (int magic)
#define mutt_is_spool(s) (mutt_strcmp (Spoolfile, s) == 0)
-#ifdef USE_DOTLOCK
+#if defined(USE_DOTLOCK) && !defined(NO_USE_HARDLINK)
/* parameters:
* path - file to lock
* retry - should retry if unable to lock?
*/
-#ifdef DL_STANDALONE
+#if defined(DL_STANDALONE) && !defined(NO_USE_HARDLINK)
static int invoke_dotlock (const char *path, int dummy, int flags, int retry)
{
@@ -181,14 +181,14 @@ static int undotlock_file (const char *path, int fd)
*/
int mx_lock_file (const char *path, int fd, int excl, int dot, int timeout)
{
-#if defined (USE_FCNTL) || defined (USE_FLOCK)
+#if defined (USE_FCNTL) || defined (USE_FLOCK) || defined(NO_USE_HARDLINK)
int count;
int attempt;
struct stat sb = { 0 }, prev_sb = { 0 }; /* silence gcc warnings */
#endif
int r = 0;
-#ifdef USE_FCNTL
+#if defined(USE_FCNTL) || defined(NO_USE_HARDLINK)
struct flock lck;
memset (&lck, 0, sizeof (struct flock));
@@ -227,7 +227,7 @@ int mx_lock_file (const char *path, int fd, int excl, int dot, int timeout)
}
#endif /* USE_FCNTL */
-#ifdef USE_FLOCK
+#if defined(USE_FLOCK) || defined(NO_USE_HARDLINK)
count = 0;
attempt = 0;
while (flock (fd, (excl ? LOCK_EX : LOCK_SH) | LOCK_NB) == -1)
@@ -261,7 +261,7 @@ int mx_lock_file (const char *path, int fd, int excl, int dot, int timeout)
}
#endif /* USE_FLOCK */
-#ifdef USE_DOTLOCK
+#if defined(USE_DOTLOCK) && !defined(NO_USE_HARDLINK)
if (r == 0 && dot)
r = dotlock_file (path, fd, timeout);
#endif /* USE_DOTLOCK */
@@ -270,12 +270,12 @@ int mx_lock_file (const char *path, int fd, int excl, int dot, int timeout)
{
/* release any other locks obtained in this routine */
-#ifdef USE_FCNTL
+#if defined(USE_FCNTL) || defined(NO_USE_HARDLINK)
lck.l_type = F_UNLCK;
fcntl (fd, F_SETLK, &lck);
#endif /* USE_FCNTL */
-#ifdef USE_FLOCK
+#if defined(USE_FLOCK) || defined(NO_USE_HARDLINK)
flock (fd, LOCK_UN);
#endif /* USE_FLOCK */
}
@@ -285,7 +285,7 @@ int mx_lock_file (const char *path, int fd, int excl, int dot, int timeout)
int mx_unlock_file (const char *path, int fd, int dot)
{
-#ifdef USE_FCNTL
+#if defined(USE_FCNTL) || defined(NO_USE_HARDLINK)
struct flock unlockit = { F_UNLCK, 0, 0, 0, 0 };
memset (&unlockit, 0, sizeof (struct flock));
@@ -294,11 +294,11 @@ int mx_unlock_file (const char *path, int fd, int dot)
fcntl (fd, F_SETLK, &unlockit);
#endif
-#ifdef USE_FLOCK
+#if defined(USE_FLOCK) || defined(NO_USE_HARDLINK)
flock (fd, LOCK_UN);
#endif
-#ifdef USE_DOTLOCK
+#if defined(USE_DOTLOCK) && !defined(NO_USE_HARDLINK)
if (dot)
undotlock_file (path, fd);
#endif
@@ -309,7 +309,7 @@ int mx_unlock_file (const char *path, int fd, int dot)
static void mx_unlink_empty (const char *path)
{
int fd;
-#ifndef USE_DOTLOCK
+#if !defined(USE_DOTLOCK) || defined(NO_USE_HARDLINK)
struct stat sb;
#endif
@@ -322,7 +322,7 @@ static void mx_unlink_empty (const char *path)
return;
}
-#ifdef USE_DOTLOCK
+#if defined(USE_DOTLOCK) && !defined(NO_USE_HARDLINK)
invoke_dotlock (path, fd, DL_FL_UNLINK, 1);
#else
if (fstat (fd, &sb) == 0 && sb.st_size == 0)
diff --git a/parse.c b/parse.c
index 0ae5594..745d2fc 100644
--- a/parse.c
+++ b/parse.c
@@ -1453,6 +1453,18 @@ ENVELOPE *mutt_read_rfc822_header (FILE *f, HEADER *hdr, short user_hdrs,
e->real_subj = e->subject + pmatch[0].rm_eo;
else
e->real_subj = e->subject;
+ if (option (OPTDELETEPREFIX))
+ {
+ /* if this option is set, mutt will delete the string as [prefix],
+ * [prefix:number] and [prefix number] in Subject line.
+ */
+ if (regexec (DeleteRegexp.rx, e->real_subj, 1, pmatch, 0) == 0)
+ {
+ e->real_subj = e->real_subj + pmatch[0].rm_eo;
+ if (regexec (ReplyRegexp.rx, e->real_subj, 1, pmatch, 0) == 0)
+ e->real_subj = e->real_subj + pmatch[0].rm_eo;
+ }
+ }
}
if (hdr->received < 0)
diff --git a/rfc2047.c b/rfc2047.c
index 8506425..e907b25 100644
--- a/rfc2047.c
+++ b/rfc2047.c
@@ -62,6 +62,9 @@ static size_t convert_string (ICONV_CONST char *f, size_t flen,
size_t obl, n;
int e;
+ if (option (OPTSANITIZEJACHARS) && !ascii_strncasecmp (from, "iso-2022-jp",
+11))
+ mutt_sanitize_ja_chars ((char *) f, flen, 0);
cd = mutt_iconv_open (to, from, 0);
if (cd == (iconv_t)(-1))
return (size_t)(-1);
diff --git a/sendlib.c b/sendlib.c
index d09b8ce..cf0df92 100644
--- a/sendlib.c
+++ b/sendlib.c
@@ -348,6 +348,30 @@ int mutt_write_mime_header (BODY *a, FILE *f)
}
}
+ if (a->use_disp && option (OPTCREATERFC2047PARAMS))
+ {
+ if(!(fn = a->d_filename))
+ fn = a->filename;
+
+ if (fn)
+ {
+ char *tmp;
+
+ /* Strip off the leading path... */
+ if ((t = strrchr (fn, '/')))
+ t++;
+ else
+ t = fn;
+
+ buffer[0] = 0;
+ tmp = safe_strdup (t);
+ rfc2047_encode_string (&tmp);
+ rfc822_cat (buffer, sizeof (buffer), tmp, MimeSpecials);
+ FREE (&tmp);
+ fprintf (f, ";\n\tname=%s", buffer);
+ }
+ }
+
fputc ('\n', f);
if (a->description)
diff --git a/wcwidth.c b/wcwidth.c
index 0b94d73..85a1397 100644
--- a/wcwidth.c
+++ b/wcwidth.c
@@ -5,6 +5,51 @@
* http://www.opengroup.org/onlinepubs/007904975/functions/wcwidth.html
* http://www.opengroup.org/onlinepubs/007904975/functions/wcswidth.html
*
+ * In fixed-width output devices, Latin characters all occupy a single
+ * "cell" position of equal width, whereas ideographic CJK characters
+ * occupy two such cells. Interoperability between terminal-line
+ * applications and (teletype-style) character terminals using the
+ * UTF-8 encoding requires agreement on which character should advance
+ * the cursor by how many cell positions. No established formal
+ * standards exist at present on which Unicode character shall occupy
+ * how many cell positions on character terminals. These routines are
+ * a first attempt of defining such behavior based on simple rules
+ * applied to data provided by the Unicode Consortium.
+ *
+ * For some graphical characters, the Unicode standard explicitly
+ * defines a character-cell width via the definition of the East Asian
+ * FullWidth (F), Wide (W), Half-width (H), and Narrow (Na) classes.
+ * In all these cases, there is no ambiguity about which width a
+ * terminal shall use. For characters in the East Asian Ambiguous (A)
+ * class, the width choice depends purely on a preference of backward
+ * compatibility with either historic CJK or Western practice.
+ * Choosing single-width for these characters is easy to justify as
+ * the appropriate long-term solution, as the CJK practice of
+ * displaying these characters as double-width comes from historic
+ * implementation simplicity (8-bit encoded characters were displayed
+ * single-width and 16-bit ones double-width, even for Greek,
+ * Cyrillic, etc.) and not any typographic considerations.
+ *
+ * Much less clear is the choice of width for the Not East Asian
+ * (Neutral) class. Existing practice does not dictate a width for any
+ * of these characters. It would nevertheless make sense
+ * typographically to allocate two character cells to characters such
+ * as for instance EM SPACE or VOLUME INTEGRAL, which cannot be
+ * represented adequately with a single-width glyph. The following
+ * routines at present merely assign a single-cell width to all
+ * neutral characters, in the interest of simplicity. This is not
+ * entirely satisfactory and should be reconsidered before
+ * establishing a formal standard in this area. At the moment, the
+ * decision which Not East Asian (Neutral) characters should be
+ * represented by double-width glyphs cannot yet be answered by
+ * applying a simple rule from the Unicode database content. Setting
+ * up a proper standard for the behavior of UTF-8 character terminals
+ * will require a careful analysis not only of each Unicode character,
+ * but also of each presentation form, something the author of these
+ * routines has avoided to do so far.
+ *
+ * http://www.unicode.org/unicode/reports/tr11/
+ *
* Markus Kuhn -- 2007-05-26 (Unicode 5.0)
*
* Permission to use, copy, modify, and distribute this software
@@ -24,12 +69,34 @@
# include "config.h"
#endif
-#ifndef HAVE_WC_FUNCS
+#if !defined(HAVE_WC_FUNCS) || defined(USE_CJK_WIDTH)
+#include <wchar.h>
-#include "mutt.h"
-#include "mbyte.h"
+struct interval {
+ wchar_t first;
+ wchar_t last;
+};
+
+/* auxiliary function for binary search in interval table */
+static int bisearch(wchar_t ucs, const struct interval *table, int max) {
+ int min = 0;
+ int mid;
+
+ if (ucs < table[0].first || ucs > table[max].last)
+ return 0;
+ while (max >= min) {
+ mid = (min + max) / 2;
+ if (ucs > table[mid].last)
+ min = mid + 1;
+ else if (ucs < table[mid].first)
+ max = mid - 1;
+ else
+ return 1;
+ }
+
+ return 0;
+}
-#include <ctype.h>
/* The following two functions define the column width of an ISO 10646
* character as follows:
@@ -67,62 +134,56 @@ int wcwidth_ucs(wchar_t ucs)
{
/* sorted list of non-overlapping intervals of non-spacing characters */
/* generated by "uniset +cat=Me +cat=Mn +cat=Cf -00AD +1160-11FF +200B c" */
- static const struct interval {
- wchar_t first;
- wchar_t last;
- } combining[] = {
- { 0x0300, 0x036f }, { 0x0483, 0x0486 }, { 0x0488, 0x0489 },
- { 0x0591, 0x05bd }, { 0x05bf, 0x05bf }, { 0x05c1, 0x05c2 },
- { 0x05c4, 0x05c5 }, { 0x05c7, 0x05c7 }, { 0x0600, 0x0603 },
- { 0x0610, 0x0615 }, { 0x064b, 0x065e }, { 0x0670, 0x0670 },
- { 0x06d6, 0x06e4 }, { 0x06e7, 0x06e8 }, { 0x06ea, 0x06ed },
- { 0x070f, 0x070f }, { 0x0711, 0x0711 }, { 0x0730, 0x074a },
- { 0x07a6, 0x07b0 }, { 0x07eb, 0x07f3 }, { 0x0901, 0x0902 },
- { 0x093c, 0x093c }, { 0x0941, 0x0948 }, { 0x094d, 0x094d },
+ static const struct interval combining[] = {
+ { 0x0300, 0x036F }, { 0x0483, 0x0486 }, { 0x0488, 0x0489 },
+ { 0x0591, 0x05BD }, { 0x05BF, 0x05BF }, { 0x05C1, 0x05C2 },
+ { 0x05C4, 0x05C5 }, { 0x05C7, 0x05C7 }, { 0x0600, 0x0603 },
+ { 0x0610, 0x0615 }, { 0x064B, 0x065E }, { 0x0670, 0x0670 },
+ { 0x06D6, 0x06E4 }, { 0x06E7, 0x06E8 }, { 0x06EA, 0x06ED },
+ { 0x070F, 0x070F }, { 0x0711, 0x0711 }, { 0x0730, 0x074A },
+ { 0x07A6, 0x07B0 }, { 0x07EB, 0x07F3 }, { 0x0901, 0x0902 },
+ { 0x093C, 0x093C }, { 0x0941, 0x0948 }, { 0x094D, 0x094D },
{ 0x0951, 0x0954 }, { 0x0962, 0x0963 }, { 0x0981, 0x0981 },
- { 0x09bc, 0x09bc }, { 0x09c1, 0x09c4 }, { 0x09cd, 0x09cd },
- { 0x09e2, 0x09e3 }, { 0x0a01, 0x0a02 }, { 0x0a3c, 0x0a3c },
- { 0x0a41, 0x0a42 }, { 0x0a47, 0x0a48 }, { 0x0a4b, 0x0a4d },
- { 0x0a70, 0x0a71 }, { 0x0a81, 0x0a82 }, { 0x0abc, 0x0abc },
- { 0x0ac1, 0x0ac5 }, { 0x0ac7, 0x0ac8 }, { 0x0acd, 0x0acd },
- { 0x0ae2, 0x0ae3 }, { 0x0b01, 0x0b01 }, { 0x0b3c, 0x0b3c },
- { 0x0b3f, 0x0b3f }, { 0x0b41, 0x0b43 }, { 0x0b4d, 0x0b4d },
- { 0x0b56, 0x0b56 }, { 0x0b82, 0x0b82 }, { 0x0bc0, 0x0bc0 },
- { 0x0bcd, 0x0bcd }, { 0x0c3e, 0x0c40 }, { 0x0c46, 0x0c48 },
- { 0x0c4a, 0x0c4d }, { 0x0c55, 0x0c56 }, { 0x0cbc, 0x0cbc },
- { 0x0cbf, 0x0cbf }, { 0x0cc6, 0x0cc6 }, { 0x0ccc, 0x0ccd },
- { 0x0ce2, 0x0ce3 }, { 0x0d41, 0x0d43 }, { 0x0d4d, 0x0d4d },
- { 0x0dca, 0x0dca }, { 0x0dd2, 0x0dd4 }, { 0x0dd6, 0x0dd6 },
- { 0x0e31, 0x0e31 }, { 0x0e34, 0x0e3a }, { 0x0e47, 0x0e4e },
- { 0x0eb1, 0x0eb1 }, { 0x0eb4, 0x0eb9 }, { 0x0ebb, 0x0ebc },
- { 0x0ec8, 0x0ecd }, { 0x0f18, 0x0f19 }, { 0x0f35, 0x0f35 },
- { 0x0f37, 0x0f37 }, { 0x0f39, 0x0f39 }, { 0x0f71, 0x0f7e },
- { 0x0f80, 0x0f84 }, { 0x0f86, 0x0f87 }, { 0x0f90, 0x0f97 },
- { 0x0f99, 0x0fbc }, { 0x0fc6, 0x0fc6 }, { 0x102d, 0x1030 },
+ { 0x09BC, 0x09BC }, { 0x09C1, 0x09C4 }, { 0x09CD, 0x09CD },
+ { 0x09E2, 0x09E3 }, { 0x0A01, 0x0A02 }, { 0x0A3C, 0x0A3C },
+ { 0x0A41, 0x0A42 }, { 0x0A47, 0x0A48 }, { 0x0A4B, 0x0A4D },
+ { 0x0A70, 0x0A71 }, { 0x0A81, 0x0A82 }, { 0x0ABC, 0x0ABC },
+ { 0x0AC1, 0x0AC5 }, { 0x0AC7, 0x0AC8 }, { 0x0ACD, 0x0ACD },
+ { 0x0AE2, 0x0AE3 }, { 0x0B01, 0x0B01 }, { 0x0B3C, 0x0B3C },
+ { 0x0B3F, 0x0B3F }, { 0x0B41, 0x0B43 }, { 0x0B4D, 0x0B4D },
+ { 0x0B56, 0x0B56 }, { 0x0B82, 0x0B82 }, { 0x0BC0, 0x0BC0 },
+ { 0x0BCD, 0x0BCD }, { 0x0C3E, 0x0C40 }, { 0x0C46, 0x0C48 },
+ { 0x0C4A, 0x0C4D }, { 0x0C55, 0x0C56 }, { 0x0CBC, 0x0CBC },
+ { 0x0CBF, 0x0CBF }, { 0x0CC6, 0x0CC6 }, { 0x0CCC, 0x0CCD },
+ { 0x0CE2, 0x0CE3 }, { 0x0D41, 0x0D43 }, { 0x0D4D, 0x0D4D },
+ { 0x0DCA, 0x0DCA }, { 0x0DD2, 0x0DD4 }, { 0x0DD6, 0x0DD6 },
+ { 0x0E31, 0x0E31 }, { 0x0E34, 0x0E3A }, { 0x0E47, 0x0E4E },
+ { 0x0EB1, 0x0EB1 }, { 0x0EB4, 0x0EB9 }, { 0x0EBB, 0x0EBC },
+ { 0x0EC8, 0x0ECD }, { 0x0F18, 0x0F19 }, { 0x0F35, 0x0F35 },
+ { 0x0F37, 0x0F37 }, { 0x0F39, 0x0F39 }, { 0x0F71, 0x0F7E },
+ { 0x0F80, 0x0F84 }, { 0x0F86, 0x0F87 }, { 0x0F90, 0x0F97 },
+ { 0x0F99, 0x0FBC }, { 0x0FC6, 0x0FC6 }, { 0x102D, 0x1030 },
{ 0x1032, 0x1032 }, { 0x1036, 0x1037 }, { 0x1039, 0x1039 },
- { 0x1058, 0x1059 }, { 0x1160, 0x11ff }, { 0x135f, 0x135f },
+ { 0x1058, 0x1059 }, { 0x1160, 0x11FF }, { 0x135F, 0x135F },
{ 0x1712, 0x1714 }, { 0x1732, 0x1734 }, { 0x1752, 0x1753 },
- { 0x1772, 0x1773 }, { 0x17b4, 0x17b5 }, { 0x17b7, 0x17bd },
- { 0x17c6, 0x17c6 }, { 0x17c9, 0x17d3 }, { 0x17dd, 0x17dd },
- { 0x180b, 0x180d }, { 0x18a9, 0x18a9 }, { 0x1920, 0x1922 },
- { 0x1927, 0x1928 }, { 0x1932, 0x1932 }, { 0x1939, 0x193b },
- { 0x1a17, 0x1a18 }, { 0x1b00, 0x1b03 }, { 0x1b34, 0x1b34 },
- { 0x1b36, 0x1b3a }, { 0x1b3c, 0x1b3c }, { 0x1b42, 0x1b42 },
- { 0x1b6b, 0x1b73 }, { 0x1dc0, 0x1dca }, { 0x1dfe, 0x1dff },
- { 0x200b, 0x200f }, { 0x202a, 0x202e }, { 0x2060, 0x2063 },
- { 0x206a, 0x206f }, { 0x20d0, 0x20ef }, { 0x302a, 0x302f },
- { 0x3099, 0x309a }, { 0xa806, 0xa806 }, { 0xa80b, 0xa80b },
- { 0xa825, 0xa826 }, { 0xfb1e, 0xfb1e }, { 0xfe00, 0xfe0f },
- { 0xfe20, 0xfe23 }, { 0xfeff, 0xfeff }, { 0xfff9, 0xfffb },
- { 0x10a01, 0x10a03 }, { 0x10a05, 0x10a06 }, { 0x10a0c, 0x10a0f },
- { 0x10a38, 0x10a3a }, { 0x10a3f, 0x10a3f }, { 0x1d167, 0x1d169 },
- { 0x1d173, 0x1d182 }, { 0x1d185, 0x1d18b }, { 0x1d1aa, 0x1d1ad },
- { 0x1d242, 0x1d244 }, { 0xe0001, 0xe0001 }, { 0xe0020, 0xe007f },
- { 0xe0100, 0xe01ef }
+ { 0x1772, 0x1773 }, { 0x17B4, 0x17B5 }, { 0x17B7, 0x17BD },
+ { 0x17C6, 0x17C6 }, { 0x17C9, 0x17D3 }, { 0x17DD, 0x17DD },
+ { 0x180B, 0x180D }, { 0x18A9, 0x18A9 }, { 0x1920, 0x1922 },
+ { 0x1927, 0x1928 }, { 0x1932, 0x1932 }, { 0x1939, 0x193B },
+ { 0x1A17, 0x1A18 }, { 0x1B00, 0x1B03 }, { 0x1B34, 0x1B34 },
+ { 0x1B36, 0x1B3A }, { 0x1B3C, 0x1B3C }, { 0x1B42, 0x1B42 },
+ { 0x1B6B, 0x1B73 }, { 0x1DC0, 0x1DCA }, { 0x1DFE, 0x1DFF },
+ { 0x200B, 0x200F }, { 0x202A, 0x202E }, { 0x2060, 0x2063 },
+ { 0x206A, 0x206F }, { 0x20D0, 0x20EF }, { 0x302A, 0x302F },
+ { 0x3099, 0x309A }, { 0xA806, 0xA806 }, { 0xA80B, 0xA80B },
+ { 0xA825, 0xA826 }, { 0xFB1E, 0xFB1E }, { 0xFE00, 0xFE0F },
+ { 0xFE20, 0xFE23 }, { 0xFEFF, 0xFEFF }, { 0xFFF9, 0xFFFB },
+ { 0x10A01, 0x10A03 }, { 0x10A05, 0x10A06 }, { 0x10A0C, 0x10A0F },
+ { 0x10A38, 0x10A3A }, { 0x10A3F, 0x10A3F }, { 0x1D167, 0x1D169 },
+ { 0x1D173, 0x1D182 }, { 0x1D185, 0x1D18B }, { 0x1D1AA, 0x1D1AD },
+ { 0x1D242, 0x1D244 }, { 0xE0001, 0xE0001 }, { 0xE0020, 0xE007F },
+ { 0xE0100, 0xE01EF }
};
- int min = 0;
- int max = sizeof(combining) / sizeof(struct interval) - 1;
- int mid;
/* test for 8-bit control characters */
if (ucs == 0)
@@ -130,20 +191,10 @@ int wcwidth_ucs(wchar_t ucs)
if (ucs < 32 || (ucs >= 0x7f && ucs < 0xa0))
return -1;
- /* first quick check for Latin-1 etc. characters */
- if (ucs < combining[0].first)
- return 1;
-
/* binary search in table of non-spacing characters */
- while (max >= min) {
- mid = (min + max) / 2;
- if (combining[mid].last < ucs)
- min = mid + 1;
- else if (combining[mid].first > ucs)
- max = mid - 1;
- else if (combining[mid].first <= ucs && combining[mid].last >= ucs)
- return 0;
- }
+ if (bisearch(ucs, combining,
+ sizeof(combining) / sizeof(struct interval) - 1))
+ return 0;
/* if we arrive here, ucs is not a combining or C0/C1 control character */
@@ -151,7 +202,7 @@ int wcwidth_ucs(wchar_t ucs)
if (ucs < 0x1100)
return 1;
- return 1 +
+ return 1 +
(ucs >= 0x1100 &&
(ucs <= 0x115f || /* Hangul Jamo init. consonants */
ucs == 0x2329 || ucs == 0x232a ||
@@ -167,15 +218,120 @@ int wcwidth_ucs(wchar_t ucs)
(ucs >= 0x30000 && ucs <= 0x3fffd)));
}
-#endif /* !HAVE_WC_FUNCS */
+#if 0 /* original */
+int wcswidth_ucs(const wchar_t *pwcs, size_t n)
+{
+ int w, width = 0;
+
+ for (;*pwcs && n-- > 0; pwcs++)
+ if ((w = wcwidth_ucs(*pwcs)) < 0)
+ return -1;
+ else
+ width += w;
+
+ return width;
+}
+#endif
+
+/*
+ * The following functions are the same as wcwidth_ucs() and
+ * wcswidth_ucs(), except that spacing characters in the East Asian
+ * Ambiguous (A) category as defined in Unicode Technical Report #11
+ * have a column width of 2. This variant might be useful for users of
+ * CJK legacy encodings who want to migrate to UCS without changing
+ * the traditional terminal character-width behaviour. It is not
+ * otherwise recommended for general use.
+ */
+/*
+ * In addition to the explanation mentioned above,
+ * several characters in the East Asian Narrow (Na) and Not East Asian
+ * (Neutral) category as defined in Unicode Technical Report #11
+ * actually have a column width of 2 in CJK legacy encodings.
+ */
+int wcwidth_cjk(wchar_t ucs)
+{
+ /* sorted list of non-overlapping intervals of East Asian Ambiguous
+ * characters, generated by "uniset +WIDTH-A -cat=Me -cat=Mn -cat=Cf c" */
+ static const struct interval ambiguous[] = {
+ { 0x00A1, 0x00A1 }, { 0x00A4, 0x00A4 }, { 0x00A7, 0x00A8 },
+ { 0x00AA, 0x00AA }, { 0x00AE, 0x00AE }, { 0x00B0, 0x00B4 },
+ { 0x00B6, 0x00BA }, { 0x00BC, 0x00BF }, { 0x00C6, 0x00C6 },
+ { 0x00D0, 0x00D0 }, { 0x00D7, 0x00D8 }, { 0x00DE, 0x00E1 },
+ { 0x00E6, 0x00E6 }, { 0x00E8, 0x00EA }, { 0x00EC, 0x00ED },
+ { 0x00F0, 0x00F0 }, { 0x00F2, 0x00F3 }, { 0x00F7, 0x00FA },
+ { 0x00FC, 0x00FC }, { 0x00FE, 0x00FE }, { 0x0101, 0x0101 },
+ { 0x0111, 0x0111 }, { 0x0113, 0x0113 }, { 0x011B, 0x011B },
+ { 0x0126, 0x0127 }, { 0x012B, 0x012B }, { 0x0131, 0x0133 },
+ { 0x0138, 0x0138 }, { 0x013F, 0x0142 }, { 0x0144, 0x0144 },
+ { 0x0148, 0x014B }, { 0x014D, 0x014D }, { 0x0152, 0x0153 },
+ { 0x0166, 0x0167 }, { 0x016B, 0x016B }, { 0x01CE, 0x01CE },
+ { 0x01D0, 0x01D0 }, { 0x01D2, 0x01D2 }, { 0x01D4, 0x01D4 },
+ { 0x01D6, 0x01D6 }, { 0x01D8, 0x01D8 }, { 0x01DA, 0x01DA },
+ { 0x01DC, 0x01DC }, { 0x0251, 0x0251 }, { 0x0261, 0x0261 },
+ { 0x02C4, 0x02C4 }, { 0x02C7, 0x02C7 }, { 0x02C9, 0x02CB },
+ { 0x02CD, 0x02CD }, { 0x02D0, 0x02D0 }, { 0x02D8, 0x02DB },
+ { 0x02DD, 0x02DD }, { 0x02DF, 0x02DF }, { 0x0391, 0x03A1 },
+ { 0x03A3, 0x03A9 }, { 0x03B1, 0x03C1 }, { 0x03C3, 0x03C9 },
+ { 0x0401, 0x0401 }, { 0x0410, 0x044F }, { 0x0451, 0x0451 },
+ { 0x2010, 0x2010 }, { 0x2013, 0x2016 }, { 0x2018, 0x2019 },
+ { 0x201C, 0x201D }, { 0x2020, 0x2022 }, { 0x2024, 0x2027 },
+ { 0x2030, 0x2030 }, { 0x2032, 0x2033 }, { 0x2035, 0x2035 },
+ { 0x203B, 0x203B }, { 0x203E, 0x203E }, { 0x2074, 0x2074 },
+ { 0x207F, 0x207F }, { 0x2081, 0x2084 }, { 0x20AC, 0x20AC },
+ { 0x2103, 0x2103 }, { 0x2105, 0x2105 }, { 0x2109, 0x2109 },
+ { 0x2113, 0x2113 }, { 0x2116, 0x2116 }, { 0x2121, 0x2122 },
+ { 0x2126, 0x2126 }, { 0x212B, 0x212B }, { 0x2153, 0x2154 },
+ { 0x215B, 0x215E }, { 0x2160, 0x216B }, { 0x2170, 0x2179 },
+ { 0x2190, 0x2199 }, { 0x21B8, 0x21B9 }, { 0x21D2, 0x21D2 },
+ { 0x21D4, 0x21D4 }, { 0x21E7, 0x21E7 }, { 0x2200, 0x2200 },
+ { 0x2202, 0x2203 }, { 0x2207, 0x2208 }, { 0x220B, 0x220B },
+ { 0x220F, 0x220F }, { 0x2211, 0x2211 }, { 0x2215, 0x2215 },
+ { 0x221A, 0x221A }, { 0x221D, 0x2220 }, { 0x2223, 0x2223 },
+ { 0x2225, 0x2225 }, { 0x2227, 0x222C }, { 0x222E, 0x222E },
+ { 0x2234, 0x2237 }, { 0x223C, 0x223D }, { 0x2248, 0x2248 },
+ { 0x224C, 0x224C }, { 0x2252, 0x2252 }, { 0x2260, 0x2261 },
+ { 0x2264, 0x2267 }, { 0x226A, 0x226B }, { 0x226E, 0x226F },
+ { 0x2282, 0x2283 }, { 0x2286, 0x2287 }, { 0x2295, 0x2295 },
+ { 0x2299, 0x2299 }, { 0x22A5, 0x22A5 }, { 0x22BF, 0x22BF },
+ { 0x2312, 0x2312 }, { 0x2460, 0x24E9 }, { 0x24EB, 0x254B },
+ { 0x2550, 0x2573 }, { 0x2580, 0x258F }, { 0x2592, 0x2595 },
+ { 0x25A0, 0x25A1 }, { 0x25A3, 0x25A9 }, { 0x25B2, 0x25B3 },
+ { 0x25B6, 0x25B7 }, { 0x25BC, 0x25BD }, { 0x25C0, 0x25C1 },
+ { 0x25C6, 0x25C8 }, { 0x25CB, 0x25CB }, { 0x25CE, 0x25D1 },
+ { 0x25E2, 0x25E5 }, { 0x25EF, 0x25EF }, { 0x2605, 0x2606 },
+ { 0x2609, 0x2609 }, { 0x260E, 0x260F }, { 0x2614, 0x2615 },
+ { 0x261C, 0x261C }, { 0x261E, 0x261E }, { 0x2640, 0x2640 },
+ { 0x2642, 0x2642 }, { 0x2660, 0x2661 }, { 0x2663, 0x2665 },
+ { 0x2667, 0x266A }, { 0x266C, 0x266D }, { 0x266F, 0x266F },
+ { 0x273D, 0x273D }, { 0x2776, 0x277F }, { 0xE000, 0xF8FF },
+ { 0xFFFD, 0xFFFD }, { 0xF0000, 0xFFFFD }, { 0x100000, 0x10FFFD }
+ };
+
+ /* For Japanese legacy encodings, the following characters are added. */
+ static const struct interval legacy_ja[] = {
+ { 0x00A2, 0x00A3 }, { 0x00A5, 0x00A6 }, { 0x00AC, 0x00AC },
+ { 0x00AF, 0x00AF }, { 0x2212, 0x2212 }
+ };
+
+ /* binary search in table of non-spacing characters */
+ if (bisearch(ucs, ambiguous,
+ sizeof(ambiguous) / sizeof(struct interval) - 1))
+ return 2;
+ if (bisearch(ucs, legacy_ja,
+ sizeof(legacy_ja) / sizeof(struct interval) - 1))
+ return 2;
+
+ return wcwidth_ucs(ucs);
+}
+
#if 0 /* original */
-int wcswidth(const wchar_t *pwcs, size_t n)
+int wcswidth_cjk(const wchar_t *pwcs, size_t n)
{
int w, width = 0;
for (;*pwcs && n-- > 0; pwcs++)
- if ((w = wcwidth(*pwcs)) < 0)
+ if ((w = wcwidth_cjk(*pwcs)) < 0)
return -1;
else
width += w;
@@ -183,3 +339,4 @@ int wcswidth(const wchar_t *pwcs, size_t n)
return width;
}
#endif
+#endif /* !HAVE_WC_FUNCS || USE_CJK_WIDTH */
diff --git a/PATCHES b/PATCHES
index e69de29b..17743fd9 100644
--- a/PATCHES
+++ b/PATCHES
@@ -0,0 +1,5 @@
+patch-1.5.23.tt+yy.delete_prefix.1
+patch-1.5.23.tt.create_rfc2047_params.1
+patch-1.5.23.tt.sanitize_ja.1
+patch-1.5.23.tt.cjk_width_tree_chars.1
+patch-1.5.23.tt.wcwidth.1
diff --git a/charset.c b/charset.c
index 759c2052..cb508973 100644
--- a/charset.c
+++ b/charset.c
@@ -482,6 +482,9 @@ int mutt_convert_string (char **ps, const char *from, const char *to, int flags)
if (!s || !*s)
return 0;
+ if (option (OPTSANITIZEJACHARS) && !ascii_strncasecmp (from, "iso-2022-jp", 11))
+ mutt_sanitize_ja_chars (s, mutt_strlen(s), 0);
+
if (to && from && (cd = mutt_iconv_open (to, from, flags)) != (iconv_t)-1)
{
int len;
@@ -678,3 +681,188 @@ int mutt_check_charset (const char *s, int strict)
return -1;
}
+
+/*
+ * mutt_sanitize_ja_chars()
+ * Adapted by TAKIZAWA Takashi <taki@cyber.email.ne.jp>
+ *
+ * - It replaces undefined KANJI characters to GETA mark.
+ * - It replaces character of 'JIS X 0201 kana' to '?'.
+ * - If $charset is EUC-JP, it replaces third character 'J' of
+ * escape sequence switching to 'JIS X 0201 latin' to 'B' indicating
+ * 'US-ASCII'.
+ * - If $charset is Shift_JIS, it replaces third character 'B' of
+ * escape sequence switching to 'US-ASCII' to 'J' indicating
+ * 'JIS X 0201 latin'.
+ */
+
+#define ASCII 0
+#define JISX0201LATIN 1
+#define JISX0201KANA 2
+#define JISX0208 3
+#define OTHER_CS 4
+
+void mutt_sanitize_ja_chars(char *s, size_t len, int keep_state)
+{
+ static int cs = ASCII;
+ static int kanji_cont = 0;
+ static int illegal_kanji = 0;
+ static int es = 0;
+ static char pes = '\0';
+ static char ascii_3rd_char = 'B';
+ static char jisx0201_3rd_char = 'J';
+
+ char *p = s;
+ char *p1 = NULL;
+ unsigned char c;
+
+ if (!keep_state || *p == 0x1b) /* consideration about mbstate's buffer */
+ {
+ if (!ascii_strcasecmp (Charset, "euc-jp"))
+ jisx0201_3rd_char = 'B';
+ else if (!ascii_strcasecmp (Charset, "shift_jis"))
+ ascii_3rd_char = 'J';
+ cs = ASCII;
+ kanji_cont = 0;
+ illegal_kanji = 0;
+ es = 0;
+ pes = '\0';
+ }
+
+ for (;p - s < len;p++)
+ {
+ if (es == 0)
+ {
+ if (*p == 0x1b)
+ es++;
+ else
+ {
+ switch (cs)
+ {
+ case ASCII:
+ case JISX0201LATIN:
+ break;
+ case JISX0201KANA:
+ *p = '?';
+ break;
+ case JISX0208:
+ /* replace ku-ten code from 9 to 15 and 85 or more to "GETA MARK" */
+ c = (unsigned char)*p;
+ if (! kanji_cont)
+ {
+ if ((size_t)(p - s + 1) == len)
+ return; /* the last character is a primary byte of KANJI */
+ if (c <= 0x20 || (c >= 0x29 && c <= 0x2f)
+ || (c >= 0x75 && c <= 0xa0))
+ illegal_kanji = 1;
+ kanji_cont = 1;
+ p1 = p;
+ }
+ else
+ {
+ if (c <= 0x20 || c >= 0x7f)
+ illegal_kanji = 1;
+ if (illegal_kanji && p1)
+ *p1 = 0x22, *p = 0x2e;
+ kanji_cont = 0;
+ illegal_kanji = 0;
+ }
+ break;
+ }
+ }
+ }
+ else if (es == 1)
+ {
+ if (*p == '$' || (*p >= '(' && *p <= '/' && *p != ','))
+ {
+ es++;
+ pes = *p;
+ }
+ else
+ {
+ es = 0;
+ return; /* broken */
+ }
+ }
+ else if (es == 2)
+ {
+ if (pes == '(')
+ {
+ switch (*p)
+ {
+ case 'B':
+ cs = ASCII, *p = ascii_3rd_char;
+ break;
+ case 'J':
+ cs = JISX0201LATIN, *p = jisx0201_3rd_char;
+ break;
+ case 'I':
+ /* ready to replace character to '?' */
+ cs = JISX0201KANA, *p = ascii_3rd_char;
+ break;
+ default:
+ cs = OTHER_CS;
+ }
+ es = 0;
+ }
+ else if (pes == '$')
+ {
+ switch (*p)
+ {
+ case '@': /* JIS X 0208-1978 */
+ case 'B': /* JIS X 0208-1983 */
+ cs = JISX0208;
+ es = 0;
+ break;
+ case 'A':
+ cs = OTHER_CS; /* GB 2312 */
+ es = 0;
+ break;
+ case '(':
+ case ')':
+ case '*':
+ case '+':
+ case '-':
+ case '.':
+ case '/':
+ es++;
+ break;
+ default:
+ es = 0;
+ return; /* broken */
+ }
+ }
+ else
+ {
+ cs = OTHER_CS;
+ es = 0;
+ }
+ }
+ else /* es == 3 */
+ {
+ cs = OTHER_CS;
+ es = 0;
+ }
+ }
+}
+
+int mutt_copy_bytes_sanitize_ja (FILE *in, FILE *out, size_t size)
+{
+ char buf[2048];
+ size_t chunk;
+
+ mutt_sanitize_ja_chars (NULL, 0, 0);
+ while (size > 0)
+ {
+ chunk = (size > sizeof (buf)) ? sizeof (buf) : size;
+ if ((chunk = fread (buf, 1, chunk, in)) < 1)
+ break;
+ mutt_sanitize_ja_chars (buf, chunk, 1);
+ if (fwrite (buf, 1, chunk, out) != chunk)
+ return (-1);
+ size -= chunk;
+ }
+
+ return 0;
+}
+
diff --git a/charset.h b/charset.h
index 54891f0e..d67b209c 100644
--- a/charset.h
+++ b/charset.h
@@ -36,6 +36,9 @@ int iconv_close (iconv_t);
int mutt_convert_string (char **, const char *, const char *, int);
+void mutt_sanitize_ja_chars (char *, size_t, int);
+int mutt_copy_bytes_sanitize_ja (FILE *, FILE *, size_t);
+
iconv_t mutt_iconv_open (const char *, const char *, int);
size_t mutt_iconv (iconv_t, ICONV_CONST char **, size_t *, char **, size_t *, ICONV_CONST char **, const char *);
diff --git a/configure.ac b/configure.ac
index b07fade0..01adf31f 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1454,6 +1454,16 @@ fi
dnl -- locales --
+AC_ARG_ENABLE(cjk-ambiguous-width, AC_HELP_STRING([--enable-cjk-ambiguous-width], [ Enable East Asian Ambiguous characters support (using own wcwidth)]),
+ [ if test "x$enableval" = "xyes" ; then
+ cjk_width=yes
+ fi
+ ])
+if test "x$cjk_width" = "xyes" ; then
+ AC_DEFINE(USE_CJK_WIDTH,1,[ Define if you want to support East Asian Ambiguous class. ])
+ MUTT_LIB_OBJECTS="$MUTT_LIB_OBJECTS wcwidth.o"
+fi
+
AC_CHECK_HEADERS(wchar.h)
AC_CACHE_CHECK([for wchar_t], mutt_cv_wchar_t,
@@ -1524,7 +1534,10 @@ fi
if test $wc_funcs = yes; then
AC_DEFINE(HAVE_WC_FUNCS,1,[ Define if you are using the system's wchar_t functions. ])
else
- MUTT_LIB_OBJECTS="$MUTT_LIB_OBJECTS utf8.o wcwidth.o"
+ MUTT_LIB_OBJECTS="$MUTT_LIB_OBJECTS utf8.o"
+ if test "x$cjk_width" != "xyes"; then
+ MUTT_LIB_OBJECTS="$MUTT_LIB_OBJECTS wcwidth.o"
+ fi
fi
AC_CACHE_CHECK([for nl_langinfo and CODESET], mutt_cv_langinfo_codeset,
diff --git a/curs_lib.c b/curs_lib.c
index cb43f9e7..7831a1a8 100644
--- a/curs_lib.c
+++ b/curs_lib.c
@@ -1223,7 +1223,14 @@ void mutt_format_string (char *dest, size_t destlen,
wc = replacement_char ();
}
if (arboreal && wc < MUTT_TREE_MAX)
- w = 1; /* hack */
+ {
+#ifdef USE_CJK_WIDTH
+ if (Charset_is_utf8 && option (OPTCJKWIDTHTREECHARS) && !option (OPTASCIICHARS))
+ w = wcwidth (TreeChars[wc]);
+ else
+#endif
+ w = 1;
+ }
else
{
#ifdef HAVE_ISWBLANK
@@ -1452,10 +1459,12 @@ int mutt_strwidth (const char *s)
int w;
size_t k, n;
mbstate_t mbstate;
+ int arboreal;
if (!s) return 0;
n = mutt_strlen (s);
+ arboreal = (s[0] < MUTT_TREE_MAX) ? 1 : 0;
memset (&mbstate, 0, sizeof (mbstate));
for (w=0; n && (k = mbrtowc (&wc, s, n, &mbstate)); s += k, n -= k)
@@ -1467,9 +1476,21 @@ int mutt_strwidth (const char *s)
k = (k == (size_t)(-1)) ? 1 : n;
wc = replacement_char ();
}
- if (!IsWPrint (wc))
- wc = '?';
- w += wcwidth (wc);
+ if (wc < MUTT_TREE_MAX && arboreal && k == 1)
+ {
+#ifdef USE_CJK_WIDTH
+ if (Charset_is_utf8 && option (OPTCJKWIDTHTREECHARS) && !option (OPTASCIICHARS))
+ w += wcwidth (TreeChars[wc]);
+ else
+#endif
+ w++;
+ }
+ else
+ {
+ if (!IsWPrint (wc))
+ wc = '?';
+ w += wcwidth (wc);
+ }
}
return w;
}
diff --git a/doc/makedoc-defs.h b/doc/makedoc-defs.h
index 78a4ebc0..dd872baa 100644
--- a/doc/makedoc-defs.h
+++ b/doc/makedoc-defs.h
@@ -31,10 +31,10 @@
# ifndef USE_SOCKET
# define USE_SOCKET
# endif
-# ifndef USE_DOTLOCK
+# if !defined(USE_DOTLOCK) && !defined(NO_USE_HARDLINK)
# define USE_DOTLOCK
# endif
-# ifndef DL_STANDALONE
+# if !defined(DL_STANDALONE) && !defined(NO_USE_HARDLINK)
# define DL_STANDALONE
# endif
# ifndef USE_HCACHE
diff --git a/dotlock.c b/dotlock.c
index 5e3086f9..dda9fc36 100644
--- a/dotlock.c
+++ b/dotlock.c
@@ -52,13 +52,13 @@
#include <getopt.h>
#endif
-#ifdef DL_STANDALONE
+#if defined(DL_STANDALONE) && !defined(NO_USE_HARDLINK)
# include "reldate.h"
#endif
#define MAXLINKS 1024 /* maximum link depth */
-#ifdef DL_STANDALONE
+#if defined(DL_STANDALONE) && !defined(NO_USE_HARDLINK)
# define LONG_STRING 1024
# define MAXLOCKATTEMPT 5
@@ -96,7 +96,7 @@ extern int snprintf (char *, size_t, const char *, ...);
static int DotlockFlags;
static int Retry = MAXLOCKATTEMPT;
-#ifdef DL_STANDALONE
+#if defined(DL_STANDALONE) && !defined(NO_USE_HARDLINK)
static char *Hostname;
#endif
@@ -110,7 +110,7 @@ static int dotlock_prepare (char *, size_t, const char *, int fd);
static int dotlock_check_stats (struct stat *, struct stat *);
static int dotlock_dispatch (const char *, int fd);
-#ifdef DL_STANDALONE
+#if defined(DL_STANDALONE) && !defined(NO_USE_HARDLINK)
static int dotlock_init_privs (void);
static void usage (const char *);
#endif
@@ -130,7 +130,7 @@ static int dotlock_unlink (const char *);
static int dotlock_lock (const char *);
-#ifdef DL_STANDALONE
+#if defined(DL_STANDALONE) && !defined(NO_USE_HARDLINK)
#define check_flags(a) if (a & DL_FL_ACTIONS) usage (argv[0])
@@ -327,7 +327,7 @@ END_PRIVILEGED (void)
#endif
}
-#ifdef DL_STANDALONE
+#if defined(DL_STANDALONE) && !defined(NO_USE_HARDLINK)
/*
* Usage information.
diff --git a/globals.h b/globals.h
index facb2ea8..e5cece8a 100644
--- a/globals.h
+++ b/globals.h
@@ -25,7 +25,7 @@ WHERE char Errorbuf[STRING];
WHERE char AttachmentMarker[STRING];
WHERE char ProtectedHeaderMarker[STRING];
-#if defined(DL_STANDALONE) && defined(USE_DOTLOCK)
+#if defined(DL_STANDALONE) && defined(USE_DOTLOCK) && !defined(NO_USE_HARDLINK)
WHERE char *MuttDotlock;
#endif
@@ -306,9 +306,31 @@ const char * const Months[] = { "Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul",
const char * const BodyTypes[] = { "x-unknown", "audio", "application", "image", "message", "model", "multipart", "text", "video" };
const char * const BodyEncodings[] = { "x-unknown", "7bit", "8bit", "quoted-printable", "base64", "binary", "x-uuencoded" };
+#ifdef USE_CJK_WIDTH
+const wchar_t TreeChars[] =
+{
+ 0xFEFF, /* not used */
+ 0x2514, /* M_TREE_LLCORNER WACS_LLCORNER */
+ 0x250C, /* M_TREE_ULCORNER WACS_ULCORNER */
+ 0x251C, /* M_TREE_LTEE WACS_LTEE */
+ 0x2500, /* M_TREE_HLINE WACS_HLINE */
+ 0x2502, /* M_TREE_VLINE WACS_VLINE */
+ 0x0020, /* M_TREE_SPACE */
+ 0x003E, /* M_TREE_RARROW */
+ 0x002A, /* M_TREE_STAR fake thread indicator */
+ 0x0026, /* M_TREE_HIDDEN */
+ 0x003D, /* M_TREE_EQUALS */
+ 0x252C, /* M_TREE_TTEE WACS_TTEE */
+ 0x2534, /* M_TREE_BTEE WACS_BTEE */
+ 0x003F /* M_TREE_MISSING */
+};
+#endif /* USE_CJK_WIDTH */
#else
extern const char * const Weekdays[];
extern const char * const Months[];
+#ifdef USE_CJK_WIDTH
+extern const wchar_t TreeChars[];
+#endif /* USE_CJK_WIDTH */
#endif
#ifdef MAIN_C
diff --git a/handler.c b/handler.c
index 2c7016ce..5602c0e6 100644
--- a/handler.c
+++ b/handler.c
@@ -100,6 +100,9 @@ static void mutt_convert_to_state(iconv_t cd, char *bufi, size_t *l, STATE *s)
return;
}
+ if (option (OPTSANITIZEJACHARS) && strchr (bufi, 0x1b))
+ mutt_sanitize_ja_chars (bufi, *l, 1);
+
ib = bufi, ibl = *l;
for (;;)
{
@@ -1315,6 +1318,7 @@ static int autoview_handler (BODY *a, STATE *s)
int piped = FALSE;
pid_t thepid;
int rc = 0;
+ char *charset;
snprintf (type, sizeof (type), "%s/%s", TYPE (a), a->subtype);
rfc1524_mailcap_lookup (a, type, entry, MUTT_AUTOVIEW);
@@ -1345,7 +1349,11 @@ static int autoview_handler (BODY *a, STATE *s)
return -1;
}
- mutt_copy_bytes (s->fpin, fpin, a->length);
+ charset = mutt_get_parameter ("charset", a->parameter);
+ if (charset && option (OPTSANITIZEJACHARS) && !ascii_strncasecmp (charset,"iso-2022-jp", 11))
+ mutt_copy_bytes_sanitize_ja (s->fpin, fpin, a->length);
+ else
+ mutt_copy_bytes (s->fpin, fpin, a->length);
if (!piped)
{
diff --git a/hdrline.c b/hdrline.c
index c83714f9..461564b4 100644
--- a/hdrline.c
+++ b/hdrline.c
@@ -272,6 +272,7 @@ hdr_format_str (char *dest,
#define THREAD_NEW (threads && hdr->collapsed && hdr->num_hidden > 1 && mutt_thread_contains_unread (ctx, hdr) == 1)
#define THREAD_OLD (threads && hdr->collapsed && hdr->num_hidden > 1 && mutt_thread_contains_unread (ctx, hdr) == 2)
size_t len;
+ char *subj;
hdr = hfi->hdr;
ctx = hfi->ctx;
@@ -594,17 +595,28 @@ hdr_format_str (char *dest,
subj = hdr->env->subject;
if (flags & MUTT_FORMAT_TREE && !hdr->collapsed)
{
- if (flags & MUTT_FORMAT_FORCESUBJ)
- {
- mutt_format_s (dest, destlen, "", NONULL (subj));
- snprintf (buf2, sizeof (buf2), "%s%s", hdr->tree, dest);
- mutt_format_s_tree (dest, destlen, prefix, buf2);
- }
- else
- mutt_format_s_tree (dest, destlen, prefix, hdr->tree);
- }
- else
- mutt_format_s (dest, destlen, prefix, NONULL (subj));
+ char *subj;
+ if (hdr->env->disp_subj)
+ subj = hdr->env->disp_subj;
+ else if (SubjectRxList)
+ subj = apply_subject_mods(hdr->env);
+ else
+ subj = hdr->env->subject;
+ subj = option (OPTDELETEPREFIX) ? hdr->env->real_subj : hdr->env->subject;
+ if (flags & MUTT_FORMAT_TREE && !hdr->collapsed)
+ {
+ if (flags & MUTT_FORMAT_FORCESUBJ)
+ {
+ mutt_format_s (dest, destlen, "", NONULL (subj));
+ snprintf (buf2, sizeof (buf2), "%s%s", hdr->tree, dest);
+ mutt_format_s_tree (dest, destlen, prefix, buf2);
+ }
+ else
+ mutt_format_s_tree (dest, destlen, prefix, hdr->tree);
+ }
+ else
+ mutt_format_s (dest, destlen, prefix, NONULL (subj));
+ }
}
break;
diff --git a/init.h b/init.h
index 280b167f..c7614f1c 100644
--- a/init.h
+++ b/init.h
@@ -451,6 +451,31 @@ struct option_t MuttVars[] = {
** this variable is \fIunset\fP, no check for new mail is performed
** while the mailbox is open.
*/
+#ifdef USE_CJK_WIDTH
+ { "cjk_width", DT_BOOL, R_NONE, OPTCJKWIDTH, 0 },
+ /*
+ ** .pp
+ ** When this option is set, characters in the East Asian Ambiguous (A)
+ ** category as defined in Unicode Technical Report #11 have a column
+ ** width of 2. Othrwise, they have a column width of 1.
+ ** This variant might be useful for users of CJK legacy encodings
+ ** who want to migrate to UCS without changing the traditional terminal
+ ** character-width behaviour.
+ ** .pp
+ ** \fBNote:\fP this option only affects in UTF-8 encoding.
+ */
+ { "cjk_width_tree_chars", DT_BOOL, R_NONE, OPTCJKWIDTHTREECHARS, 0 },
+ /*
+ ** .pp
+ ** If \fIset\fP, Mutt will use the result of $cjk_width as a column
+ ** width of WACS characters when displaying thread and attachment trees.
+ ** This variant might be useful for users of CJK legacy encodings
+ ** who want to migrate to UCS without changing the traditional terminal
+ ** character-width behaviour.
+ ** .pp
+ ** \fBNote:\fP this option only affects in UTF-8 encoding.
+ */
+#endif
{ "collapse_unread", DT_BOOL, R_NONE, OPTCOLLAPSEUNREAD, 1 },
/*
** .pp
@@ -723,6 +748,17 @@ struct option_t MuttVars[] = {
** If \fI``no''\fP, never attempt to verify cryptographic signatures.
** (Crypto only)
*/
+ { "create_rfc2047_parameters", DT_BOOL, R_NONE, OPTCREATERFC2047PARAMS, 0 },
+ /*
+ ** .pp
+ ** When this variable is set, Mutt will add the following RFC-2047-encoded
+ ** MIME parameter to Content-Type header field as filename for attachment:
+ ** name="=?iso-2022-jp?B?GyRCO244MxsoQi50eHQ=?="
+ ** .pp
+ ** Note: this use of RFC 2047's encoding is explicitly prohibited
+ ** by the standard. You may set this variable only if a mailer
+ ** of recipients can not parse RFC 2231 parameters.
+ */
{ "date_format", DT_STR, R_MENU, UL &DateFmt, UL "!%a, %b %d, %Y at %I:%M:%S%p %Z" },
/*
** .pp
@@ -774,6 +810,19 @@ struct option_t MuttVars[] = {
** If this option is \fIset\fP, mutt's received-attachments menu will not show the subparts of
** individual messages in a multipart/digest. To see these subparts, press ``v'' on that menu.
*/
+ { "delete_prefix", DT_BOOL, R_NONE, OPTDELETEPREFIX, 0 },
+ /*
+ ** .pp
+ ** If set, prefix in Subject: field generated by some mailing lists
+ ** (something like "Subject: [foo-ML:0012] real-subject") can be deleted
+ ** when displaying in index-mode and editing in message reply.
+ ** Deletion pattern can be configured by $$delete_regexp variable.
+ */
+ { "delete_regexp", DT_RX, R_NONE, UL &DeleteRegexp, UL "^(\\[[A-Za-z0-9_.: \\-]*\\][ ]*)" },
+ /*
+ ** .pp
+ ** A regular expression used in $$delete_prefix function.
+ */
{ "display_filter", DT_PATH, R_PAGER, UL &DisplayFilter, UL "" },
/*
** .pp
@@ -781,7 +830,7 @@ struct option_t MuttVars[] = {
** is viewed it is passed as standard input to $$display_filter, and the
** filtered message is read from the standard output.
*/
-#if defined(DL_STANDALONE) && defined(USE_DOTLOCK)
+#if defined(DL_STANDALONE) && defined(USE_DOTLOCK) && !defined(NO_USE_HARDLINK)
{ "dotlock_program", DT_PATH, R_NONE, UL &MuttDotlock, UL BINDIR "/mutt_dotlock" },
/*
** .pp
@@ -2927,6 +2976,28 @@ struct option_t MuttVars[] = {
** that mutt \fIgenerates\fP this kind of encoding. Instead, mutt will
** unconditionally use the encoding specified in RFC2231.
*/
+ { "sanitize_ja_chars", DT_BOOL, R_NONE, OPTSANITIZEJACHARS, 0 },
+ /*
+ ** .pp
+ ** When set, Japanese "platform dependent characters" (illegal
+ ** characters for iso-2022-jp charset; mainly used by MS-Windows
+ ** mailers) are substituted to special character, GETA mark ('ESC $$ B " .
+ ** ESC ( B' in iso-2022-jp), and JIS X 0201 kana characters
+ ** (only for "ESC ) I" cases) are also substituted to "?" to
+ ** prevent garbage characters. JIS X 0201 kana characters are
+ ** not substituted if they appear in 8bit form.
+ ** .pp
+ ** This fixes another Japanese encoding issue. In case $$charset
+ ** is set to "EUC-JP", which does not contain JIS X 0201 roman
+ ** character set, the JIS X 0201 roman part of received messages
+ ** encoded in iso-2022-jp can not be converted to EUC-JP.
+ ** On the other hand, the ASCII part can not be converted to
+ ** Shift_JIS, which does not contain ASCII character set. Thus,
+ ** the converted characters are garbled in these cases. When this
+ ** option is set, the JIS X 0201 roman escape sequence and the
+ ** ASCII escape sequence are replaced appropriately to prevent
+ ** the output from being garbled.
+ */
{ "save_address", DT_BOOL, R_NONE, OPTSAVEADDRESS, 0 },
/*
** .pp
diff --git a/lib.c b/lib.c
index 1d294ed2..f4f9208a 100644
--- a/lib.c
+++ b/lib.c
@@ -445,6 +445,10 @@ int safe_symlink(const char *oldpath, const char *newpath)
int safe_rename (const char *src, const char *target)
{
+#ifdef NO_USE_HARDLINK
+ /* Android (since 6.0) does not support hardlinks. */
+ return rename(src, target);
+#else
struct stat ssb, tsb;
int link_errno;
@@ -569,6 +573,7 @@ success:
return 0;
+#endif /* NO_USE_HARDLINK */
}
diff --git a/main.c b/main.c
index 30defd6e..13d3e7be 100644
--- a/main.c
+++ b/main.c
@@ -271,26 +271,26 @@ static void show_version (void)
"-USE_SETGID "
#endif
-#ifdef USE_DOTLOCK
- "+USE_DOTLOCK "
+#if defined(USE_DOTLOCK) && !defined(NO_USE_HARDLINK)
+ "+USE_DOTLOCK "
#else
"-USE_DOTLOCK "
#endif
-#ifdef DL_STANDALONE
- "+DL_STANDALONE "
+#if defined(DL_STANDALONE) && !defined(NO_USE_HARDLINK)
+ "+DL_STANDALONE "
#else
"-DL_STANDALONE "
#endif
-#ifdef USE_FCNTL
- "+USE_FCNTL "
+#if defined(USE_FCNTL) || defined(NO_USE_HARDLINK)
+ "+USE_FCNTL "
#else
"-USE_FCNTL "
#endif
-#ifdef USE_FLOCK
- "+USE_FLOCK "
+#if defined(USE_FLOCK) || defined(NO_USE_HARDLINK)
+ "+USE_FLOCK "
#else
"-USE_FLOCK "
#endif
@@ -455,6 +455,12 @@ static void show_version (void)
#else
"-LOCALES_HACK "
#endif
+
+#ifdef USE_CJK_WIDTH
+ "+USE_CJK_WIDTH "
+#else
+ "-USE_CJK_WIDTH "
+#endif
#ifdef HAVE_WC_FUNCS
"+HAVE_WC_FUNCS "
diff --git a/mbyte.c b/mbyte.c
index 5aa7fc40..e8f12a53 100644
--- a/mbyte.c
+++ b/mbyte.c
@@ -17,7 +17,7 @@
*/
/*
- * Japanese support by TAKIZAWA Takashi <taki@luna.email.ne.jp>.
+ * CJK support by TAKIZAWA Takashi <taki@luna.email.ne.jp>.
*/
#if HAVE_CONFIG_H
@@ -37,8 +37,8 @@
#endif
int Charset_is_utf8 = 0;
+static int charset_is_cjk = 0;
#ifndef HAVE_WC_FUNCS
-static int charset_is_ja = 0;
static iconv_t charset_to_utf8 = (iconv_t)(-1);
static iconv_t charset_from_utf8 = (iconv_t)(-1);
#endif
@@ -50,8 +50,8 @@ void mutt_set_charset (char *charset)
mutt_canonical_charset (buffer, sizeof (buffer), charset);
Charset_is_utf8 = 0;
+ charset_is_cjk = 0;
#ifndef HAVE_WC_FUNCS
- charset_is_ja = 0;
if (charset_to_utf8 != (iconv_t)(-1))
{
iconv_close (charset_to_utf8);
@@ -67,11 +67,17 @@ void mutt_set_charset (char *charset)
if (mutt_is_utf8 (buffer))
Charset_is_utf8 = 1;
#ifndef HAVE_WC_FUNCS
- else if (!ascii_strcasecmp(buffer, "euc-jp") || !ascii_strcasecmp(buffer, "shift_jis")
- || !ascii_strcasecmp(buffer, "cp932") || !ascii_strcasecmp(buffer, "eucJP-ms"))
+ else if (!ascii_strcasecmp (buffer, "gb2312") ||
+ !ascii_strcasecmp (buffer, "gb18030") ||
+ !ascii_strcasecmp (buffer, "big5") ||
+ !ascii_strcasecmp (buffer, "euc-tw") ||
+ !ascii_strcasecmp (buffer, "EUC-JP") ||
+ !ascii_strcasecmp (buffer, "eucJP-ms") ||
+ !ascii_strcasecmp (buffer, "Shift_JIS") ||
+ !ascii_strcasecmp (buffer, "cp932") ||
+ !ascii_strcasecmp (buffer, "euc-kr"))
{
- charset_is_ja = 1;
-
+ charset_is_cjk = 1;
/* Note flags=0 to skip charset-hooks: User masters the $charset
* name, and we are sure of our "utf-8" constant. So there is no
* possibility of wrong name that we would want to try to correct
@@ -80,24 +86,68 @@ void mutt_set_charset (char *charset)
*/
charset_to_utf8 = mutt_iconv_open ("utf-8", charset, 0);
charset_from_utf8 = mutt_iconv_open (charset, "utf-8", 0);
- }
#endif
+ }
#if defined(HAVE_BIND_TEXTDOMAIN_CODESET) && defined(ENABLE_NLS)
bind_textdomain_codeset(PACKAGE, buffer);
#endif
}
+#if !defined(HAVE_WC_FUNCS) || defined(USE_CJK_WIDTH)
+/*
+ * For systems that don't have wcwidth() which functions correctly,
+ * we provide our own wcwidth().
+ * Furthermore, this wcwidth() enables change of character-cell width of
+ * the East Asian Ambiguous class by using $cjk_width.
+ * The function which most systems have cannot do it.
+ * Please read the comment of wcwidth.c about the East Asian Ambiguous
+ * class for details.
+ */
+int wcwidth_ucs(wchar_t ucs);
+int wcwidth_cjk(wchar_t ucs);
+
+int wcwidth (wchar_t wc)
+{
+ if (!Charset_is_utf8)
+ {
+ if (!charset_is_cjk)
+ {
+ /* 8-bit case */
+ if (!wc)
+ return 0;
+ else if ((0 <= wc && wc < 256) && IsPrint (wc))
+ return 1;
+ else
+ return -1;
+ }
+ else
+ {
+ /* CJK */
+ return wcwidth_cjk (wc);
+ }
+ }
+ else {
+#ifdef USE_CJK_WIDTH
+ if (option (OPTCJKWIDTH))
+ return wcwidth_cjk (wc);
+#endif /* USE_CJK_WIDTH */
+ return wcwidth_ucs (wc);
+ }
+}
+#endif /* !HAVE_WC_FUNCS || USE_CJK_WIDTH */
+
+
#ifndef HAVE_WC_FUNCS
/*
* For systems that don't have them, we provide here our own
- * implementations of wcrtomb(), mbrtowc(), iswprint() and wcwidth().
+ * implementations of wcrtomb(), mbrtowc() and iswprint().
* Instead of using the locale, as these functions normally would,
* we use Mutt's Charset variable. We support 3 types of charset:
* (1) For 8-bit charsets, wchar_t uses the same encoding as char.
* (2) For UTF-8, wchar_t uses UCS.
- * (3) For stateless Japanese encodings, we use UCS and convert
+ * (3) For stateless CJK encodings, we use UCS and convert
* via UTF-8 using iconv.
* Unfortunately, we can't handle non-stateless encodings.
*/
@@ -256,7 +306,7 @@ size_t mbrtowc (wchar_t *pwc, const char *s, size_t n, mbstate_t *ps)
int iswprint (wint_t wc)
{
- if (Charset_is_utf8 || charset_is_ja)
+ if (Charset_is_utf8 || charset_is_cjk)
return ((0x20 <= wc && wc < 0x7f) || 0xa0 <= wc);
else
return (0 <= wc && wc < 256) ? IsPrint (wc) : 0;
@@ -264,7 +314,7 @@ int iswprint (wint_t wc)
int iswspace (wint_t wc)
{
- if (Charset_is_utf8 || charset_is_ja)
+ if (Charset_is_utf8 || charset_is_cjk)
return (9 <= wc && wc <= 13) || wc == 32;
else
return (0 <= wc && wc < 256) ? isspace (wc) : 0;
@@ -347,7 +397,7 @@ static int iswalpha_ucs (wint_t wc)
wint_t towupper (wint_t wc)
{
- if (Charset_is_utf8 || charset_is_ja)
+ if (Charset_is_utf8 || charset_is_cjk)
return towupper_ucs (wc);
else
return (0 <= wc && wc < 256) ? toupper (wc) : wc;
@@ -355,7 +405,7 @@ wint_t towupper (wint_t wc)
wint_t towlower (wint_t wc)
{
- if (Charset_is_utf8 || charset_is_ja)
+ if (Charset_is_utf8 || charset_is_cjk)
return towlower_ucs (wc);
else
return (0 <= wc && wc < 256) ? tolower (wc) : wc;
@@ -363,7 +413,7 @@ wint_t towlower (wint_t wc)
int iswalnum (wint_t wc)
{
- if (Charset_is_utf8 || charset_is_ja)
+ if (Charset_is_utf8 || charset_is_cjk)
return iswalnum_ucs (wc);
else
return (0 <= wc && wc < 256) ? isalnum (wc) : 0;
@@ -371,7 +421,7 @@ int iswalnum (wint_t wc)
int iswalpha (wint_t wc)
{
- if (Charset_is_utf8 || charset_is_ja)
+ if (Charset_is_utf8 || charset_is_cjk)
return iswalpha_ucs (wc);
else
return (0 <= wc && wc < 256) ? isalpha (wc) : 0;
@@ -379,58 +429,12 @@ int iswalpha (wint_t wc)
int iswupper (wint_t wc)
{
- if (Charset_is_utf8 || charset_is_ja)
+ if (Charset_is_utf8 || charset_is_cjk)
return iswupper_ucs (wc);
else
return (0 <= wc && wc < 256) ? isupper (wc) : 0;
}
-/*
- * l10n for Japanese:
- * Symbols, Greek and Cyrillic in JIS X 0208, Japanese Kanji
- * Character Set, have a column width of 2.
- */
-int wcwidth_ja (wchar_t ucs)
-{
- if (ucs >= 0x3021)
- return -1; /* continue with the normal check */
- /* a rough range for quick check */
- if ((ucs >= 0x00a1 && ucs <= 0x00fe) || /* Latin-1 Supplement */
- (ucs >= 0x0391 && ucs <= 0x0451) || /* Greek and Cyrillic */
- (ucs >= 0x2010 && ucs <= 0x266f) || /* Symbols */
- (ucs >= 0x3000 && ucs <= 0x3020)) /* CJK Symbols and Punctuation */
- return 2;
- else
- return -1;
-}
-
-int wcwidth_ucs(wchar_t ucs);
-
-int wcwidth (wchar_t wc)
-{
- if (!Charset_is_utf8)
- {
- if (!charset_is_ja)
- {
- /* 8-bit case */
- if (!wc)
- return 0;
- else if ((0 <= wc && wc < 256) && IsPrint (wc))
- return 1;
- else
- return -1;
- }
- else
- {
- /* Japanese */
- int k = wcwidth_ja (wc);
- if (k != -1)
- return k;
- }
- }
- return wcwidth_ucs (wc);
-}
-
size_t utf8rtowc (wchar_t *pwc, const char *s, size_t n, mbstate_t *_ps)
{
static wchar_t mbstate;
diff --git a/mbyte.h b/mbyte.h
index 9c58c9ec..224cafb5 100644
--- a/mbyte.h
+++ b/mbyte.h
@@ -8,6 +8,12 @@
# ifdef HAVE_WCTYPE_H
# include <wctype.h>
# endif
+# ifdef USE_CJK_WIDTH
+#ifdef wcwidth
+# undef wcwidth
+#endif
+int wcwidth (wchar_t wc);
+# endif /* USE_CJK_WIDTH */
# endif
# ifndef HAVE_WC_FUNCS
@@ -32,6 +38,9 @@
#ifdef iswupper
# undef iswupper
#endif
+#ifdef wcwidth
+# undef wcwidth
+#endif
size_t wcrtomb (char *s, wchar_t wc, mbstate_t *ps);
size_t mbrtowc (wchar_t *pwc, const char *s, size_t n, mbstate_t *ps);
int iswprint (wint_t wc);
@@ -44,7 +53,6 @@ wint_t towlower (wint_t wc);
int wcwidth (wchar_t wc);
# endif /* !HAVE_WC_FUNCS */
-
void mutt_set_charset (char *charset);
extern int Charset_is_utf8;
size_t utf8rtowc (wchar_t *pwc, const char *s, size_t n, mbstate_t *_ps);
diff --git a/mutt.h b/mutt.h
index e507ea5e..48b44ba1 100644
--- a/mutt.h
+++ b/mutt.h
@@ -382,10 +382,16 @@ enum
OPTBROWSERABBRMAILBOXES,
OPTCHECKMBOXSIZE,
OPTCHECKNEW,
+#ifdef USE_CJK_WIDTH
+ OPTCJKWIDTH,
+ OPTCJKWIDTHTREECHARS,
+#endif /* USE_CJK_WIDTH */
OPTCOLLAPSEUNREAD,
OPTCONFIRMAPPEND,
OPTCONFIRMCREATE,
+ OPTCREATERFC2047PARAMS,
OPTDELETEUNTAG,
+ OPTDELETEPREFIX,
OPTDIGESTCOLLAPSE,
OPTDUPTHREADS,
OPTEDITHDRS,
@@ -488,6 +494,7 @@ enum
OPTREVNAME,
OPTREVREAL,
OPTRFC2047PARAMS,
+ OPTSANITIZEJACHARS,
OPTSAVEADDRESS,
OPTSAVEEMPTY,
OPTSAVENAME,
diff --git a/mutt_regex.h b/mutt_regex.h
index b145af90..33eeff67 100644
--- a/mutt_regex.h
+++ b/mutt_regex.h
@@ -52,5 +52,6 @@ WHERE REGEXP QuoteRegexp;
WHERE REGEXP ReplyRegexp;
WHERE REGEXP Smileys;
WHERE REGEXP GecosMask;
+WHERE REGEXP DeleteRegexp;
#endif /* MUTT_REGEX_H */
diff --git a/mx.c b/mx.c
index 7bba6370..7a8cda77 100644
--- a/mx.c
+++ b/mx.c
@@ -47,7 +47,7 @@
#include "buffy.h"
-#ifdef USE_DOTLOCK
+#if defined(USE_DOTLOCK) && !defined(NO_USE_HARDLINK)
#include "dotlock.h"
#endif
@@ -95,13 +95,13 @@ struct mx_ops* mx_get_ops (int magic)
#define mutt_is_spool(s) (mutt_strcmp (Spoolfile, s) == 0)
-#ifdef USE_DOTLOCK
-/* parameters:
+#if defined(USE_DOTLOCK) && !defined(NO_USE_HARDLINK)
+/* parameters:
* path - file to lock
* retry - should retry if unable to lock?
*/
-#ifdef DL_STANDALONE
+#if defined(DL_STANDALONE) && !defined(NO_USE_HARDLINK)
static int invoke_dotlock (const char *path, int dummy, int flags, int retry)
{
@@ -181,14 +181,14 @@ static int undotlock_file (const char *path, int fd)
*/
int mx_lock_file (const char *path, int fd, int excl, int dot, int timeout)
{
-#if defined (USE_FCNTL) || defined (USE_FLOCK)
+#if defined (USE_FCNTL) || defined (USE_FLOCK) || defined(NO_USE_HARDLINK)
int count;
int attempt;
struct stat sb = { 0 }, prev_sb = { 0 }; /* silence gcc warnings */
#endif
int r = 0;
-#ifdef USE_FCNTL
+#if defined(USE_FCNTL) || defined(NO_USE_HARDLINK)
struct flock lck;
memset (&lck, 0, sizeof (struct flock));
@@ -227,7 +227,7 @@ int mx_lock_file (const char *path, int fd, int excl, int dot, int timeout)
}
#endif /* USE_FCNTL */
-#ifdef USE_FLOCK
+#if defined(USE_FLOCK) || defined(NO_USE_HARDLINK)
count = 0;
attempt = 0;
while (flock (fd, (excl ? LOCK_EX : LOCK_SH) | LOCK_NB) == -1)
@@ -261,7 +261,7 @@ int mx_lock_file (const char *path, int fd, int excl, int dot, int timeout)
}
#endif /* USE_FLOCK */
-#ifdef USE_DOTLOCK
+#if defined(USE_DOTLOCK) && !defined(NO_USE_HARDLINK)
if (r == 0 && dot)
r = dotlock_file (path, fd, timeout);
#endif /* USE_DOTLOCK */
@@ -270,12 +270,12 @@ int mx_lock_file (const char *path, int fd, int excl, int dot, int timeout)
{
/* release any other locks obtained in this routine */
-#ifdef USE_FCNTL
+#if defined(USE_FCNTL) || defined(NO_USE_HARDLINK)
lck.l_type = F_UNLCK;
fcntl (fd, F_SETLK, &lck);
#endif /* USE_FCNTL */
-#ifdef USE_FLOCK
+#if defined(USE_FLOCK) || defined(NO_USE_HARDLINK)
flock (fd, LOCK_UN);
#endif /* USE_FLOCK */
}
@@ -285,7 +285,7 @@ int mx_lock_file (const char *path, int fd, int excl, int dot, int timeout)
int mx_unlock_file (const char *path, int fd, int dot)
{
-#ifdef USE_FCNTL
+#if defined(USE_FCNTL) || defined(NO_USE_HARDLINK)
struct flock unlockit = { F_UNLCK, 0, 0, 0, 0 };
memset (&unlockit, 0, sizeof (struct flock));
@@ -294,11 +294,11 @@ int mx_unlock_file (const char *path, int fd, int dot)
fcntl (fd, F_SETLK, &unlockit);
#endif
-#ifdef USE_FLOCK
+#if defined(USE_FLOCK) || defined(NO_USE_HARDLINK)
flock (fd, LOCK_UN);
#endif
-#ifdef USE_DOTLOCK
+#if defined(USE_DOTLOCK) && !defined(NO_USE_HARDLINK)
if (dot)
undotlock_file (path, fd);
#endif
@@ -309,7 +309,7 @@ int mx_unlock_file (const char *path, int fd, int dot)
static void mx_unlink_empty (const char *path)
{
int fd;
-#ifndef USE_DOTLOCK
+#if !defined(USE_DOTLOCK) || defined(NO_USE_HARDLINK)
struct stat sb;
#endif
@@ -322,7 +322,7 @@ static void mx_unlink_empty (const char *path)
return;
}
-#ifdef USE_DOTLOCK
+#if defined(USE_DOTLOCK) && !defined(NO_USE_HARDLINK)
invoke_dotlock (path, fd, DL_FL_UNLINK, 1);
#else
if (fstat (fd, &sb) == 0 && sb.st_size == 0)
diff --git a/parse.c b/parse.c
index c243062c..8eb35194 100644
--- a/parse.c
+++ b/parse.c
@@ -1495,6 +1495,18 @@ ENVELOPE *mutt_read_rfc822_header (FILE *f, HEADER *hdr, short user_hdrs,
e->real_subj = e->subject + pmatch[0].rm_eo;
else
e->real_subj = e->subject;
+ if (option (OPTDELETEPREFIX))
+ {
+ /* if this option is set, mutt will delete the string as [prefix],
+ * [prefix:number] and [prefix number] in Subject line.
+ */
+ if (regexec (DeleteRegexp.rx, e->real_subj, 1, pmatch, 0) == 0)
+ {
+ e->real_subj = e->real_subj + pmatch[0].rm_eo;
+ if (regexec (ReplyRegexp.rx, e->real_subj, 1, pmatch, 0) == 0)
+ e->real_subj = e->real_subj + pmatch[0].rm_eo;
+ }
+ }
}
if (hdr->received < 0)
diff --git a/rfc2047.c b/rfc2047.c
index f2f333d3..d64dccbb 100644
--- a/rfc2047.c
+++ b/rfc2047.c
@@ -62,6 +62,9 @@ static size_t convert_string (ICONV_CONST char *f, size_t flen,
size_t obl, n;
int e;
+ if (option (OPTSANITIZEJACHARS) && !ascii_strncasecmp (from, "iso-2022-jp",
+11))
+ mutt_sanitize_ja_chars ((char *) f, flen, 0);
cd = mutt_iconv_open (to, from, 0);
if (cd == (iconv_t)(-1))
return (size_t)(-1);
diff --git a/sendlib.c b/sendlib.c
index 8a772823..601add72 100644
--- a/sendlib.c
+++ b/sendlib.c
@@ -348,6 +348,30 @@ int mutt_write_mime_header (BODY *a, FILE *f)
}
}
+ if (a->use_disp && option (OPTCREATERFC2047PARAMS))
+ {
+ if(!(fn = a->d_filename))
+ fn = a->filename;
+
+ if (fn)
+ {
+ char *tmp;
+
+ /* Strip off the leading path... */
+ if ((t = strrchr (fn, '/')))
+ t++;
+ else
+ t = fn;
+
+ buffer[0] = 0;
+ tmp = safe_strdup (t);
+ rfc2047_encode_string (&tmp);
+ rfc822_cat (buffer, sizeof (buffer), tmp, MimeSpecials);
+ FREE (&tmp);
+ fprintf (f, ";\n\tname=%s", buffer);
+ }
+ }
+
fputc ('\n', f);
if (a->description)
diff --git a/wcwidth.c b/wcwidth.c
index 75e1b9a8..85a13970 100644
--- a/wcwidth.c
+++ b/wcwidth.c
@@ -5,6 +5,51 @@
* http://www.opengroup.org/onlinepubs/007904975/functions/wcwidth.html
* http://www.opengroup.org/onlinepubs/007904975/functions/wcswidth.html
*
+ * In fixed-width output devices, Latin characters all occupy a single
+ * "cell" position of equal width, whereas ideographic CJK characters
+ * occupy two such cells. Interoperability between terminal-line
+ * applications and (teletype-style) character terminals using the
+ * UTF-8 encoding requires agreement on which character should advance
+ * the cursor by how many cell positions. No established formal
+ * standards exist at present on which Unicode character shall occupy
+ * how many cell positions on character terminals. These routines are
+ * a first attempt of defining such behavior based on simple rules
+ * applied to data provided by the Unicode Consortium.
+ *
+ * For some graphical characters, the Unicode standard explicitly
+ * defines a character-cell width via the definition of the East Asian
+ * FullWidth (F), Wide (W), Half-width (H), and Narrow (Na) classes.
+ * In all these cases, there is no ambiguity about which width a
+ * terminal shall use. For characters in the East Asian Ambiguous (A)
+ * class, the width choice depends purely on a preference of backward
+ * compatibility with either historic CJK or Western practice.
+ * Choosing single-width for these characters is easy to justify as
+ * the appropriate long-term solution, as the CJK practice of
+ * displaying these characters as double-width comes from historic
+ * implementation simplicity (8-bit encoded characters were displayed
+ * single-width and 16-bit ones double-width, even for Greek,
+ * Cyrillic, etc.) and not any typographic considerations.
+ *
+ * Much less clear is the choice of width for the Not East Asian
+ * (Neutral) class. Existing practice does not dictate a width for any
+ * of these characters. It would nevertheless make sense
+ * typographically to allocate two character cells to characters such
+ * as for instance EM SPACE or VOLUME INTEGRAL, which cannot be
+ * represented adequately with a single-width glyph. The following
+ * routines at present merely assign a single-cell width to all
+ * neutral characters, in the interest of simplicity. This is not
+ * entirely satisfactory and should be reconsidered before
+ * establishing a formal standard in this area. At the moment, the
+ * decision which Not East Asian (Neutral) characters should be
+ * represented by double-width glyphs cannot yet be answered by
+ * applying a simple rule from the Unicode database content. Setting
+ * up a proper standard for the behavior of UTF-8 character terminals
+ * will require a careful analysis not only of each Unicode character,
+ * but also of each presentation form, something the author of these
+ * routines has avoided to do so far.
+ *
+ * http://www.unicode.org/unicode/reports/tr11/
+ *
* Markus Kuhn -- 2007-05-26 (Unicode 5.0)
*
* Permission to use, copy, modify, and distribute this software
@@ -24,12 +69,34 @@
# include "config.h"
#endif
-#ifndef HAVE_WC_FUNCS
+#if !defined(HAVE_WC_FUNCS) || defined(USE_CJK_WIDTH)
+#include <wchar.h>
+
+struct interval {
+ wchar_t first;
+ wchar_t last;
+};
+
+/* auxiliary function for binary search in interval table */
+static int bisearch(wchar_t ucs, const struct interval *table, int max) {
+ int min = 0;
+ int mid;
+
+ if (ucs < table[0].first || ucs > table[max].last)
+ return 0;
+ while (max >= min) {
+ mid = (min + max) / 2;
+ if (ucs > table[mid].last)
+ min = mid + 1;
+ else if (ucs < table[mid].first)
+ max = mid - 1;
+ else
+ return 1;
+ }
-#include "mutt.h"
-#include "mbyte.h"
+ return 0;
+}
-#include <ctype.h>
/* The following two functions define the column width of an ISO 10646
* character as follows:
@@ -67,62 +134,56 @@ int wcwidth_ucs(wchar_t ucs)
{
/* sorted list of non-overlapping intervals of non-spacing characters */
/* generated by "uniset +cat=Me +cat=Mn +cat=Cf -00AD +1160-11FF +200B c" */
- static const struct interval {
- wchar_t first;
- wchar_t last;
- } combining[] = {
- { 0x0300, 0x036f }, { 0x0483, 0x0486 }, { 0x0488, 0x0489 },
- { 0x0591, 0x05bd }, { 0x05bf, 0x05bf }, { 0x05c1, 0x05c2 },
- { 0x05c4, 0x05c5 }, { 0x05c7, 0x05c7 }, { 0x0600, 0x0603 },
- { 0x0610, 0x0615 }, { 0x064b, 0x065e }, { 0x0670, 0x0670 },
- { 0x06d6, 0x06e4 }, { 0x06e7, 0x06e8 }, { 0x06ea, 0x06ed },
- { 0x070f, 0x070f }, { 0x0711, 0x0711 }, { 0x0730, 0x074a },
- { 0x07a6, 0x07b0 }, { 0x07eb, 0x07f3 }, { 0x0901, 0x0902 },
- { 0x093c, 0x093c }, { 0x0941, 0x0948 }, { 0x094d, 0x094d },
+ static const struct interval combining[] = {
+ { 0x0300, 0x036F }, { 0x0483, 0x0486 }, { 0x0488, 0x0489 },
+ { 0x0591, 0x05BD }, { 0x05BF, 0x05BF }, { 0x05C1, 0x05C2 },
+ { 0x05C4, 0x05C5 }, { 0x05C7, 0x05C7 }, { 0x0600, 0x0603 },
+ { 0x0610, 0x0615 }, { 0x064B, 0x065E }, { 0x0670, 0x0670 },
+ { 0x06D6, 0x06E4 }, { 0x06E7, 0x06E8 }, { 0x06EA, 0x06ED },
+ { 0x070F, 0x070F }, { 0x0711, 0x0711 }, { 0x0730, 0x074A },
+ { 0x07A6, 0x07B0 }, { 0x07EB, 0x07F3 }, { 0x0901, 0x0902 },
+ { 0x093C, 0x093C }, { 0x0941, 0x0948 }, { 0x094D, 0x094D },
{ 0x0951, 0x0954 }, { 0x0962, 0x0963 }, { 0x0981, 0x0981 },
- { 0x09bc, 0x09bc }, { 0x09c1, 0x09c4 }, { 0x09cd, 0x09cd },
- { 0x09e2, 0x09e3 }, { 0x0a01, 0x0a02 }, { 0x0a3c, 0x0a3c },
- { 0x0a41, 0x0a42 }, { 0x0a47, 0x0a48 }, { 0x0a4b, 0x0a4d },
- { 0x0a70, 0x0a71 }, { 0x0a81, 0x0a82 }, { 0x0abc, 0x0abc },
- { 0x0ac1, 0x0ac5 }, { 0x0ac7, 0x0ac8 }, { 0x0acd, 0x0acd },
- { 0x0ae2, 0x0ae3 }, { 0x0b01, 0x0b01 }, { 0x0b3c, 0x0b3c },
- { 0x0b3f, 0x0b3f }, { 0x0b41, 0x0b43 }, { 0x0b4d, 0x0b4d },
- { 0x0b56, 0x0b56 }, { 0x0b82, 0x0b82 }, { 0x0bc0, 0x0bc0 },
- { 0x0bcd, 0x0bcd }, { 0x0c3e, 0x0c40 }, { 0x0c46, 0x0c48 },
- { 0x0c4a, 0x0c4d }, { 0x0c55, 0x0c56 }, { 0x0cbc, 0x0cbc },
- { 0x0cbf, 0x0cbf }, { 0x0cc6, 0x0cc6 }, { 0x0ccc, 0x0ccd },
- { 0x0ce2, 0x0ce3 }, { 0x0d41, 0x0d43 }, { 0x0d4d, 0x0d4d },
- { 0x0dca, 0x0dca }, { 0x0dd2, 0x0dd4 }, { 0x0dd6, 0x0dd6 },
- { 0x0e31, 0x0e31 }, { 0x0e34, 0x0e3a }, { 0x0e47, 0x0e4e },
- { 0x0eb1, 0x0eb1 }, { 0x0eb4, 0x0eb9 }, { 0x0ebb, 0x0ebc },
- { 0x0ec8, 0x0ecd }, { 0x0f18, 0x0f19 }, { 0x0f35, 0x0f35 },
- { 0x0f37, 0x0f37 }, { 0x0f39, 0x0f39 }, { 0x0f71, 0x0f7e },
- { 0x0f80, 0x0f84 }, { 0x0f86, 0x0f87 }, { 0x0f90, 0x0f97 },
- { 0x0f99, 0x0fbc }, { 0x0fc6, 0x0fc6 }, { 0x102d, 0x1030 },
+ { 0x09BC, 0x09BC }, { 0x09C1, 0x09C4 }, { 0x09CD, 0x09CD },
+ { 0x09E2, 0x09E3 }, { 0x0A01, 0x0A02 }, { 0x0A3C, 0x0A3C },
+ { 0x0A41, 0x0A42 }, { 0x0A47, 0x0A48 }, { 0x0A4B, 0x0A4D },
+ { 0x0A70, 0x0A71 }, { 0x0A81, 0x0A82 }, { 0x0ABC, 0x0ABC },
+ { 0x0AC1, 0x0AC5 }, { 0x0AC7, 0x0AC8 }, { 0x0ACD, 0x0ACD },
+ { 0x0AE2, 0x0AE3 }, { 0x0B01, 0x0B01 }, { 0x0B3C, 0x0B3C },
+ { 0x0B3F, 0x0B3F }, { 0x0B41, 0x0B43 }, { 0x0B4D, 0x0B4D },
+ { 0x0B56, 0x0B56 }, { 0x0B82, 0x0B82 }, { 0x0BC0, 0x0BC0 },
+ { 0x0BCD, 0x0BCD }, { 0x0C3E, 0x0C40 }, { 0x0C46, 0x0C48 },
+ { 0x0C4A, 0x0C4D }, { 0x0C55, 0x0C56 }, { 0x0CBC, 0x0CBC },
+ { 0x0CBF, 0x0CBF }, { 0x0CC6, 0x0CC6 }, { 0x0CCC, 0x0CCD },
+ { 0x0CE2, 0x0CE3 }, { 0x0D41, 0x0D43 }, { 0x0D4D, 0x0D4D },
+ { 0x0DCA, 0x0DCA }, { 0x0DD2, 0x0DD4 }, { 0x0DD6, 0x0DD6 },
+ { 0x0E31, 0x0E31 }, { 0x0E34, 0x0E3A }, { 0x0E47, 0x0E4E },
+ { 0x0EB1, 0x0EB1 }, { 0x0EB4, 0x0EB9 }, { 0x0EBB, 0x0EBC },
+ { 0x0EC8, 0x0ECD }, { 0x0F18, 0x0F19 }, { 0x0F35, 0x0F35 },
+ { 0x0F37, 0x0F37 }, { 0x0F39, 0x0F39 }, { 0x0F71, 0x0F7E },
+ { 0x0F80, 0x0F84 }, { 0x0F86, 0x0F87 }, { 0x0F90, 0x0F97 },
+ { 0x0F99, 0x0FBC }, { 0x0FC6, 0x0FC6 }, { 0x102D, 0x1030 },
{ 0x1032, 0x1032 }, { 0x1036, 0x1037 }, { 0x1039, 0x1039 },
- { 0x1058, 0x1059 }, { 0x1160, 0x11ff }, { 0x135f, 0x135f },
+ { 0x1058, 0x1059 }, { 0x1160, 0x11FF }, { 0x135F, 0x135F },
{ 0x1712, 0x1714 }, { 0x1732, 0x1734 }, { 0x1752, 0x1753 },
- { 0x1772, 0x1773 }, { 0x17b4, 0x17b5 }, { 0x17b7, 0x17bd },
- { 0x17c6, 0x17c6 }, { 0x17c9, 0x17d3 }, { 0x17dd, 0x17dd },
- { 0x180b, 0x180d }, { 0x18a9, 0x18a9 }, { 0x1920, 0x1922 },
- { 0x1927, 0x1928 }, { 0x1932, 0x1932 }, { 0x1939, 0x193b },
- { 0x1a17, 0x1a18 }, { 0x1b00, 0x1b03 }, { 0x1b34, 0x1b34 },
- { 0x1b36, 0x1b3a }, { 0x1b3c, 0x1b3c }, { 0x1b42, 0x1b42 },
- { 0x1b6b, 0x1b73 }, { 0x1dc0, 0x1dca }, { 0x1dfe, 0x1dff },
- { 0x200b, 0x200f }, { 0x202a, 0x202e }, { 0x2060, 0x2063 },
- { 0x206a, 0x206f }, { 0x20d0, 0x20ef }, { 0x302a, 0x302f },
- { 0x3099, 0x309a }, { 0xa806, 0xa806 }, { 0xa80b, 0xa80b },
- { 0xa825, 0xa826 }, { 0xfb1e, 0xfb1e }, { 0xfe00, 0xfe0f },
- { 0xfe20, 0xfe23 }, { 0xfeff, 0xfeff }, { 0xfff9, 0xfffb },
- { 0x10a01, 0x10a03 }, { 0x10a05, 0x10a06 }, { 0x10a0c, 0x10a0f },
- { 0x10a38, 0x10a3a }, { 0x10a3f, 0x10a3f }, { 0x1d167, 0x1d169 },
- { 0x1d173, 0x1d182 }, { 0x1d185, 0x1d18b }, { 0x1d1aa, 0x1d1ad },
- { 0x1d242, 0x1d244 }, { 0xe0001, 0xe0001 }, { 0xe0020, 0xe007f },
- { 0xe0100, 0xe01ef }
+ { 0x1772, 0x1773 }, { 0x17B4, 0x17B5 }, { 0x17B7, 0x17BD },
+ { 0x17C6, 0x17C6 }, { 0x17C9, 0x17D3 }, { 0x17DD, 0x17DD },
+ { 0x180B, 0x180D }, { 0x18A9, 0x18A9 }, { 0x1920, 0x1922 },
+ { 0x1927, 0x1928 }, { 0x1932, 0x1932 }, { 0x1939, 0x193B },
+ { 0x1A17, 0x1A18 }, { 0x1B00, 0x1B03 }, { 0x1B34, 0x1B34 },
+ { 0x1B36, 0x1B3A }, { 0x1B3C, 0x1B3C }, { 0x1B42, 0x1B42 },
+ { 0x1B6B, 0x1B73 }, { 0x1DC0, 0x1DCA }, { 0x1DFE, 0x1DFF },
+ { 0x200B, 0x200F }, { 0x202A, 0x202E }, { 0x2060, 0x2063 },
+ { 0x206A, 0x206F }, { 0x20D0, 0x20EF }, { 0x302A, 0x302F },
+ { 0x3099, 0x309A }, { 0xA806, 0xA806 }, { 0xA80B, 0xA80B },
+ { 0xA825, 0xA826 }, { 0xFB1E, 0xFB1E }, { 0xFE00, 0xFE0F },
+ { 0xFE20, 0xFE23 }, { 0xFEFF, 0xFEFF }, { 0xFFF9, 0xFFFB },
+ { 0x10A01, 0x10A03 }, { 0x10A05, 0x10A06 }, { 0x10A0C, 0x10A0F },
+ { 0x10A38, 0x10A3A }, { 0x10A3F, 0x10A3F }, { 0x1D167, 0x1D169 },
+ { 0x1D173, 0x1D182 }, { 0x1D185, 0x1D18B }, { 0x1D1AA, 0x1D1AD },
+ { 0x1D242, 0x1D244 }, { 0xE0001, 0xE0001 }, { 0xE0020, 0xE007F },
+ { 0xE0100, 0xE01EF }
};
- int min = 0;
- int max = sizeof(combining) / sizeof(struct interval) - 1;
- int mid;
/* test for 8-bit control characters */
if (ucs == 0)
@@ -130,21 +191,10 @@ int wcwidth_ucs(wchar_t ucs)
if (ucs < 32 || (ucs >= 0x7f && ucs < 0xa0))
return -1;
- /* first quick check for Latin-1 etc. characters */
- if (ucs < combining[0].first)
- return 1;
-
/* binary search in table of non-spacing characters */
- while (max >= min)
- {
- mid = (min + max) / 2;
- if (combining[mid].last < ucs)
- min = mid + 1;
- else if (combining[mid].first > ucs)
- max = mid - 1;
- else if (combining[mid].first <= ucs && combining[mid].last >= ucs)
- return 0;
- }
+ if (bisearch(ucs, combining,
+ sizeof(combining) / sizeof(struct interval) - 1))
+ return 0;
/* if we arrive here, ucs is not a combining or C0/C1 control character */
@@ -152,7 +202,7 @@ int wcwidth_ucs(wchar_t ucs)
if (ucs < 0x1100)
return 1;
- return 1 +
+ return 1 +
(ucs >= 0x1100 &&
(ucs <= 0x115f || /* Hangul Jamo init. consonants */
ucs == 0x2329 || ucs == 0x232a ||
@@ -168,15 +218,120 @@ int wcwidth_ucs(wchar_t ucs)
(ucs >= 0x30000 && ucs <= 0x3fffd)));
}
-#endif /* !HAVE_WC_FUNCS */
+#if 0 /* original */
+int wcswidth_ucs(const wchar_t *pwcs, size_t n)
+{
+ int w, width = 0;
+
+ for (;*pwcs && n-- > 0; pwcs++)
+ if ((w = wcwidth_ucs(*pwcs)) < 0)
+ return -1;
+ else
+ width += w;
+
+ return width;
+}
+#endif
+
+/*
+ * The following functions are the same as wcwidth_ucs() and
+ * wcswidth_ucs(), except that spacing characters in the East Asian
+ * Ambiguous (A) category as defined in Unicode Technical Report #11
+ * have a column width of 2. This variant might be useful for users of
+ * CJK legacy encodings who want to migrate to UCS without changing
+ * the traditional terminal character-width behaviour. It is not
+ * otherwise recommended for general use.
+ */
+/*
+ * In addition to the explanation mentioned above,
+ * several characters in the East Asian Narrow (Na) and Not East Asian
+ * (Neutral) category as defined in Unicode Technical Report #11
+ * actually have a column width of 2 in CJK legacy encodings.
+ */
+int wcwidth_cjk(wchar_t ucs)
+{
+ /* sorted list of non-overlapping intervals of East Asian Ambiguous
+ * characters, generated by "uniset +WIDTH-A -cat=Me -cat=Mn -cat=Cf c" */
+ static const struct interval ambiguous[] = {
+ { 0x00A1, 0x00A1 }, { 0x00A4, 0x00A4 }, { 0x00A7, 0x00A8 },
+ { 0x00AA, 0x00AA }, { 0x00AE, 0x00AE }, { 0x00B0, 0x00B4 },
+ { 0x00B6, 0x00BA }, { 0x00BC, 0x00BF }, { 0x00C6, 0x00C6 },
+ { 0x00D0, 0x00D0 }, { 0x00D7, 0x00D8 }, { 0x00DE, 0x00E1 },
+ { 0x00E6, 0x00E6 }, { 0x00E8, 0x00EA }, { 0x00EC, 0x00ED },
+ { 0x00F0, 0x00F0 }, { 0x00F2, 0x00F3 }, { 0x00F7, 0x00FA },
+ { 0x00FC, 0x00FC }, { 0x00FE, 0x00FE }, { 0x0101, 0x0101 },
+ { 0x0111, 0x0111 }, { 0x0113, 0x0113 }, { 0x011B, 0x011B },
+ { 0x0126, 0x0127 }, { 0x012B, 0x012B }, { 0x0131, 0x0133 },
+ { 0x0138, 0x0138 }, { 0x013F, 0x0142 }, { 0x0144, 0x0144 },
+ { 0x0148, 0x014B }, { 0x014D, 0x014D }, { 0x0152, 0x0153 },
+ { 0x0166, 0x0167 }, { 0x016B, 0x016B }, { 0x01CE, 0x01CE },
+ { 0x01D0, 0x01D0 }, { 0x01D2, 0x01D2 }, { 0x01D4, 0x01D4 },
+ { 0x01D6, 0x01D6 }, { 0x01D8, 0x01D8 }, { 0x01DA, 0x01DA },
+ { 0x01DC, 0x01DC }, { 0x0251, 0x0251 }, { 0x0261, 0x0261 },
+ { 0x02C4, 0x02C4 }, { 0x02C7, 0x02C7 }, { 0x02C9, 0x02CB },
+ { 0x02CD, 0x02CD }, { 0x02D0, 0x02D0 }, { 0x02D8, 0x02DB },
+ { 0x02DD, 0x02DD }, { 0x02DF, 0x02DF }, { 0x0391, 0x03A1 },
+ { 0x03A3, 0x03A9 }, { 0x03B1, 0x03C1 }, { 0x03C3, 0x03C9 },
+ { 0x0401, 0x0401 }, { 0x0410, 0x044F }, { 0x0451, 0x0451 },
+ { 0x2010, 0x2010 }, { 0x2013, 0x2016 }, { 0x2018, 0x2019 },
+ { 0x201C, 0x201D }, { 0x2020, 0x2022 }, { 0x2024, 0x2027 },
+ { 0x2030, 0x2030 }, { 0x2032, 0x2033 }, { 0x2035, 0x2035 },
+ { 0x203B, 0x203B }, { 0x203E, 0x203E }, { 0x2074, 0x2074 },
+ { 0x207F, 0x207F }, { 0x2081, 0x2084 }, { 0x20AC, 0x20AC },
+ { 0x2103, 0x2103 }, { 0x2105, 0x2105 }, { 0x2109, 0x2109 },
+ { 0x2113, 0x2113 }, { 0x2116, 0x2116 }, { 0x2121, 0x2122 },
+ { 0x2126, 0x2126 }, { 0x212B, 0x212B }, { 0x2153, 0x2154 },
+ { 0x215B, 0x215E }, { 0x2160, 0x216B }, { 0x2170, 0x2179 },
+ { 0x2190, 0x2199 }, { 0x21B8, 0x21B9 }, { 0x21D2, 0x21D2 },
+ { 0x21D4, 0x21D4 }, { 0x21E7, 0x21E7 }, { 0x2200, 0x2200 },
+ { 0x2202, 0x2203 }, { 0x2207, 0x2208 }, { 0x220B, 0x220B },
+ { 0x220F, 0x220F }, { 0x2211, 0x2211 }, { 0x2215, 0x2215 },
+ { 0x221A, 0x221A }, { 0x221D, 0x2220 }, { 0x2223, 0x2223 },
+ { 0x2225, 0x2225 }, { 0x2227, 0x222C }, { 0x222E, 0x222E },
+ { 0x2234, 0x2237 }, { 0x223C, 0x223D }, { 0x2248, 0x2248 },
+ { 0x224C, 0x224C }, { 0x2252, 0x2252 }, { 0x2260, 0x2261 },
+ { 0x2264, 0x2267 }, { 0x226A, 0x226B }, { 0x226E, 0x226F },
+ { 0x2282, 0x2283 }, { 0x2286, 0x2287 }, { 0x2295, 0x2295 },
+ { 0x2299, 0x2299 }, { 0x22A5, 0x22A5 }, { 0x22BF, 0x22BF },
+ { 0x2312, 0x2312 }, { 0x2460, 0x24E9 }, { 0x24EB, 0x254B },
+ { 0x2550, 0x2573 }, { 0x2580, 0x258F }, { 0x2592, 0x2595 },
+ { 0x25A0, 0x25A1 }, { 0x25A3, 0x25A9 }, { 0x25B2, 0x25B3 },
+ { 0x25B6, 0x25B7 }, { 0x25BC, 0x25BD }, { 0x25C0, 0x25C1 },
+ { 0x25C6, 0x25C8 }, { 0x25CB, 0x25CB }, { 0x25CE, 0x25D1 },
+ { 0x25E2, 0x25E5 }, { 0x25EF, 0x25EF }, { 0x2605, 0x2606 },
+ { 0x2609, 0x2609 }, { 0x260E, 0x260F }, { 0x2614, 0x2615 },
+ { 0x261C, 0x261C }, { 0x261E, 0x261E }, { 0x2640, 0x2640 },
+ { 0x2642, 0x2642 }, { 0x2660, 0x2661 }, { 0x2663, 0x2665 },
+ { 0x2667, 0x266A }, { 0x266C, 0x266D }, { 0x266F, 0x266F },
+ { 0x273D, 0x273D }, { 0x2776, 0x277F }, { 0xE000, 0xF8FF },
+ { 0xFFFD, 0xFFFD }, { 0xF0000, 0xFFFFD }, { 0x100000, 0x10FFFD }
+ };
+
+ /* For Japanese legacy encodings, the following characters are added. */
+ static const struct interval legacy_ja[] = {
+ { 0x00A2, 0x00A3 }, { 0x00A5, 0x00A6 }, { 0x00AC, 0x00AC },
+ { 0x00AF, 0x00AF }, { 0x2212, 0x2212 }
+ };
+
+ /* binary search in table of non-spacing characters */
+ if (bisearch(ucs, ambiguous,
+ sizeof(ambiguous) / sizeof(struct interval) - 1))
+ return 2;
+ if (bisearch(ucs, legacy_ja,
+ sizeof(legacy_ja) / sizeof(struct interval) - 1))
+ return 2;
+
+ return wcwidth_ucs(ucs);
+}
+
#if 0 /* original */
-int wcswidth(const wchar_t *pwcs, size_t n)
+int wcswidth_cjk(const wchar_t *pwcs, size_t n)
{
int w, width = 0;
for (;*pwcs && n-- > 0; pwcs++)
- if ((w = wcwidth(*pwcs)) < 0)
+ if ((w = wcwidth_cjk(*pwcs)) < 0)
return -1;
else
width += w;
@@ -184,3 +339,4 @@ int wcswidth(const wchar_t *pwcs, size_t n)
return width;
}
#endif
+#endif /* !HAVE_WC_FUNCS || USE_CJK_WIDTH */
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment