ssfang/setlocale_sample.md

## setlocale_sample.md

      
    Raw
  

              setlocale_sample.md
            
          
    VS2008（文件->高级保存选项-编码：简体中文(GB2312) - 代码页 936）
#include <stdio.h>
#include <tchar.h>
#include <locale.h>

int _tmain(int argc, _TCHAR* argv[])
{
/*
C:\Users\fangss>chcp /?
	  显示或设置活动代码页编号。

	  CHCP [nnn]

	  nnn   指定代码页编号。
	  不带参数键入 CHCP 以显示活动代码页编号。

	  chcp 65001  就是换成UTF-8代码页
	  chcp 936 可以换回默认的GBK
	  chcp 437 是美国英语  

	  https://www.microsoft.com/resources/documentation/windows/xp/all/proddocs/en-us/chcp.mspx?mfr=true
*/
	//https://msdn.microsoft.com/en-us/library/x99tb11d.aspx
	printf("the thread's current locale: %s\nthe active console code page: ", setlocale(LC_ALL, NULL));
	system("CHCP");

	printf("\n");

	//setlocale();
	printf("printf(\"%%S\"): WideChars = %S", L"CN中国");
	printf(" vs ");
	wprintf(L"wprintf(L\"%%s\"): WideChars = %s", L"CN中国");

	printf("\n\n");

	// POSIX language[_territory][.codeset][@modifier]
	// MS lang[_country_region[.code_page]]
	// language 为 ISO 639 中规定的语言代码，territory 为 ISO 3166 中规定的国家/地区代码，codeset 为字符集名称
	// e.g. zh_CN.GBK for POSIX vs Chinese_People's Republic of China.936 for Windows CRT

	//setlocale(LC_ALL, "Chinese_People's Republic of China.936");
	setlocale(LC_ALL, "");

	printf("printf(\"%%S\"): WideChars = %S", L"CN中国");
	printf(" vs ");
	wprintf(L"wprintf(L\"%%s\"): WideChars = %s", L"CN中国");

	printf("\n\n");
	//getchar();
	system("pause");
	return 0;
}
输出结果
the thread's current locale: C
the active console code page: 活动代码页: 936

printf("%S"): WideChars = CN vs wprintf(L"%s"): WideChars = CN??

printf("%S"): WideChars = CN中国 vs wprintf(L"%s"): WideChars = CN中国

请按任意键继续. . .

为了更清楚了解源文件的编码，如下查看（截取了文件前段和后段部分）
C:\Users\fangss>hexdump -C D:\VSProjects\Win32\Win32\propdump.cpp
00000000  23 69 6e 63 6c 75 64 65  20 3c 73 74 64 69 6f 2e  |#include <stdio.|
00000010  68 3e 0d 0a 23 69 6e 63  6c 75 64 65 20 3c 74 63  |h>..#include <tc|

00000530  53 22 2c 20 4c 22 43 4e  d6 d0 b9 fa 22 29 3b 0d  |S", L"CN....");.| <---
00000540  0a 20 20 20 20 70 72 69  6e 74 66 28 22 20 76 73  |.    printf(" vs|
00000550  20 22 29 3b 0d 0a 20 20  20 20 77 70 72 69 6e 74  | ");..    wprint|
00000560  66 28 4c 22 77 70 72 69  6e 74 66 28 4c 5c 22 25  |f(L"wprintf(L\"%|
00000570  25 73 5c 22 29 3a 20 57  69 64 65 43 68 61 72 73  |%s\"): WideChars|
00000580  20 3d 20 25 73 22 2c 20  4c 22 43 4e d6 d0 b9 fa  | = %s", L"CN....| <---
00000590  22 29 3b 0d 0a 0d 0a 20  20 20 20 70 72 69 6e 74  |");....    print|
000005a0  66 28 22 5c 6e 5c 6e 22  29 3b 0d 0a 20 20 20 20  |f("\n\n");..    |
000005b0  2f 2f 67 65 74 63 68 61  72 28 29 3b 0d 0a 20 20  |//getchar();..  |

00000530和00000580偏移行包含中文字符中国，二进制为d6 d0 b9 fa。
Unihan data for U+4E2D 中 和 Unihan data for U+56FD 国
从ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit936.txt摘录部分：
fangss@fangss-PC ~
$ grep -n "Null" /cygdrive/c/Users/fangss/Desktop/bestfit936.txt
7:0x00  0x0000  ;Null
24465:0x0000    0x0000  ;Null

fangss@fangss-PC ~
$ grep -n "中" /cygdrive/c/Users/fangss/Desktop/bestfit936.txt
16694:0xd0      0x4e2d  ;中
25342:0x3197    0xd6d0  ;中
25486:0x4e2d    0xd6d0  ;中

fangss@fangss-PC ~
$ grep -n "国" /cygdrive/c/Users/fangss/Desktop/bestfit936.txt
11139:0xfa      0x56fd  ;国
27742:0x56fd    0xb9fa  ;国

同一个字符，前者归属于MBTABLE或DBCSTABLE； 后者归属WCTABLE	24482，文件结构如下：
CODEPAGE 936            ; PRC GBK (XGB) - ANSI, OEM

CPINFO 2 0x3f 0x003f    ; DBCS CP, Default Char = Question Mark

MBTABLE 130

0x00	0x0000	;Null
0x01	0x0001	;Start Of Heading
;...省略
0xff	0xf8f5	;


DBCSRANGE  1            ;Lead Byte Range: 0x81-0xfe

0x81  0xfe              ;Lead Byte Range


DBCSTABLE 190           ;LeadByte = 0x81

0x40	0x4e02	;丂
0x41	0x4e04	;丄
;...省略

DBCSTABLE 190           ;LeadByte = 0x82

0x40	0x4fa4	;侤
0x41	0x4fab	;侫
;...省略
0xfe	0xe4c5	;


WCTABLE	24482

0x0000	0x0000	;Null
0x0001	0x0001	;Start Of Heading
;...省略
0xffe5	0xa3a4	;￥

ENDCODEPAGE


Windows CRT setlocale, _wsetlocale
National Language Support (NLS) API Reference
WindowsBestFit 936