Create documents:
$ acurl -X PUT "$host/encoding/a" -d '{ "cp": "U+F925", "data": "chr: 拉" }'
{"ok":true,"id":"a","rev":"1-5f4eb3548d021ed8d02b3e2622783b52"}
$ acurl -X PUT "$host/encoding/b" -d '{ "cp": "U+1F631", "data": "chr: 😱" }'
{"ok":true,"id":"b","rev":"1-853e9ee71d55c2312d9f2c5fc6956ab6"}
Check the encoding data is returned in:
$ acurl -X GET "$host/encoding/_all_docs?include_docs=true" \
| jq -r '.rows | .[] | .doc.data' \
| hexdump -C
00000000 6e 75 6c 6c 0a 63 68 72 3a 20 ef a4 a5 0a 63 68 |null.chr: ....ch|
00000010 72 3a 20 f0 9f 98 b1 0a |r: .....|
00000018
1st char (拉) is ef a4 a5
, 2nd char (😱) is f0 9f 98 b1
, looks like UTF-8.
We'd expect 拉 to sort before 😱 based on their codepoints and UTF-8 bytes.
In UTF-16BE, 拉 is f9 25
, and 😱 is d8 3d de 31
, so 😱 would sort before 拉.
In UTF-16LE, 拉 is 25 f9
, and 😱 is 3d d8 31 de
, so 😱 would sort after 拉.
To decide which encoding is in use we need another example: U+F946, 牢.
$ acurl -X PUT "$host/encoding/c" -d '{ "cp": "U+F946", "data": "chr: 牢" }'
{"ok":true,"id":"c","rev":"1-dd58f21b9fedf2322abd2a8485f56321"}
Full example set:
docid | char | codepoint | UTF-8 | UTF-16BE | UTF-16LE
------+------+-----------+-------------+-------------+------------
a | 拉 | U+F925 | EF A4 A5 | F9 25 | 25 F9
b | 😱 | U+1F631 | F0 9F 98 B1 | D8 3D DE 31 | 3D D8 31 DE
c | 牢 | U+F946 | EF A5 86 | F9 46 | 46 F9
We'd expect the following sort orders for different representations:
- codepoint: A, C, B
- UTF-8: A, C, B
- UTF-16BE: B, A, C
- UTF-16LE: A, B, C
Create index:
$ acurl -X POST "$host/encoding/_index" \
-d '{ "ddoc": "by-data", "type": "json", "index": { "fields": ["data"] } }'
{"result":"created","id":"_design/by-data","name":"1f39708131deed6f5649d8c9447ae53729ceb7ef"}
Mango query, sort by data
:
$ acurl -X POST "$host/encoding/_find" \
-d '{ "selector": {}, "sort": [{ "data": "asc" }] }'
{"docs":[
{"_id":"b","_rev":"1-853e9ee71d55c2312d9f2c5fc6956ab6","cp":"U+1F631","data":"chr: 😱"},
{"_id":"a","_rev":"1-5f4eb3548d021ed8d02b3e2622783b52","cp":"U+F925","data":"chr: 拉"},
{"_id":"c","_rev":"1-dd58f21b9fedf2322abd2a8485f56321","cp":"U+F946","data":"chr: 牢"}
],
"bookmark": "g1AAAAA_eJzLYWBgYMpgSmHgKy5JLCrJTq2MT8lPzkzJBYozJoMkOGASOSAhkDhHckaRlcL7pW1ZWQARIRGp"}
Order is B, A, C which is consistent with UTF-16BE.