Last active
August 29, 2015 14:28
-
-
Save oyvholm/ebf4d055f5500b257ed8 to your computer and use it in GitHub Desktop.
hPutChar error message in git-annex with UTF-8 chars above 7F in filenames
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
#======================================================================= | |
# runme | |
# File ID: 38d0ddc8-49bb-11e5-a06b-fefdb24f8e10 | |
# | |
# Author: Øyvind A. Holm <sunny@sunbase.org> | |
# License: GNU General Public License version 2 or later. | |
#======================================================================= | |
: <<'INFO_END' | |
When using --incremental together with "git annex fsck", the error | |
message "hPutChar: invalid argument (invalid character)" appears in the | |
"Only X of Y trustworthy copies exist" message when the filename | |
contains an UTF-8 character above U+007F. This test script can take an | |
optional argument $1 and sets the environment variable $LANG to that | |
value. Here are two excerpts from the test output: | |
$ ./runme C | |
[snip] | |
================== git annex --incremental fsck ================== | |
fsck U00D8_Ø.txt (checksum...) | |
Only 1 of 2 trustworthy copies exist of U00D8_ | |
git-annex: <stderr>: hPutChar: invalid argument (invalid character) | |
failed | |
fsck ascii_only.txt (checksum...) | |
Only 1 of 2 trustworthy copies exist of ascii_only.txt | |
Back it up with git-annex copy. | |
failed | |
(recording state in git...) | |
git-annex: fsck: 2 failed | |
$ ./runme C.UTF-8 | |
[snip] | |
================== git annex --incremental fsck ================== | |
fsck U00D8_Ø.txt (checksum...) | |
Only 1 of 2 trustworthy copies exist of U00D8_Ø.txt | |
Back it up with git-annex copy. | |
failed | |
fsck ascii_only.txt (checksum...) | |
Only 1 of 2 trustworthy copies exist of ascii_only.txt | |
Back it up with git-annex copy. | |
failed | |
(recording state in git...) | |
git-annex: fsck: 2 failed | |
I tried to run the tests with different locales (Norwegian and some | |
English variants) and ran the following command (available as | |
./test-all-locales) to loop through them: | |
for f in `locale -a`; do | |
./runme $f 2>&1 | grep -q "invalid character" && | |
echo $f: error || | |
echo $f: no error | |
done | |
Result: | |
C: error | |
C.UTF-8: no error | |
POSIX: error | |
en_AG: error | |
en_AG.utf8: error | |
en_AU.utf8: error | |
en_BW.utf8: error | |
en_CA.utf8: error | |
en_DK.utf8: error | |
en_GB.utf8: error | |
en_HK.utf8: error | |
en_IE.utf8: error | |
en_IN: error | |
en_IN.utf8: error | |
en_NG: error | |
en_NG.utf8: error | |
en_NZ.utf8: error | |
en_PH.utf8: error | |
en_SG.utf8: error | |
en_US.utf8: error | |
en_ZA.utf8: error | |
en_ZM: error | |
en_ZM.utf8: error | |
en_ZW.utf8: error | |
nb_NO.utf8: error | |
In summary, the error appears in all locales except "C.UTF-8". Have also | |
tried variants like "en_IN.UTF8" and "en_IN.UTF-8" with the same result. | |
Using newest git-annex version from | |
<https://downloads.kitenet.net/.git/> in directory | |
git-annex/linux/current/ . | |
$ git annex version | |
git-annex version: 5.20150812-ge953be1 | |
build flags: Assistant Webapp Webapp-secure Pairing Testsuite S3 WebDAV Inotify DBus DesktopNotify XMPP DNS Feeds Quvi TDFA Database | |
key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 SHA1E SHA1 MD5E MD5 WORM URL | |
remote types: git gcrypt S3 bup directory rsync web bittorrent webdav tahoe glacier ddar hook external | |
local repository version: 5 | |
supported repository version: 5 | |
upgrade supported from repository versions: 0 1 2 4 | |
$ git --version | |
git version 2.5.0.400.gff86faf | |
$ locale | |
LANG=en_GB.UTF-8 | |
LANGUAGE= | |
LC_CTYPE="en_GB.UTF-8" | |
LC_NUMERIC=C | |
LC_TIME="en_GB.UTF-8" | |
LC_COLLATE=C | |
LC_MONETARY="en_GB.UTF-8" | |
LC_MESSAGES="en_GB.UTF-8" | |
LC_PAPER="en_GB.UTF-8" | |
LC_NAME="en_GB.UTF-8" | |
LC_ADDRESS="en_GB.UTF-8" | |
LC_TELEPHONE="en_GB.UTF-8" | |
LC_MEASUREMENT="en_GB.UTF-8" | |
LC_IDENTIFICATION="en_GB.UTF-8" | |
LC_ALL= | |
$ arch | |
x86_64 | |
$ lsb_release -a | |
No LSB modules are available. | |
Distributor ID: Debian | |
Description: Debian GNU/Linux 7.8 (wheezy) | |
Release: 7.8 | |
Codename: wheezy | |
The same behaviour happens in Linux Mint 17 Qiana and Debian GNU/Linux | |
6.0.10 (squeeze). | |
INFO_END | |
repo=tmpdir-hputchar | |
unset debugstr | |
# Uncomment for debug info | |
# debugstr="--debug" | |
test -n "$1" && { | |
export LANG="$1" | |
shift | |
} | |
msg() { | |
echo | |
echo ================== $* ================== | |
} | |
add_file() { | |
msg Add $1 to git-annex && | |
echo This is $1 >"$1" && | |
git annex add "$1" && | |
msg git commit, add "$1" && | |
git commit -m "Add $1" && | |
return 0 || return 1 | |
} | |
test -d "$repo" && chmod -R +w "$repo" && rm -rf "$repo" | |
msg git init "$repo" && | |
git init "$repo" && | |
cd "$repo" && | |
msg git annex init && | |
git annex init && | |
msg git commit, empty start commit && | |
git commit --allow-empty -m "Empty startcommit" && | |
msg git annex numcopies 2 && | |
git annex numcopies 2 && | |
git annex numcopies && | |
add_file ascii_only.txt && | |
add_file U00D8_Ø.txt && | |
msg git annex $debugstr fsck && | |
(git annex $debugstr fsck || true) && | |
msg git annex $debugstr --incremental fsck && | |
(git annex fsck $debugstr --incremental || true) 2>&1 | tee output.txt && | |
echo && | |
echo Reached the end || | |
{ | |
echo Reached the end, but something went wrong in the \&\& chain | |
exit 2 | |
} | |
# Exit with 0 if no error, or 1 if the error appeared | |
grep -q "invalid character" output.txt && | |
{ echo exit with ERROR; exit 1; } || | |
{ echo exit with OK; exit 0; } |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
#======================================================================= | |
# test-all-locales | |
# File ID: c17de02c-49c4-11e5-97b3-0026b9848456 | |
# | |
# Execute ./runme with all available locales | |
# | |
# Author: Øyvind A. Holm <sunny@sunbase.org> | |
# License: GNU General Public License version 2 or later. | |
#======================================================================= | |
retval=0 | |
for f in `locale -a`; do | |
./runme $f 2>&1 | grep -q "invalid character" && | |
{ echo $f: error; retval=1; } || | |
echo $f: no error | |
done | |
echo | |
test "$retval" = "0" && | |
echo No errors found, exit with 0 || | |
echo Errors found, exit with 1 | |
exit $retval |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment