Skip to content

Instantly share code, notes, and snippets.

@oyvholm
Last active August 29, 2015 14:28
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save oyvholm/ebf4d055f5500b257ed8 to your computer and use it in GitHub Desktop.
Save oyvholm/ebf4d055f5500b257ed8 to your computer and use it in GitHub Desktop.
hPutChar error message in git-annex with UTF-8 chars above 7F in filenames
#!/bin/bash
#=======================================================================
# runme
# File ID: 38d0ddc8-49bb-11e5-a06b-fefdb24f8e10
#
# Author: Øyvind A. Holm <sunny@sunbase.org>
# License: GNU General Public License version 2 or later.
#=======================================================================
: <<'INFO_END'
When using --incremental together with "git annex fsck", the error
message "hPutChar: invalid argument (invalid character)" appears in the
"Only X of Y trustworthy copies exist" message when the filename
contains an UTF-8 character above U+007F. This test script can take an
optional argument $1 and sets the environment variable $LANG to that
value. Here are two excerpts from the test output:
$ ./runme C
[snip]
================== git annex --incremental fsck ==================
fsck U00D8_Ø.txt (checksum...)
Only 1 of 2 trustworthy copies exist of U00D8_
git-annex: <stderr>: hPutChar: invalid argument (invalid character)
failed
fsck ascii_only.txt (checksum...)
Only 1 of 2 trustworthy copies exist of ascii_only.txt
Back it up with git-annex copy.
failed
(recording state in git...)
git-annex: fsck: 2 failed
$ ./runme C.UTF-8
[snip]
================== git annex --incremental fsck ==================
fsck U00D8_Ø.txt (checksum...)
Only 1 of 2 trustworthy copies exist of U00D8_Ø.txt
Back it up with git-annex copy.
failed
fsck ascii_only.txt (checksum...)
Only 1 of 2 trustworthy copies exist of ascii_only.txt
Back it up with git-annex copy.
failed
(recording state in git...)
git-annex: fsck: 2 failed
I tried to run the tests with different locales (Norwegian and some
English variants) and ran the following command (available as
./test-all-locales) to loop through them:
for f in `locale -a`; do
./runme $f 2>&1 | grep -q "invalid character" &&
echo $f: error ||
echo $f: no error
done
Result:
C: error
C.UTF-8: no error
POSIX: error
en_AG: error
en_AG.utf8: error
en_AU.utf8: error
en_BW.utf8: error
en_CA.utf8: error
en_DK.utf8: error
en_GB.utf8: error
en_HK.utf8: error
en_IE.utf8: error
en_IN: error
en_IN.utf8: error
en_NG: error
en_NG.utf8: error
en_NZ.utf8: error
en_PH.utf8: error
en_SG.utf8: error
en_US.utf8: error
en_ZA.utf8: error
en_ZM: error
en_ZM.utf8: error
en_ZW.utf8: error
nb_NO.utf8: error
In summary, the error appears in all locales except "C.UTF-8". Have also
tried variants like "en_IN.UTF8" and "en_IN.UTF-8" with the same result.
Using newest git-annex version from
<https://downloads.kitenet.net/.git/> in directory
git-annex/linux/current/ .
$ git annex version
git-annex version: 5.20150812-ge953be1
build flags: Assistant Webapp Webapp-secure Pairing Testsuite S3 WebDAV Inotify DBus DesktopNotify XMPP DNS Feeds Quvi TDFA Database
key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 SHA1E SHA1 MD5E MD5 WORM URL
remote types: git gcrypt S3 bup directory rsync web bittorrent webdav tahoe glacier ddar hook external
local repository version: 5
supported repository version: 5
upgrade supported from repository versions: 0 1 2 4
$ git --version
git version 2.5.0.400.gff86faf
$ locale
LANG=en_GB.UTF-8
LANGUAGE=
LC_CTYPE="en_GB.UTF-8"
LC_NUMERIC=C
LC_TIME="en_GB.UTF-8"
LC_COLLATE=C
LC_MONETARY="en_GB.UTF-8"
LC_MESSAGES="en_GB.UTF-8"
LC_PAPER="en_GB.UTF-8"
LC_NAME="en_GB.UTF-8"
LC_ADDRESS="en_GB.UTF-8"
LC_TELEPHONE="en_GB.UTF-8"
LC_MEASUREMENT="en_GB.UTF-8"
LC_IDENTIFICATION="en_GB.UTF-8"
LC_ALL=
$ arch
x86_64
$ lsb_release -a
No LSB modules are available.
Distributor ID: Debian
Description: Debian GNU/Linux 7.8 (wheezy)
Release: 7.8
Codename: wheezy
The same behaviour happens in Linux Mint 17 Qiana and Debian GNU/Linux
6.0.10 (squeeze).
INFO_END
repo=tmpdir-hputchar
unset debugstr
# Uncomment for debug info
# debugstr="--debug"
test -n "$1" && {
export LANG="$1"
shift
}
msg() {
echo
echo ================== $* ==================
}
add_file() {
msg Add $1 to git-annex &&
echo This is $1 >"$1" &&
git annex add "$1" &&
msg git commit, add "$1" &&
git commit -m "Add $1" &&
return 0 || return 1
}
test -d "$repo" && chmod -R +w "$repo" && rm -rf "$repo"
msg git init "$repo" &&
git init "$repo" &&
cd "$repo" &&
msg git annex init &&
git annex init &&
msg git commit, empty start commit &&
git commit --allow-empty -m "Empty startcommit" &&
msg git annex numcopies 2 &&
git annex numcopies 2 &&
git annex numcopies &&
add_file ascii_only.txt &&
add_file U00D8_Ø.txt &&
msg git annex $debugstr fsck &&
(git annex $debugstr fsck || true) &&
msg git annex $debugstr --incremental fsck &&
(git annex fsck $debugstr --incremental || true) 2>&1 | tee output.txt &&
echo &&
echo Reached the end ||
{
echo Reached the end, but something went wrong in the \&\& chain
exit 2
}
# Exit with 0 if no error, or 1 if the error appeared
grep -q "invalid character" output.txt &&
{ echo exit with ERROR; exit 1; } ||
{ echo exit with OK; exit 0; }
#!/bin/bash
#=======================================================================
# test-all-locales
# File ID: c17de02c-49c4-11e5-97b3-0026b9848456
#
# Execute ./runme with all available locales
#
# Author: Øyvind A. Holm <sunny@sunbase.org>
# License: GNU General Public License version 2 or later.
#=======================================================================
retval=0
for f in `locale -a`; do
./runme $f 2>&1 | grep -q "invalid character" &&
{ echo $f: error; retval=1; } ||
echo $f: no error
done
echo
test "$retval" = "0" &&
echo No errors found, exit with 0 ||
echo Errors found, exit with 1
exit $retval
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment