Skip to content

Instantly share code, notes, and snippets.

@JamesChevalier
Last active August 14, 2024 08:56
Show Gist options
  • Save JamesChevalier/8448512 to your computer and use it in GitHub Desktop.
Save JamesChevalier/8448512 to your computer and use it in GitHub Desktop.
Unicode on Mac is insane. Mac OS X uses NFD while everything else uses NFC. This fixes that.

convmv manpage

Install convmv if you don't have it

sudo apt-get install convmv

Convert all files in a directory from NFD to NFC:

convmv -r -f utf8 -t utf8 --nfc --notest .

Convert all files in a directory from NFC to NFD:

convmv -r -f utf8 -t utf8 --nfd --notest .

@hwdbk
Copy link

hwdbk commented Dec 27, 2020 via email

@fguern
Copy link

fguern commented Dec 27, 2020

Hello Henk.

At this stage I see three solutions:

/////1 - Your script

-> This time, it's "not overwritten":
francois@Francoiss-MacBook-Air mac-nfd-conversion % for f in /Users/francois/Documents/01.\ Documents/25.\ Test/*.rtf ; do mv -v -n "$f" "$(dirname "$f")/$(./mac2syn <<< "$(basename "$f")")" ; done
/Users/francois/Documents/01. Documents/25. Test/2008_03_26 - A voir à paris.rtf not overwritten
Is it possible to have an entire folder+subfolders rename with your script?

//// 2 - James CONVMV command
-> I also tried the command above "convmv -r -f utf8 -t utf8 --nfc --notest" and I got a "wrong/unknown encoding" :
francois@Francoiss-MacBook-Air 25. Test % convmv -r -f enc -t enc utf8 --nfc --notest
wrong/unknown "from" encoding!

///// 3 - Rsync local copy with NFC
-> No apparent problem, except the copy of 10Go
rsync -a --iconv=utf-8-mac,utf-8 /Users/francois/Documents/01.\ Documents/02.\ Administratif /Users/francois/Documents/01.\ Documents/02.\ Administratif\ nfc

What's your expert advise? Is it worth it to try to make the script or the convmv command work?
Thanks

@hwdbk
Copy link

hwdbk commented Dec 28, 2020 via email

@hwdbk
Copy link

hwdbk commented Dec 28, 2020 via email

@jsvini
Copy link

jsvini commented Mar 18, 2021

God bless you! 🙌

@jcarnat
Copy link

jcarnat commented Apr 14, 2021

Great. Thanks a lot!

@boulderob
Copy link

boulderob commented Apr 30, 2021

So i just upgraded to a new used intel-based macbook pro (aka mbp). I reformatted the internal SDD to use APFS and i'm seeing the exact symptoms defined here when i use VLC to view downloaded french video mp4 files with matching subtitle vtt filenames that have unicode characters in the filenames themselves! VLC plays the mp4 fine but it can't locate and autoload the vtt file with the same exact filename except for the extension! If there are no unicode "french" characters in the filenames it all works fine. But all files whether they had unicode "French" characters or not worked great on my old mbp with an older version of VLC and a NON-APFS file system.

In fact if i mount my external NTFS drive with a huge library of previously downloaded french video mp4 and vtt files on the new mbp (using an NTFS driver to mount the drive of course), the new VLC recognizes and plays these OLD files normally.

BUT on the new mbp, VLC will not autorecognzie the coinciding and same exact vtt filename as it's equivalent mp4 when the filename contains unicode french chars when i download new files via youtube-dl onto the internal SDD formatted as APFS

If i copy these new download files to the usb attached NTFS drive they then magically work they way i expect from VLC on the new mbp! :) If i then copy them back to the new mbp internal SDD with APFS, they also work the way i expect :) This seems to indicate that the NTFS filesystem changes the fileNAME encoding when the file is copied to it and that copying back to APFS somehow does NOT change that new encoding!???

I have a huge collection and am constantly adding to and maintaining it.

Is there any way to just set what your scripts are doing at the filesystem or even system level? Or going forward will i have to always run a post download script on every new filename to convert the unicode flavor used so that my mac / vlc can recognize them?!

Thanks

@hwdbk
Copy link

hwdbk commented May 1, 2021

yup, that is exactly the madness with having two allowed but different character encodings for, say, the è (e-accent-grave)
the simple way to make sure both the media file and the vtt/srt file uses the same file name encoding is:

for i in *.mp4 ; do
mv -vn "$i" "$(syn2mac <<< "$i")"
done

and do the same for the subtitle files. You'll probably get a lot of "same file" warnings from mv on those files that were already in the target encoding.

@s2k
Copy link

s2k commented May 4, 2021

Very nice tip!
On my Mac, I used brew install convmv, BTW.

@igorsgm
Copy link

igorsgm commented May 13, 2021

You saved my day. Thank you!

@simnalamburt
Copy link

simnalamburt commented Jun 16, 2023

Take a look at https://github.com/cr0sh/jaso for a faster alternative written in Rust.

$ brew install simnalamburt/x/jaso
$ jaso .
DONE; 100 files in 1.111529301 seconds

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment