Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Unicode on Mac is insane. Mac OS X uses NFD while everything else uses NFC. This fixes that.

convmv manpage

Install convmv if you don't have it

sudo apt-get install convmv

Convert all files in a directory from NFD to NFC:

convmv -r -f utf8 -t utf8 --nfc --notest .

Convert all files in a directory from NFC to NFD:

convmv -r -f utf8 -t utf8 --nfd --notest .

@watersb
Copy link

watersb commented Jan 27, 2019

NOTE: Apple's new file system, APFS, apparently preserves Unicode normalization: if it gets a filename specified with decomposed Unicode (NFD), it won't change it, but if APFS writes new files, it will use the NFC (composed char) form.

You might not need covmv with APFS.

https://medium.com/@yorkxin/apfs-docker-unicode-6e9893c9385d

(I used to run ZFS storage arrays on my Mac Pro, and had a script that would set NFD on ZFS volume setup. Note that this was when I was using ZFS as direct-attached storage; for a while it seemed that ZFS was to be the next-gen macOS file system of choice. That blew up when Sun was acquired by Oracle, and Sun was not able to separate intellectual-property claims in order to ensure its ability to license the ZFS codebase. So now we have APFS, and macOS seems to have used the decade-long delay to implement NFC in its VFS layer. YMMV. WWJD. WTF.)

@SHawnHardy
Copy link

SHawnHardy commented Feb 24, 2019

Saved my day. Thanks a lot.

@DanielSmedegaardBuus
Copy link

DanielSmedegaardBuus commented Feb 25, 2019

Remember, if you send files to a non-Mac with rsync from a Mac, you can use the argument --iconv=utf-8-mac,utf-8 to ensure the files are sent with the proper NFC names to the target; and vice-versa, when fetching from a non-Mac to a Mac via rsync, you can use --iconv=utf-8,utf-8-mac.

Unfortunately, at least for the Ubuntu version of rsync, this argument may not be supported. Really weird. But it is for the native Mac version of rsync, as well as the Homebrew version.

@hwdbk
Copy link

hwdbk commented May 31, 2019

Also note that on MacOS, the command iconv can be used to convert between NFD and NFC
iconv -f UTF-8 -t UTF-8-MAC (or vice versa of course)
but many UNIX/Linux implementations that I've come across have the iconv command but do not support the UTF-8-MAC option...

@mackyle
Copy link

mackyle commented Oct 24, 2019

The UTF-8-MAC support was added to Cupertino’s version of iconv -- that’s why it’s not available on other systems.

They have also, apparently, removed their documentation of the HFS+ file name encodings. But, thanks to the wayback machine, you can see it here:

File Systems and Unicode Support

It states:

Mac OS Extended (HFS+) uses canonically decomposed Unicode 3.2 [...]
characters in the ranges U2000-U2FFF, UF900-UFA6A, and U2F800-U2FA1D are not decomposed

And that last little bit is how UTF-8-MAC differs from Unicode 3.2’s NFD.

@hwdbk
Copy link

hwdbk commented May 19, 2020

I've created a repository with a pair of bijective scripts that do the conversion to and from NFD and does not rely on iconv:
https://github.com/hwdbk/synology-scripts/tree/master/mac-nfd-conversion
The scripts run on Mac OS X and other unixes (it uses bash and sed only). I use them on a Synology NAS, hence the names syn2mac and mac2syn, but what's in a name?
Also contains the script generating these scripts, if you want to play with it.

@jeiksegovia
Copy link

jeiksegovia commented Jun 4, 2020

Very simple and nice

@rico
Copy link

rico commented Oct 27, 2020

... another day saved - thanks so much!

@fguern
Copy link

fguern commented Dec 20, 2020

Hello everyone.

I am on mac and I can't make the script work.

After a cd to the directory i want to change UTF, I copy the script path and press enter. But nothing happen.
Can you help me to make it work?
Thank you very very much by advance.

Best

@hwdbk
Copy link

hwdbk commented Dec 21, 2020

Hi fguern, if you're referring to mac2syn or syn2mac, these scripts read from file or stdin, and output on stdout.
So, suppose you have file with NFD UTF text, called my_nfd_utf.txt (for instance), you type
mac2syn my_nfd_utf.txt
or
mac2syn < my_nfd_utf.txt
or
some-other-program-producing-the-text | mac2syn
make sure the script is executable (chmod 750 mac2syn)

If you have a string that needs translating, the syntax is
echo $(mac2syn <<< "string_or_variable_with_nfd_utf_text")

Cheers, Henk

@fguern
Copy link

fguern commented Dec 23, 2020

Hello Henk,

Thanks for your help. It's still not clear for me.

Using a mac, does this mean to type that in the terminal ?
However, my goal is to change an entire folder with sub folders and sub files to the syno compatible UTF.

I tried that in the terminal :
Francoiss-MacBook-Air:~ francois$ mac2syn /Users/francois/Documents/01.\ Documents/2008_03_26\ -\ A\ voir\ à\ paris.rtf
-bash: mac2syn: command not found

I guess it didn't work :D.

And the script mac2syn is read and write for everyone.

Cheers, François

@hwdbk
Copy link

hwdbk commented Dec 23, 2020

@fguern
Copy link

fguern commented Dec 23, 2020

Hey Henk,

I tried your command, after installing the homebrew package + convmv (see : http://macappstore.org/convmv/). Because I understood that the script is based on these two package, right?

When launching the script with your command, even by adding 'sightseeing paris utf.rtf' it doesn't do anything : the file is still not synchronized with my syno.

francois@Francoiss-MacBook-Air ~ % cd /Users/francois/Downloads/synology-scripts-master/mac-nfd-conversion
francois@Francoiss-MacBook-Air mac-nfd-conversion % ./mac2syn /Users/francois/Documents/01.\ Documents/2008_03_26\ -\ A\ voir\ à\ paris.rtf 'sightseeing paris utf.rtf'
{\rtf1\ansi\ansicpg1252\cocoartf1265
\cocoascreenfonts1{\fonttbl\f0\fswiss\fcharset0 Helvetica;}
{\colortbl;\red255\green255\blue255;}
\paperw11900\paperh16840\margl1440\margr1440\vieww9000\viewh8400\viewkind0
\pard\tx566\tx1133\tx1700\tx2267\tx2834\tx3401\tx3968\tx4535\tx5102\tx5669\tx6236\tx6803\pardirnatural

\f0\fs24 \cf0 * Caf'e9 Branly + mus'e9e
m'e9tro alma marceau
rer, pont de l'alma
\

  • Caf'e9 des deux moulin _ Am'e9lie Poulain
    15 rue Lepic _ Montmartre
    -> fait
    \
  • Mus'e9e Gr'e9vin, 10 boulevard Montmartre
    m'e9tro gd boulevard}cat: sightseeing paris utf.rtf: No such file or directory

Once it works with a file, i'll try you command on a folder.

Thank you Henk!!
+

@hwdbk
Copy link

hwdbk commented Dec 24, 2020

@fguern
Copy link

fguern commented Dec 24, 2020

I did a test and it's only the title of the file which is the problem: if i create a copy and rename the file as "2008_03_26 - A voir a paris copy.rtf" instead of "2008_03_26 - A voir à paris.rtf" the file is synced with the synology

My goal is to rename all my files in a folder to adapt from NFD to NFC and in the future be sure all the accented files are sync with the synology.
And I thought the script mac2syn was to adapt the title file from NFD to NFC.

Was I wrong and did I miss something?

Thanks a lot for your help and time Henk.
Cheers, François

@hwdbk
Copy link

hwdbk commented Dec 24, 2020

@hwdbk
Copy link

hwdbk commented Dec 24, 2020

@fguern
Copy link

fguern commented Dec 26, 2020

Hello Henk,

I wish you a merry Christmas !

Thanks for the command. I tried it. At first the command worked but no file title where changed, and no sync with the syno happened. Then it display a dquote>.
Terminal stuff is definitely not for me.

To summarize :

  • mac2syn script is in the mac download folder : /Users/francois/Downloads/synology-scripts-master/mac-nfd-conversion/mac2syn
  • The folder and all its subfolders to change NFD to NFC is /Users/francois/Documents/01.\ Documents/25.\ Test/
  • I created a "01. Test"sub folder in the "25. Test" Folder to simulate subfolders. (need it in the future as I will apply the command on all 01.\ Documents)
  • The file "2008_03_26 - A voir à paris.rtf" is in the "01. Test" subfolder
  • I CD the mac2syn foder (cd /Users/francois/Downloads/synology-scripts-master/mac-nfd-conversion/mac2syn)
  • I enter your command without the */rtf to deal with all files : for f in /Users/francois/Documents/01.\ Documents/25.\ Test/ ; do mv -v -n ”$f" "$(dirname "$f")/$(./mac2syn <<< "$(basename "$f")")" ; done

Here is the result :
francois@Francoiss-MacBook-Air mac-nfd-conversion % for f in /Users/francois/Documents/01.\ Documents/25.\ Test/ ; do mv -v -n ”$f" "$(dirname "$f")/$(./mac2syn <<< "$(basename "$f")")" ; done
for dquote> for f in /Users/francois/Documents/01.\ Documents/25.\ Test/ ; do mv -v -n ”$f" "$(dirname "$f")/$(./mac2syn <<< "$(basename "$f")")" ; done
mv: rename ”/Users/francois/Documents/01. Documents/25. Test/ /Users/francois/Documents/01. to /Users/francois/Documents/01. Documents/25. Test/01.: No such file or directory
mv: rename Documents/25. to /Users/francois/Documents/01. Documents/25. Test/25.: No such file or directory
mv: rename Test ; done
for f in /Users/francois/Documents/01.\ Documents/25.\ Test/ ; do mv -v -n ”/Users/francois/Documents/01. Documents/25. Test/ to /Users/francois/Documents/01. Documents/25. Test/25. Test/: No such file or directory

Is the mac2syn and a rename command done for my need?

Thank you Henk,
François

@hwdbk
Copy link

hwdbk commented Dec 26, 2020

@fguern
Copy link

fguern commented Dec 26, 2020

Got it.
I changed the double quotes and have an invalid argument this time:
francois@Francoiss-MacBook-Air mac-nfd-conversion % for f in /Users/francois/Documents/01.\ Documents/25.\ Test/ ; do mv -v -n "$f" "$(dirname "$f")/$(./mac2syn <<< "$(basename "$f")")" ; done
mv: rename /Users/francois/Documents/01. Documents/25. Test/ to /Users/francois/Documents/01. Documents/25. Test/25. Test/: Invalid argument

The command seems to repeat the last folder. Is it the Dirname+basename command?

However, I found another command which copy an entire folder and change the nfd, without script: rsync -a --iconv=utf-8-mac,utf-8 /Users/francois/Documents/01.\ Documents/25.\ Test/ /Users/francois/Documents/01.\ Documents/26.\ Test\ 2/
This one works. Even If it duplicate the files, I think it's a acceptable workaround. What do you think ?

Thank you,
François

@hwdbk
Copy link

hwdbk commented Dec 27, 2020

@fguern
Copy link

fguern commented Dec 27, 2020

Hello Henk.

At this stage I see three solutions:

/////1 - Your script

-> This time, it's "not overwritten":
francois@Francoiss-MacBook-Air mac-nfd-conversion % for f in /Users/francois/Documents/01.\ Documents/25.\ Test/*.rtf ; do mv -v -n "$f" "$(dirname "$f")/$(./mac2syn <<< "$(basename "$f")")" ; done
/Users/francois/Documents/01. Documents/25. Test/2008_03_26 - A voir à paris.rtf not overwritten
Is it possible to have an entire folder+subfolders rename with your script?

//// 2 - James CONVMV command
-> I also tried the command above "convmv -r -f utf8 -t utf8 --nfc --notest" and I got a "wrong/unknown encoding" :
francois@Francoiss-MacBook-Air 25. Test % convmv -r -f enc -t enc utf8 --nfc --notest
wrong/unknown "from" encoding!

///// 3 - Rsync local copy with NFC
-> No apparent problem, except the copy of 10Go
rsync -a --iconv=utf-8-mac,utf-8 /Users/francois/Documents/01.\ Documents/02.\ Administratif /Users/francois/Documents/01.\ Documents/02.\ Administratif\ nfc

What's your expert advise? Is it worth it to try to make the script or the convmv command work?
Thanks

@hwdbk
Copy link

hwdbk commented Dec 28, 2020

@hwdbk
Copy link

hwdbk commented Dec 28, 2020

@jsvini
Copy link

jsvini commented Mar 18, 2021

God bless you! 🙌

@jcarnat
Copy link

jcarnat commented Apr 14, 2021

Great. Thanks a lot!

@boulderob
Copy link

boulderob commented Apr 30, 2021

So i just upgraded to a new used intel-based macbook pro (aka mbp). I reformatted the internal SDD to use APFS and i'm seeing the exact symptoms defined here when i use VLC to view downloaded french video mp4 files with matching subtitle vtt filenames that have unicode characters in the filenames themselves! VLC plays the mp4 fine but it can't locate and autoload the vtt file with the same exact filename except for the extension! If there are no unicode "french" characters in the filenames it all works fine. But all files whether they had unicode "French" characters or not worked great on my old mbp with an older version of VLC and a NON-APFS file system.

In fact if i mount my external NTFS drive with a huge library of previously downloaded french video mp4 and vtt files on the new mbp (using an NTFS driver to mount the drive of course), the new VLC recognizes and plays these OLD files normally.

BUT on the new mbp, VLC will not autorecognzie the coinciding and same exact vtt filename as it's equivalent mp4 when the filename contains unicode french chars when i download new files via youtube-dl onto the internal SDD formatted as APFS

If i copy these new download files to the usb attached NTFS drive they then magically work they way i expect from VLC on the new mbp! :) If i then copy them back to the new mbp internal SDD with APFS, they also work the way i expect :) This seems to indicate that the NTFS filesystem changes the fileNAME encoding when the file is copied to it and that copying back to APFS somehow does NOT change that new encoding!???

I have a huge collection and am constantly adding to and maintaining it.

Is there any way to just set what your scripts are doing at the filesystem or even system level? Or going forward will i have to always run a post download script on every new filename to convert the unicode flavor used so that my mac / vlc can recognize them?!

Thanks

@hwdbk
Copy link

hwdbk commented May 1, 2021

yup, that is exactly the madness with having two allowed but different character encodings for, say, the è (e-accent-grave)
the simple way to make sure both the media file and the vtt/srt file uses the same file name encoding is:

for i in *.mp4 ; do
mv -vn "$i" "$(syn2mac <<< "$i")"
done

and do the same for the subtitle files. You'll probably get a lot of "same file" warnings from mv on those files that were already in the target encoding.

@s2k
Copy link

s2k commented May 4, 2021

Very nice tip!
On my Mac, I used brew install convmv, BTW.

@igorsgm
Copy link

igorsgm commented May 13, 2021

You saved my day. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment