Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
ANSI / UTF-8 (with or without BOM) conversion #Windows

Using Uni2Me

  • It's free but discontinued.

Using UTFCast

  • Proprietary software
  • Allows conversion from ANSI to UTF-8 with or without BOM

Using Notepad++

Using Python Script Plugin

from glob import glob
from Npp import notepad

globPath = "C:\MyFiles\*.txt"

for file in glob(globPath):
  notepad.open(file)
  notepad.runMenuCommand("Encoding", "Convert to UTF-8-BOM")
  notepad.save()
  notepad.close()

Using Macros

  1. Start Macro recording
  2. Select Encoding > Convert to UTF-8-BOM
  3. Select all text and copy it (it's a bug otherwise it will replace file contents with Clipboard content)
  4. Save file and close it

Using Bash

Add BOM to an already encoded UTF-8 file

echo -ne '\xEF\xBB\xBF' > utf8-no-bom.txt

Batch conversion using find and iconv

# Find all .txt files and convert them to UTF-8 (assuming US characters only / ANSI)
find *.txt -exec 'iconv -f CP1252 -t UTF-8  {} > {}'

# all Windows character sets
iconv -l | grep -i windows

Batch conversion using ls and iconv

for i in `ls *.txt`; do
  iconv -f WINDOWS-1252 -t UTF8 $i -o $i.utf8
  mv $i.utf8 $i
done

Change in …; do with in $@; do to create a usable Bash file. (e.g. convert.sh myfile.txt myfile2.txt)

Using Batch

Add BOM to all text files using nkf

for %a in (*.txt) do nkf32 -W8 -w8 --overwrite "%a"

Download binary for WindowsSource code

Note: Change -w8 with -w80 to remove BOM

Batch conversion using for and iconv

for %a in (*.txt) do iconv -f CP1252 -t UTF-8 "%a" > "%a"

Trivial methods

@vlakoff
Copy link

vlakoff commented May 23, 2020

For the record, since Windows 10 version 1903, Notepad supports UTF-8 without BOM. (finally!)

Refs: Windows 10 Notepad is Getting Better UTF-8 Encoding Support

@bulli-03
Copy link

bulli-03 commented Apr 1, 2021

Good morning,

I have a question, I want to convert all files of a type from ANSI to UTF-8 with batch. For this I would like to use the following code.

for %a in (*.txt) do iconv -f CP1252 -t UTF-8 "%a" > "%a"

now my question is what should be in the second %a the path of the folder where the files are and the third %a where they should be. But what should be in the first %a and is this correct at all? If no, how would it be correct?

Example:
for %a in (*.csv) do iconv -f CP1252 -t UTF-8 ...\Desktop\Test1 > ...\Desktop\Test2

@dogancelik
Copy link
Author

dogancelik commented Apr 1, 2021

@bulli-03 Your question is more about for command than the conversion itself. See for /? for more examples about the loop process.

rem Example: you are in 'C:\Files\' and you run this command:
mkdir new
for %a in (*.csv) do iconv -f CP1252 -t UTF-8 "%~a" > "new\%~nxa"
rem C:\Files\Test.csv ➡ C:\Files\new\Test.csv

@Pooja5757
Copy link

Pooja5757 commented Apr 7, 2021

iconv -f CP1252 -t UTF-8 "%~a" > "new\%~nxa"

Thank youu..this worked so well....but when im giving same filename after conversion(as i dont want to have other folder or another file), the file after conversion is getting empty/corrupted......Any idea how can i have the same file after conversion(just like overwriting the same file)

@Basti-Fantasti
Copy link

Basti-Fantasti commented Dec 14, 2021

Thanks for sharing the method to change the file encoding using Python in NP++ 👍

I had to make some adjustments to the script to get it to work.
First of all I had to change the globPath to be set like this:

globPath = "C:\\mydir\\*.txt"

And the line notepad.runMenuCommand needs to be adjusted to the NP++ language in use.
So I had to change it on my German NP++ setup from:

notepad.runMenuCommand("Encoding", "Convert to UTF-8")

to

notepad.runMenuCommand("Kodierung", "Konvertiere zu UTF-8")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment