Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
ANSI / UTF-8 (with or without BOM) conversion #Windows

Using Uni2Me

  • It's free but discontinued.

Using UTFCast

  • Proprietary software
  • Allows conversion from ANSI to UTF-8 with or without BOM

Using Notepad++

Using Python Script Plugin

from glob import glob
from Npp import notepad

globPath = "C:\MyFiles\*.txt"

for file in glob(globPath):
  notepad.open(file)
  notepad.runMenuCommand("Encoding", "Convert to UTF-8-BOM")
  notepad.save()
  notepad.close()

Using Macros

  1. Start Macro recording
  2. Select Encoding > Convert to UTF-8-BOM
  3. Select all text and copy it (it's a bug otherwise it will replace file contents with Clipboard content)
  4. Save file and close it

Using Bash

Add BOM to an already encoded UTF-8 file

echo -ne '\xEF\xBB\xBF' > utf8-no-bom.txt

Batch conversion using find and iconv

# Find all .txt files and convert them to UTF-8 (assuming US characters only / ANSI)
find *.txt -exec 'iconv -f CP1252 -t UTF-8  {} > {}'

# all Windows character sets
iconv -l | grep -i windows

Batch conversion using ls and iconv

for i in `ls *.txt`; do
  iconv -f WINDOWS-1252 -t UTF8 $i -o $i.utf8
  mv $i.utf8 $i
done

Change in …; do with in $@; do to create a usable Bash file. (e.g. convert.sh myfile.txt myfile2.txt)

Using Batch

Add BOM to all text files using nkf

for %a in (*.txt) do nkf32 -W8 -w8 --overwrite "%a"

Download binary for WindowsSource code

Note: Change -w8 with -w80 to remove BOM

Batch conversion using for and iconv

for %a in (*.txt) do iconv -f CP1252 -t UTF-8 "%a" > "%a"

Trivial methods

@vlakoff

This comment has been minimized.

Copy link

@vlakoff vlakoff commented May 23, 2020

For the record, since Windows 10 version 1903, Notepad supports UTF-8 without BOM. (finally!)

Refs: Windows 10 Notepad is Getting Better UTF-8 Encoding Support

@bulli-03

This comment has been minimized.

Copy link

@bulli-03 bulli-03 commented Apr 1, 2021

Good morning,

I have a question, I want to convert all files of a type from ANSI to UTF-8 with batch. For this I would like to use the following code.

for %a in (*.txt) do iconv -f CP1252 -t UTF-8 "%a" > "%a"

now my question is what should be in the second %a the path of the folder where the files are and the third %a where they should be. But what should be in the first %a and is this correct at all? If no, how would it be correct?

Example:
for %a in (*.csv) do iconv -f CP1252 -t UTF-8 ...\Desktop\Test1 > ...\Desktop\Test2

@dogancelik

This comment has been minimized.

Copy link
Owner Author

@dogancelik dogancelik commented Apr 1, 2021

@bulli-03 Your question is more about for command than the conversion itself. See for /? for more examples about the loop process.

rem Example: you are in 'C:\Files\' and you run this command:
mkdir new
for %a in (*.csv) do iconv -f CP1252 -t UTF-8 "%~a" > "new\%~nxa"
rem C:\Files\Test.csv ➡ C:\Files\new\Test.csv
@Pooja5757

This comment has been minimized.

Copy link

@Pooja5757 Pooja5757 commented Apr 7, 2021

iconv -f CP1252 -t UTF-8 "%~a" > "new\%~nxa"

Thank youu..this worked so well....but when im giving same filename after conversion(as i dont want to have other folder or another file), the file after conversion is getting empty/corrupted......Any idea how can i have the same file after conversion(just like overwriting the same file)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment