Skip to content

Instantly share code, notes, and snippets.

@windsting
Created September 14, 2018 06:18
Show Gist options
  • Save windsting/5563cad76167c0d9b5a97e233eeeac3f to your computer and use it in GitHub Desktop.
Save windsting/5563cad76167c0d9b5a97e233eeeac3f to your computer and use it in GitHub Desktop.
为 UTF-8 编码的文本文件添加BOM

为 UTF-8 编码的文本文件添加 BOM

应用场景

  • 在 macOS 和 Windows 下同步 Cocos2D 项目

    Cocos2D 项目,在 macOS 下用 Xcode 编辑后,UTF-8 编码文件保存时不带 BOM,导致在 Windows 下用 Visual Studio 编译期报错,给这些文件添加 BOM 后,可以解决这些编译期错误,并且不会导致 Xcode 中编译有问题。

系统需求

本工具是一组 bash 脚本,需要在 bash 命令行下执行,同时请确保系统内存在以下软件

使用方法

在 bash 命令行内,执行

add-bom-for-files-in-folder.sh path-of-files-to-convert

其中 path-of-files-to-convert 是一个路径,该路径下所有以 不带签名的 UTF-8(UTF-8 without Signature) 编码的文件,都会被转换为 带签名的UTF-8编码(UTF-8 with Signature)

脚本文件

find-file-with-encoding.sh

此脚本列出指定编码格式的文件,请用 find-file-with-encoding.sh -h 查看使用说明。

#!/bin/bash

# ENCODING=UTF-8Unicodetext
TARGET=.
usage(){
    echo "find all files with specified encoding in a directory and all subdirectories"
    echo ""
    echo "$0"
    echo "  -h --help           show this message and exit"
    echo "  -e --encoding       specified encoding -e=$ENCODING"
    echo "  -l --list-encodings list all encodings and one file can find currently"
    echo "  -t --target-dir     target directory to check -t=$TARGET"
    echo ""
}

parse_arg() {
    while [ "$1" != "" ]; do
        PARAM=`echo $1 | awk -F= '{print $1}'`
        VALUE=`echo $1 | awk -F= '{print $2}'`
        case $PARAM in
            -h | --help)
                usage
                exit
                ;;
            -e | --encoding)
                ENCODING=$VALUE
                # echo "got ENCODING=$ENCODING"
                ;;
            -l | --list-encodings)
                LIST="1"
                # echo "got LIST=$LIST"
                ;;
            -t | --target-dir)
                TARGET=$VALUE
                # echo "got TARGET=$TARGET"
                ;;
            *)
                echo "ERROR: unknown parameter \"$PARAM\""
                usage
                exit 1
                ;;
        esac
        shift
    done
    echo $*
}

get_type () {
    INFO=`file - < "$1" | cut -d: -f2`
    TYPE=`echo $INFO | cut -d, -f2`
    TYPE=`echo $TYPE | sed 's,^ *,,; s, *$,,'`
    TYPE=`echo ${INFO//[[:space:]]/}`
    echo "$TYPE"
}

declare -A EMap

# BOMs:
# UTF-8Unicode(withBOM)text
# UTF-8Unicodetext
# ASCIItext
# ISO-8859text

find() {
    for file in $1/*
    do
        if [ -d "$file" ]
        then
            if [ -z "$(ls -A $file)" ]
            then
                :
            else
                if (( $# > 1 ))
                then
                    find "$file" "$2"
                else
                    find "$file"
                fi
            fi
        else
            # echo "$file"
            TYPE=`get_type "$file"`
            # echo "$file : $TYPE"
            EMap[$TYPE]="$file"
            if [ -z "$2" ] || [[ "$TYPE" != *$2* ]]
            then
                :
            else
                echo "$file"
            fi
        fi
    done
}

main(){
    parse_arg $*

    find $TARGET $ENCODING

    if [ -z "$LIST" ]
    then
        :
    else
        echo ""
        echo "All encodings:"
        for i in "${!EMap[@]}"
        do
        echo "$i:    ${EMap[$i]}"
        done
    fi
}

main $*

convert-to-utf8-with-signature.sh

此脚本把指定 文件(File)源编码格式(SourceEncoding)(默认是不带BOM的 UTF-8) 转换到 “带签名的UTF-8编码(UTF-8 with Signature)”。

#!/bin/bash

# echo $*

function get_type () {
    INFO=`file - < "$1" | cut -d: -f2`
    TYPE=`echo $INFO | cut -d, -f2`
    TYPE=`echo $TYPE | sed 's,^ *,,; s, *$,,'`
    echo "$TYPE"
}

function trim_string () {
    result=${1##}
    # result=${result%%}
    echo $result
}

function print_with_spaces () {
    echo "-$1-"
}

function test_trim() {
    STR="   Hello  World!       "
    print_with_spaces "$STR"
    STR=`echo $STR | sed 's,^ *,,; s, *$,,'`   # this line do the "Trim" action
    print_with_spaces "$STR"
    exit 0
}

function do_print_type () {
    echo "    $1"
}

function print_type () {
    FILE=$1
    TYPE=`echo $TYPE | sed 's,^ *,,; s, *$,,'`
    echo "$FILE type: -$TYPE-" 1>&2
    if [ "$TYPE" = "ASCII text" ]
    then
        do_print_type "ascii file"
    elif [ "$TYPE" = "UTF-8 Unicode (with BOM) text" ]
    then
        do_print_type "utf-8 with BOM"
    elif [ "$TYPE" = "UTF-8 Unicode text" ]
    then
        do_print_type "utf-8 without BOM"
    elif [ "$TYPE" = "ISO-8859 text" ]
    then
        do_print_type "GB2312"
    else
        do_print_type "========== unknown type: $TYPE"
    fi
}

# test_trim

if [ $# -lt 1 ]
then
    echo "Usage: $0 File [SourceEncoding]"
    exit
fi

File=$1
Src="UTF-8"
# if [ $# -ge 2 ]
# then
#     Src=$2
# fi


for File in $@
do
    echo "converting $File from $Src"
    uconv -f $Src -t UTF-8 --add-signature "$File" -o "$File.new"
    mv "$File.new" "$File"
done

exit


if [ $# -eq 0 ]
then
    echo usage $0 files ...
    exit 1
fi

for file in "$@"
do
    # echo "# Processing: $file" 1>&2
    if [ ! -f "$file" ]
    then
        echo Not a file: "$file" 1>&2
        exit 1
    fi
    TYPE=`get_type "$file"`
    # echo "$file type: -$TYPE-" 1>&2
    print_type "$file" "$TYPE"
    if echo "$TYPE" | grep -q '(with BOM)'
    then
        :
        # echo "# $file already has BOM, skipping." 1>&2
    else
        :
        # echo 1>&2
        # ( mv "${file}" "${file}"~ && uconv -f utf-8 -t utf-8 --add-signature < "${file}~" > "${file}" ) || ( echo Error processing "$file" 1>&2 ; exit 1)
    fi
done

add-bom-for-files-in-folder.sh

此脚本组合以上两个独立的脚本,提供简化的操作接口

#!/bin/bash

if [ $# -lt 1 ]
then
    echo "Usage: $0 path-of-files-to-convert"
    exit
fi

find-file-with-encoding.sh -e=UTF-8Unicodetext -t=$1 | xargs convert-to-utf8-with-signature.sh

补充说明

  1. 同步 Cocos2D 项目事项

    在项目中添加新文件后,如果该文件需要被添加到 项目文件(VS下是 .vcxproj 文件)内,一般会出现链接期的错误提示:

    Error	LNK1120	2 unresolved externals	land	E:\develop\proj\land\proj.win32\Release.win32\client.exe	1	
    Error	LNK2001	unresolved external symbol "public: static void __cdecl DialogNewbieGuide::Dialog(class cocos2d::Node *)" (?Dialog@DialogNewbieGuide@@SAXPAVNode@cocos2d@@@Z)	client	E:\develop\proj\land\proj.win32\DialogClubMain.obj	1	
    

    这种情况,只要找到包含这些符号(本例中是 DialogNewbieGuide)的文件,添加到项目中即可。

  2. 脚本文件

    在本页面上复制文本保存文件时,请尽量使用 Unix 的换行方式(LF),如果使用 Windows 的换行方式(CRLF),可能在执行脚本时,出现如下错误:

    /mnt/d/portable/_bin/add-bom-for-files-in-folder.sh: line 2: $'\r': command not found
    /mnt/d/portable/_bin/add-bom-for-files-in-folder.sh: line 10: syntax error: unexpected end of file
    
@iOSPrincekin
Copy link

have a error :
./find-file-with-encoding.sh: line 84: cprogramtext,UTF-8Unicodetext: value too great for base (error token is "8Unicodetext")

@windsting
Copy link
Author

have a error :
./find-file-with-encoding.sh: line 84: cprogramtext,UTF-8Unicodetext: value too great for base (error token is "8Unicodetext")

This should be related to the environment and the files to be precessed, what is the version of your bash?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment