Skip to content

Instantly share code, notes, and snippets.

@BruceWind
Last active December 12, 2023 15:03
Show Gist options
  • Save BruceWind/c7c21e5ec608f8bb984771bc52afd246 to your computer and use it in GitHub Desktop.
Save BruceWind/c7c21e5ec608f8bb984771bc52afd246 to your computer and use it in GitHub Desktop.
One-key shell to convert all Chinese text in a repository to English.

One-key shell to convert all Chinese text in a repository to English.

I use it to translation Chinese text in my project. I share this to help so many people's time on translate their project.

# A shell, which is utilised to match all Chinese texts.

########################################################################
# Usage:
# Environment requirements:
# 1. install ag(silver searcher).
# 2. install translate-shell.
# ------run---------------
# 1. chmod +x match_chinese.sh
# 2. run './match_chinese.sh' in your project folder.
########################################################################

set -e
# read all text files which contains Chinese and Unicode characters into a variable
FileContainingChineseArr=($(ag -c '[^\x00-\x7F]'))
echo "All files below which contains Chinese or Unicode characters:
"
printf "'%s'\n" "${FileContainingChineseArr[@]}"
#
# to foreach all files
for fileWithLine in ${FileContainingChineseArr[@]}; do
  # to replace chars after colon(:).
  file=($(echo $fileWithLine | sed 's/:.*//g'))

  # to match all Chinese texts
  ChineseTexts=($(cat $file | grep -o -E "[^u4E00-u9FA5]+"))
  echo ""
  echo "$file has Chinese or Unicode characters below:
  "
  echo "
  ----------------------------------------------"
  # to foreach all Chinese texts
  for text in ${ChineseTexts[@]}; do
    # to replace unexpected chars.
    text=($(echo $text | tr -d '\11\12\15\40-\176'))
    if ([[ ! -z "$text" ]]); then
      translatedTxt=$(trans -b :en "$text")
      echo "\"$text\": \"$translatedTxt\","
    fi
  done
done

# cat file.txt | grep -E "[^u4E00-u9FA5]"
@BruceWind
Copy link
Author

BruceWind commented Feb 20, 2023

In this shell, there is a bug:

结果,如下:is a clauses, but it will be seperated into two words like 结果and如下 .
Despite the bug, this is a convinient shell script for translating my project.

@BruceWind
Copy link
Author

BruceWind commented Sep 29, 2023

You may need a script to wrap <>中文</> to <>{t('中文')}</>,So I wrote this script:

#!/bin/bash

# set file extensions.
FILE_TYPE=".jsx"

# iterate all files in current directory.
for file in $(find . -type f -name "*$FILE_TYPE"); do
  # to match all Chinese texts
  ChineseTexts=($(cat $file | grep -o -E "\s[^u4E00-u9FA5]+"))
  echo ""
  echo "$file has Chinese or Unicode characters below:
  "
  echo "
  ----------------------------------------------"
  # to foreach all Chinese texts
  for text in ${ChineseTexts[@]}; do
    # to replace unexpected chars.
    text=($(echo $text | tr -d '\11\12\15\40-\176'))
    
    if ([[ ! -z "$text" ]]); then
        if [[ $string == *"'"* ]]; then
            echo "it contains Single quotation marks."
        else
            sed -i -E "s/($text)/{t('\1')}/g" "$file"
            echo "'$text' wrapped up in $file"
        fi
    fi
  done
done

echo "Done!"

This script is not perfect but usable. Pls check code change after running it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment