aseemk/i18n._coffee

## readme.md

      
    Raw
  

              readme.md
            
          
    This is a little script I wrote to automatically scour a git repo's source code and extract i18n strings.
It searches specifically for gettext-style __() calls, which happen to be used by @mashpie's i18n-node and @jeresig's i18n-node-2 (which I'm using), but which are also used by other tools.
It searches for these strings in a pretty liberal way that works with JavaScript, CoffeeScript, and probably most other similar languages: it just looks for (non-word character), __, either ( or whitespace, followed by a single- or double-quoted string.
It takes these strings and updates a JSON file with them, showing you strings that have been added and removed since the last run. This JSON format is the same as used by the above mentioned modules and others, so there should be no conflicts.
Example run:
$ ./scripts/i18n._coffee 

Reading current strings...
31 current strings found.

Searching files for strings...
14 matching files found.

Extracting strings from files...
32 total strings found.

3 strings added:
  - Log In
  - Log in
  - Log Out

2 strings removed:
  - Login
  - Logout


This tool probably isn't perfect, but it gives me peace of mind knowing that forgetting a code path will no longer mean missed strings. It doesn't have to replace the runtime updating of these tools; it supplements them nicely.
This tool is written in Streamline syntax, but the only async part (besides the file I/O, which doesn't have to be) is the call to git. You could easily rewrite this without Streamline if you just wrap the rest in a function and pass that as a callback to exec().
Feedback welcome! If you try this out, let me know how you like it. Cheers.

  
## i18n._coffee
#!/usr/bin/env _coffee
#
# Helper script to search all of our files for i18n strings and update our
# strings file. Helpful in case we missed a code path during testing.
#
# Specifically, searches for gettext `__()` calls in our checked-in files.
#

$ = require 'underscore'
echo = console.log
{exec} = require 'child_process'
FS = require 'fs'
Path = require 'path'


## CONSTANTS:

STRINGS_FILEPATH = "#{__dirname}/locales/en.json"

# High-level search for files that look like they may contain __ calls.
# -I (no long option) means exclude binary files.
# https://www.kernel.org/pub/software/scm/git/docs/git-grep.html
GIT_GREP_COMMAND = """
    git grep -I --word-regexp --name-only -e '__' -- #{__dirname}
"""

# Tailored regex to match our `__()` calls and extract the strings.
# http://www.regular-expressions.info/reference.html =)
# XXX Is this a bad idea? Brittle? Or good enough and safe?
# TODO We don't do this currently, but do we want to detect and support calls
# with heredoc (triple-quoted) strings too? Are they even good for i18n tho?
I18N_CALL_REGEX = ///
    \W          # `__` cannot follow a letter, number, or another underscore
    __
    [(\s]       # "calling" means either an `(` or whitespace (CoffeeScript)
    (           # and the string is either...
        '           # single-quoted...
            (.+?        # (match anything, but lazily, not greedily)
            [^\\])      # and the closing quote is one that's *not* preceded by a `\`
        '
        |           # or...
        "           # double-quoted...
            (.+?        # (match anything, but lazily, not greedily)
            [^\\])      # and the closing quote is one that's *not* preceded by a `\`
        "
    )
///gi


## MAIN:

# Read in the current set of strings:
echo '\nReading current strings...'
oldStrs = Object.keys require STRINGS_FILEPATH
echo "#{oldStrs.length} current strings found."

# Grep our checked-in files for a rough match of files:
echo '\nSearching files for strings...'
files = exec GIT_GREP_COMMAND, _
files = files.trim().split '\n'

# Filter out Markdown files since they're only documentation right now:
# (And if we ever used Markdown for user-facing content, I bet we could just
# translate the whole Markdown file itself.)
files = files.filter (file) -> (Path.extname file) isnt '.md'
echo "#{files.length} matching files found."

# Search matching files for all instances of, and extract, i18n strings:
echo "\nExtracting strings from files..."
newStrsMap = {}
for file in files
    code = FS.readFile "#{__dirname}/#{file}", 'utf8', _
    while match = I18N_CALL_REGEX.exec code
        if str = match[2]
            newStrsMap[str] = str
        else
            console.error 'Anomaly!', match
newStrs = Object.keys newStrsMap
echo "#{newStrs.length} total strings found."

# Compare the old vs. new strings:
added = $(newStrs).difference oldStrs
removed = $(oldStrs).difference newStrs
echo "\n#{added.length} strings added:\n  -", added.join '\n  - '
echo "\n#{removed.length} strings removed:\n  -", removed.join '\n  - '

# Finally, update the JSON!
FS.writeFile STRINGS_FILEPATH, (JSON.stringify newStrsMap, null, 4), _
	#!/usr/bin/env _coffee
	#
	# Helper script to search all of our files for i18n strings and update our
	# strings file. Helpful in case we missed a code path during testing.
	#
	# Specifically, searches for gettext `__()` calls in our checked-in files.
	#

	$ = require 'underscore'
	echo = console.log
	{exec} = require 'child_process'
	FS = require 'fs'
	Path = require 'path'


	## CONSTANTS:

	STRINGS_FILEPATH = "#{__dirname}/locales/en.json"

	# High-level search for files that look like they may contain __ calls.
	# -I (no long option) means exclude binary files.
	# https://www.kernel.org/pub/software/scm/git/docs/git-grep.html
	GIT_GREP_COMMAND = """
	git grep -I --word-regexp --name-only -e '__' -- #{__dirname}
	"""

	# Tailored regex to match our `__()` calls and extract the strings.
	# http://www.regular-expressions.info/reference.html =)
	# XXX Is this a bad idea? Brittle? Or good enough and safe?
	# TODO We don't do this currently, but do we want to detect and support calls
	# with heredoc (triple-quoted) strings too? Are they even good for i18n tho?
	I18N_CALL_REGEX = ///
	\W # `__` cannot follow a letter, number, or another underscore
	__
	[(\s] # "calling" means either an `(` or whitespace (CoffeeScript)
	( # and the string is either...
	' # single-quoted...
	(.+? # (match anything, but lazily, not greedily)
	[^\\]) # and the closing quote is one that's not preceded by a `\`
	'
	\| # or...
	" # double-quoted...
	(.+? # (match anything, but lazily, not greedily)
	[^\\]) # and the closing quote is one that's not preceded by a `\`
	"
	)
	///gi


	## MAIN:

	# Read in the current set of strings:
	echo '\nReading current strings...'
	oldStrs = Object.keys require STRINGS_FILEPATH
	echo "#{oldStrs.length} current strings found."

	# Grep our checked-in files for a rough match of files:
	echo '\nSearching files for strings...'
	files = exec GIT_GREP_COMMAND, _
	files = files.trim().split '\n'

	# Filter out Markdown files since they're only documentation right now:
	# (And if we ever used Markdown for user-facing content, I bet we could just
	# translate the whole Markdown file itself.)
	files = files.filter (file) -> (Path.extname file) isnt '.md'
	echo "#{files.length} matching files found."

	# Search matching files for all instances of, and extract, i18n strings:
	echo "\nExtracting strings from files..."
	newStrsMap = {}
	for file in files
	code = FS.readFile "#{__dirname}/#{file}", 'utf8', _
	while match = I18N_CALL_REGEX.exec code
	if str = match[2]
	newStrsMap[str] = str
	else
	console.error 'Anomaly!', match
	newStrs = Object.keys newStrsMap
	echo "#{newStrs.length} total strings found."

	# Compare the old vs. new strings:
	added = $(newStrs).difference oldStrs
	removed = $(oldStrs).difference newStrs
	echo "\n#{added.length} strings added:\n -", added.join '\n - '
	echo "\n#{removed.length} strings removed:\n -", removed.join '\n - '

	# Finally, update the JSON!
	FS.writeFile STRINGS_FILEPATH, (JSON.stringify newStrsMap, null, 4), _