Created
October 21, 2011 04:55
-
-
Save vivien/1303130 to your computer and use it in GitHub Desktop.
CodeRay: Iterate on each token with its kind, and its starting line number
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
require 'coderay' | |
class CodeRay::Tokens | |
def each_token | |
lineno = 1 | |
self.each_slice(2) do |token, kind| | |
yield token, kind, lineno | |
lineno += token.count("\n") if token.is_a?(String) | |
end | |
end | |
end | |
__END__ | |
# Example | |
CodeRay.scan_file("path/to/a/file").each_token do |token, kind, line| | |
puts "#{line}: #{token}" if kind == :comment | |
end |
For the moment the Tokens#each method doesn't return two-element arrays, but flatten token/kind couples (i.e. [token1, kind1, token2, kind2, ...],
True, because that turned out to be even faster ;)
One 1.8.6-compatible way to iterate over pairs would be this:
content = nil
for item in tokens
if content
yield content, item
content = nil
else
content = item
end
end
raise 'odd number list for Tokens' if content
But I guess an each_token method would be nice, too.
However, you don't need the Array representation any more: The Scanners call text_token, begin_group etc. on the encoder object, so you can react to them directly. The YAML Encoder demonstrates this.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi Korny,
For the moment the Tokens#each method doesn't return two-element arrays, but flatten token/kind couples (i.e. [token1, kind1, token2, kind2, ...], that's why I'm using Array#each_slice(2)). Or maybe I'm not using the good Tokens method?
If you're not using tokens array anymore, how are you iterating on each token?
I'm using CodeRay because I needed a tokenizer to improve a personal project, notes, which grep annotations in source comments. I'll push this modification soon.