-
-
Save vivien/1303130 to your computer and use it in GitHub Desktop.
require 'coderay' | |
class CodeRay::Tokens | |
def each_token | |
lineno = 1 | |
self.each_slice(2) do |token, kind| | |
yield token, kind, lineno | |
lineno += token.count("\n") if token.is_a?(String) | |
end | |
end | |
end | |
__END__ | |
# Example | |
CodeRay.scan_file("path/to/a/file").each_token do |token, kind, line| | |
puts "#{line}: #{token}" if kind == :comment | |
end |
Hi Korny,
For the moment the Tokens#each method doesn't return two-element arrays, but flatten token/kind couples (i.e. [token1, kind1, token2, kind2, ...], that's why I'm using Array#each_slice(2)). Or maybe I'm not using the good Tokens method?
If you're not using tokens array anymore, how are you iterating on each token?
I'm using CodeRay because I needed a tokenizer to improve a personal project, notes, which grep annotations in source comments. I'll push this modification soon.
For the moment the Tokens#each method doesn't return two-element arrays, but flatten token/kind couples (i.e. [token1, kind1, token2, kind2, ...],
True, because that turned out to be even faster ;)
One 1.8.6-compatible way to iterate over pairs would be this:
content = nil
for item in tokens
if content
yield content, item
content = nil
else
content = item
end
end
raise 'odd number list for Tokens' if content
But I guess an each_token method would be nice, too.
However, you don't need the Array representation any more: The Scanners call text_token, begin_group etc. on the encoder object, so you can react to them directly. The YAML Encoder demonstrates this.
Thank you for using it :)
Interesting...
The main reason for returning just the token and its kind was speed (and memory); two-element arrays / method calls are very fast in Ruby.
Some random thoughts: We could add something like a each_token_with_line_number Filter (since CodeRay isn't using the Tokens array any more; the Scanner calls the Encoder directly). This might be useful for streaming output — but a filter that returns additional :line_number tokens at the start of each line would be more appropriate for this.
What are you using it for?