Skip to content

Instantly share code, notes, and snippets.

@vivien
Created October 21, 2011 04:55
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save vivien/1303130 to your computer and use it in GitHub Desktop.
Save vivien/1303130 to your computer and use it in GitHub Desktop.
CodeRay: Iterate on each token with its kind, and its starting line number
require 'coderay'
class CodeRay::Tokens
def each_token
lineno = 1
self.each_slice(2) do |token, kind|
yield token, kind, lineno
lineno += token.count("\n") if token.is_a?(String)
end
end
end
__END__
# Example
CodeRay.scan_file("path/to/a/file").each_token do |token, kind, line|
puts "#{line}: #{token}" if kind == :comment
end
@korny
Copy link

korny commented Oct 21, 2011

Thanks for CodeRay, that's a great job.

Thank you for using it :)

Just for information, I wasn't able to find how to get line number when iterating on a token, so I wrote this patch.

Interesting...

The main reason for returning just the token and its kind was speed (and memory); two-element arrays / method calls are very fast in Ruby.

Maybe you could be interested.

Some random thoughts: We could add something like a each_token_with_line_number Filter (since CodeRay isn't using the Tokens array any more; the Scanner calls the Encoder directly). This might be useful for streaming output — but a filter that returns additional :line_number tokens at the start of each line would be more appropriate for this.

What are you using it for?

@vivien
Copy link
Author

vivien commented Oct 24, 2011

Hi Korny,

For the moment the Tokens#each method doesn't return two-element arrays, but flatten token/kind couples (i.e. [token1, kind1, token2, kind2, ...], that's why I'm using Array#each_slice(2)). Or maybe I'm not using the good Tokens method?

If you're not using tokens array anymore, how are you iterating on each token?

I'm using CodeRay because I needed a tokenizer to improve a personal project, notes, which grep annotations in source comments. I'll push this modification soon.

@korny
Copy link

korny commented Oct 24, 2011

For the moment the Tokens#each method doesn't return two-element arrays, but flatten token/kind couples (i.e. [token1, kind1, token2, kind2, ...],

True, because that turned out to be even faster ;)

One 1.8.6-compatible way to iterate over pairs would be this:

        content = nil
        for item in tokens
          if content
            yield content, item
            content = nil
          else
            content = item
          end
        end
        raise 'odd number list for Tokens' if content

But I guess an each_token method would be nice, too.

However, you don't need the Array representation any more: The Scanners call text_token, begin_group etc. on the encoder object, so you can react to them directly. The YAML Encoder demonstrates this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment