Skip to content

Instantly share code, notes, and snippets.

@asmeurer
Created February 8, 2018 22:54
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save asmeurer/69340402032c16e33e4d6a523b3ce654 to your computer and use it in GitHub Desktop.
Save asmeurer/69340402032c16e33e4d6a523b3ce654 to your computer and use it in GitHub Desktop.
parso inside_string
"""
An attempt at an implementation of inside_string() using parso
inside_string(s, row, col) should return True if (row, col) is inside a string in the (partial) Python code s. (row is 1-based and col is 0-based)
Here is a version using tokenize that I am pretty sure is correct (it has tests) https://github.com/asmeurer/mypython/blob/796a33c8b029f5ee7096bf153db9d59a8220bb01/mypython/tokenize.py#L140
Two bugs I found so far:
- parso counts the newline as col 0 after the first line
- parso.parse("1 + 'a' + '''abc\ndef'''").get_leaf_for_position((1, 7)) gives String (position 7 is the space after 'a')
Other issues:
- Handling error leafs, especially when the value includes spaces, is a bit annoying
- It's not obvious that this handles every possible case
"""
import parso
def inside_string(s, row, col):
"""
Return True if row, col is inside a string in s
"""
p = parso.parse(s)
if row > 1:
# For whatever reason parso puts \n at col 0 after the first line
col += 1
node = p.get_leaf_for_position((row, col))
return isinstance(node, parso.python.tree.String) or isinstance(node, parso.python.tree.PythonErrorLeaf) and node.value.lstrip()[0] in '"\''
@davidhalter
Copy link

As I said on twitter before, there is a tokenizer that you could probably use more or less like a drop in replacement for tokenize.generate_tokens. It offers a nicer API and doesn't raise errors.

parso counts the newline as col 0 after the first line

>>> parso.parse('a=1\nb=2').get_leaf_for_position((2, 0))
<Newline: u'\n'>

I understand now what you mean. This is actually an issue about the API of get_leaf_for_position. I never liked that method and now it shows that it's even more problematic.

I don't understand what you mean. The API for get_leaf_for_position just sucks. If you are between tokens e.g. -1, the - will always win, because the end position is used as well. We probably need to improve that. If nothing else we should at least document it. The positions of the tokens are correct, it just feels weird.

  • parso.parse("1 + 'a' + '''abc\ndef'''").get_leaf_for_position((1, 7)) gives String (position 7 is the space after 'a')

This is the same issue. The problem is really that in some cases there are two leafs for a position. Which one wins? :)

  • It's not obvious that this handles every possible case

Which cases are not handled? Do you just mean that you're not sure if it handles every single case?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment