Created
February 8, 2018 22:54
-
-
Save asmeurer/69340402032c16e33e4d6a523b3ce654 to your computer and use it in GitHub Desktop.
parso inside_string
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
""" | |
An attempt at an implementation of inside_string() using parso | |
inside_string(s, row, col) should return True if (row, col) is inside a string in the (partial) Python code s. (row is 1-based and col is 0-based) | |
Here is a version using tokenize that I am pretty sure is correct (it has tests) https://github.com/asmeurer/mypython/blob/796a33c8b029f5ee7096bf153db9d59a8220bb01/mypython/tokenize.py#L140 | |
Two bugs I found so far: | |
- parso counts the newline as col 0 after the first line | |
- parso.parse("1 + 'a' + '''abc\ndef'''").get_leaf_for_position((1, 7)) gives String (position 7 is the space after 'a') | |
Other issues: | |
- Handling error leafs, especially when the value includes spaces, is a bit annoying | |
- It's not obvious that this handles every possible case | |
""" | |
import parso | |
def inside_string(s, row, col): | |
""" | |
Return True if row, col is inside a string in s | |
""" | |
p = parso.parse(s) | |
if row > 1: | |
# For whatever reason parso puts \n at col 0 after the first line | |
col += 1 | |
node = p.get_leaf_for_position((row, col)) | |
return isinstance(node, parso.python.tree.String) or isinstance(node, parso.python.tree.PythonErrorLeaf) and node.value.lstrip()[0] in '"\'' |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
As I said on twitter before, there is a tokenizer that you could probably use more or less like a drop in replacement for tokenize.generate_tokens. It offers a nicer API and doesn't raise errors.
I understand now what you mean. This is actually an issue about the API of
get_leaf_for_position
. I never liked that method and now it shows that it's even more problematic.I don't understand what you mean. The API for
get_leaf_for_position
just sucks. If you are between tokens e.g.-1
, the-
will always win, because the end position is used as well. We probably need to improve that. If nothing else we should at least document it. The positions of the tokens are correct, it just feels weird.This is the same issue. The problem is really that in some cases there are two leafs for a position. Which one wins? :)
Which cases are not handled? Do you just mean that you're not sure if it handles every single case?