Skip to content

Instantly share code, notes, and snippets.

@jdtsmith
Last active August 20, 2023 12:30
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jdtsmith/7fa6263a13559d587abb51827e6ae472 to your computer and use it in GitHub Desktop.
Save jdtsmith/7fa6263a13559d587abb51827e6ae472 to your computer and use it in GitHub Desktop.
tree-sitter navigation speed test
(let (cnt tm node res)
(goto-char (point-min))
(while (< (point) (point-max))
(setq tm
(benchmark-run 100
(setq node (treesit-node-at (point)))
(while node
(setq node (treesit-node-parent node))))
node (treesit-node-at (point)))
(setq cnt 0)
(while node (setq node (treesit-node-parent node)) (cl-incf cnt))
(setq res (push (cons cnt tm) res))
(beginning-of-line 2))
(with-output-to-temp-buffer "*ts-test-output*"
(cl-loop for (cnt tau ngc tgc) in (nreverse res)
for i from 1
do (princ (format "%d\t%d\t%f\t%d\t%f\n"
i cnt (* tau 10) ngc (* tgc 10))))))
import matplotlib.pyplot as plt
import numpy as np
plt.clf()
d=np.loadtxt('test.dat')
sc=plt.scatter(d[:,0],d[:,2], c=d[:,1], alpha=0.5, s=60, cmap='rainbow', linewidths=0)
plt.xlabel('Line Number')
plt.ylabel('Root Navigation Time (ms)')
plt.title('Tree Sitter Navigation Time\nfrom start of each line in _axes.py')
plt.colorbar(sc, label='Node Depth')
plt.xscale('log')
plt.yscale('log')
plt.tight_layout()
first = d[d[:,0]<=10,:]
fac = np.mean(np.sqrt(first[:,0])/first[:,2])
plt.plot([1,1e4],np.sqrt([1,1e4])/fac, label='$t\propto\sqrt{N}$')
plt.legend()
plt.savefig('ts.png', dpi=300)
@jdtsmith
Copy link
Author

jdtsmith commented Aug 17, 2023

This gist provides code to test the performance scaling of Emacs 29's tree sitter navigation. The test it performs is navigating upwards from the node identified at the beginning of each line in a large file, here the _axes.py file from matplotlib, which has about 8400 lines.

To perform the test:

  1. Load the _axes.py file into Emacs 29, configured with python-ts-mode active and the default python grammar library.
  2. Disable any other costly modes like eglot, tree-sitter-explore-mode, etc.
  3. In this python file, execute the elisp code from the 1st file above (e.g. with M-:), waiting a minute or two for the results to appear in a separate buffer.
  4. Save the displayed buffer as the file test.dat.
  5. Run the ts_plot.py script interactively or via python command line to produce a plot.

The resulting plot:

image

We can conclude from this that:

  1. Emacs 29 tree-sitter's navigation time to root, via treesit-node-parent grows as sqrt(N) with line number N (solid line), which increases navigation time by roughly 100x from beginning to end in a file of this size.
  2. The deeper a given node is placed in the syntax tree, the more time it takes (redder colors towards the top), but this is a sub-dominant effect — even at a relatively low depth (purple points) the time increases substantially from early to late positions in the file.

@jdtsmith
Copy link
Author

jdtsmith commented Aug 18, 2023

Update: a similar test for treesitter-node-at shows very slow growth with line number (after excluding points with GC occurring):

ts2

@jdtsmith
Copy link
Author

Another update: this plot compares the time taken to find a node at point with locating it's parent, both off which should be using positional winnowing to skip branches of the syntax tree that do not contain point.

image

@jdtsmith
Copy link
Author

jdtsmith commented Aug 19, 2023

I have applied a patch developed by Dimitry Gutov, which uses ts_node_parent, vs. the original ts_tree_cursor_goto_parent, on a freshly compiled NS build of main. The results are striking:

With ts_tree_cursor_goto_parent (current design of node-parent):

ts_new_cursor

This looks quite similar to the emacs-29 results.

With ts_node_parent (Dimitry's patch):

ts_new_parent

Note the reduced y range and much shallower scaling. This indicates that the scaling and long navigation time at the end of the file are related to the cursor walk, not the parent search from root per se. The patched version results also make more sense in terms of their similar logarithmic growth as node-at-point, since the method of search for a node at point and for its parent is quite similar.

@jdtsmith
Copy link
Author

Another new patch, from Yuan Fu, retains the cursor walk, but applies a byte position pruning. These look like:

ts_yuan_parent

So intermediate between the two, rising to about 300µs, i.e. 10x faster than the current algorithm. node_parent is still ~10x faster yet, and rises more slowly across the tree.

@jdtsmith
Copy link
Author

An updated patch along these lines, which trims the tree search using both the start and end coverage position of nodes:

image

This has similar performance and scaling as the simple node_parent version above. Looks like the right solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment