Skip to content

Instantly share code, notes, and snippets.

@jdtsmith
Last active December 28, 2022 19:23
Show Gist options
  • Save jdtsmith/0c35675ec33be1402fab60fe6cbd4d0c to your computer and use it in GitHub Desktop.
Save jdtsmith/0c35675ec33be1402fab60fe6cbd4d0c to your computer and use it in GitHub Desktop.
Where precisely can Emacs take control of execution to deliver async process output to your filter-function?
(defvar-local still-waiting nil
"Whether we are still waiting for a chunk of process output to complete.")
(defun my-worker-1 (process)
"Do some work and yield to PROCESS output."
(do-some-sensitive-calculation)
(sit-for 1) ; A) output can interrupt me here
(another-sensitive-buffer-manipulation) ; B) what about here?
(my-worker-2 process))
(defun my-worker-2 (process)
"Do some work and communicate with PROCESS."
(heavy-calc1) ; C) can these simple calc-only calls themselves be interrupted?
(heavy-calc2) ; D) how about *between* the two calls?
(process-send-string proc "I'm done!") ; E) presumably right here can be
(my-worker-3)
(while still-waiting
(accept-process-output process 0.1 nil 1) ; F) obviously here
(sit-for 0.02))) ; G) and here
(defun my-worker-3 ()
"Do more work, entirely unrelated to the process."
(do-something-entirely-unrelated-to-the-process)) ; H) but can this function or its internal calls be interrupted?
@jdtsmith
Copy link
Author

Update: from discussions with Eli, A,E,F and G are the only places input can arrive.

@minad
Copy link

minad commented Dec 26, 2022

There are many more functions which process input/output, e.g., redisplay. These functions all do a little bit of io processing. The rest of the processing happens then in the main outer event loop, as soon as the control flow returns. For this reason it is not really clear for which of your heavy functions, input could arrive. But note that we are not really talking about interrupts here, the processing is interleaved with the actual operation. sit-for will process some io, but will continue to sit as long as time is not up and as long as no user input arrived.

I believe all of this is rather simple as long as long as you are careful. You could for example implement a process filter which does too much heavy processing, call redisplay etc. Then this could lead to reentrant process filter calls. You may want to avoid this. In the usual case you just append to a queue. If you want to trigger some ui action afterwards you could enqueue a timer, which defers the action until the next time control returns to the outer event loop.

I wonder, is there an actual issue which prompted your question? The model should be mostly equivalent to event loops in other systems, maybe with the exception that Emacs has these additional processing points and nested event loops via recursive minibuffers. Emacs supports this additional processing points to allow synchronous Turtle style programming, sequentially executing steps, redisplaying etc, goto-char, read-key, goto-char, insert, redisplay, ... However this programming style is a recipe for Emacs hangups. One should strive to avoid writing code in such a style these days and better go fully async.

@jdtsmith
Copy link
Author

Thanks for your thoughts, minad. It's very interesting that there are indeed other points where async process input can in principle arrive. This actually fits my experience a bit better. And it supports my feeling that emacs async is "less cooperative" than other async systems, where you have more explicit control of when async tasks are allowed to execute in interleaved fashion. Now in these systems, it may be you are using an async library which surprises you in terms of where it awaits output from (say) a network-facing task. But IMLE most async libraries in Python/JS/etc. are carefully designed around giving their users explicit control of when/where to await process/network/... IO. Of course interleaving a GUI event loop at the same time, as Emacs must do, gets more complex, so that's a consideration for going easier on it. Still, it feels like there must be some improvements in "async coding style" possible there.

In terms of the actual issue, I am recently expanding from a relatively simple comint process-filter-function to one which can process heavier data (hundreds of kb of base64-encoded image data) "on the side". I do try to get the data into a (hidden) buffer and get out quickly, which is about as fast an operation on ~Kb of data you could do in emacs. But I also do re-enter the filter function once at a specific location via a while (accept-process-output..., since for good data transfer speed over the pty connection, you cannot rely on the outer event loop's cadence of calling the filter function. In my measurements the outer event loop only delivers process output over a PTY at about 60kB/s max (arriving in 1024 byte chunks on MacOS). A tight accept-process-output loop is hundreds of times faster.

The actual decoding of my image data is very fast, but your idea to perform this and (perhaps especially) the actual image display outside of the filter function is a worth a look. I do have a "pending commands to send" queue, but not (yet) a "tasks in response to process output" queue. Can you mention a few more details? Do you enqueue closures in your filter functions? Do you use (run-at-time 0 ...) for processing deferred tasks from the outer event loop? Do you save the current buffer in the closure and then explicitly set it for each such deferred task (I use many buffer-local variables, since you can have multiple sub-processes across different buffers)?

Some of this structure I'm borrowing from Carsten and my old IDLWAVE package, which had its own comint + "fast hidden data, on-the-side" filter function capability. In those simpler days, completions + docs/etc. were "on-demand" and process latency was low and predictable, so it was easier all-around than now. Now we have timers spawning commands (e.g. corfu + eldoc), static analysis backends with quasi-random run-times (iPython's Jedi internals), etc. Still I do recall both Carsten and I regarding the async filter (which needs also to service normal shell-based user interaction, over the same channel) as a delicate glass structure that, once setup, should be tiptoed around.

@minad
Copy link

minad commented Dec 28, 2022 via email

@jdtsmith
Copy link
Author

I don't understand what you mean by "less cooperative". I would say,
that actually Emacs is "more cooperative" since you can add explicit
yield points where processing can happen.

Maybe I should say "less intuitive".

In other async systems this is not possible, since all async processing only happens at the outer main
event loop.

But you can yield to that loop at will, to let other tasks run (async output arrive) at any time, in a predictable manner. Whereas Emacs may process async output at more unexpected times. Maybe it's a matter of learning to expect those and avoid the unwanted ones. But it's way less clear to me than e.g.:

do_something_heavy()
new_output = await some_output()
do_something_with(new_output) # <--- no output will arrive here
other_output = await other_output()
...

@jdtsmith
Copy link
Author

jdtsmith commented Dec 28, 2022

This sounds a bit weird. It should probably be sufficient if you
increase read-process-output-max? Did you try that? The default value is
far too small.

Mine is at 1048576 (on emacs-mac), but due to annoying MacOS limitations, the maximum tty line length is 1024 bytes, so we get 1024 byte blocks. Probably pipe would be much faster, but then you lose Ctrl-D and other types of tty processing. UPDATE: OMG pipes are so much faster. Blocks of output come in 65536 chunks @>1500kB/s without any special effort. Might be worth working around Ctrl-D and related. Do you know of other PTY advantages?

It may be that accept-process-output is indeed faster. But it should not be.

An example would be a %ls -lR in a directory producing about 20K of data, that takes 0.4s in a (non-emacs) terminal. For me, in a tight accept-process-output loop I can get all this data in (in 1024 byte chunks) in about 1sec. Using the outer event loop, it takes 45s-1min. I tried this same command in eshell, and i) it blocks the entire time waiting for the new prompt and ii) it takes about 6sec (though it does color-code files etc. so has more processing to do).

Update: with a pipe this takes a mere 0.7s letting the main event loop do the process polling! Why would I not just switch to a pipe?

The same holds for the GC thresholds. The Emacs developers didn't scale these values with hardware advancements such
that they are still stuck in the 90's.

I use gcmh-mode. I tend to think of GC settings as a user's discretion (maybe making recommendations). Do you actively alter it in your packages? Seems hard to do generically.

@jdtsmith
Copy link
Author

jdtsmith commented Dec 28, 2022

In Consult I use for example consult--async-refresh which throttles the refreshing.

Thanks, I'll take a look at this. Since most refreshing is "showing accumulating output in the buffer", I'm not sure how much throttling I should do. Do you throttle in terms of "display-altering updates per second" or total data rate of the updates? I think the former might make good sense to me. E.g.: give yourself a full second of slurping in data as fast as possible, then dump output to the shell, then repeat as needed.

This might dissatisfy some people, since in Python you can of course make arbitrary text output/input sequences at whatever rate you want, but probably better than just letting the main event loop trickle data in at 35-60kB/s (I mean, that's 1990's dialup speed!). Nobody is happy waiting 1min for a deep directory listing that takes <0.5s in a real shell.

@minad
Copy link

minad commented Dec 28, 2022

But you can yield to that loop at will, to let other tasks run (async output arrive) at any time, in a predictable manner. Whereas Emacs may process async output at more unexpected times.

No, that's not true. It is exactly the same situation in Emacs. Note that async/await is just syntactic sugar.

do_something_heavy()
some_output(function(new_output) {
   do_something_with(new_output)
   other_output(function(other_output) {
      ...
   })
})

Vice versa you can write a macro in Emacs which performs the same async/await transformation for a block of code. Not sure how this package is called (aio?). If you write the exact same code in Emacs, "await" will also yield to the outer event loop "at will".

I am not sure where your misconception lies here. The complication in Emacs is only that you can locally start additional event loops, e.g., via sit-for. This is what makes Emacs somewhat "more cooperative" and maybe "less intuitive".

@minad
Copy link

minad commented Dec 28, 2022

Mine is at 1048576, but due to annoying MacOS limitations, the maximum tty line length is 1024 bytes, so we get 1024 byte blocks.

Hmm, I am not sure but maybe you could disable line buffering.

I use gcmh-mode. I tend to think of GC settings as a user's discretion (maybe making recommendations). Do you actively alter it in your packages? Seems hard to do generically.

I am not a fan of gcmh-mode since it can lead to very long pauses. I use fairly conservative settings, but my threshold is still maybe 10x larger than the Emacs default. I don't recommend adjusting the gc threshold in packages. You can do it locally if you want to optimize for throughput. I do that at a few places in Consult, where a lot of allocations take place.

Thanks, I'll take a look at this. Since most refreshing is "showing accumulating output in the buffer", I'm not sure how much throttling I should do.

In Consult I use rather large delays, 0.1s maybe? We wouldn't win much by refreshing more often. The display would start to flicker and load would go up. I recommend to experiment a bit.

@jdtsmith
Copy link
Author

One more crazy idea: if I do switch to pipe interaction with the iPython process, I could also send all "hidden output" that happens behind the scenes (completion data, documentation, etc.) to stderr, attaching a separate filter-function to that. This would simplify parsing that data. I need to do some experiments to see whether this is actually faster (e.g. can iPython fill stdout and stderr separately faster than stuffing all output down one pipe). Do you have any experience with using both stdout and stderr to communicate with high-volume processes?

@jdtsmith
Copy link
Author

But you can yield to that loop at will, to let other tasks run (async output arrive) at any time, in a predictable manner. Whereas Emacs may process async output at more unexpected times.

No, that's not true. It is exactly the same situation in Emacs. Note that async/await is just syntactic sugar.

do_something_heavy()
some_output(function(new_output) {
   do_something_with(new_output)
   other_output(function(other_output) {
      ...
   })
})

Right. But in emacs, do_something_with(new_output) may itself lead to code-paths which produce process output. While this is true in Python/JS, it seems (based on admittedly limited experience) way less common. Maybe "async hygiene" is the right idea.

Vice versa you can write a macro in Emacs which performs the same async/await transformation for a block of code. Not sure how this package is called (aio?). If you write the exact same code in Emacs, "await" will also yield to the outer event loop "at will".

I am not sure where your misconception lies here. The complication in Emacs is only that you can locally start additional event loops, e.g., via sit-for. This is what makes Emacs somewhat "more cooperative" and maybe "less intuitive".

This exactly. And that various emacs internals effectively call "await process_output()" without regard for whether you are prepared to accept it at that moment. But if the list of such internals is predictable, you can work around it (and, as you say, use it to your advantage).

@minad
Copy link

minad commented Dec 28, 2022

And that various emacs internals effectively call "await process_output()" without regard for whether you are prepared to accept it at that moment.

That's not what they do. They start a new local event loop, they are blocking. But yes, if one ensures that no such blocking functions are called, everything should be predictable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment