Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save guest271314/59406ad47a622d19b26f8a8c1e1bdfd5 to your computer and use it in GitHub Desktop.
Save guest271314/59406ad47a622d19b26f8a8c1e1bdfd5 to your computer and use it in GitHub Desktop.
SpeechSynthesis *to* a MediaStreamTrack or: How to execute arbitrary shell commands using inotify-tools and DevTools Snippets
The requirement described at Support SpeechSynthesis *to* a MediaStreamTrack (
is possible at Firefox and Nightly due to those browsers exposing `"Monitor of <device>"` as a device that can be selected
when `navigator.getUserMedia()` is executed. That provides a means to capture audio being output to speakers or headphones
without also capturing microphone input.
That output is also possible at Chrome/Chromium by following the proceure described at This is again recording from microphone,
not from audiooutput device (
After filing issues Support capturing audio output from sound card (,
Clarify getUserMedia({audio:{deviceId:{exact:<audiooutput_device>}}}) in this specification mandates capability
to capture of audio output device - not exclusively microphone input device (,
at Media Capture and Streams specification (aka getUserMedia) ( in order to make what
is already possible clear in the specification so that Chrome/Chromium authors to explicitly expose the device `"Monitor of <device>"`
at *nix, decided to revisit the subject matter anew, with a more expansive mandate than only capturing speech synthesis as
a `MediaStream` or, specifically, as a `MediaStreamTrack`; rather, with the requirement to execute arbitrary shell commands
with the capability to get the output of those commands, if any, within the browser, at *nix.
Prior issues describing the concept built upon hereafter
- <script type="shell"> to execute arbitrary shell commands, and import stdout or result written to local file as a JavaScript module (
- Add execute() to FileSystemDirectoryHandle (
1. Install `inotify-tools` (
2. Launch Chrome/Chromium with necessary flags set with `--use-file-for-fake-audio-capture` value set as the `wav` file
that we will have the ability to get as a `MediaStream` in 6. and 7. below.
chromium-browser --allow-file-access-from-files --autoplay-policy=no-user-gesture-required --use-fake-device-for-media-stream --use-fake-ui-for-media-stream --use-file-for-fake-audio-capture=$HOME/localscripts/output.wav%noloop --user-data-dir=$HOME/test
3. Create a local directory (e.g. `localscripts`) where the file to be monitored for `close` event is saved.
Open DevTools at Chrome/Chromium, select `Sources`, select `Snippets`, select `New snippet`, name the snippet `run`,
right-click on `run` snippet, select `Save as...` then save the file in `localscripts` directory.
4. Create a shell script to be executed when `close` event of file `run` occurs, again, in Snippets at DevTools.
Follow the procedure in 2., save the script in a directory in `PATH`, e.g. `$HOME/bin`, here the file is named ``
while inotifywait -e close $HOME/localscripts/run; do
5. Create the shell script to be executed (again, in Snippets at DevTools following 2.) and save as ``. Set the script as
executable `chmod +x` and place in `PATH` or `localscripts` directory.
espeak-ng -m -f $HOME/localscripts/input.txt -w $HOME/localscripts/output.wav
In this case we read text input from `input.txt` and output the resulting `wav` file to `output.wav`.
6. To meet the requirement "Support SpeechSynthesis *to* a MediaStreamTrack" at Chrome/Chromium we will launch Chromium with the
necessary flags set to get input to the `MediaStream` from `output.wav` using JavaScript. Again, we follow 2. to create
and name the file `stream`.
async function speak() {
// one issue with speech synthesis directly to MediaStream or MediaStreamTrack
// is there is no way to determine when the output is really ended, as neither
// ended, mute, or unmute events are fired, and the input, due to -m flag passed to espeak-ng
// can contain SSML <break> elements, e.g., <break time="2500ms"> which can convey
// false-positive if detecting silence is used to check if the expected output has completed
// therefore store the MediaStreamTrack as a global variable and execute stop() when speak()
// is called again
if (!globalThis.track) {
globalThis.track = null;
} else {
const stream = await navigator.mediaDevices.getUserMedia({audio: true});
globalThis.track = stream.getTracks()[0];
// sound is not output to speakers or headphones (
const ac = new AudioContext();
const source = ac.createMediaStreamSource(stream);
7. Following 2. we create a JavaScript file to execute the code defined in `stream` and name the file `speak`
8. Following 2 we write input text in `input` and save the file as `input.txt` in `localscripts`, e.g.
Do stuff.
Do other stuff!
Now, let's try this.
<p>A paragraph.<p>
<s>a sentence</s>
123<break time="2500ms">
456<break time="2500ms">
9. Execute `` (3.).
$ ~/bin/
10. Right-click `stream` and select `Run` to define `speak` globally.
11. Right-click `run` and select `Save as...` to save the file in `localscripts` which will cause `inotifywatch` event
to be fired
/home/user/localscripts/run CLOSE_WRITE,CLOSE
Setting up watches.
Watches established.
12. Right-click `speak` and select `Run`.
Copy link


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment