guest271314/speech_synthesis_to_a_mediastreamtrack_or_how_to_execute_arbitrary_shell_commands_using_inotify_tools_and_devtools_snippets.txt

## speech_synthesis_to_a_mediastreamtrack_or_how_to_execute_arbitrary_shell_commands_using_inotify_tools_and_devtools_snippets.txt
The requirement described at Support SpeechSynthesis *to* a MediaStreamTrack (https://github.com/WICG/speech-api/issues/69)
is possible at Firefox and Nightly due to those browsers exposing `"Monitor of <device>"` as a device that can be selected
when `navigator.getUserMedia()` is executed. That provides a means to capture audio being output to speakers or headphones
without also capturing microphone input.

That output is also possible at Chrome/Chromium by following the proceure described at This is again recording from microphone,
not from audiooutput device (https://github.com/guest271314/SpeechSynthesisRecorder/issues/14#issuecomment-527020198).

After filing issues Support capturing audio output from sound card (https://github.com/w3c/mediacapture-main/issues/629),
Clarify getUserMedia({audio:{deviceId:{exact:<audiooutput_device>}}}) in this specification mandates capability
to capture of audio output device - not exclusively microphone input device (https://github.com/w3c/mediacapture-main/issues/650),
at Media Capture and Streams specification (aka getUserMedia) (https://github.com/w3c/mediacapture-main) in order to make what
is already possible clear in the specification so that Chrome/Chromium authors to explicitly expose the device `"Monitor of <device>"`
at *nix, decided to revisit the subject matter anew, with a more expansive mandate than only capturing speech synthesis as
a `MediaStream` or, specifically, as a `MediaStreamTrack`; rather, with the requirement to execute arbitrary shell commands
with the capability to get the output of those commands, if any, within the browser, at *nix.

Prior issues describing the concept built upon hereafter

- <script type="shell"> to execute arbitrary shell commands, and import stdout or result written to local file as a JavaScript module (https://github.com/whatwg/html/issues/3443)
- Add execute() to FileSystemDirectoryHandle (https://github.com/WICG/native-file-system/issues/97)

Procedure

1. Install `inotify-tools` (https://github.com/rvoicilas/inotify-tools).
2. Launch Chrome/Chromium with necessary flags set with `--use-file-for-fake-audio-capture` value set as the `wav` file
   that we will have the ability to get as a `MediaStream` in 6. and 7. below.

```
chromium-browser --allow-file-access-from-files --autoplay-policy=no-user-gesture-required --use-fake-device-for-media-stream --use-fake-ui-for-media-stream --use-file-for-fake-audio-capture=$HOME/localscripts/output.wav%noloop --user-data-dir=$HOME/test
```

3. Create a local directory (e.g. `localscripts`) where the file to be monitored for `close` event is saved.
   Open DevTools at Chrome/Chromium, select `Sources`, select `Snippets`, select `New snippet`, name the snippet `run`,
   right-click on `run` snippet, select `Save as...` then save the file in `localscripts` directory.
4. Create a shell script to be executed when `close` event of file `run` occurs, again, in Snippets at DevTools.
   Follow the procedure in 2., save the script in a directory in `PATH`, e.g. `$HOME/bin`, here the file is named `waiting.sh`

```
#!/bin/sh
while inotifywait -e close $HOME/localscripts/run; do
  $HOME/bin/input.sh
done
```

5. Create the shell script to be executed (again, in Snippets at DevTools following 2.) and save as `input.sh`. Set the script as
   executable `chmod +x` and place in `PATH` or `localscripts` directory.

```
#!/bin/sh
espeak-ng -m -f $HOME/localscripts/input.txt -w $HOME/localscripts/output.wav
```
   In this case we read text input from `input.txt` and output the resulting `wav` file to `output.wav`.

6. To meet the requirement "Support SpeechSynthesis *to* a MediaStreamTrack" at Chrome/Chromium we will launch Chromium with the
   necessary flags set to get input to the `MediaStream` from `output.wav` using JavaScript. Again, we follow 2. to create
   and name the file `stream`.

```
async function speak() {
  // one issue with speech synthesis directly to MediaStream or MediaStreamTrack
  // is there is no way to determine when the output is really ended, as neither
  // ended, mute, or unmute events are fired, and the input, due to -m flag passed to espeak-ng
  // can contain SSML <break> elements, e.g., <break time="2500ms"> which can convey
  // false-positive if detecting silence is used to check if the expected output has completed
  // therefore store the MediaStreamTrack as a global variable and execute stop() when speak()
  // is called again
  if (!globalThis.track) {
    globalThis.track = null;
  } else {
    console.log(globalThis.track);
    globalThis.track.stop();
  }

  const stream = await navigator.mediaDevices.getUserMedia({audio: true});

  globalThis.track = stream.getTracks()[0];
  // sound is not output to speakers or headphones (https://github.com/cypress-io/cypress/issues/5592#issuecomment-569972506)
  const ac = new AudioContext();
  const source = ac.createMediaStreamSource(stream);
  source.connect(ac.destination);
}
```

7. Following 2. we create a JavaScript file to execute the code defined in `stream` and name the file `speak`

```
speak();
```
8. Following 2 we write input text in `input` and save the file as `input.txt` in `localscripts`, e.g.

```
Do stuff.
Do other stuff!
Now, let's try this.

<p>A paragraph.<p>
<s>a sentence</s>

123<break time="2500ms">
456<break time="2500ms">
789
```
9. Execute `waiting.sh` (3.).
```
$ ~/bin/waiting.sh
```

10. Right-click `stream` and select `Run` to define `speak` globally.
11. Right-click `run` and select `Save as...` to save the file in `localscripts` which will cause `inotifywatch` event
    to be fired
```
/home/user/localscripts/run CLOSE_WRITE,CLOSE
Setting up watches.
Watches established.
```
12. Right-click `speak` and select `Run`.
	The requirement described at Support SpeechSynthesis to a MediaStreamTrack (https://github.com/WICG/speech-api/issues/69)
	is possible at Firefox and Nightly due to those browsers exposing `"Monitor of <device>"` as a device that can be selected
	when `navigator.getUserMedia()` is executed. That provides a means to capture audio being output to speakers or headphones
	without also capturing microphone input.

	That output is also possible at Chrome/Chromium by following the proceure described at This is again recording from microphone,
	not from audiooutput device (https://github.com/guest271314/SpeechSynthesisRecorder/issues/14#issuecomment-527020198).

	After filing issues Support capturing audio output from sound card (https://github.com/w3c/mediacapture-main/issues/629),
	Clarify getUserMedia({audio:{deviceId:{exact:<audiooutput_device>}}}) in this specification mandates capability
	to capture of audio output device - not exclusively microphone input device (https://github.com/w3c/mediacapture-main/issues/650),
	at Media Capture and Streams specification (aka getUserMedia) (https://github.com/w3c/mediacapture-main) in order to make what
	is already possible clear in the specification so that Chrome/Chromium authors to explicitly expose the device `"Monitor of <device>"`
	at *nix, decided to revisit the subject matter anew, with a more expansive mandate than only capturing speech synthesis as
	a `MediaStream` or, specifically, as a `MediaStreamTrack`; rather, with the requirement to execute arbitrary shell commands
	with the capability to get the output of those commands, if any, within the browser, at *nix.

	Prior issues describing the concept built upon hereafter

	- <script type="shell"> to execute arbitrary shell commands, and import stdout or result written to local file as a JavaScript module (https://github.com/whatwg/html/issues/3443)
	- Add execute() to FileSystemDirectoryHandle (https://github.com/WICG/native-file-system/issues/97)

	Procedure

	1. Install `inotify-tools` (https://github.com/rvoicilas/inotify-tools).
	2. Launch Chrome/Chromium with necessary flags set with `--use-file-for-fake-audio-capture` value set as the `wav` file
	that we will have the ability to get as a `MediaStream` in 6. and 7. below.

	```
	chromium-browser --allow-file-access-from-files --autoplay-policy=no-user-gesture-required --use-fake-device-for-media-stream --use-fake-ui-for-media-stream --use-file-for-fake-audio-capture=$HOME/localscripts/output.wav%noloop --user-data-dir=$HOME/test
	```

	3. Create a local directory (e.g. `localscripts`) where the file to be monitored for `close` event is saved.
	Open DevTools at Chrome/Chromium, select `Sources`, select `Snippets`, select `New snippet`, name the snippet `run`,
	right-click on `run` snippet, select `Save as...` then save the file in `localscripts` directory.
	4. Create a shell script to be executed when `close` event of file `run` occurs, again, in Snippets at DevTools.
	Follow the procedure in 2., save the script in a directory in `PATH`, e.g. `$HOME/bin`, here the file is named `waiting.sh`

	```
	#!/bin/sh
	while inotifywait -e close $HOME/localscripts/run; do
	$HOME/bin/input.sh
	done
	```

	5. Create the shell script to be executed (again, in Snippets at DevTools following 2.) and save as `input.sh`. Set the script as
	executable `chmod +x` and place in `PATH` or `localscripts` directory.

	```
	#!/bin/sh
	espeak-ng -m -f $HOME/localscripts/input.txt -w $HOME/localscripts/output.wav
	```
	In this case we read text input from `input.txt` and output the resulting `wav` file to `output.wav`.

	6. To meet the requirement "Support SpeechSynthesis to a MediaStreamTrack" at Chrome/Chromium we will launch Chromium with the
	necessary flags set to get input to the `MediaStream` from `output.wav` using JavaScript. Again, we follow 2. to create
	and name the file `stream`.

	```
	async function speak() {
	// one issue with speech synthesis directly to MediaStream or MediaStreamTrack
	// is there is no way to determine when the output is really ended, as neither
	// ended, mute, or unmute events are fired, and the input, due to -m flag passed to espeak-ng
	// can contain SSML <break> elements, e.g., <break time="2500ms"> which can convey
	// false-positive if detecting silence is used to check if the expected output has completed
	// therefore store the MediaStreamTrack as a global variable and execute stop() when speak()
	// is called again
	if (!globalThis.track) {
	globalThis.track = null;
	} else {
	console.log(globalThis.track);
	globalThis.track.stop();
	}

	const stream = await navigator.mediaDevices.getUserMedia({audio: true});

	globalThis.track = stream.getTracks()[0];
	// sound is not output to speakers or headphones (https://github.com/cypress-io/cypress/issues/5592#issuecomment-569972506)
	const ac = new AudioContext();
	const source = ac.createMediaStreamSource(stream);
	source.connect(ac.destination);
	}
	```

	7. Following 2. we create a JavaScript file to execute the code defined in `stream` and name the file `speak`

	```
	speak();
	```
	8. Following 2 we write input text in `input` and save the file as `input.txt` in `localscripts`, e.g.

	```
	Do stuff.
	Do other stuff!
	Now, let's try this.

	<p>A paragraph.<p>
	<s>a sentence</s>

	123<break time="2500ms">
	456<break time="2500ms">
	789
	```
	9. Execute `waiting.sh` (3.).
	```
	$ ~/bin/waiting.sh
	```

	10. Right-click `stream` and select `Run` to define `speak` globally.
	11. Right-click `run` and select `Save as...` to save the file in `localscripts` which will cause `inotifywatch` event
	to be fired
	```
	/home/user/localscripts/run CLOSE_WRITE,CLOSE
	Setting up watches.
	Watches established.
	```
	12. Right-click `speak` and select `Run`.