Skip to content

Instantly share code, notes, and snippets.

@tetele
Last active April 30, 2024 12:36
Show Gist options
  • Save tetele/75815c53cfd2a02412ef153f9c2d24e2 to your computer and use it in GitHub Desktop.
Save tetele/75815c53cfd2a02412ef153f9c2d24e2 to your computer and use it in GitHub Desktop.
ESPHome config - Onju Voice/Home as a voice assistant satellite in Home Assistant

⚠️ CONFIG WAS MOVED HERE ⚠️

https://github.com/tetele/onju-voice-satellite

This Gist will no longer be maintained, please check out the new location for the config.

Introduction

Ever since voice satellites were introduced to Home Assistant, people wanted to use good microphones and speakers for this purpose, but not many were really available.

In a valiant attempt to free a Google Nest Mini (2nd generation) from its privacy ignoring overlords, Justin Alvey created Onju Voice, a drop-in replacement PCB for the Mini, with an ESP32-S3 at its heart, capable of some pretty funky stuff.

The purpose of this ESPHome config is to be able to use such a modded Nest Mini as a voice satellite in Home Assistant. Here's a small demo:

https://youtu.be/fuX6IYa79gA

Features

  • wake word, push to talk, on-demand and continuous conversation support
  • response playback
  • audio media player
  • service exposed in HA to start and stop the voice assistant from another device/trigger
  • visual feedback of the wake word listening/audio recording/success/error status via the Mini's onboard top LEDs
  • uses all 3 of the original Mini's touch controls as volume controls and a means of manually starting the assistant and setting the volume
  • uses the original Mini's microphone mute button to prevent the wake word engine from starting unintendedly
  • automatic continuous touch control calibration

Pre-requisites

  • Home Assistant 2023.11.3 or newer
  • A voice assistant configured in HA with STT and TTS in a language of your choice
  • ESPHome 2023.11.6 or newer

Known issues and limitations

  • you have to be able to retrofit an Onju Voice PCB inside a 2nd generation Google Nest Mini.
  • the media_player component in ESPHome does not play raw audio coming from Piper TTS. It works with any STT that outputs mp3 by default, though fixed in HA 2023.12
  • the version for microWakeWord is in BETA and probably full of bugs

Installation instructions

Here is a video explaining how to perform the PCB "transplant". You can find some instructions for disassembly here.

To flash the Onju Voice for the first time, you have to do so BEFORE YOU PUT EVERYTHING BACK TOGETHER in the Google Nest Mini housing. Otherwise, you lose access to the USB port.

So, before connecting the board for the first time, hold down the BOOT switch on it and connect a USB cable to your computer. Use the ESPHome web installer to flash according to the config below.

Double check Wifi connection details, API encryption key and device name/friendly name to make sure you use your own.

After the device has been added to ESPHome, if auto discovery is turned on, the device should appear in Home Assistant automatically. Otherwise, check out this guide.

Credits

  • obviously, a huge thanks to Justin Alvey (@justLV) for the excellent Onju Voice project
  • many thanks to Mike Hansen (@synesthesiam) for the relentless work he's put into Year of the Voice at Home Assistant
  • thanks to the ESPHome Discord server members for both creating the most time saving piece of software ever and for helping out with some kinks with the config - in particular @jesserockz, @ssieb and @Hawwa

GithubSponsor or BuyMeCoffee

substitutions:
name: "onju-voice"
friendly_name: "Onju Voice"
wifi_ap_password: ""
esphome:
name: ${name}
friendly_name: ${friendly_name}
name_add_mac_suffix: false
min_version: 2024.2.0
platformio_options:
build_flags: "-DBOARD_HAS_PSRAM"
board_build.arduino.memory_type: qio_opi
on_boot:
then:
- light.turn_on:
id: top_led
effect: slow_pulse
red: 100%
green: 60%
blue: 0%
- wait_until:
condition:
wifi.connected:
- light.turn_on:
id: top_led
effect: pulse
red: 0%
green: 100%
blue: 0%
- wait_until:
condition:
api.connected:
- light.turn_on:
id: top_led
effect: none
red: 0%
green: 100%
blue: 0%
- delay: 1s
- script.execute: reset_led
esp32:
board: esp32-s3-devkitc-1
framework:
type: esp-idf
psram:
mode: octal
speed: 80MHz
logger:
api:
services:
- service: start_va
then:
- voice_assistant.start
- service: stop_va
then:
- voice_assistant.stop
ota:
wifi:
ap:
password: "${wifi_ap_password}"
captive_portal:
globals:
- id: thresh_percent
type: float
initial_value: "0.03"
restore_value: false
- id: touch_calibration_values_left
type: uint32_t[5]
restore_value: false
- id: touch_calibration_values_center
type: uint32_t[5]
restore_value: false
- id: touch_calibration_values_right
type: uint32_t[5]
restore_value: false
interval:
- interval: 1s
then:
- script.execute:
id: calibrate_touch
button: 0
- script.execute:
id: calibrate_touch
button: 1
- script.execute:
id: calibrate_touch
button: 2
i2s_audio:
- i2s_lrclk_pin: GPIO13
i2s_bclk_pin: GPIO18
micro_wake_word:
model: okay_nabu
# model: hey_jarvis
# model: alexa
on_wake_word_detected:
then:
- voice_assistant.start
speaker:
- platform: i2s_audio
id: onju_out
dac_type: external
i2s_dout_pin: GPIO12
microphone:
- platform: i2s_audio
id: onju_microphone
i2s_din_pin: GPIO17
adc_type: external
pdm: false
voice_assistant:
id: va
microphone: onju_microphone
speaker: onju_out
use_wake_word: false
on_listening:
- light.turn_on:
id: top_led
blue: 100%
red: 100%
green: 100%
brightness: 100%
effect: listening
on_stt_vad_end:
- light.turn_on:
id: top_led
blue: 100%
red: 0%
green: 20%
brightness: 70%
effect: processing
on_tts_end:
- light.turn_on:
id: top_led
blue: 0%
red: 20%
green: 100%
effect: speaking
on_end:
- delay: 500ms
- wait_until:
not:
speaker.is_playing: onju_out
- script.execute: reset_led
- if:
condition:
and:
- switch.is_on: use_wake_word
- binary_sensor.is_off: mute_switch
then:
- delay: 200ms
- micro_wake_word.start
on_client_connected:
- if:
condition:
and:
- switch.is_on: use_wake_word
- binary_sensor.is_off: mute_switch
then:
- micro_wake_word.start:
on_client_disconnected:
- if:
condition:
and:
- switch.is_on: use_wake_word
- binary_sensor.is_off: mute_switch
then:
- voice_assistant.stop:
- micro_wake_word.stop:
on_error:
- light.turn_on:
id: top_led
blue: 0%
red: 100%
green: 0%
effect: none
- delay: 1s
- script.execute: reset_led
number:
- platform: template
name: "Touch threshold percentage"
id: touch_threshold_percentage
update_interval: never
entity_category: config
initial_value: 1.25
min_value: -1
max_value: 5
step: 0.25
optimistic: true
on_value:
then:
- lambda: !lambda |-
id(thresh_percent) = 0.01 * x;
esp32_touch:
setup_mode: false
sleep_duration: 2ms
measurement_duration: 800us
low_voltage_reference: 0.8V
high_voltage_reference: 2.4V
filter_mode: IIR_16
debounce_count: 2
noise_threshold: 0
jitter_step: 0
smooth_mode: IIR_2
denoise_grade: BIT8
denoise_cap_level: L0
binary_sensor:
- platform: esp32_touch
id: volume_down
pin: GPIO4
threshold: 539000 # 533156-551132
- platform: esp32_touch
id: volume_up
pin: GPIO2
threshold: 580000 # 575735-593064
- platform: esp32_touch
id: action
pin: GPIO3
threshold: 751000 # 745618-767100
on_click:
- if:
condition:
or:
- switch.is_off: use_wake_word
- binary_sensor.is_on: mute_switch
then:
- logger.log:
tag: "action_click"
format: "Voice assistant is running: %s"
args: ['id(va).is_running() ? "yes" : "no"']
- if:
condition: speaker.is_playing
then:
- speaker.stop
- if:
condition: voice_assistant.is_running
then:
- voice_assistant.stop:
else:
- voice_assistant.start:
else:
- logger.log:
tag: "action_click"
format: "Voice assistant was running with wake word detection enabled. Starting continuously"
- if:
condition: speaker.is_playing
then:
- speaker.stop
- voice_assistant.stop
- delay: 1s
- script.execute: reset_led
- script.wait: reset_led
- voice_assistant.start_continuous:
- platform: gpio
id: mute_switch
pin:
number: GPIO38
mode: INPUT_PULLUP
name: Disable wake word
on_press:
- script.execute: turn_off_wake_word
on_release:
- script.execute: turn_on_wake_word
light:
- platform: esp32_rmt_led_strip
id: leds
pin: GPIO11
chipset: SK6812
num_leds: 6
rgb_order: grb
rmt_channel: 0
default_transition_length: 0s
gamma_correct: 2.8
- platform: partition
id: left_led
segments:
- id: leds
from: 0
to: 0
default_transition_length: 100ms
- platform: partition
id: top_led
segments:
- id: leds
from: 1
to: 4
default_transition_length: 100ms
effects:
- pulse:
name: pulse
transition_length: 250ms
update_interval: 250ms
- pulse:
name: slow_pulse
transition_length: 1s
update_interval: 2s
- addressable_twinkle:
name: listening_ww
twinkle_probability: 1%
- addressable_twinkle:
name: listening
twinkle_probability: 45%
- addressable_scan:
name: processing
move_interval: 80ms
- addressable_flicker:
name: speaking
intensity: 35%
- platform: partition
id: right_led
segments:
- id: leds
from: 5
to: 5
default_transition_length: 100ms
script:
- id: reset_led
then:
- if:
condition:
and:
- switch.is_on: use_wake_word
- binary_sensor.is_off: mute_switch
then:
- light.turn_on:
id: top_led
blue: 100%
red: 100%
green: 0%
brightness: 60%
effect: listening_ww
else:
- light.turn_off: top_led
- id: turn_on_wake_word
then:
- if:
condition:
and:
- binary_sensor.is_off: mute_switch
- switch.is_on: use_wake_word
then:
- micro_wake_word.start
- if:
condition:
speaker.is_playing:
then:
- speaker.stop:
- script.execute: reset_led
else:
- logger.log:
tag: "turn_on_wake_word"
format: "Trying to start listening for wake word, but %s"
args:
[
'id(mute_switch).state ? "mute switch is on" : "use wake word toggle is off"',
]
level: "INFO"
- id: turn_off_wake_word
then:
- micro_wake_word.stop
- script.execute: reset_led
- id: calibrate_touch
parameters:
button: int
then:
- lambda: |-
static uint8_t thresh_indices[3] = {0, 0, 0};
static uint32_t sums[3] = {0, 0, 0};
static uint8_t qsizes[3] = {0, 0, 0};
static uint16_t consecutive_anomalies_per_button[3] = {0, 0, 0};
uint32_t newval;
uint32_t* calibration_values;
switch(button) {
case 0:
newval = id(volume_down).get_value();
calibration_values = id(touch_calibration_values_left);
break;
case 1:
newval = id(action).get_value();
calibration_values = id(touch_calibration_values_center);
break;
case 2:
newval = id(volume_up).get_value();
calibration_values = id(touch_calibration_values_right);
break;
default:
ESP_LOGE("touch_calibration", "Invalid button ID (%d)", button);
return;
}
if(newval == 0) return;
//ESP_LOGD("touch_calibration", "[%d] qsize %d, sum %d, thresh_index %d, consecutive_anomalies %d", button, qsizes[button], sums[button], thresh_indices[button], consecutive_anomalies_per_button[button]);
//ESP_LOGD("touch_calibration", "[%d] New value is %d", button, newval);
if(qsizes[button] == 5) {
float avg = float(sums[button])/float(qsizes[button]);
if((fabs(float(newval)-avg)/avg) > id(thresh_percent)) {
consecutive_anomalies_per_button[button]++;
//ESP_LOGD("touch_calibration", "[%d] %d anomalies detected.", button, consecutive_anomalies_per_button[button]);
if(consecutive_anomalies_per_button[button] < 10)
return;
}
}
//ESP_LOGD("touch_calibration", "[%d] Resetting consecutive anomalies counter.", button);
consecutive_anomalies_per_button[button] = 0;
if(qsizes[button] == 5) {
//ESP_LOGD("touch_calibration", "[%d] Queue full, removing %d.", button, id(touch_calibration_values)[thresh_indices[button]]);
sums[button] -= (uint32_t) *(calibration_values+thresh_indices[button]);// id(touch_calibration_values)[thresh_indices[button]];
qsizes[button]--;
}
*(calibration_values+thresh_indices[button]) = newval;
sums[button] += newval;
qsizes[button]++;
thresh_indices[button] = (thresh_indices[button] + 1) % 5;
//ESP_LOGD("touch_calibration", "[%d] Average value is %d", button, sums[button]/qsizes[button]);
uint32_t newthresh = uint32_t((sums[button]/qsizes[button]) * (1.0 + id(thresh_percent)));
//ESP_LOGD("touch_calibration", "[%d] Setting threshold %d", button, newthresh);
switch(button) {
case 0:
id(volume_down).set_threshold(newthresh);
break;
case 1:
id(action).set_threshold(newthresh);
break;
case 2:
id(volume_up).set_threshold(newthresh);
break;
default:
ESP_LOGE("touch_calibration", "Invalid button ID (%d)", button);
return;
}
switch:
- platform: template
name: Use Wake Word
id: use_wake_word
optimistic: true
restore_mode: RESTORE_DEFAULT_ON
on_turn_on:
- script.execute: turn_on_wake_word
on_turn_off:
- script.execute: turn_off_wake_word
- platform: gpio
id: dac_mute
restore_mode: ALWAYS_OFF
pin:
number: GPIO21
inverted: True
substitutions:
name: "onju-voice"
friendly_name: "Onju Voice"
wifi_ap_password: ""
esphome:
name: ${name}
friendly_name: ${friendly_name}
name_add_mac_suffix: false
min_version: 2023.11.6
on_boot:
then:
- light.turn_on:
id: top_led
effect: slow_pulse
red: 100%
green: 60%
blue: 0%
- wait_until:
condition:
wifi.connected:
- light.turn_on:
id: top_led
effect: pulse
red: 0%
green: 100%
blue: 0%
- wait_until:
condition:
api.connected:
- light.turn_on:
id: top_led
effect: none
red: 0%
green: 100%
blue: 0%
- delay: 1s
- script.execute: reset_led
esp32:
board: esp32-s3-devkitc-1
framework:
type: arduino
logger:
api:
services:
- service: start_va
then:
- voice_assistant.start
- service: stop_va
then:
- voice_assistant.stop
ota:
wifi:
ap:
password: "${wifi_ap_password}"
captive_portal:
globals:
- id: thresh_percent
type: float
initial_value: "0.03"
restore_value: false
- id: touch_calibration_values_left
type: uint32_t[5]
restore_value: false
- id: touch_calibration_values_center
type: uint32_t[5]
restore_value: false
- id: touch_calibration_values_right
type: uint32_t[5]
restore_value: false
interval:
- interval: 1s
then:
- script.execute:
id: calibrate_touch
button: 0
- script.execute:
id: calibrate_touch
button: 1
- script.execute:
id: calibrate_touch
button: 2
i2s_audio:
- i2s_lrclk_pin: GPIO13
i2s_bclk_pin: GPIO18
media_player:
- platform: i2s_audio
name: None
id: onju_out
dac_type: external
i2s_dout_pin: GPIO12
mode: mono
mute_pin:
number: GPIO21
inverted: True
######
# speaker:
# - platform: i2s_audio
# id: onju_out
# dac_type: external
# i2s_dout_pin: GPIO12
# mode: stereo
######
microphone:
- platform: i2s_audio
id: onju_microphone
i2s_din_pin: GPIO17
adc_type: external
pdm: false
voice_assistant:
id: va
microphone: onju_microphone
media_player: onju_out
######
# speaker: onju_out
######
use_wake_word: true
on_listening:
- light.turn_on:
id: top_led
blue: 100%
red: 100%
green: 100%
brightness: 100%
effect: listening
on_stt_vad_end:
- light.turn_on:
id: top_led
blue: 100%
red: 0%
green: 20%
brightness: 70%
effect: processing
on_tts_end:
- media_player.play_media: !lambda return x;
- light.turn_on:
id: top_led
blue: 0%
red: 20%
green: 100%
effect: speaking
on_end:
- delay: 100ms
- wait_until:
not:
media_player.is_playing: onju_out
- script.execute: reset_led
on_client_connected:
- if:
condition:
and:
- switch.is_on: use_wake_word
- binary_sensor.is_off: mute_switch
then:
- voice_assistant.start_continuous:
on_client_disconnected:
- if:
condition:
and:
- switch.is_on: use_wake_word
- binary_sensor.is_off: mute_switch
then:
- voice_assistant.stop:
on_error:
- light.turn_on:
id: top_led
blue: 0%
red: 100%
green: 0%
effect: none
- delay: 1s
- script.execute: reset_led
number:
- platform: template
name: "Touch threshold percentage"
id: touch_threshold_percentage
update_interval: never
entity_category: config
initial_value: 1.25
min_value: -1
max_value: 5
step: 0.25
optimistic: true
on_value:
then:
- lambda: !lambda |-
id(thresh_percent) = 0.01 * x;
esp32_touch:
setup_mode: false
sleep_duration: 2ms
measurement_duration: 800us
low_voltage_reference: 0.8V
high_voltage_reference: 2.4V
filter_mode: IIR_16
debounce_count: 2
noise_threshold: 0
jitter_step: 0
smooth_mode: IIR_2
denoise_grade: BIT8
denoise_cap_level: L0
binary_sensor:
- platform: esp32_touch
id: volume_down
pin: GPIO4
threshold: 539000 # 533156-551132
on_press:
then:
- light.turn_on: left_led
- script.execute:
id: set_volume
volume: -0.05
- delay: 0.75s
- while:
condition:
binary_sensor.is_on: volume_down
then:
- script.execute:
id: set_volume
volume: -0.05
- delay: 150ms
on_release:
then:
- light.turn_off: left_led
- platform: esp32_touch
id: volume_up
pin: GPIO2
threshold: 580000 # 575735-593064
on_press:
then:
- light.turn_on: right_led
- script.execute:
id: set_volume
volume: 0.05
- delay: 0.75s
- while:
condition:
binary_sensor.is_on: volume_up
then:
- script.execute:
id: set_volume
volume: 0.05
- delay: 150ms
on_release:
then:
- light.turn_off: right_led
- platform: esp32_touch
id: action
pin: GPIO3
threshold: 751000 # 745618-767100
on_click:
- if:
condition:
or:
- switch.is_off: use_wake_word
- binary_sensor.is_on: mute_switch
then:
- logger.log:
tag: "action_click"
format: "Voice assistant is running: %s"
args: ['id(va).is_running() ? "yes" : "no"']
- if:
condition: media_player.is_playing
then:
- media_player.stop
- if:
condition: voice_assistant.is_running
then:
- voice_assistant.stop:
else:
- voice_assistant.start:
else:
- logger.log:
tag: "action_click"
format: "Voice assistant was running with wake word detection enabled. Starting continuously"
- if:
condition: media_player.is_playing
then:
- media_player.stop
- voice_assistant.stop
- delay: 1s
- script.execute: reset_led
- script.wait: reset_led
- voice_assistant.start_continuous:
- platform: gpio
id: mute_switch
pin:
number: GPIO38
mode: INPUT_PULLUP
name: Disable wake word
on_press:
- script.execute: turn_off_wake_word
on_release:
- script.execute: turn_on_wake_word
light:
- platform: esp32_rmt_led_strip
id: leds
pin: GPIO11
chipset: SK6812
num_leds: 6
rgb_order: grb
rmt_channel: 0
default_transition_length: 0s
gamma_correct: 2.8
- platform: partition
id: left_led
segments:
- id: leds
from: 0
to: 0
default_transition_length: 100ms
- platform: partition
id: top_led
segments:
- id: leds
from: 1
to: 4
default_transition_length: 100ms
effects:
- pulse:
name: pulse
transition_length: 250ms
update_interval: 250ms
- pulse:
name: slow_pulse
transition_length: 1s
update_interval: 2s
- addressable_lambda:
name: show_volume
update_interval: 50ms
lambda: |-
int int_volume = int(id(onju_out).volume * 100.0f * it.size());
int full_leds = int_volume / 100;
int last_brightness = int_volume % 100;
int i = 0;
for(; i < full_leds; i++) {
it[i] = Color::WHITE;
}
if(i < 4) {
it[i++] = Color(0,0,0).fade_to_white(last_brightness*256/100);
}
for(; i < it.size(); i++) {
it[i] = Color::BLACK;
}
- addressable_twinkle:
name: listening_ww
twinkle_probability: 1%
- addressable_twinkle:
name: listening
twinkle_probability: 45%
- addressable_scan:
name: processing
move_interval: 80ms
- addressable_flicker:
name: speaking
intensity: 35%
- platform: partition
id: right_led
segments:
- id: leds
from: 5
to: 5
default_transition_length: 100ms
script:
- id: reset_led
then:
- if:
condition:
and:
- switch.is_on: use_wake_word
- binary_sensor.is_off: mute_switch
then:
- light.turn_on:
id: top_led
blue: 100%
red: 100%
green: 0%
brightness: 60%
effect: listening_ww
else:
- light.turn_off: top_led
- id: set_volume
mode: restart
parameters:
volume: float
then:
- media_player.volume_set:
id: onju_out
volume: !lambda return clamp(id(onju_out).volume+volume, 0.0f, 1.0f);
- light.turn_on:
id: top_led
effect: show_volume
- delay: 1s
- script.execute: reset_led
- id: turn_on_wake_word
then:
- if:
condition:
and:
- binary_sensor.is_off: mute_switch
- switch.is_on: use_wake_word
then:
- lambda: id(va).set_use_wake_word(true);
- if:
condition:
media_player.is_playing:
then:
- media_player.stop:
- if:
condition:
not:
- voice_assistant.is_running
then:
- voice_assistant.start_continuous
- script.execute: reset_led
else:
- logger.log:
tag: "turn_on_wake_word"
format: "Trying to start listening for wake word, but %s"
args:
[
'id(mute_switch).state ? "mute switch is on" : "use wake word toggle is off"',
]
level: "INFO"
- id: turn_off_wake_word
then:
- voice_assistant.stop
- lambda: id(va).set_use_wake_word(false);
- script.execute: reset_led
- id: calibrate_touch
parameters:
button: int
then:
- lambda: |-
static byte thresh_indices[3] = {0, 0, 0};
static uint32_t sums[3] = {0, 0, 0};
static byte qsizes[3] = {0, 0, 0};
static int consecutive_anomalies_per_button[3] = {0, 0, 0};
uint32_t newval;
uint32_t* calibration_values;
switch(button) {
case 0:
newval = id(volume_down).get_value();
calibration_values = id(touch_calibration_values_left);
break;
case 1:
newval = id(action).get_value();
calibration_values = id(touch_calibration_values_center);
break;
case 2:
newval = id(volume_up).get_value();
calibration_values = id(touch_calibration_values_right);
break;
default:
ESP_LOGE("touch_calibration", "Invalid button ID (%d)", button);
return;
}
if(newval == 0) return;
//ESP_LOGD("touch_calibration", "[%d] qsize %d, sum %d, thresh_index %d, consecutive_anomalies %d", button, qsizes[button], sums[button], thresh_indices[button], consecutive_anomalies_per_button[button]);
//ESP_LOGD("touch_calibration", "[%d] New value is %d", button, newval);
if(qsizes[button] == 5) {
float avg = float(sums[button])/float(qsizes[button]);
if((fabs(float(newval)-avg)/avg) > id(thresh_percent)) {
consecutive_anomalies_per_button[button]++;
//ESP_LOGD("touch_calibration", "[%d] %d anomalies detected.", button, consecutive_anomalies_per_button[button]);
if(consecutive_anomalies_per_button[button] < 10)
return;
}
}
//ESP_LOGD("touch_calibration", "[%d] Resetting consecutive anomalies counter.", button);
consecutive_anomalies_per_button[button] = 0;
if(qsizes[button] == 5) {
//ESP_LOGD("touch_calibration", "[%d] Queue full, removing %d.", button, id(touch_calibration_values)[thresh_indices[button]]);
sums[button] -= (uint32_t) *(calibration_values+thresh_indices[button]);// id(touch_calibration_values)[thresh_indices[button]];
qsizes[button]--;
}
*(calibration_values+thresh_indices[button]) = newval;
sums[button] += newval;
qsizes[button]++;
thresh_indices[button] = (thresh_indices[button] + 1) % 5;
//ESP_LOGD("touch_calibration", "[%d] Average value is %d", button, sums[button]/qsizes[button]);
uint32_t newthresh = uint32_t((sums[button]/qsizes[button]) * (1.0 + id(thresh_percent)));
//ESP_LOGD("touch_calibration", "[%d] Setting threshold %d", button, newthresh);
switch(button) {
case 0:
id(volume_down).set_threshold(newthresh);
break;
case 1:
id(action).set_threshold(newthresh);
break;
case 2:
id(volume_up).set_threshold(newthresh);
break;
default:
ESP_LOGE("touch_calibration", "Invalid button ID (%d)", button);
return;
}
switch:
- platform: template
name: Use Wake Word
id: use_wake_word
optimistic: true
restore_mode: RESTORE_DEFAULT_ON
on_turn_on:
- script.execute: turn_on_wake_word
on_turn_off:
- script.execute: turn_off_wake_word
@vhsdream
Copy link

@vhsdream very interesting, thanks for this! I've also tried to get PSRAM working, but at the time I hadn't found enough resources and kinda postponed the idea. I'll try to pick it up again.

What I can tell you about your config above is that the Onju Voice PCB uses an ESP32-S3R8 (see page 10 here) which has no internal flash (it does have a 16MB external QSPI NOR flash module which I haven't used in the config - a W25Q128JVSIQ - which I plan to use to store audio notifications/"dings" for wake word detection once that becomes available in the ESPHome implementation). As such, I don't really understand how your config works with the defined flash_size and partitions.

That said, the part regarding PSRAM is exactly what I've tried myself and it never detected even one byte of PSRAM. You've given me some ideas I will try to investigate further when I get some time to work on this. Thank you!

Oh you have already demonstrated your better understanding of what is going on compared to what I think I know! I am mostly clueless about the differences between internal and external flash (and I'm thinking now that the part where I added flash_size and partitions is likely doing nothing and can be removed!) and I don't know if esphome is able to deal with that or not.

And my tests with the PSRAM might be incomplete as I had one issue where I was falling asleep to clear white noise audio when it abruptly cut out and did not return. I checked the HA logs the next morning and it looks like the onju might have crashed and restarted, so it's not as stable as I first thought. I'm glad I've given you a bit of inspiration though - good luck!

@vhsdream
Copy link

With ESPHome 2024.2.0 out, which now has microWakeWord, I'm going to see if I can create a config that leverages the PSRAM on the onju to have onboard wake word detection.

@tetele
Copy link
Author

tetele commented Feb 21, 2024

Go ahead, but microWakeWord will bring some limitations along with it.

Since it only works with the esp-idf framework, which is incompatible with the media_player component, switching to speaker will take away the ability to control volume and stream any audio to the satellite.

@vhsdream
Copy link

Oh darn, I must have missed that! Not going to bother then. Where can I read up more about that - I didn't see it on the component page about it only working with esp-idf. Do you think it can be made to work with the Arduino framework eventually?

@fuzzie360
Copy link

fuzzie360 commented Feb 22, 2024

I've taken a look at the microWakeWord component code and it seems on Arduino platform it is missing this header but it is provided by esp-idf platform.

I've seen this stackoverflow saying that Arduino platform could be possible if we add the following line after adding the TensorFlowLite ESP32 library as a dependency:

#include <TensorFlowLite_ESP32.h>

I haven't checked for any other blockers, but I think it is likely we can make microWakeWord work for Arduino platform.

@fuzzie360
Copy link

I have tried the TensorFlowLite ESP32 library as mentioned above, but the library version is too old compared to the idf version.

Compilation error messages
src/esphome/components/micro_wake_word/micro_wake_word.cpp: In member function 'bool esphome::micro_wake_word::MicroWakeWord::initialize_models()':
src/esphome/components/micro_wake_word/micro_wake_word.cpp:234:101: error: no matching function for call to 'tflite::MicroAllocator::Create(uint8_t*&, const uint32_t&)'
       tflite::MicroAllocator::Create(this->streaming_var_arena_, STREAMING_MODEL_VARIABLE_ARENA_SIZE);
                                                                                                     ^
In file included from .piolibdeps/esphome-web-a294cc/TensorFlowLite_ESP32/src/tensorflow/lite/micro/micro_interpreter.h:26,
                 from src/esphome/components/micro_wake_word/micro_wake_word.h:19,
                 from src/esphome/components/micro_wake_word/micro_wake_word.cpp:1:
.piolibdeps/esphome-web-a294cc/TensorFlowLite_ESP32/src/tensorflow/lite/micro/micro_allocator.h:121:26: note: candidate: 'static tflite::MicroAllocator* tflite::MicroAllocator::Create(uint8_t*, size_t, tflite::ErrorReporter*)'
   static MicroAllocator* Create(uint8_t* tensor_arena, size_t arena_size,
                          ^~~~~~
.piolibdeps/esphome-web-a294cc/TensorFlowLite_ESP32/src/tensorflow/lite/micro/micro_allocator.h:121:26: note:   candidate expects 3 arguments, 2 provided
.piolibdeps/esphome-web-a294cc/TensorFlowLite_ESP32/src/tensorflow/lite/micro/micro_allocator.h:128:26: note: candidate: 'static tflite::MicroAllocator* tflite::MicroAllocator::Create(uint8_t*, size_t, tflite::MicroMemoryPlanner*, tflite::ErrorReporter*)'
   static MicroAllocator* Create(uint8_t* tensor_arena, size_t arena_size,
                          ^~~~~~
.piolibdeps/esphome-web-a294cc/TensorFlowLite_ESP32/src/tensorflow/lite/micro/micro_allocator.h:128:26: note:   candidate expects 4 arguments, 2 provided
.piolibdeps/esphome-web-a294cc/TensorFlowLite_ESP32/src/tensorflow/lite/micro/micro_allocator.h:135:26: note: candidate: 'static tflite::MicroAllocator* tflite::MicroAllocator::Create(tflite::SimpleMemoryAllocator*, tflite::MicroMemoryPlanner*, tflite::ErrorReporter*)'
   static MicroAllocator* Create(SimpleMemoryAllocator* memory_allocator,
                          ^~~~~~
.piolibdeps/esphome-web-a294cc/TensorFlowLite_ESP32/src/tensorflow/lite/micro/micro_allocator.h:135:26: note:   candidate expects 3 arguments, 2 provided
src/esphome/components/micro_wake_word/micro_wake_word.cpp:238:117: error: no matching function for call to 'tflite::MicroInterpreter::MicroInterpreter(const tflite::Model*&, tflite::MicroMutableOpResolver<18>&, uint8_t*&, const uint32_t&)'
       this->preprocessor_model_, preprocessor_op_resolver, this->preprocessor_tensor_arena_, PREPROCESSOR_ARENA_SIZE);
                                                                                                                     ^
In file included from src/esphome/components/micro_wake_word/micro_wake_word.h:19,
                 from src/esphome/components/micro_wake_word/micro_wake_word.cpp:1:
.piolibdeps/esphome-web-a294cc/TensorFlowLite_ESP32/src/tensorflow/lite/micro/micro_interpreter.h:61:3: note: candidate: 'tflite::MicroInterpreter::MicroInterpreter(const tflite::Model*, const tflite::MicroOpResolver&, tflite::MicroAllocator*, tflite::ErrorReporter*, tflite::MicroResourceVariables*, tflite::MicroProfiler*)'
   MicroInterpreter(const Model* model, const MicroOpResolver& op_resolver,
   ^~~~~~~~~~~~~~~~
.piolibdeps/esphome-web-a294cc/TensorFlowLite_ESP32/src/tensorflow/lite/micro/micro_interpreter.h:61:3: note:   no known conversion for argument 3 from 'uint8_t*' {aka 'unsigned char*'} to 'tflite::MicroAllocator*'
.piolibdeps/esphome-web-a294cc/TensorFlowLite_ESP32/src/tensorflow/lite/micro/micro_interpreter.h:50:3: note: candidate: 'tflite::MicroInterpreter::MicroInterpreter(const tflite::Model*, const tflite::MicroOpResolver&, uint8_t*, size_t, tflite::ErrorReporter*, tflite::MicroResourceVariables*, tflite::MicroProfiler*)'
   MicroInterpreter(const Model* model, const MicroOpResolver& op_resolver,
   ^~~~~~~~~~~~~~~~
.piolibdeps/esphome-web-a294cc/TensorFlowLite_ESP32/src/tensorflow/lite/micro/micro_interpreter.h:50:3: note:   candidate expects 7 arguments, 4 provided
.piolibdeps/esphome-web-a294cc/TensorFlowLite_ESP32/src/tensorflow/lite/micro/micro_interpreter.h:40:7: note: candidate: 'constexpr tflite::MicroInterpreter::MicroInterpreter(const tflite::MicroInterpreter&)'
 class MicroInterpreter {
       ^~~~~~~~~~~~~~~~
.piolibdeps/esphome-web-a294cc/TensorFlowLite_ESP32/src/tensorflow/lite/micro/micro_interpreter.h:40:7: note:   candidate expects 1 argument, 4 provided
src/esphome/components/micro_wake_word/micro_wake_word.cpp:242:102: error: no matching function for call to 'tflite::MicroInterpreter::MicroInterpreter(const tflite::Model*&, tflite::MicroMutableOpResolver<14>&, uint8_t*&, const uint32_t&, tflite::MicroResourceVariables*&)'
                                                                STREAMING_MODEL_ARENA_SIZE, this->mrv_);
                                                                                                      ^
In file included from src/esphome/components/micro_wake_word/micro_wake_word.h:19,
                 from src/esphome/components/micro_wake_word/micro_wake_word.cpp:1:
.piolibdeps/esphome-web-a294cc/TensorFlowLite_ESP32/src/tensorflow/lite/micro/micro_interpreter.h:61:3: note: candidate: 'tflite::MicroInterpreter::MicroInterpreter(const tflite::Model*, const tflite::MicroOpResolver&, tflite::MicroAllocator*, tflite::ErrorReporter*, tflite::MicroResourceVariables*, tflite::MicroProfiler*)'
   MicroInterpreter(const Model* model, const MicroOpResolver& op_resolver,
   ^~~~~~~~~~~~~~~~
.piolibdeps/esphome-web-a294cc/TensorFlowLite_ESP32/src/tensorflow/lite/micro/micro_interpreter.h:61:3: note:   no known conversion for argument 3 from 'uint8_t*' {aka 'unsigned char*'} to 'tflite::MicroAllocator*'
.piolibdeps/esphome-web-a294cc/TensorFlowLite_ESP32/src/tensorflow/lite/micro/micro_interpreter.h:50:3: note: candidate: 'tflite::MicroInterpreter::MicroInterpreter(const tflite::Model*, const tflite::MicroOpResolver&, uint8_t*, size_t, tflite::ErrorReporter*, tflite::MicroResourceVariables*, tflite::MicroProfiler*)'
   MicroInterpreter(const Model* model, const MicroOpResolver& op_resolver,
   ^~~~~~~~~~~~~~~~
.piolibdeps/esphome-web-a294cc/TensorFlowLite_ESP32/src/tensorflow/lite/micro/micro_interpreter.h:50:3: note:   no known conversion for argument 5 from 'tflite::MicroResourceVariables*' to 'tflite::ErrorReporter*'
.piolibdeps/esphome-web-a294cc/TensorFlowLite_ESP32/src/tensorflow/lite/micro/micro_interpreter.h:40:7: note: candidate: 'constexpr tflite::MicroInterpreter::MicroInterpreter(const tflite::MicroInterpreter&)'
 class MicroInterpreter {
       ^~~~~~~~~~~~~~~~
.piolibdeps/esphome-web-a294cc/TensorFlowLite_ESP32/src/tensorflow/lite/micro/micro_interpreter.h:40:7: note:   candidate expects 1 argument, 5 provided
src/esphome/components/micro_wake_word/micro_wake_word.cpp: In member function 'bool esphome::micro_wake_word::MicroWakeWord::register_preprocessor_ops_(tflite::MicroMutableOpResolver<18>&)':
src/esphome/components/micro_wake_word/micro_wake_word.cpp:453:19: error: 'class tflite::MicroMutableOpResolver<18>' has no member named 'AddDiv'; did you mean 'AddSin'?
   if (op_resolver.AddDiv() != kTfLiteOk)
                   ^~~~~~
                   AddSin
src/esphome/components/micro_wake_word/micro_wake_word.cpp:459:19: error: 'class tflite::MicroMutableOpResolver<18>' has no member named 'AddWindow'; did you mean 'AddSin'?
   if (op_resolver.AddWindow() != kTfLiteOk)
                   ^~~~~~~~~
                   AddSin
src/esphome/components/micro_wake_word/micro_wake_word.cpp:461:19: error: 'class tflite::MicroMutableOpResolver<18>' has no member named 'AddFftAutoScale'
   if (op_resolver.AddFftAutoScale() != kTfLiteOk)
                   ^~~~~~~~~~~~~~~
src/esphome/components/micro_wake_word/micro_wake_word.cpp:463:19: error: 'class tflite::MicroMutableOpResolver<18>' has no member named 'AddRfft'; did you mean 'AddCast'?
   if (op_resolver.AddRfft() != kTfLiteOk)
                   ^~~~~~~
                   AddCast
src/esphome/components/micro_wake_word/micro_wake_word.cpp:465:19: error: 'class tflite::MicroMutableOpResolver<18>' has no member named 'AddEnergy'; did you mean 'AddNeg'?
   if (op_resolver.AddEnergy() != kTfLiteOk)
                   ^~~~~~~~~
                   AddNeg
src/esphome/components/micro_wake_word/micro_wake_word.cpp:467:19: error: 'class tflite::MicroMutableOpResolver<18>' has no member named 'AddFilterBank'
   if (op_resolver.AddFilterBank() != kTfLiteOk)
                   ^~~~~~~~~~~~~
src/esphome/components/micro_wake_word/micro_wake_word.cpp:469:19: error: 'class tflite::MicroMutableOpResolver<18>' has no member named 'AddFilterBankSquareRoot'
   if (op_resolver.AddFilterBankSquareRoot() != kTfLiteOk)
                   ^~~~~~~~~~~~~~~~~~~~~~~
src/esphome/components/micro_wake_word/micro_wake_word.cpp:471:19: error: 'class tflite::MicroMutableOpResolver<18>' has no member named 'AddFilterBankSpectralSubtraction'
   if (op_resolver.AddFilterBankSpectralSubtraction() != kTfLiteOk)
                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/esphome/components/micro_wake_word/micro_wake_word.cpp:473:19: error: 'class tflite::MicroMutableOpResolver<18>' has no member named 'AddPCAN'; did you mean 'AddAddN'?
   if (op_resolver.AddPCAN() != kTfLiteOk)
                   ^~~~~~~
                   AddAddN
src/esphome/components/micro_wake_word/micro_wake_word.cpp:475:19: error: 'class tflite::MicroMutableOpResolver<18>' has no member named 'AddFilterBankLog'
   if (op_resolver.AddFilterBankLog() != kTfLiteOk)
                   ^~~~~~~~~~~~~~~~
*** [.pioenvs/esphome-web-a294cc/src/esphome/components/micro_wake_word/micro_wake_word.cpp.o] Error 1

It looks like making it work with Arduino framework is not likely unless somebody updates the TensorFlowLite ESP32 library.

@yahav-bot
Copy link

Hi guys, I'm trying to install onju voice microwakeword from ha esphome add ons but it gets stuck on Preparing installation and the CPU Usage is at 100 percent. But when I install onju voice without the microwakeword everything works fine. Can someone please direct me?

@tetele
Copy link
Author

tetele commented Mar 8, 2024

@yahav-bot the build takes much longer, especially on an underpowered machine. I've heard reports of 1h30' on a HA Green. I have a pretty big i5 with 32GB RAM and it takes ~5-7 minutes, compared to ~1 minute for the config without MWW.

@yahav-bot
Copy link

yahav-bot commented Mar 11, 2024

Won't it damage the machine if it stays at 100 percent for the Processor so long time?

@bobzer
Copy link

bobzer commented Apr 30, 2024

@yahav-bot It should not damage the cpu, in case of you can watch the cpu temperature or overall temperature of the box. By the way those kind of questions or not specific to onju so you can easily find good answer on the forum or every other support space for home assistant

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment