Skip to content

Instantly share code, notes, and snippets.

@tetele
Last active April 30, 2024 12:36
Show Gist options
  • Star 56 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save tetele/75815c53cfd2a02412ef153f9c2d24e2 to your computer and use it in GitHub Desktop.
Save tetele/75815c53cfd2a02412ef153f9c2d24e2 to your computer and use it in GitHub Desktop.
ESPHome config - Onju Voice/Home as a voice assistant satellite in Home Assistant

⚠️ CONFIG WAS MOVED HERE ⚠️

https://github.com/tetele/onju-voice-satellite

This Gist will no longer be maintained, please check out the new location for the config.

Introduction

Ever since voice satellites were introduced to Home Assistant, people wanted to use good microphones and speakers for this purpose, but not many were really available.

In a valiant attempt to free a Google Nest Mini (2nd generation) from its privacy ignoring overlords, Justin Alvey created Onju Voice, a drop-in replacement PCB for the Mini, with an ESP32-S3 at its heart, capable of some pretty funky stuff.

The purpose of this ESPHome config is to be able to use such a modded Nest Mini as a voice satellite in Home Assistant. Here's a small demo:

https://youtu.be/fuX6IYa79gA

Features

  • wake word, push to talk, on-demand and continuous conversation support
  • response playback
  • audio media player
  • service exposed in HA to start and stop the voice assistant from another device/trigger
  • visual feedback of the wake word listening/audio recording/success/error status via the Mini's onboard top LEDs
  • uses all 3 of the original Mini's touch controls as volume controls and a means of manually starting the assistant and setting the volume
  • uses the original Mini's microphone mute button to prevent the wake word engine from starting unintendedly
  • automatic continuous touch control calibration

Pre-requisites

  • Home Assistant 2023.11.3 or newer
  • A voice assistant configured in HA with STT and TTS in a language of your choice
  • ESPHome 2023.11.6 or newer

Known issues and limitations

  • you have to be able to retrofit an Onju Voice PCB inside a 2nd generation Google Nest Mini.
  • the media_player component in ESPHome does not play raw audio coming from Piper TTS. It works with any STT that outputs mp3 by default, though fixed in HA 2023.12
  • the version for microWakeWord is in BETA and probably full of bugs

Installation instructions

Here is a video explaining how to perform the PCB "transplant". You can find some instructions for disassembly here.

To flash the Onju Voice for the first time, you have to do so BEFORE YOU PUT EVERYTHING BACK TOGETHER in the Google Nest Mini housing. Otherwise, you lose access to the USB port.

So, before connecting the board for the first time, hold down the BOOT switch on it and connect a USB cable to your computer. Use the ESPHome web installer to flash according to the config below.

Double check Wifi connection details, API encryption key and device name/friendly name to make sure you use your own.

After the device has been added to ESPHome, if auto discovery is turned on, the device should appear in Home Assistant automatically. Otherwise, check out this guide.

Credits

  • obviously, a huge thanks to Justin Alvey (@justLV) for the excellent Onju Voice project
  • many thanks to Mike Hansen (@synesthesiam) for the relentless work he's put into Year of the Voice at Home Assistant
  • thanks to the ESPHome Discord server members for both creating the most time saving piece of software ever and for helping out with some kinks with the config - in particular @jesserockz, @ssieb and @Hawwa

GithubSponsor or BuyMeCoffee

substitutions:
name: "onju-voice"
friendly_name: "Onju Voice"
wifi_ap_password: ""
esphome:
name: ${name}
friendly_name: ${friendly_name}
name_add_mac_suffix: false
min_version: 2024.2.0
platformio_options:
build_flags: "-DBOARD_HAS_PSRAM"
board_build.arduino.memory_type: qio_opi
on_boot:
then:
- light.turn_on:
id: top_led
effect: slow_pulse
red: 100%
green: 60%
blue: 0%
- wait_until:
condition:
wifi.connected:
- light.turn_on:
id: top_led
effect: pulse
red: 0%
green: 100%
blue: 0%
- wait_until:
condition:
api.connected:
- light.turn_on:
id: top_led
effect: none
red: 0%
green: 100%
blue: 0%
- delay: 1s
- script.execute: reset_led
esp32:
board: esp32-s3-devkitc-1
framework:
type: esp-idf
psram:
mode: octal
speed: 80MHz
logger:
api:
services:
- service: start_va
then:
- voice_assistant.start
- service: stop_va
then:
- voice_assistant.stop
ota:
wifi:
ap:
password: "${wifi_ap_password}"
captive_portal:
globals:
- id: thresh_percent
type: float
initial_value: "0.03"
restore_value: false
- id: touch_calibration_values_left
type: uint32_t[5]
restore_value: false
- id: touch_calibration_values_center
type: uint32_t[5]
restore_value: false
- id: touch_calibration_values_right
type: uint32_t[5]
restore_value: false
interval:
- interval: 1s
then:
- script.execute:
id: calibrate_touch
button: 0
- script.execute:
id: calibrate_touch
button: 1
- script.execute:
id: calibrate_touch
button: 2
i2s_audio:
- i2s_lrclk_pin: GPIO13
i2s_bclk_pin: GPIO18
micro_wake_word:
model: okay_nabu
# model: hey_jarvis
# model: alexa
on_wake_word_detected:
then:
- voice_assistant.start
speaker:
- platform: i2s_audio
id: onju_out
dac_type: external
i2s_dout_pin: GPIO12
microphone:
- platform: i2s_audio
id: onju_microphone
i2s_din_pin: GPIO17
adc_type: external
pdm: false
voice_assistant:
id: va
microphone: onju_microphone
speaker: onju_out
use_wake_word: false
on_listening:
- light.turn_on:
id: top_led
blue: 100%
red: 100%
green: 100%
brightness: 100%
effect: listening
on_stt_vad_end:
- light.turn_on:
id: top_led
blue: 100%
red: 0%
green: 20%
brightness: 70%
effect: processing
on_tts_end:
- light.turn_on:
id: top_led
blue: 0%
red: 20%
green: 100%
effect: speaking
on_end:
- delay: 500ms
- wait_until:
not:
speaker.is_playing: onju_out
- script.execute: reset_led
- if:
condition:
and:
- switch.is_on: use_wake_word
- binary_sensor.is_off: mute_switch
then:
- delay: 200ms
- micro_wake_word.start
on_client_connected:
- if:
condition:
and:
- switch.is_on: use_wake_word
- binary_sensor.is_off: mute_switch
then:
- micro_wake_word.start:
on_client_disconnected:
- if:
condition:
and:
- switch.is_on: use_wake_word
- binary_sensor.is_off: mute_switch
then:
- voice_assistant.stop:
- micro_wake_word.stop:
on_error:
- light.turn_on:
id: top_led
blue: 0%
red: 100%
green: 0%
effect: none
- delay: 1s
- script.execute: reset_led
number:
- platform: template
name: "Touch threshold percentage"
id: touch_threshold_percentage
update_interval: never
entity_category: config
initial_value: 1.25
min_value: -1
max_value: 5
step: 0.25
optimistic: true
on_value:
then:
- lambda: !lambda |-
id(thresh_percent) = 0.01 * x;
esp32_touch:
setup_mode: false
sleep_duration: 2ms
measurement_duration: 800us
low_voltage_reference: 0.8V
high_voltage_reference: 2.4V
filter_mode: IIR_16
debounce_count: 2
noise_threshold: 0
jitter_step: 0
smooth_mode: IIR_2
denoise_grade: BIT8
denoise_cap_level: L0
binary_sensor:
- platform: esp32_touch
id: volume_down
pin: GPIO4
threshold: 539000 # 533156-551132
- platform: esp32_touch
id: volume_up
pin: GPIO2
threshold: 580000 # 575735-593064
- platform: esp32_touch
id: action
pin: GPIO3
threshold: 751000 # 745618-767100
on_click:
- if:
condition:
or:
- switch.is_off: use_wake_word
- binary_sensor.is_on: mute_switch
then:
- logger.log:
tag: "action_click"
format: "Voice assistant is running: %s"
args: ['id(va).is_running() ? "yes" : "no"']
- if:
condition: speaker.is_playing
then:
- speaker.stop
- if:
condition: voice_assistant.is_running
then:
- voice_assistant.stop:
else:
- voice_assistant.start:
else:
- logger.log:
tag: "action_click"
format: "Voice assistant was running with wake word detection enabled. Starting continuously"
- if:
condition: speaker.is_playing
then:
- speaker.stop
- voice_assistant.stop
- delay: 1s
- script.execute: reset_led
- script.wait: reset_led
- voice_assistant.start_continuous:
- platform: gpio
id: mute_switch
pin:
number: GPIO38
mode: INPUT_PULLUP
name: Disable wake word
on_press:
- script.execute: turn_off_wake_word
on_release:
- script.execute: turn_on_wake_word
light:
- platform: esp32_rmt_led_strip
id: leds
pin: GPIO11
chipset: SK6812
num_leds: 6
rgb_order: grb
rmt_channel: 0
default_transition_length: 0s
gamma_correct: 2.8
- platform: partition
id: left_led
segments:
- id: leds
from: 0
to: 0
default_transition_length: 100ms
- platform: partition
id: top_led
segments:
- id: leds
from: 1
to: 4
default_transition_length: 100ms
effects:
- pulse:
name: pulse
transition_length: 250ms
update_interval: 250ms
- pulse:
name: slow_pulse
transition_length: 1s
update_interval: 2s
- addressable_twinkle:
name: listening_ww
twinkle_probability: 1%
- addressable_twinkle:
name: listening
twinkle_probability: 45%
- addressable_scan:
name: processing
move_interval: 80ms
- addressable_flicker:
name: speaking
intensity: 35%
- platform: partition
id: right_led
segments:
- id: leds
from: 5
to: 5
default_transition_length: 100ms
script:
- id: reset_led
then:
- if:
condition:
and:
- switch.is_on: use_wake_word
- binary_sensor.is_off: mute_switch
then:
- light.turn_on:
id: top_led
blue: 100%
red: 100%
green: 0%
brightness: 60%
effect: listening_ww
else:
- light.turn_off: top_led
- id: turn_on_wake_word
then:
- if:
condition:
and:
- binary_sensor.is_off: mute_switch
- switch.is_on: use_wake_word
then:
- micro_wake_word.start
- if:
condition:
speaker.is_playing:
then:
- speaker.stop:
- script.execute: reset_led
else:
- logger.log:
tag: "turn_on_wake_word"
format: "Trying to start listening for wake word, but %s"
args:
[
'id(mute_switch).state ? "mute switch is on" : "use wake word toggle is off"',
]
level: "INFO"
- id: turn_off_wake_word
then:
- micro_wake_word.stop
- script.execute: reset_led
- id: calibrate_touch
parameters:
button: int
then:
- lambda: |-
static uint8_t thresh_indices[3] = {0, 0, 0};
static uint32_t sums[3] = {0, 0, 0};
static uint8_t qsizes[3] = {0, 0, 0};
static uint16_t consecutive_anomalies_per_button[3] = {0, 0, 0};
uint32_t newval;
uint32_t* calibration_values;
switch(button) {
case 0:
newval = id(volume_down).get_value();
calibration_values = id(touch_calibration_values_left);
break;
case 1:
newval = id(action).get_value();
calibration_values = id(touch_calibration_values_center);
break;
case 2:
newval = id(volume_up).get_value();
calibration_values = id(touch_calibration_values_right);
break;
default:
ESP_LOGE("touch_calibration", "Invalid button ID (%d)", button);
return;
}
if(newval == 0) return;
//ESP_LOGD("touch_calibration", "[%d] qsize %d, sum %d, thresh_index %d, consecutive_anomalies %d", button, qsizes[button], sums[button], thresh_indices[button], consecutive_anomalies_per_button[button]);
//ESP_LOGD("touch_calibration", "[%d] New value is %d", button, newval);
if(qsizes[button] == 5) {
float avg = float(sums[button])/float(qsizes[button]);
if((fabs(float(newval)-avg)/avg) > id(thresh_percent)) {
consecutive_anomalies_per_button[button]++;
//ESP_LOGD("touch_calibration", "[%d] %d anomalies detected.", button, consecutive_anomalies_per_button[button]);
if(consecutive_anomalies_per_button[button] < 10)
return;
}
}
//ESP_LOGD("touch_calibration", "[%d] Resetting consecutive anomalies counter.", button);
consecutive_anomalies_per_button[button] = 0;
if(qsizes[button] == 5) {
//ESP_LOGD("touch_calibration", "[%d] Queue full, removing %d.", button, id(touch_calibration_values)[thresh_indices[button]]);
sums[button] -= (uint32_t) *(calibration_values+thresh_indices[button]);// id(touch_calibration_values)[thresh_indices[button]];
qsizes[button]--;
}
*(calibration_values+thresh_indices[button]) = newval;
sums[button] += newval;
qsizes[button]++;
thresh_indices[button] = (thresh_indices[button] + 1) % 5;
//ESP_LOGD("touch_calibration", "[%d] Average value is %d", button, sums[button]/qsizes[button]);
uint32_t newthresh = uint32_t((sums[button]/qsizes[button]) * (1.0 + id(thresh_percent)));
//ESP_LOGD("touch_calibration", "[%d] Setting threshold %d", button, newthresh);
switch(button) {
case 0:
id(volume_down).set_threshold(newthresh);
break;
case 1:
id(action).set_threshold(newthresh);
break;
case 2:
id(volume_up).set_threshold(newthresh);
break;
default:
ESP_LOGE("touch_calibration", "Invalid button ID (%d)", button);
return;
}
switch:
- platform: template
name: Use Wake Word
id: use_wake_word
optimistic: true
restore_mode: RESTORE_DEFAULT_ON
on_turn_on:
- script.execute: turn_on_wake_word
on_turn_off:
- script.execute: turn_off_wake_word
- platform: gpio
id: dac_mute
restore_mode: ALWAYS_OFF
pin:
number: GPIO21
inverted: True
substitutions:
name: "onju-voice"
friendly_name: "Onju Voice"
wifi_ap_password: ""
esphome:
name: ${name}
friendly_name: ${friendly_name}
name_add_mac_suffix: false
min_version: 2023.11.6
on_boot:
then:
- light.turn_on:
id: top_led
effect: slow_pulse
red: 100%
green: 60%
blue: 0%
- wait_until:
condition:
wifi.connected:
- light.turn_on:
id: top_led
effect: pulse
red: 0%
green: 100%
blue: 0%
- wait_until:
condition:
api.connected:
- light.turn_on:
id: top_led
effect: none
red: 0%
green: 100%
blue: 0%
- delay: 1s
- script.execute: reset_led
esp32:
board: esp32-s3-devkitc-1
framework:
type: arduino
logger:
api:
services:
- service: start_va
then:
- voice_assistant.start
- service: stop_va
then:
- voice_assistant.stop
ota:
wifi:
ap:
password: "${wifi_ap_password}"
captive_portal:
globals:
- id: thresh_percent
type: float
initial_value: "0.03"
restore_value: false
- id: touch_calibration_values_left
type: uint32_t[5]
restore_value: false
- id: touch_calibration_values_center
type: uint32_t[5]
restore_value: false
- id: touch_calibration_values_right
type: uint32_t[5]
restore_value: false
interval:
- interval: 1s
then:
- script.execute:
id: calibrate_touch
button: 0
- script.execute:
id: calibrate_touch
button: 1
- script.execute:
id: calibrate_touch
button: 2
i2s_audio:
- i2s_lrclk_pin: GPIO13
i2s_bclk_pin: GPIO18
media_player:
- platform: i2s_audio
name: None
id: onju_out
dac_type: external
i2s_dout_pin: GPIO12
mode: mono
mute_pin:
number: GPIO21
inverted: True
######
# speaker:
# - platform: i2s_audio
# id: onju_out
# dac_type: external
# i2s_dout_pin: GPIO12
# mode: stereo
######
microphone:
- platform: i2s_audio
id: onju_microphone
i2s_din_pin: GPIO17
adc_type: external
pdm: false
voice_assistant:
id: va
microphone: onju_microphone
media_player: onju_out
######
# speaker: onju_out
######
use_wake_word: true
on_listening:
- light.turn_on:
id: top_led
blue: 100%
red: 100%
green: 100%
brightness: 100%
effect: listening
on_stt_vad_end:
- light.turn_on:
id: top_led
blue: 100%
red: 0%
green: 20%
brightness: 70%
effect: processing
on_tts_end:
- media_player.play_media: !lambda return x;
- light.turn_on:
id: top_led
blue: 0%
red: 20%
green: 100%
effect: speaking
on_end:
- delay: 100ms
- wait_until:
not:
media_player.is_playing: onju_out
- script.execute: reset_led
on_client_connected:
- if:
condition:
and:
- switch.is_on: use_wake_word
- binary_sensor.is_off: mute_switch
then:
- voice_assistant.start_continuous:
on_client_disconnected:
- if:
condition:
and:
- switch.is_on: use_wake_word
- binary_sensor.is_off: mute_switch
then:
- voice_assistant.stop:
on_error:
- light.turn_on:
id: top_led
blue: 0%
red: 100%
green: 0%
effect: none
- delay: 1s
- script.execute: reset_led
number:
- platform: template
name: "Touch threshold percentage"
id: touch_threshold_percentage
update_interval: never
entity_category: config
initial_value: 1.25
min_value: -1
max_value: 5
step: 0.25
optimistic: true
on_value:
then:
- lambda: !lambda |-
id(thresh_percent) = 0.01 * x;
esp32_touch:
setup_mode: false
sleep_duration: 2ms
measurement_duration: 800us
low_voltage_reference: 0.8V
high_voltage_reference: 2.4V
filter_mode: IIR_16
debounce_count: 2
noise_threshold: 0
jitter_step: 0
smooth_mode: IIR_2
denoise_grade: BIT8
denoise_cap_level: L0
binary_sensor:
- platform: esp32_touch
id: volume_down
pin: GPIO4
threshold: 539000 # 533156-551132
on_press:
then:
- light.turn_on: left_led
- script.execute:
id: set_volume
volume: -0.05
- delay: 0.75s
- while:
condition:
binary_sensor.is_on: volume_down
then:
- script.execute:
id: set_volume
volume: -0.05
- delay: 150ms
on_release:
then:
- light.turn_off: left_led
- platform: esp32_touch
id: volume_up
pin: GPIO2
threshold: 580000 # 575735-593064
on_press:
then:
- light.turn_on: right_led
- script.execute:
id: set_volume
volume: 0.05
- delay: 0.75s
- while:
condition:
binary_sensor.is_on: volume_up
then:
- script.execute:
id: set_volume
volume: 0.05
- delay: 150ms
on_release:
then:
- light.turn_off: right_led
- platform: esp32_touch
id: action
pin: GPIO3
threshold: 751000 # 745618-767100
on_click:
- if:
condition:
or:
- switch.is_off: use_wake_word
- binary_sensor.is_on: mute_switch
then:
- logger.log:
tag: "action_click"
format: "Voice assistant is running: %s"
args: ['id(va).is_running() ? "yes" : "no"']
- if:
condition: media_player.is_playing
then:
- media_player.stop
- if:
condition: voice_assistant.is_running
then:
- voice_assistant.stop:
else:
- voice_assistant.start:
else:
- logger.log:
tag: "action_click"
format: "Voice assistant was running with wake word detection enabled. Starting continuously"
- if:
condition: media_player.is_playing
then:
- media_player.stop
- voice_assistant.stop
- delay: 1s
- script.execute: reset_led
- script.wait: reset_led
- voice_assistant.start_continuous:
- platform: gpio
id: mute_switch
pin:
number: GPIO38
mode: INPUT_PULLUP
name: Disable wake word
on_press:
- script.execute: turn_off_wake_word
on_release:
- script.execute: turn_on_wake_word
light:
- platform: esp32_rmt_led_strip
id: leds
pin: GPIO11
chipset: SK6812
num_leds: 6
rgb_order: grb
rmt_channel: 0
default_transition_length: 0s
gamma_correct: 2.8
- platform: partition
id: left_led
segments:
- id: leds
from: 0
to: 0
default_transition_length: 100ms
- platform: partition
id: top_led
segments:
- id: leds
from: 1
to: 4
default_transition_length: 100ms
effects:
- pulse:
name: pulse
transition_length: 250ms
update_interval: 250ms
- pulse:
name: slow_pulse
transition_length: 1s
update_interval: 2s
- addressable_lambda:
name: show_volume
update_interval: 50ms
lambda: |-
int int_volume = int(id(onju_out).volume * 100.0f * it.size());
int full_leds = int_volume / 100;
int last_brightness = int_volume % 100;
int i = 0;
for(; i < full_leds; i++) {
it[i] = Color::WHITE;
}
if(i < 4) {
it[i++] = Color(0,0,0).fade_to_white(last_brightness*256/100);
}
for(; i < it.size(); i++) {
it[i] = Color::BLACK;
}
- addressable_twinkle:
name: listening_ww
twinkle_probability: 1%
- addressable_twinkle:
name: listening
twinkle_probability: 45%
- addressable_scan:
name: processing
move_interval: 80ms
- addressable_flicker:
name: speaking
intensity: 35%
- platform: partition
id: right_led
segments:
- id: leds
from: 5
to: 5
default_transition_length: 100ms
script:
- id: reset_led
then:
- if:
condition:
and:
- switch.is_on: use_wake_word
- binary_sensor.is_off: mute_switch
then:
- light.turn_on:
id: top_led
blue: 100%
red: 100%
green: 0%
brightness: 60%
effect: listening_ww
else:
- light.turn_off: top_led
- id: set_volume
mode: restart
parameters:
volume: float
then:
- media_player.volume_set:
id: onju_out
volume: !lambda return clamp(id(onju_out).volume+volume, 0.0f, 1.0f);
- light.turn_on:
id: top_led
effect: show_volume
- delay: 1s
- script.execute: reset_led
- id: turn_on_wake_word
then:
- if:
condition:
and:
- binary_sensor.is_off: mute_switch
- switch.is_on: use_wake_word
then:
- lambda: id(va).set_use_wake_word(true);
- if:
condition:
media_player.is_playing:
then:
- media_player.stop:
- if:
condition:
not:
- voice_assistant.is_running
then:
- voice_assistant.start_continuous
- script.execute: reset_led
else:
- logger.log:
tag: "turn_on_wake_word"
format: "Trying to start listening for wake word, but %s"
args:
[
'id(mute_switch).state ? "mute switch is on" : "use wake word toggle is off"',
]
level: "INFO"
- id: turn_off_wake_word
then:
- voice_assistant.stop
- lambda: id(va).set_use_wake_word(false);
- script.execute: reset_led
- id: calibrate_touch
parameters:
button: int
then:
- lambda: |-
static byte thresh_indices[3] = {0, 0, 0};
static uint32_t sums[3] = {0, 0, 0};
static byte qsizes[3] = {0, 0, 0};
static int consecutive_anomalies_per_button[3] = {0, 0, 0};
uint32_t newval;
uint32_t* calibration_values;
switch(button) {
case 0:
newval = id(volume_down).get_value();
calibration_values = id(touch_calibration_values_left);
break;
case 1:
newval = id(action).get_value();
calibration_values = id(touch_calibration_values_center);
break;
case 2:
newval = id(volume_up).get_value();
calibration_values = id(touch_calibration_values_right);
break;
default:
ESP_LOGE("touch_calibration", "Invalid button ID (%d)", button);
return;
}
if(newval == 0) return;
//ESP_LOGD("touch_calibration", "[%d] qsize %d, sum %d, thresh_index %d, consecutive_anomalies %d", button, qsizes[button], sums[button], thresh_indices[button], consecutive_anomalies_per_button[button]);
//ESP_LOGD("touch_calibration", "[%d] New value is %d", button, newval);
if(qsizes[button] == 5) {
float avg = float(sums[button])/float(qsizes[button]);
if((fabs(float(newval)-avg)/avg) > id(thresh_percent)) {
consecutive_anomalies_per_button[button]++;
//ESP_LOGD("touch_calibration", "[%d] %d anomalies detected.", button, consecutive_anomalies_per_button[button]);
if(consecutive_anomalies_per_button[button] < 10)
return;
}
}
//ESP_LOGD("touch_calibration", "[%d] Resetting consecutive anomalies counter.", button);
consecutive_anomalies_per_button[button] = 0;
if(qsizes[button] == 5) {
//ESP_LOGD("touch_calibration", "[%d] Queue full, removing %d.", button, id(touch_calibration_values)[thresh_indices[button]]);
sums[button] -= (uint32_t) *(calibration_values+thresh_indices[button]);// id(touch_calibration_values)[thresh_indices[button]];
qsizes[button]--;
}
*(calibration_values+thresh_indices[button]) = newval;
sums[button] += newval;
qsizes[button]++;
thresh_indices[button] = (thresh_indices[button] + 1) % 5;
//ESP_LOGD("touch_calibration", "[%d] Average value is %d", button, sums[button]/qsizes[button]);
uint32_t newthresh = uint32_t((sums[button]/qsizes[button]) * (1.0 + id(thresh_percent)));
//ESP_LOGD("touch_calibration", "[%d] Setting threshold %d", button, newthresh);
switch(button) {
case 0:
id(volume_down).set_threshold(newthresh);
break;
case 1:
id(action).set_threshold(newthresh);
break;
case 2:
id(volume_up).set_threshold(newthresh);
break;
default:
ESP_LOGE("touch_calibration", "Invalid button ID (%d)", button);
return;
}
switch:
- platform: template
name: Use Wake Word
id: use_wake_word
optimistic: true
restore_mode: RESTORE_DEFAULT_ON
on_turn_on:
- script.execute: turn_on_wake_word
on_turn_off:
- script.execute: turn_off_wake_word
@tetele
Copy link
Author

tetele commented Jan 16, 2024

Didn't know if you wanted me to load the log up with those so I kept it just the boot up part. Thank you for taking the time to look!

Apologies for the delay @dping28. To be able to help, I have to see the log including the moment when you tap the center pad and nothing happens.

@yahav-bot that's already built into the config. In HA you should be able to call 2 services: esphome.onju_voice_start_va and esphome.onju_voice_stop_va. The names of the services depend on the name of your device, so the onju_voice part could be different for you.

@dping28
Copy link

dping28 commented Jan 16, 2024

@tetele Sorry about that I should have figured I needed to try clicking something. Here you go. I just updated it to the recent firmware update and the first time I pressed it it actually heard the wake work but I couldnt get it to do anything. After that the wake word wouldnt trigger it at all. And still no audio at all only the leds are my source of knowing its doing anything, and the logs. Thanks again! I probably bit off more than I can chew with this project. :) I am no programmer just wanted an alexa replacement ;)

https://dpaste.org/a8rzs

@yahav-bot
Copy link

thank you very much you are great

@tetele
Copy link
Author

tetele commented Jan 19, 2024

@dping28 I've tried numerous times to replicate your issue and I can't. Are you sure your pipeline is running smoothly and that the satellite has proper network access to the HA instance? You need to have all UDP ports open on the HA host. How are you running HA? Consider joining the #voice-assistants channel on the HA Discord server for help.

@dping28
Copy link

dping28 commented Jan 19, 2024

@dping28 I've tried numerous times to replicate your issue and I can't. Are you sure your pipeline is running smoothly and that the satellite has proper network access to the HA instance? You need to have all UDP ports open on the HA host. How are you running HA? Consider joining the #voice-assistants channel on the HA Discord server for help.

I have HA running in a VM in my Unraid server. I have a couple atoms that work with my pipeline. Mics/Speakers are really bad tho. I'll try there thnx for checking!

@saya6k
Copy link

saya6k commented Jan 23, 2024

How can I make it work as bluetooth proxy?
Just add this?

bluetooth_proxy:
  enable: true

@tetele
Copy link
Author

tetele commented Jan 23, 2024

@saya6k you can't have both VA and proxy running on an ESP32. Check out the big, red, warning label here https://www.esphome.io/components/voice_assistant

@BenDavidson90
Copy link

Hi ! Got my Onju Voice PCBs since yesterday and transplanted one in a Nest mini.
It works but not reliably...

In fact, I got a lot of Error: no_wake_word - No wake word detected.
When I plug it, sometimes it works, sometimes it loops into this error: https://dpaste.org/L081b For this log, I just plugged in the Nest mini whithout touching it.
But most of the time it does not fail at start: https://dpaste.org/voz8v

When it works, it is not so reliable: sometimes I can ask it several things, sometimes just one, in the end it always finishes to display this same error...
Here you can see at 20:16:33 that I ask something, it successfuly responded at 20:16:44 but at 20:17:33 the error appear whithout me doing something: https://dpaste.org/Z4UCo
Most of the time when this error comes, I have to touch the top of the Nest Mini to get it work for one or some questions, but sometimes I have to completly reboot Home Assistant...
I also have the impression that the error occurs even more often when I use a custom wake word.

In addition to this error, I often see this one:

[20:18:42][W][component:214]: Component i2s_audio.media_player took a long time for an operation (0.52 s).
[20:18:42][W][component:215]: Components should block for at most 20-30ms.
[20:18:47][W][component:214]: Component i2s_audio.media_player took a long time for an operation (0.37 s).
[20:18:47][W][component:215]: Components should block for at most 20-30ms.
[20:18:47][W][component:214]: Component i2s_audio.media_player took a long time for an operation (0.37 s).
[20:18:47][W][component:215]: Components should block for at most 20-30ms.

Furthermore, I find it very slow... Once the question is asked, the LEDs flash blue for almost 10 seconds + the processing time + the response time. To speed things up I configured faster-whisper + piper TTS locally but it still takes a long time.
I think the computer is not a problem, it's an i7-1165G7 with 32 GiB DDR4. The HA VM has direct access to the CPU and max RAM allocated.
Network is managed by Unifi Hardware with a WiFi AP at two meters from the Nest mini. HA machine is plugged with CAT7 Ethernet cable to the Unifi router.

Also, is it normal that once answered, its LEDs turn purple for a second, then blue for a few seconds and then turn purple again and then we can ask it something? It prevents us to ask a new question rapidly after the first one.

Sorry for this very long message and thanks for your help !

@GovernmentHack
Copy link

Heyya! First off, wanted to say thanks for the guides and all the work on this!

Made 3 of them, and overall they work. However, I am struggling to get the wakeword to consistently work for me for all of them, and I wasn't sure of there was some sort of tweaked configuration that works well.

In your experience, is there any particular wake word, threshold, and trigger settings that work best within openwakeword to allow for consistent wake word activations? Escpecially when trying to use a wake word from 5+ feet away from the device? Is there any other configurations I can adjust within the Onju devices that may help?

Thanks in advance.

@yahav-bot
Copy link

Hi thanks for everything it's just amazing. I wanted to ask if there is a way to make some sound or word after the wake word and not just lights. It is for the use of a blind person
I tried to create an automation but it has a lot of errors

@tetele
Copy link
Author

tetele commented Jan 25, 2024

@BenDavidson90 those are a lot of issues you listed and most have nothing to do with this config

When I plug it, sometimes it works, sometimes it loops into this error: https://dpaste.org/L081b

That's not an error, that's how wake word detection is implemented in ESPHome - every 5 seconds the recorded audio is set to the wake word engine for analysis. If it's not detected, another 5s batch starts. Does it work any different with other satellites you may have?

but sometimes I have to completly reboot Home Assistant...

Have you tried just saying the wake word again? I really don't see any error there.

In addition to this error, I often see this one:

I'm not 100% sure about it, but that might have to do with a slow network connection. The antenna in the Onju Voice board is not the best. Can you try with the satellite closer to your AP?

Once the question is asked, the LEDs flash blue for almost 10 seconds + the processing time + the response time. To speed things up I configured faster-whisper + piper TTS locally but it still takes a long time.

What were you using before Whisper+Piper? The slowness is not due to board or config, but either to your STT/TTS engines and/or network speeds.

I think the computer is not a problem, it's an i7-1165G7 with 32 GiB DDR4.

Without a GPU, it can certainly be a problem, depending on your selected Whisper model and beam size.

Also, is it normal that once answered, its LEDs turn purple for a second, then blue for a few seconds and then turn purple again and then we can ask it something? It prevents us to ask a new question rapidly after the first one.

Debug logs would help, but I think again it has to do with network slowness. Also, I will assume you've flashed the lates version of this config (i.e. the one above) and not an older version or Mark Watt's version (which is basically my older buggy version, but without any copyright notice).

@GovernmentHack

In your experience, is there any particular wake word, threshold, and trigger settings that work best within openwakeword to allow for consistent wake word activations? Escpecially when trying to use a wake word from 5+ feet away from the device? Is there any other configurations I can adjust within the Onju devices that may help?

I've rarely used the wake word, but I've never had issues, even from across the room. The mics are pretty good and very well positioned acoustically.

If you want, you can tweak the 3 config parameters: noise_suppression_level, auto_gain and volume_multiplier. But I can't tell you how, since I haven't had issues. You can turn on audio debugging if you want to see what effect they have.

@yahav-bot

I wanted to ask if there is a way to make some sound or word after the wake word and not just lights.

Unfortunately ESPHome does not support that, but I'll include it as soon as they add it.

@bluenazgul
Copy link

Hello, first i like to say thanks for your work on this code

is there a way to set the default volume to 50% ? On resatrs like power down or update, the volume is everytime back to 100%

@tetele
Copy link
Author

tetele commented Jan 26, 2024

@timothybuchanan
Copy link

There are three touch sensors in the yaml, but only the volume controls respond in my devices. Where is the action touch sensor located?

@bobzer
Copy link

bobzer commented Jan 28, 2024

@timothybuchanan when you look at the side that have the 4 leds, you can see 3 silver circle, it's those. the middle one is the action button, when you touh it you can see it in the log.

@timothybuchanan
Copy link

I don't see any silver circles. I see the four LEDs. With the cord at the bottom, when I touch the left side it will lower the volume, and when I touch the right side it will raise it. If I touch the top of the device (opposite of cord) I get these entries in the log:

[15:30:54][D][binary_sensor:036]: 'action': Sending state ON
[15:31:02][D][voice_assistant:519]: Event Type: 0
[15:31:02][D][voice_assistant:519]: Event Type: 2
[15:31:02][D][voice_assistant:609]: Assist Pipeline ended
[15:31:02][D][voice_assistant:412]: State changed from STREAMING_MICROPHONE to IDLE
[15:31:02][D][voice_assistant:418]: Desired state set to IDLE
[15:31:02][D][voice_assistant:412]: State changed from IDLE to START_PIPELINE
[15:31:02][D][voice_assistant:418]: Desired state set to START_MICROPHONE
[15:31:02][D][voice_assistant:200]: Requesting start...
[15:31:02][D][voice_assistant:412]: State changed from START_PIPELINE to STARTING_PIPELINE
[15:31:02][D][voice_assistant:433]: Client started, streaming microphone
[15:31:02][D][voice_assistant:412]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE
[15:31:02][D][voice_assistant:418]: Desired state set to STREAMING_MICROPHONE
[15:31:02][D][voice_assistant:519]: Event Type: 1
[15:31:02][D][voice_assistant:522]: Assist Pipeline running
[15:31:02][D][light:036]: 'top_led' Setting:
[15:31:02][D][light:059]: Red: 0%, Green: 0%, Blue: 100%
[15:31:02][D][light:085]: Transition length: 1.0s
[15:31:02][D][voice_assistant:519]: Event Type: 9
[15:31:02][D][light:036]: 'top_led' Setting:
[15:31:02][D][light:051]: Brightness: 100%
[15:31:02][D][light:059]: Red: 0%, Green: 0%, Blue: 100%
[15:31:02][D][light:085]: Transition length: 1.0s
[15:31:07][D][voice_assistant:519]: Event Type: 0
[15:31:07][D][voice_assistant:519]: Event Type: 2
[15:31:07][D][voice_assistant:609]: Assist Pipeline ended
[15:31:07][D][voice_assistant:412]: State changed from STREAMING_MICROPHONE to IDLE
[15:31:07][D][voice_assistant:418]: Desired state set to IDLE
[15:31:07][D][voice_assistant:412]: State changed from IDLE to START_PIPELINE
[15:31:07][D][voice_assistant:418]: Desired state set to START_MICROPHONE
[15:31:07][D][voice_assistant:200]: Requesting start...
[15:31:07][D][voice_assistant:412]: State changed from START_PIPELINE to STARTING_PIPELINE
[15:31:07][D][voice_assistant:433]: Client started, streaming microphone
[15:31:07][D][voice_assistant:412]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE
[15:31:07][D][voice_assistant:418]: Desired state set to STREAMING_MICROPHONE
[15:31:07][D][voice_assistant:519]: Event Type: 1
[15:31:07][D][voice_assistant:522]: Assist Pipeline running
[15:31:07][D][light:036]: 'top_led' Setting:
[15:31:07][D][light:059]: Red: 0%, Green: 0%, Blue: 100%
[15:31:07][D][light:085]: Transition length: 1.0s
[15:31:07][D][voice_assistant:519]: Event Type: 9
[15:31:07][D][light:036]: 'top_led' Setting:
[15:31:07][D][light:051]: Brightness: 100%
[15:31:07][D][light:059]: Red: 0%, Green: 0%, Blue: 100%
[15:31:07][D][light:085]: Transition length: 1.0s
[15:31:12][D][voice_assistant:519]: Event Type: 0
[15:31:12][D][voice_assistant:519]: Event Type: 2
[15:31:12][D][voice_assistant:609]: Assist Pipeline ended
[15:31:12][D][voice_assistant:412]: State changed from STREAMING_MICROPHONE to IDLE
[15:31:12][D][voice_assistant:418]: Desired state set to IDLE
[15:31:12][D][voice_assistant:412]: State changed from IDLE to START_PIPELINE
[15:31:12][D][voice_assistant:418]: Desired state set to START_MICROPHONE
[15:31:12][D][voice_assistant:200]: Requesting start...
[15:31:12][D][voice_assistant:412]: State changed from START_PIPELINE to STARTING_PIPELINE
[15:31:12][D][voice_assistant:433]: Client started, streaming microphone
[15:31:12][D][voice_assistant:412]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE
[15:31:12][D][voice_assistant:418]: Desired state set to STREAMING_MICROPHONE
[15:31:12][D][voice_assistant:519]: Event Type: 1
[15:31:12][D][voice_assistant:522]: Assist Pipeline running
[15:31:12][D][light:036]: 'top_led' Setting:
[15:31:12][D][light:059]: Red: 0%, Green: 0%, Blue: 100%
[15:31:12][D][light:085]: Transition length: 1.0s
[15:31:12][D][voice_assistant:519]: Event Type: 9
[15:31:12][D][light:036]: 'top_led' Setting:
[15:31:12][D][light:051]: Brightness: 100%
[15:31:12][D][light:059]: Red: 0%, Green: 0%, Blue: 100%
[15:31:12][D][light:085]: Transition length: 1.0s
[15:31:13][D][binary_sensor:036]: 'action': Sending state OFF

but nothing else happens and I get no response if I speak while touching the device. What is supposed to happened with a touch triggering the sensor 'action'?

@tetele
Copy link
Author

tetele commented Jan 29, 2024

@timothybuchanan unfortunately, ESPHome updates have made it impossible to manually start the pipeline and respond to an action tap when the wake word listening is on. If I'm reading your logs right, you could turn off the wake word (either using the software switch or by using the physical side toggle) and tap the center again. That should start your pipeline manually.

@timothybuchanan
Copy link

the log shows these threshold settings:

[16:29:46][C][esp32_touch:260]: Touch Pad 'volume_down'
[16:29:46][C][esp32_touch:261]: Pad: T4
[16:29:46][C][esp32_touch:262]: Threshold: 392284
[16:29:46][C][esp32_touch:260]: Touch Pad 'volume_up'
[16:29:46][C][esp32_touch:261]: Pad: T2
[16:29:46][C][esp32_touch:262]: Threshold: 461169
[16:29:46][C][esp32_touch:260]: Touch Pad 'action'
[16:29:46][C][esp32_touch:261]: Pad: T3
[16:29:46][C][esp32_touch:262]: Threshold: 592252

but my yaml file has these:

  • platform: esp32_touch
    id: volume_down
    pin: GPIO4
    threshold: 539000 # 533156-551132

  • platform: esp32_touch
    id: volume_up
    pin: GPIO2
    threshold: 580000 # 575735-593064

  • platform: esp32_touch
    id: action
    pin: GPIO3
    threshold: 751000 # 745618-767100

Why the differences?

@vhsdream
Copy link

vhsdream commented Feb 10, 2024

and if I use the media player component I have a lot of gaps and dropouts

I'm trying to leverage the entire onboard PSRAM, which should improve these issues significantly.

I've still been having the same occasional cracks, pops and audio dropouts when playing ambient sleep sounds (I don't think the audio quality is there to make it a music player yet) and even sometimes with TTS responses to commands.

So I've begun to make a pathetic attempt to get PSRAM working on the Onju voice, but I am not sure if I have completely succeeded. The config flashes successfully, and the debug logs seem to indicate success:
image

I then did some testing with one of my noise files and it seems to have cleared up the issue with dropouts and cracks and pops. At least at first. If I reopen the log while one of these files is playing, or stop and play another audio file, that PSRAM Available: YES no longer appears and the gaps and dropouts return. So I guess I have my answer as to whether PSRAM is working or not.

I used the following two 'resources' for trying to set this up:

Github issue

ESPHome How-to in the HA community forum

I think I will enable the Debug component to see if I can figure out what might be going wrong, but I am hoping some others with more advanced knowledge can pick up where I left off.

Editing to add that it now seems stable after a few more changes. I'll include the relevant snippets of the config below.

esphome:
  name: ${name}
  friendly_name: ${friendly_name}
  name_add_mac_suffix: false
  min_version: 2023.11.6
  platformio_options:
    board_build.arduino.memory_type: qio_opi
esp32:
  board: esp32-s3-devkitc-1
  variant: esp32s3
  flash_size: 16MB
  partitions: /config/esphome/custom_16MB.csv
  framework:
    type: arduino

psram:
  mode: octal
  speed: 80MHz

Regarding the 'partitions' key above, I put the file that is here into that path in my HA. I'm not sure if it is necessary (perhaps the 'flash_size' key makes esphome use the correct partition) but it also is not throwing any errors.

When I run this with debug enabled I do see that a very tiny bit of PSRAM is being used when I stream an audio file, as well as when I use the voice assistant. It would be nice if it was fully utilized but right now seems to work ok.

@tetele
Copy link
Author

tetele commented Feb 12, 2024

Why the differences?

@timothybuchanan those values in the config were experimental values according to my own measurements on my own PCB in my own environment. While for most they seem to provide a close enough value, they are by no means useful when trying to use the touch controls.

As a solution, I've implemented self-calibration, where every now and then, the values of the touch sensors are read and averaged to provide a continuous calibration of what the "untouched" values should be. This not only accounts for your own environment, but also if you move your Onju closer to or further from a static charge which would alter the values in time. You can safely ignore the values from the config.

@vhsdream very interesting, thanks for this! I've also tried to get PSRAM working, but at the time I hadn't found enough resources and kinda postponed the idea. I'll try to pick it up again.

What I can tell you about your config above is that the Onju Voice PCB uses an ESP32-S3R8 (see page 10 here) which has no internal flash (it does have a 16MB external QSPI NOR flash module which I haven't used in the config - a W25Q128JVSIQ - which I plan to use to store audio notifications/"dings" for wake word detection once that becomes available in the ESPHome implementation). As such, I don't really understand how your config works with the defined flash_size and partitions.

That said, the part regarding PSRAM is exactly what I've tried myself and it never detected even one byte of PSRAM. You've given me some ideas I will try to investigate further when I get some time to work on this. Thank you!

@vhsdream
Copy link

@vhsdream very interesting, thanks for this! I've also tried to get PSRAM working, but at the time I hadn't found enough resources and kinda postponed the idea. I'll try to pick it up again.

What I can tell you about your config above is that the Onju Voice PCB uses an ESP32-S3R8 (see page 10 here) which has no internal flash (it does have a 16MB external QSPI NOR flash module which I haven't used in the config - a W25Q128JVSIQ - which I plan to use to store audio notifications/"dings" for wake word detection once that becomes available in the ESPHome implementation). As such, I don't really understand how your config works with the defined flash_size and partitions.

That said, the part regarding PSRAM is exactly what I've tried myself and it never detected even one byte of PSRAM. You've given me some ideas I will try to investigate further when I get some time to work on this. Thank you!

Oh you have already demonstrated your better understanding of what is going on compared to what I think I know! I am mostly clueless about the differences between internal and external flash (and I'm thinking now that the part where I added flash_size and partitions is likely doing nothing and can be removed!) and I don't know if esphome is able to deal with that or not.

And my tests with the PSRAM might be incomplete as I had one issue where I was falling asleep to clear white noise audio when it abruptly cut out and did not return. I checked the HA logs the next morning and it looks like the onju might have crashed and restarted, so it's not as stable as I first thought. I'm glad I've given you a bit of inspiration though - good luck!

@vhsdream
Copy link

With ESPHome 2024.2.0 out, which now has microWakeWord, I'm going to see if I can create a config that leverages the PSRAM on the onju to have onboard wake word detection.

@tetele
Copy link
Author

tetele commented Feb 21, 2024

Go ahead, but microWakeWord will bring some limitations along with it.

Since it only works with the esp-idf framework, which is incompatible with the media_player component, switching to speaker will take away the ability to control volume and stream any audio to the satellite.

@vhsdream
Copy link

Oh darn, I must have missed that! Not going to bother then. Where can I read up more about that - I didn't see it on the component page about it only working with esp-idf. Do you think it can be made to work with the Arduino framework eventually?

@fuzzie360
Copy link

fuzzie360 commented Feb 22, 2024

I've taken a look at the microWakeWord component code and it seems on Arduino platform it is missing this header but it is provided by esp-idf platform.

I've seen this stackoverflow saying that Arduino platform could be possible if we add the following line after adding the TensorFlowLite ESP32 library as a dependency:

#include <TensorFlowLite_ESP32.h>

I haven't checked for any other blockers, but I think it is likely we can make microWakeWord work for Arduino platform.

@fuzzie360
Copy link

I have tried the TensorFlowLite ESP32 library as mentioned above, but the library version is too old compared to the idf version.

Compilation error messages
src/esphome/components/micro_wake_word/micro_wake_word.cpp: In member function 'bool esphome::micro_wake_word::MicroWakeWord::initialize_models()':
src/esphome/components/micro_wake_word/micro_wake_word.cpp:234:101: error: no matching function for call to 'tflite::MicroAllocator::Create(uint8_t*&, const uint32_t&)'
       tflite::MicroAllocator::Create(this->streaming_var_arena_, STREAMING_MODEL_VARIABLE_ARENA_SIZE);
                                                                                                     ^
In file included from .piolibdeps/esphome-web-a294cc/TensorFlowLite_ESP32/src/tensorflow/lite/micro/micro_interpreter.h:26,
                 from src/esphome/components/micro_wake_word/micro_wake_word.h:19,
                 from src/esphome/components/micro_wake_word/micro_wake_word.cpp:1:
.piolibdeps/esphome-web-a294cc/TensorFlowLite_ESP32/src/tensorflow/lite/micro/micro_allocator.h:121:26: note: candidate: 'static tflite::MicroAllocator* tflite::MicroAllocator::Create(uint8_t*, size_t, tflite::ErrorReporter*)'
   static MicroAllocator* Create(uint8_t* tensor_arena, size_t arena_size,
                          ^~~~~~
.piolibdeps/esphome-web-a294cc/TensorFlowLite_ESP32/src/tensorflow/lite/micro/micro_allocator.h:121:26: note:   candidate expects 3 arguments, 2 provided
.piolibdeps/esphome-web-a294cc/TensorFlowLite_ESP32/src/tensorflow/lite/micro/micro_allocator.h:128:26: note: candidate: 'static tflite::MicroAllocator* tflite::MicroAllocator::Create(uint8_t*, size_t, tflite::MicroMemoryPlanner*, tflite::ErrorReporter*)'
   static MicroAllocator* Create(uint8_t* tensor_arena, size_t arena_size,
                          ^~~~~~
.piolibdeps/esphome-web-a294cc/TensorFlowLite_ESP32/src/tensorflow/lite/micro/micro_allocator.h:128:26: note:   candidate expects 4 arguments, 2 provided
.piolibdeps/esphome-web-a294cc/TensorFlowLite_ESP32/src/tensorflow/lite/micro/micro_allocator.h:135:26: note: candidate: 'static tflite::MicroAllocator* tflite::MicroAllocator::Create(tflite::SimpleMemoryAllocator*, tflite::MicroMemoryPlanner*, tflite::ErrorReporter*)'
   static MicroAllocator* Create(SimpleMemoryAllocator* memory_allocator,
                          ^~~~~~
.piolibdeps/esphome-web-a294cc/TensorFlowLite_ESP32/src/tensorflow/lite/micro/micro_allocator.h:135:26: note:   candidate expects 3 arguments, 2 provided
src/esphome/components/micro_wake_word/micro_wake_word.cpp:238:117: error: no matching function for call to 'tflite::MicroInterpreter::MicroInterpreter(const tflite::Model*&, tflite::MicroMutableOpResolver<18>&, uint8_t*&, const uint32_t&)'
       this->preprocessor_model_, preprocessor_op_resolver, this->preprocessor_tensor_arena_, PREPROCESSOR_ARENA_SIZE);
                                                                                                                     ^
In file included from src/esphome/components/micro_wake_word/micro_wake_word.h:19,
                 from src/esphome/components/micro_wake_word/micro_wake_word.cpp:1:
.piolibdeps/esphome-web-a294cc/TensorFlowLite_ESP32/src/tensorflow/lite/micro/micro_interpreter.h:61:3: note: candidate: 'tflite::MicroInterpreter::MicroInterpreter(const tflite::Model*, const tflite::MicroOpResolver&, tflite::MicroAllocator*, tflite::ErrorReporter*, tflite::MicroResourceVariables*, tflite::MicroProfiler*)'
   MicroInterpreter(const Model* model, const MicroOpResolver& op_resolver,
   ^~~~~~~~~~~~~~~~
.piolibdeps/esphome-web-a294cc/TensorFlowLite_ESP32/src/tensorflow/lite/micro/micro_interpreter.h:61:3: note:   no known conversion for argument 3 from 'uint8_t*' {aka 'unsigned char*'} to 'tflite::MicroAllocator*'
.piolibdeps/esphome-web-a294cc/TensorFlowLite_ESP32/src/tensorflow/lite/micro/micro_interpreter.h:50:3: note: candidate: 'tflite::MicroInterpreter::MicroInterpreter(const tflite::Model*, const tflite::MicroOpResolver&, uint8_t*, size_t, tflite::ErrorReporter*, tflite::MicroResourceVariables*, tflite::MicroProfiler*)'
   MicroInterpreter(const Model* model, const MicroOpResolver& op_resolver,
   ^~~~~~~~~~~~~~~~
.piolibdeps/esphome-web-a294cc/TensorFlowLite_ESP32/src/tensorflow/lite/micro/micro_interpreter.h:50:3: note:   candidate expects 7 arguments, 4 provided
.piolibdeps/esphome-web-a294cc/TensorFlowLite_ESP32/src/tensorflow/lite/micro/micro_interpreter.h:40:7: note: candidate: 'constexpr tflite::MicroInterpreter::MicroInterpreter(const tflite::MicroInterpreter&)'
 class MicroInterpreter {
       ^~~~~~~~~~~~~~~~
.piolibdeps/esphome-web-a294cc/TensorFlowLite_ESP32/src/tensorflow/lite/micro/micro_interpreter.h:40:7: note:   candidate expects 1 argument, 4 provided
src/esphome/components/micro_wake_word/micro_wake_word.cpp:242:102: error: no matching function for call to 'tflite::MicroInterpreter::MicroInterpreter(const tflite::Model*&, tflite::MicroMutableOpResolver<14>&, uint8_t*&, const uint32_t&, tflite::MicroResourceVariables*&)'
                                                                STREAMING_MODEL_ARENA_SIZE, this->mrv_);
                                                                                                      ^
In file included from src/esphome/components/micro_wake_word/micro_wake_word.h:19,
                 from src/esphome/components/micro_wake_word/micro_wake_word.cpp:1:
.piolibdeps/esphome-web-a294cc/TensorFlowLite_ESP32/src/tensorflow/lite/micro/micro_interpreter.h:61:3: note: candidate: 'tflite::MicroInterpreter::MicroInterpreter(const tflite::Model*, const tflite::MicroOpResolver&, tflite::MicroAllocator*, tflite::ErrorReporter*, tflite::MicroResourceVariables*, tflite::MicroProfiler*)'
   MicroInterpreter(const Model* model, const MicroOpResolver& op_resolver,
   ^~~~~~~~~~~~~~~~
.piolibdeps/esphome-web-a294cc/TensorFlowLite_ESP32/src/tensorflow/lite/micro/micro_interpreter.h:61:3: note:   no known conversion for argument 3 from 'uint8_t*' {aka 'unsigned char*'} to 'tflite::MicroAllocator*'
.piolibdeps/esphome-web-a294cc/TensorFlowLite_ESP32/src/tensorflow/lite/micro/micro_interpreter.h:50:3: note: candidate: 'tflite::MicroInterpreter::MicroInterpreter(const tflite::Model*, const tflite::MicroOpResolver&, uint8_t*, size_t, tflite::ErrorReporter*, tflite::MicroResourceVariables*, tflite::MicroProfiler*)'
   MicroInterpreter(const Model* model, const MicroOpResolver& op_resolver,
   ^~~~~~~~~~~~~~~~
.piolibdeps/esphome-web-a294cc/TensorFlowLite_ESP32/src/tensorflow/lite/micro/micro_interpreter.h:50:3: note:   no known conversion for argument 5 from 'tflite::MicroResourceVariables*' to 'tflite::ErrorReporter*'
.piolibdeps/esphome-web-a294cc/TensorFlowLite_ESP32/src/tensorflow/lite/micro/micro_interpreter.h:40:7: note: candidate: 'constexpr tflite::MicroInterpreter::MicroInterpreter(const tflite::MicroInterpreter&)'
 class MicroInterpreter {
       ^~~~~~~~~~~~~~~~
.piolibdeps/esphome-web-a294cc/TensorFlowLite_ESP32/src/tensorflow/lite/micro/micro_interpreter.h:40:7: note:   candidate expects 1 argument, 5 provided
src/esphome/components/micro_wake_word/micro_wake_word.cpp: In member function 'bool esphome::micro_wake_word::MicroWakeWord::register_preprocessor_ops_(tflite::MicroMutableOpResolver<18>&)':
src/esphome/components/micro_wake_word/micro_wake_word.cpp:453:19: error: 'class tflite::MicroMutableOpResolver<18>' has no member named 'AddDiv'; did you mean 'AddSin'?
   if (op_resolver.AddDiv() != kTfLiteOk)
                   ^~~~~~
                   AddSin
src/esphome/components/micro_wake_word/micro_wake_word.cpp:459:19: error: 'class tflite::MicroMutableOpResolver<18>' has no member named 'AddWindow'; did you mean 'AddSin'?
   if (op_resolver.AddWindow() != kTfLiteOk)
                   ^~~~~~~~~
                   AddSin
src/esphome/components/micro_wake_word/micro_wake_word.cpp:461:19: error: 'class tflite::MicroMutableOpResolver<18>' has no member named 'AddFftAutoScale'
   if (op_resolver.AddFftAutoScale() != kTfLiteOk)
                   ^~~~~~~~~~~~~~~
src/esphome/components/micro_wake_word/micro_wake_word.cpp:463:19: error: 'class tflite::MicroMutableOpResolver<18>' has no member named 'AddRfft'; did you mean 'AddCast'?
   if (op_resolver.AddRfft() != kTfLiteOk)
                   ^~~~~~~
                   AddCast
src/esphome/components/micro_wake_word/micro_wake_word.cpp:465:19: error: 'class tflite::MicroMutableOpResolver<18>' has no member named 'AddEnergy'; did you mean 'AddNeg'?
   if (op_resolver.AddEnergy() != kTfLiteOk)
                   ^~~~~~~~~
                   AddNeg
src/esphome/components/micro_wake_word/micro_wake_word.cpp:467:19: error: 'class tflite::MicroMutableOpResolver<18>' has no member named 'AddFilterBank'
   if (op_resolver.AddFilterBank() != kTfLiteOk)
                   ^~~~~~~~~~~~~
src/esphome/components/micro_wake_word/micro_wake_word.cpp:469:19: error: 'class tflite::MicroMutableOpResolver<18>' has no member named 'AddFilterBankSquareRoot'
   if (op_resolver.AddFilterBankSquareRoot() != kTfLiteOk)
                   ^~~~~~~~~~~~~~~~~~~~~~~
src/esphome/components/micro_wake_word/micro_wake_word.cpp:471:19: error: 'class tflite::MicroMutableOpResolver<18>' has no member named 'AddFilterBankSpectralSubtraction'
   if (op_resolver.AddFilterBankSpectralSubtraction() != kTfLiteOk)
                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/esphome/components/micro_wake_word/micro_wake_word.cpp:473:19: error: 'class tflite::MicroMutableOpResolver<18>' has no member named 'AddPCAN'; did you mean 'AddAddN'?
   if (op_resolver.AddPCAN() != kTfLiteOk)
                   ^~~~~~~
                   AddAddN
src/esphome/components/micro_wake_word/micro_wake_word.cpp:475:19: error: 'class tflite::MicroMutableOpResolver<18>' has no member named 'AddFilterBankLog'
   if (op_resolver.AddFilterBankLog() != kTfLiteOk)
                   ^~~~~~~~~~~~~~~~
*** [.pioenvs/esphome-web-a294cc/src/esphome/components/micro_wake_word/micro_wake_word.cpp.o] Error 1

It looks like making it work with Arduino framework is not likely unless somebody updates the TensorFlowLite ESP32 library.

@yahav-bot
Copy link

Hi guys, I'm trying to install onju voice microwakeword from ha esphome add ons but it gets stuck on Preparing installation and the CPU Usage is at 100 percent. But when I install onju voice without the microwakeword everything works fine. Can someone please direct me?

@tetele
Copy link
Author

tetele commented Mar 8, 2024

@yahav-bot the build takes much longer, especially on an underpowered machine. I've heard reports of 1h30' on a HA Green. I have a pretty big i5 with 32GB RAM and it takes ~5-7 minutes, compared to ~1 minute for the config without MWW.

@yahav-bot
Copy link

yahav-bot commented Mar 11, 2024

Won't it damage the machine if it stays at 100 percent for the Processor so long time?

@bobzer
Copy link

bobzer commented Apr 30, 2024

@yahav-bot It should not damage the cpu, in case of you can watch the cpu temperature or overall temperature of the box. By the way those kind of questions or not specific to onju so you can easily find good answer on the forum or every other support space for home assistant

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment