-
-
Save EverythingSmartHome/055fbdde31a607ef9d695d5cac780e94 to your computer and use it in GitHub Desktop.
esphome: | |
name: esp32-mic-speaker | |
friendly_name: esp32-mic-speaker | |
on_boot: | |
- priority: -100 | |
then: | |
- wait_until: api.connected | |
- delay: 1s | |
- if: | |
condition: | |
switch.is_on: use_wake_word | |
then: | |
- voice_assistant.start_continuous: | |
esp32: | |
board: esp32dev | |
framework: | |
type: esp-idf | |
version: recommended | |
# Enable logging | |
logger: | |
# Enable Home Assistant API | |
api: | |
ota: | |
wifi: | |
ssid: !secret wifi_ssid | |
password: !secret wifi_password | |
# Enable fallback hotspot (captive portal) in case wifi connection fails | |
ap: | |
ssid: "Esp32-Mic-Speaker" | |
password: "9vYvAFzzPjuc" | |
i2s_audio: | |
i2s_lrclk_pin: GPIO27 | |
i2s_bclk_pin: GPIO26 | |
microphone: | |
- platform: i2s_audio | |
id: mic | |
adc_type: external | |
i2s_din_pin: GPIO13 | |
pdm: false | |
speaker: | |
- platform: i2s_audio | |
id: big_speaker | |
dac_type: external | |
i2s_dout_pin: GPIO25 | |
mode: mono | |
voice_assistant: | |
microphone: mic | |
use_wake_word: false | |
noise_suppression_level: 2 | |
auto_gain: 31dBFS | |
volume_multiplier: 2.0 | |
speaker: big_speaker | |
id: assist | |
switch: | |
- platform: template | |
name: Use wake word | |
id: use_wake_word | |
optimistic: true | |
restore_mode: RESTORE_DEFAULT_ON | |
entity_category: config | |
on_turn_on: | |
- lambda: id(assist).set_use_wake_word(true); | |
- if: | |
condition: | |
not: | |
- voice_assistant.is_running | |
then: | |
- voice_assistant.start_continuous | |
on_turn_off: | |
- voice_assistant.stop | |
- lambda: id(assist).set_use_wake_word(false); |
Yes now everything works via github.com/esphome/voice-kit repository, thanks for the explanation.
I have been trying to get a voice assistant setup on an esp32 for months now with the ability to also have the media player. Is this possible? I also would like to use the same pin for both speaker and media player
It is possible with the nabu media player...
It requires an ESP32S3
For my build I'm also using a MAX98357A with 2 Adafruit ICS43434. There should be enough GPIO to run everything on separate pins which is also necessary for proper operation.
Hello! Can I use this code with an INMP441 microphone?
I have been trying to get a voice assistant setup on an esp32 for months now with the ability to also have the media player. Is this possible? I also would like to use the same pin for both speaker and media player
It is possible with the nabu media player...
It requires an ESP32S3
For my build I'm also using a MAX98357A with 2 Adafruit ICS43434. There should be enough GPIO to run everything on separate pins which is also necessary for proper operation.Hello! Can I use this code with an INMP441 microphone?
Hi, the original code and the code that is published last from Wetzel402 works with INMP441. For the code from @Wetzel402 I needed two microphones and modify it a bit (remove parts I don't need).
I have been trying to get a voice assistant setup on an esp32 for months now with the ability to also have the media player. Is this possible? I also would like to use the same pin for both speaker and media player
It is possible with the nabu media player...
It requires an ESP32S3
For my build I'm also using a MAX98357A with 2 Adafruit ICS43434. There should be enough GPIO to run everything on separate pins which is also necessary for proper operation.Hello! Can I use this code with an INMP441 microphone?
Its work with 2 inmp441,
I ‘m using 2 inmp441, max98357, rotary encoder and 12 ring led ws2812b to replicate voice assistant pe
edit some code, use external from https://github.com/esphome/home-assistant-voice-pe
for components:
- media_player
- micro_wake_word
- microphone
- nabu
- voice_assistant
- nabu_microphone
add some code from https://github.com/esphome/home-assistant-voice-pe/blob/dev/home-assistant-voice.yaml
center button, multiple click, rotary, and led are working with few adjustment
Hello!
Thanks for the answer. I have a similar setup, with one microphone for now. esphome/esphome#7672 With these external sources. But with a bigger speaker. I have two problems with it. 1, If the music is loud, can't hear "OK, Nabu". 2, Also, the music is much louder than the TTS. Is there any way to adjust the proportions?
Hello!
Thanks for the answer. I have a similar setup, with one microphone for now. esphome/esphome#7672 With these external sources. But with a bigger speaker. I have two problems with it. 1, If the music is loud, can't hear "OK, Nabu". 2, Also, the music is much louder than the TTS. Is there any way to adjust the proportions?
If you have a big speaker, you must put your mic a littlebit far from the speaker, when it detect wake word, the media player will enter ducking mode to lower the volume so you can talk to the assistant
Hello!
Thanks for the answer. I have a similar setup, with one microphone for now. esphome/esphome#7672 With these external sources. But with a bigger speaker. I have two problems with it. 1, If the music is loud, can't hear "OK, Nabu". 2, Also, the music is much louder than the TTS. Is there any way to adjust the proportions?
I don't think it would work adequately with a speaker of that size, as there is no chip to force voice recognition (voice\speech extraction) in this project, as you would need to shout louder than a speaker with an amplifier. If I were you, I would use bluetooth + amplifier + speaker in one part of the room and the voice assistant in another. Without combining the two. Or move the speaker away from the assistant on the wires.
Ok, thanks! I'm just testing it. I'll make a box for it. And the speaker will be a little further away.
my 1st version of budget home assistant voice is finished..
thx for the idea, discussion and sharing, hope that all external component will be available in main esphome repo soon
component:
- esp32-s3-wroom
- 2x inmp441 mic
- max98357 amp
- 40 mm hifi speaker
- YD-040 rotary encoder (center button + dial)
- ws2812 12 bit ring LED
maybe i will create a github repo later after finishing my 3d print case design
Hello, I tried the code from Wetzel402. The only thing that I modified is psram to octal because I have N16R8 module. Also I use MAX98357A and one INMP441. But ESP cannot connect to Home Assistant. It gives the error "can't connect to esp. please make sure your yaml file contains an 'api:' line." although I have an api line with an encryption key. Any ideas what I'm doing wrong? Or has anyone a good or an alternative yaml code? Including the media player component from nabu. Here is my code:
'
substitutions:
# Phases of the Voice Assistant
# The voice assistant is ready to be triggered by a wake word
voice_assist_idle_phase_id: '1'
# The voice assistant is waiting for a voice command (after being triggered by the wake word)
voice_assist_waiting_for_command_phase_id: '2'
# The voice assistant is listening for a voice command
voice_assist_listening_for_command_phase_id: '3'
# The voice assistant is currently processing the command
### voice_assist_thinking_phase_id: '4'
# The voice assistant is replying to the command
voice_assist_replying_phase_id: '5'
# The voice assistant is not ready
voice_assist_not_ready_phase_id: '10'
# The voice assistant encountered an error
voice_assist_error_phase_id: '11'
device_name: "aaaaaaa"
friendly_name: "aaaaaaa"
device_description: "aaaaaaa"
din: "GPIO11" # MAX98357A - DIN
lrclk_mic: "GPIO15" # MAX98357A & ICS43434 - LRC/LRCL
lrclk_amp: "GPIO9" # MAX98357A & ICS43434 - LRC/LRCL
bclk_mic: "GPIO6" # MAX98357A & ICS43434 - BCLK
bclk_amp: "GPIO10" # MAX98357A & ICS43434 - BCLK
dout: "GPIO7" # ICS43434 - DOUT
# ICS43434 - SEL low = left, high = right
duck: "30" # db reduction for audio ducking
vol: "90%" # volume level at boot
pwr: "GPIO37" # charger output
tablet_battery: "sensor.kitchen_tablet_battery_level"
external_components:
- source:
type: git
url: https://github.com/esphome/voice-kit
ref: dev
components:
- media_player
- micro_wake_word
- microphone
- nabu
- voice_assistant
refresh: 0s
- source:
type: git
url: https://github.com/formatBCE/home-assistant-voice-pe
ref: 48kHz_mic_support
components:
- nabu_microphone
refresh: 0s
esphome:
name: ${device_name}
friendly_name: ${friendly_name}
name_add_mac_suffix: true
min_version: 2024.12.2
platformio_options:
board_build.flash_mode: dio
on_boot:
priority: 375
then:
- media_player.volume_set: ${vol}
- delay: 10min
- if:
condition:
lambda: return id(init_in_progress);
then:
- lambda: id(init_in_progress) = false;
esp32:
board: esp32-s3-devkitc-1
variant: esp32s3
flash_size: 16MB
framework:
type: esp-idf
version: recommended
sdkconfig_options:
CONFIG_ESP32S3_DEFAULT_CPU_FREQ_240: "y"
CONFIG_ESP32S3_DATA_CACHE_64KB: "y"
CONFIG_ESP32S3_DATA_CACHE_LINE_64B: "y"
CONFIG_ESP32S3_INSTRUCTION_CACHE_32KB: "y"
CONFIG_ESP32_S3_BOX_BOARD: "y"
CONFIG_SPIRAM_ALLOW_STACK_EXTERNAL_MEMORY: "y"
CONFIG_SPIRAM_TRY_ALLOCATE_WIFI_LWIP: "y"
# Settings based on https://github.com/espressif/esp-adf/issues/297#issuecomment-783811702
CONFIG_ESP32_WIFI_STATIC_RX_BUFFER_NUM: "16"
CONFIG_ESP32_WIFI_DYNAMIC_RX_BUFFER_NUM: "512"
CONFIG_ESP32_WIFI_STATIC_TX_BUFFER: "y"
CONFIG_ESP32_WIFI_TX_BUFFER_TYPE: "0"
CONFIG_ESP32_WIFI_STATIC_TX_BUFFER_NUM: "8"
CONFIG_ESP32_WIFI_CACHE_TX_BUFFER_NUM: "32"
CONFIG_ESP32_WIFI_AMPDU_TX_ENABLED: "y"
CONFIG_ESP32_WIFI_TX_BA_WIN: "16"
CONFIG_ESP32_WIFI_AMPDU_RX_ENABLED: "y"
CONFIG_ESP32_WIFI_RX_BA_WIN: "32"
CONFIG_LWIP_MAX_ACTIVE_TCP: "16"
CONFIG_LWIP_MAX_LISTENING_TCP: "16"
CONFIG_TCP_MAXRTX: "12"
CONFIG_TCP_SYNMAXRTX: "6"
CONFIG_TCP_MSS: "1436"
CONFIG_TCP_MSL: "60000"
CONFIG_TCP_SND_BUF_DEFAULT: "65535"
CONFIG_TCP_WND_DEFAULT: "65535" # Adjusted from linked settings to avoid compilation error
CONFIG_TCP_RECVMBOX_SIZE: "512"
CONFIG_TCP_QUEUE_OOSEQ: "y"
CONFIG_TCP_OVERSIZE_MSS: "y"
CONFIG_LWIP_WND_SCALE: "y"
CONFIG_TCP_RCV_SCALE: "3"
CONFIG_LWIP_TCPIP_RECVMBOX_SIZE: "512"
CONFIG_BT_ALLOCATION_FROM_SPIRAM_FIRST: "y"
CONFIG_BT_BLE_DYNAMIC_ENV_MEMORY: "y"
psram:
mode: octal # quad for N8R2 and octal for N16R8
speed: 80MHz
globals:
# Global initialization variable. Initialized to true and set to false once everything is connected. Only used to have a smooth "plugging" experience
- id: init_in_progress
type: bool
restore_value: no
initial_value: 'true'
# Global variable tracking the phase of the voice assistant (defined above). Initialized to not_ready
- id: voice_assistant_phase
type: int
restore_value: no
initial_value: ${voice_assist_not_ready_phase_id}
# Global variable storing the first active timer
- id: first_active_timer
type: voice_assistant::Timer
restore_value: false
# Global variable storing the timer finished TTS
- id: timer_tts_str
type: std::string
restore_value: false
logger:
captive_portal:
web_server:
api:
id: api_id
ota:
- platform: esphome
password: "36e011a812d79f51cb886ff8d171eda2"
wifi:
ssid: !secret wifi_ssid
password: !secret wifi_password
ap:
ssid: "Esp32-Mic-Speaker"
password: "9vYvAFzzPjuc"
i2s_audio:
- id: i2s_in
i2s_lrclk_pin: ${lrclk_mic}
i2s_bclk_pin: ${bclk_mic}
- id: i2s_out
i2s_lrclk_pin: ${lrclk_amp}
i2s_bclk_pin: ${bclk_amp}
microphone:
- platform: nabu_microphone
i2s_din_pin: ${dout}
adc_type: external
pdm: false
sample_rate: 48000
bits_per_sample: 32bit
i2s_audio_id: i2s_in
channel_0:
id: mic0
channel_1:
id: mic1
speaker:
- platform: i2s_audio
id: spk
sample_rate: 48000
i2s_dout_pin: ${din}
bits_per_sample: 32bit
i2s_audio_id: i2s_out
dac_type: external
channel: mono
timeout: never
buffer_duration: 100ms
media_player:
- platform: nabu
id: nabu_media_player
name: Media Player
internal: false
speaker:
sample_rate: 48000
volume_increment: 0.05
volume_min: 0.4
volume_max: 1
on_announcement:
- nabu.set_ducking:
decibel_reduction: ${duck}
duration: 0.0s
on_state:
if:
condition:
and:
- switch.is_off: timer_ringing
- not:
voice_assistant.is_running:
- not:
lambda: return id(nabu_media_player)->state == media_player::MediaPlayerState::MEDIA_PLAYER_STATE_ANNOUNCING;
then:
- nabu.set_ducking:
decibel_reduction: 0
duration: 1.0s
files:
- id: timer_finished_sound
file: https://github.com/esphome/home-assistant-voice-pe/raw/dev/sounds/timer_finished.flac
- id: wake_word_triggered_sound
file: https://github.com/esphome/home-assistant-voice-pe/raw/dev/sounds/wake_word_triggered.flac
- id: jack_connected_sound
file: https://github.com/esphome/home-assistant-voice-pe/raw/dev/sounds/jack_connected.flac
- id: jack_disconnected_sound
file: https://github.com/esphome/home-assistant-voice-pe/raw/dev/sounds/jack_disconnected.flac
micro_wake_word:
id: mww
microphone: mic0
models:
- model: hey_jarvis
id: hey_jarvis
- model: https://github.com/kahrendt/microWakeWord/releases/download/stop/stop.json
id: stop
internal: true
vad:
on_wake_word_detected:
# If a timer is ringing: Stop it, do not start the voice assistant (We can stop timer from voice!)
- if:
condition:
switch.is_on: timer_ringing
then:
- switch.turn_off: timer_ringing
# Start voice assistant, stop current announcement.
else:
- if:
condition:
lambda: return id(nabu_media_player)->state == media_player::MediaPlayerState::MEDIA_PLAYER_STATE_ANNOUNCING;
then:
lambda: |-
id(nabu_media_player)
->make_call()
.set_command(media_player::MediaPlayerCommand::MEDIA_PLAYER_COMMAND_STOP)
.set_announcement(true)
.perform();
else:
- script.execute:
id: play_sound
priority: true
sound_file: !lambda return id(wake_word_triggered_sound);
- delay: 500ms
- voice_assistant.start:
voice_assistant:
id: va
microphone: mic1
media_player: nabu_media_player
micro_wake_word: mww
noise_suppression_level: 1
auto_gain: 31dBFS
volume_multiplier: 4.0
on_client_connected:
- lambda: id(init_in_progress) = false;
- script.execute:
id: play_sound
priority: true
sound_file: !lambda return id(jack_connected_sound);
- micro_wake_word.start:
- lambda: id(voice_assistant_phase) = ${voice_assist_idle_phase_id};
on_client_disconnected:
- voice_assistant.stop:
- script.execute:
id: play_sound
priority: true
sound_file: !lambda return id(jack_disconnected_sound);
- lambda: id(voice_assistant_phase) = ${voice_assist_not_ready_phase_id};
on_error:
- if:
condition:
and:
- lambda: return !id(init_in_progress);
- lambda: return code != "duplicate_wake_up_detected";
then:
- lambda: id(voice_assistant_phase) = ${voice_assist_error_phase_id};
# When the voice assistant starts: Play a wake up sound, duck audio.
on_start:
- nabu.set_ducking:
decibel_reduction: ${duck} # Number of dB quieter; higher implies more quiet, 0 implies full volume
duration: 0.0s # The duration of the transition (default is 0)
on_listening:
- lambda: id(voice_assistant_phase) = ${voice_assist_waiting_for_command_phase_id};
on_stt_vad_start:
- lambda: id(voice_assistant_phase) = ${voice_assist_listening_for_command_phase_id};
on_stt_vad_end:
- lambda: id(voice_assistant_phase) = ${voice_assist_thinking_phase_id};
on_tts_start:
- lambda: id(voice_assistant_phase) = ${voice_assist_replying_phase_id};
# Start a script that would potentially enable the stop word if the response is longer than a second
- script.execute: activate_stop_word_if_tts_step_is_long
# When the voice assistant ends ...
on_end:
- wait_until:
not:
voice_assistant.is_running:
# Stop ducking audio.
- nabu.set_ducking:
decibel_reduction: 0 # 0 dB means no reduction
duration: 1.0s
# Stop the script that would potentially enable the stop word if the response is longer than a second
- script.stop: activate_stop_word_if_tts_step_is_long
# Disable the stop word (If the timer is not ringing)
- if:
condition:
switch.is_off: timer_ringing
then:
- lambda: id(stop).disable();
# If the end happened because of an error, let the error phase on for a second
- if:
condition:
lambda: return id(voice_assistant_phase) == ${voice_assist_error_phase_id};
then:
- delay: 1s
# Reset the voice assistant phase id and reset the LED animations.
- lambda: id(voice_assistant_phase) = ${voice_assist_idle_phase_id};
on_timer_finished:
- switch.turn_on: timer_ringing
button:
- platform: restart
id: restart_btn
name: "${friendly_name} REBOOT"
switch:
# Internal switch to track when a timer is ringing on the device.
- platform: template
id: timer_ringing
optimistic: true
internal: true
restore_mode: ALWAYS_OFF
on_turn_off:
# Disable stop wake word
- lambda: id(stop).disable();
# Stop any current annoucement (ie: stop the timer ring mid playback)
- if:
condition:
lambda: return id(nabu_media_player)->state == media_player::MediaPlayerState::MEDIA_PLAYER_STATE_ANNOUNCING;
then:
lambda: |-
id(nabu_media_player)
->make_call()
.set_command(media_player::MediaPlayerCommand::MEDIA_PLAYER_COMMAND_STOP)
.set_announcement(true)
.perform();
# Set back ducking ratio to zero
- nabu.set_ducking:
decibel_reduction: 0
duration: 1.0s
on_turn_on:
# Duck audio
- nabu.set_ducking:
decibel_reduction: ${duck}
duration: 0.0s
# Enable stop wake word
- lambda: id(stop).enable();
# Ring timer
- script.execute: ring_timer
# If 15 minutes have passed and the timer is still ringing, stop it.
- delay: 15min
- switch.turn_off: timer_ringing
# switch to charge tablet
- platform: gpio
name: ${device_name} charger
pin: ${pwr}
id: pwr
icon: mdi:charging_station
sensor:
- platform: homeassistant
entity_id: ${tablet_battery}
id: battery
internal: true
filters:
- heartbeat: 10s
on_value:
- if:
condition:
- lambda: "return id(battery).state < 30;" # '70' is the lower level, change if needed
then:
- switch.turn_on: pwr
else:
- if:
condition:
- lambda: "return id(battery).state > 80 ;" # '80' is the upper level, change if needed
then:
- switch.turn_off: pwr
script:
# Script executed when the timer is ringing, to playback sounds.
- id: ring_timer
then:
- script.execute:
id: timer_tts
- while:
condition:
switch.is_on: timer_ringing
then: # this is required to play the output on a media player
- script.execute:
id: play_sound
priority: true
sound_file: !lambda return id(timer_finished_sound);
- delay: 4s
- homeassistant.service:
service: assist_satellite.announce
data:
entity_id: assist_satellite.assist_media_player_kitchen_assist_satellite
message: !lambda return id(timer_tts_str);
- wait_until:
lambda: |-
return id(nabu_media_player)->state == media_player::MediaPlayerState::MEDIA_PLAYER_STATE_ANNOUNCING;
- wait_until:
not:
lambda: |-
return id(nabu_media_player)->state == media_player::MediaPlayerState::MEDIA_PLAYER_STATE_ANNOUNCING;
# Script executed when we want to play sounds on the device.
- id: play_sound
parameters:
priority: bool
sound_file: "media_player::MediaFile*"
then:
- lambda: |-
if (priority) {
id(nabu_media_player)
->make_call()
.set_command(media_player::MediaPlayerCommand::MEDIA_PLAYER_COMMAND_STOP)
.set_announcement(true)
.perform();
}
if ( (id(nabu_media_player).state != media_player::MediaPlayerState::MEDIA_PLAYER_STATE_ANNOUNCING ) || priority) {
id(nabu_media_player)
->make_call()
.set_announcement(true)
.set_local_media_file(sound_file)
.perform();
}
# Script used to fetch the first active timer (Stored in global first_active_timer)
- id: fetch_first_active_timer
then:
- lambda: |
const auto timers = id(va).get_timers();
auto output_timer = timers.begin()->second;
for (auto &iterable_timer : timers) {
if (iterable_timer.second.is_active && iterable_timer.second.seconds_left <= output_timer.seconds_left) {
output_timer = iterable_timer.second;
}
}
id(first_active_timer) = output_timer;
# Script used to create TTS string for timer
- id: timer_tts
then:
- script.execute:
id: fetch_first_active_timer
- lambda: |-
std::string finished_timer_name = id(first_active_timer).name;
int total_seconds = id(first_active_timer).total_seconds;
// Variables for hours, minutes, and seconds
int hours = total_seconds / 3600;
int minutes = (total_seconds % 3600) / 60;
int seconds = total_seconds % 60;
std::string finished_timer_duration;
std::string result;
// If more than 1 hour
if (hours > 0) {
finished_timer_duration = std::to_string(hours) + " hour" + (hours > 1 ? "s" : "");
if (minutes > 0) {
finished_timer_duration += " " + std::to_string(minutes) + " minute" + (minutes > 1 ? "s" : "");
}
if (seconds > 0) {
finished_timer_duration += " " + std::to_string(seconds) + " second" + (seconds > 1 ? "s" : "");
}
}
// If less than 1 hour but more than 1 minute
else if (minutes > 0) {
finished_timer_duration = std::to_string(minutes) + " minute" + (minutes > 1 ? "s" : "");
if (seconds > 0) {
finished_timer_duration += " " + std::to_string(seconds) + " second" + (seconds > 1 ? "s" : "");
}
}
// If less than 1 minute
else {
finished_timer_duration = std::to_string(seconds) + " second" + (seconds > 1 ? "s" : "");
}
// Construct the final message
if (finished_timer_name.empty()) {
result = finished_timer_duration + "timer finished";
}
else {
result = finished_timer_name + "timer finished";
}
id(timer_tts_str) = result;
# Script used activate the stop word if the TTS step is long.
# Why is this wrapped on a script?
# Becasue we want to stop the sequence if the TTS step is faster than that.
# This allows us to prevent having the deactivation of the stop word before its own activation.
- id: activate_stop_word_if_tts_step_is_long
then:
- delay: 1s
# Enable stop wake word
- lambda: id(stop).enable();`
Try adding your encryption key like this:
api:
encryption:
key: !secret assist_kitchen_api
Try adding your encryption key like this:
api: encryption: key: !secret assist_kitchen_api
I did it directly in code, not with !secret...and it didn't work. Sould I try with !secret file?
Try adding your encryption key like this:
api: encryption: key: !secret assist_kitchen_api
I did it directly in code, not with !secret...and it didn't work. Sould I try with !secret file?
That shouldn't mater. Can you give us your logs?
Try adding your encryption key like this:
api: encryption: key: !secret assist_kitchen_api
I did it directly in code, not with !secret...and it didn't work. Sould I try with !secret file?
That shouldn't mater. Can you give us your logs?
I figured out, I needed to delete the sensor for tablet battery. Now I'm adapting the code for only one microphone and also for adding a LED.
I see that you have 2 channels for the microphone. One channel I think is for the wakeword and the second one for the voice command. How can I use one microphone?
If you only have one microphone I think you can just use the standard old mic example instead. I think the nabu mic requires two.
Yes, for using nabu mic you must have 2 microphones. But how is the connection diagram for 2 mics? I have 2 INMP441 mics.
Here I've made a schematic and assembled the device:
I want to warn about inflated expectations (maybe only mine).
As a device that stands on a table in a room where you have soft furniture, the floor is carpeted or padded and you communicate at a distance of 2 meters - it is acceptable. With the wake up word “ok nabu”, jarvis is in second place, alexa is practically deaf (too bad it's the only word that in theory isn't too cringe-worthy to say every day (who or what is nabu I don't know). To me as a resident of eastern Europe - everything except alexa is depressing (sorry).
The problems will start when you move 3-5 meters away, or when you put the device in a corridor (another room with no furniture) where there is echo and sound re-reflection is a big problem.
If someone is talking or making noise or knocking or playing music, it's a deaf device.
To summarize. Most likely I will bet on openwakeword - I think more complex neural networks will be able to filter noise, echo, suppress interference and so on. A large model on computer hardware will definitely win - without complex hardware of the device.. Also nobody canceled software DSP.
What is missing here:
- microphone array (4-6) - direction detection and amplification. In this project there are two of them - 1 listens to the wake-up, the second listens to the command.
- acoustic Echo Cancellation (AEC)
- blind Source Separation (BSS)
- noise Suppression (NS)
Therefore, the project voice pe - has XMOS and aic3204, which at the hardware level on the fly trying to improve and increase the sound that you do not shout in an empty corridor 4 times from 5 meters.
Thanks. Those two capacitors on 3.3 V pin are mandatory? I connected two INMP441 mics, one with L/R pin to ground and the other one L/R to 3.3V pin. It worked for some hours in testing, after that one of them (I think the one with L/R pin to 3.3V) stopped working and now is dead. So now the wake word is working but no voice command is received. Also, the L1 is a resistor? I'm sorry I'm not so good at electronic diagrams.
Thanks. Those two capacitors on 3.3 V pin are mandatory? I connected two INMP441 mics, one with L/R pin to ground and the other one L/R to 3.3V pin. It worked for some hours in testing, after that one of them (I think the one with L/R pin to 3.3V) stopped working and now is dead. So now the wake word is working but no voice command is received. Also, the L1 is a resistor? I'm sorry I'm not so good at electronic diagrams.
No it is not necessary, capacitors and inductors (l1, l2), I reinsured against dirty power supply. in your case most likely they are of poor quality or you mixed up and the voltage of 5v got to the microphones.
I though to reply to this with some of my findings. It has really taken me many many months to get all of this working flawlesly. I first tried using whisper on dell optiplex 3020 but as it turns out it is not fast enough to run. Luckily I recently found the perfect solution https://github.com/einToast/openai_stt_ha available from HACS create separate account for platform.openai.com and ditch 10 usd to it. It costs 0.006 usd per minute so even with heavy use it should not cost more than 1 usd per month to use. And since the account is not linked you get some privacy as well.
I find these settings useful:
voice_assistant:
noise_suppression_level: 3.0
auto_gain: 31dBFS
volume_multiplier: 12.0
I for a long time had trouble with the assistant not responding and I realized I had to solder the dupont connectors as shown. Even after soldering I got some stutter on speaker and only after replacing the i2s dupont wires and resoldering they went away. Alot of these dupont wires have issues so be sure to check them.
I had some issues with wifi signal strength on esp32s3 N16R8 so I ended up cutting the PCB antenna trace with scissors and soldered slightly better antenna to scratched PCB traces on sides of the antenna. A bit of a nasty way of doing things but it improves wifi connection to acceptable level.
I have 4 scenes: coming_home, leaving_home, sleep, waking. Then for each I have automations with sentences like: Good morning, I am up, I'm up, Morning. Going out automation: Leaving home, I'm leaving home, I'm leaving, going out, I'm going out, I am leaving.
I used the tie-to-vcc gain setting of MAX98357, which is pretty good and not too loud if you need to watch out for neighbors at night.
Speaker is https://www.aliexpress.com/item/1005006524208930.html
I would say the this is not quite as good as Google Nest (not a native English speaker) but manageable it I speak out clearly enough and not too far away.
my 1st version of budget home assistant voice is finished..
thx for the idea, discussion and sharing, hope that all external component will be available in main esphome repo soon
component:
- esp32-s3-wroom
- 2x inmp441 mic
- max98357 amp
- 40 mm hifi speaker
- YD-040 rotary encoder (center button + dial)
- ws2812 12 bit ring LED
maybe i will create a github repo later after finishing my 3d print case design
Does the wake up work well for you? I noticed that if the media player is louder, it doesn't listen to the wake-up call. I got a smaller speaker. :) My components are mostly the same as yours. And otherwise, it would work fine.
component:
esp32-s3-wroom
2x inmp441 mic
max98357 amp
JBL satellite speaker (not big)
ws2812 16 bit ring LED
mic settings:
microphone:
- platform: nabu_microphone
i2s_din_pin: GPIO4
adc_type: external
pdm: false
sample_rate: 48000
bits_per_sample: 32bit
i2s_audio_id: i2s_in
channel_0:
id: mic0
amplify_shift: 2
channel_1:
id: mic1
amplify_shift: 0
va:
voice_assistant:
id: va
microphone: mic1
media_player: nabu_media_player
micro_wake_word: mww
noise_suppression_level: 1
auto_gain: 31dBFS
volume_multiplier: 4.0
my 1st version of budget home assistant voice is finished..
thx for the idea, discussion and sharing, hope that all external component will be available in main esphome repo soon
component:
- esp32-s3-wroom
- 2x inmp441 mic
- max98357 amp
- 40 mm hifi speaker
- YD-040 rotary encoder (center button + dial)
- ws2812 12 bit ring LED
maybe i will create a github repo later after finishing my 3d print case design
Does the wake up work well for you? I noticed that if the media player is louder, it doesn't listen to the wake-up call. I got a smaller speaker. :) My components are mostly the same as yours. And otherwise, it would work fine. component:
esp32-s3-wroom 2x inmp441 mic max98357 amp JBL satellite speaker (not big) ws2812 16 bit ring LED
mic settings: microphone:
- platform: nabu_microphone
i2s_din_pin: GPIO4
adc_type: external
pdm: false
sample_rate: 48000
bits_per_sample: 32bit
i2s_audio_id: i2s_in
channel_0:
id: mic0
amplify_shift: 2
channel_1:
id: mic1
amplify_shift: 0va: voice_assistant: id: va microphone: mic1 media_player: nabu_media_player micro_wake_word: mww noise_suppression_level: 1 auto_gain: 31dBFS volume_multiplier: 4.0
its the hardware limitation, because lack of xmos chip (pe, respeaker lite has it) for noise suppression, so we have to speak louder than the media play, the mic and speaker placement will help a bit
Hello, here is my working code for a configuration with ESP32-S3, two microphones inmp441 and one max98357 amp. Also, I used a led strip and a speaker. This code is an adaptation from the original code, excluded the mute button and the button on top. Unfortunately, I wanted to modify the code to turn on lights if a media file is played, but I found that something is changed in the external components: Could not find init.py file for component media_player. Please check the component is defined by this source (search path: /data/external_components/989fb39b/esphome/components/media_player/init.py). and I don't know how to resolve this. Also, it seems that they changed the original code, and for the media player component they are using now speaker platform instead nabu.
substitutions:
# Phases of the Voice Assistant
# The voice assistant is ready to be triggered by a wake word
voice_assist_idle_phase_id: '1'
# The voice assistant is waiting for a voice command (after being triggered by the wake word)
voice_assist_waiting_for_command_phase_id: '2'
# The voice assistant is listening for a voice command
voice_assist_listening_for_command_phase_id: '3'
# The voice assistant is currently processing the command
voice_assist_thinking_phase_id: '4'
# The voice assistant is replying to the command
voice_assist_replying_phase_id: '5'
# The voice assistant is not ready
voice_assist_not_ready_phase_id: '10'
# The voice assistant encountered an error
voice_assist_error_phase_id: '11'
device_name: "home-assistant-voice"
friendly_name: "home-assistant-voice"
device_description: "home-assistant-voice"
din: "GPIO8" # MAX98357A - DIN
lrclk_mic: "GPIO3" # MAX98357A & ICS43434 - LRC/LRCL
lrclk_amp: "GPIO6" # MAX98357A & ICS43434 - LRC/LRCL
bclk_mic: "GPIO2" # MAX98357A & ICS43434 - BCLK
bclk_amp: "GPIO7" # MAX98357A & ICS43434 - BCLK
dout: "GPIO4" # ICS43434 - DOUT
# ICS43434 - SEL low = left, high = right
duck: "30" # db reduction for audio ducking
vol: "90%" # volume level at boot
external_components:
- source:
type: git
url: https://github.com/esphome/voice-kit
ref: dev
components:
- media_player
- micro_wake_word
- microphone
- nabu
- voice_assistant
refresh: 0s
- source:
type: git
url: https://github.com/formatBCE/home-assistant-voice-pe
ref: 48kHz_mic_support
components:
- nabu_microphone
refresh: 0s
esphome:
name: ${device_name}
friendly_name: ${friendly_name}
name_add_mac_suffix: true
min_version: 2024.12.2
platformio_options:
board_build.flash_mode: dio
on_boot:
priority: 375
then:
- script.execute: control_leds
- media_player.volume_set: ${vol}
- delay: 10min
- if:
condition:
lambda: return id(init_in_progress);
then:
- lambda: id(init_in_progress) = false;
- script.execute: control_leds
esp32:
board: esp32-s3-devkitc-1
variant: esp32s3
flash_size: 16MB
framework:
type: esp-idf
version: recommended
sdkconfig_options:
CONFIG_ESP32S3_DEFAULT_CPU_FREQ_240: "y"
CONFIG_ESP32S3_DATA_CACHE_64KB: "y"
CONFIG_ESP32S3_DATA_CACHE_LINE_64B: "y"
CONFIG_ESP32S3_INSTRUCTION_CACHE_32KB: "y"
CONFIG_ESP32_S3_BOX_BOARD: "y"
CONFIG_SPIRAM_ALLOW_STACK_EXTERNAL_MEMORY: "y"
CONFIG_SPIRAM_TRY_ALLOCATE_WIFI_LWIP: "y"
# Settings based on https://github.com/espressif/esp-adf/issues/297#issuecomment-783811702
CONFIG_ESP32_WIFI_STATIC_RX_BUFFER_NUM: "16"
CONFIG_ESP32_WIFI_DYNAMIC_RX_BUFFER_NUM: "512"
CONFIG_ESP32_WIFI_STATIC_TX_BUFFER: "y"
CONFIG_ESP32_WIFI_TX_BUFFER_TYPE: "0"
CONFIG_ESP32_WIFI_STATIC_TX_BUFFER_NUM: "8"
CONFIG_ESP32_WIFI_CACHE_TX_BUFFER_NUM: "32"
CONFIG_ESP32_WIFI_AMPDU_TX_ENABLED: "y"
CONFIG_ESP32_WIFI_TX_BA_WIN: "16"
CONFIG_ESP32_WIFI_AMPDU_RX_ENABLED: "y"
CONFIG_ESP32_WIFI_RX_BA_WIN: "32"
CONFIG_LWIP_MAX_ACTIVE_TCP: "16"
CONFIG_LWIP_MAX_LISTENING_TCP: "16"
CONFIG_TCP_MAXRTX: "12"
CONFIG_TCP_SYNMAXRTX: "6"
CONFIG_TCP_MSS: "1436"
CONFIG_TCP_MSL: "60000"
CONFIG_TCP_SND_BUF_DEFAULT: "65535"
CONFIG_TCP_WND_DEFAULT: "65535" # Adjusted from linked settings to avoid compilation error
CONFIG_TCP_RECVMBOX_SIZE: "512"
CONFIG_TCP_QUEUE_OOSEQ: "y"
CONFIG_TCP_OVERSIZE_MSS: "y"
CONFIG_LWIP_WND_SCALE: "y"
CONFIG_TCP_RCV_SCALE: "3"
CONFIG_LWIP_TCPIP_RECVMBOX_SIZE: "512"
CONFIG_BT_ALLOCATION_FROM_SPIRAM_FIRST: "y"
CONFIG_BT_BLE_DYNAMIC_ENV_MEMORY: "y"
psram:
mode: octal # quad for N8R2 and octal for N16R8
speed: 80MHz
globals:
# Global index for our LEDs. So that switching between different animation does not lead to unwanted effects.
- id: global_led_animation_index
type: int
restore_value: no
initial_value: '0'
# Global variable tracking if the LED color was recently changed.
- id: color_changed
type: bool
restore_value: no
initial_value: 'false'
# Global initialization variable. Initialized to true and set to false once everything is connected. Only used to have a smooth "plugging" experience
- id: init_in_progress
type: bool
restore_value: no
initial_value: 'true'
# Global variable tracking the phase of the voice assistant (defined above). Initialized to not_ready
- id: voice_assistant_phase
type: int
restore_value: no
initial_value: ${voice_assist_not_ready_phase_id}
# Global variable storing the first active timer
- id: first_active_timer
type: voice_assistant::Timer
restore_value: false
# Global variable storing the timer finished TTS
- id: timer_tts_str
type: std::string
restore_value: false
# Global variable storing if a timer is active
- id: is_timer_active
type: bool
restore_value: false
logger:
level: DEBUG
logs:
sensor: WARN # avoids logging debug sensor updates
ota:
- platform: esphome
id: ota_esphome
debug:
update_interval: 5s
captive_portal:
web_server:
port: 80
api:
id: api_id
encryption:
key: !secret test-va_api
on_client_connected:
- script.execute: control_leds
on_client_disconnected:
- script.execute: control_leds
wifi:
id: wifi_id
ssid: !secret wifi_ssid
password: !secret wifi_password
ap:
ssid: "Esp32-Mic-Speaker"
password: "9vYvAFzzPjuc"
i2s_audio:
- id: i2s_in
i2s_lrclk_pin: ${lrclk_mic}
i2s_bclk_pin: ${bclk_mic}
- id: i2s_out
i2s_lrclk_pin: ${lrclk_amp}
i2s_bclk_pin: ${bclk_amp}
microphone:
- platform: nabu_microphone
i2s_din_pin: ${dout}
adc_type: external
pdm: false
sample_rate: 48000
bits_per_sample: 32bit
i2s_audio_id: i2s_in
channel_0:
id: mic0
channel_1:
id: mic1
speaker:
- platform: i2s_audio
id: spk
sample_rate: 48000
i2s_dout_pin: ${din}
bits_per_sample: 32bit
i2s_audio_id: i2s_out
dac_type: external
channel: mono
timeout: never
buffer_duration: 100ms
media_player:
- platform: nabu
id: nabu_media_player
name: Media Player
internal: false
speaker:
sample_rate: 48000
volume_increment: 0.05
volume_min: 0.4
volume_max: 1
on_announcement:
- nabu.set_ducking:
decibel_reduction: ${duck}
duration: 0.0s
on_state:
if:
condition:
and:
- switch.is_off: timer_ringing
- not:
voice_assistant.is_running:
- not:
lambda: return id(nabu_media_player)->state == media_player::MediaPlayerState::MEDIA_PLAYER_STATE_ANNOUNCING;
then:
- nabu.set_ducking:
decibel_reduction: 0
duration: 1.0s
files:
- id: timer_finished_sound
file: https://github.com/esphome/home-assistant-voice-pe/raw/dev/sounds/timer_finished.flac
- id: wake_word_triggered_sound
file: https://github.com/esphome/home-assistant-voice-pe/raw/dev/sounds/wake_word_triggered.flac
- id: jack_connected_sound
file: https://github.com/esphome/home-assistant-voice-pe/raw/dev/sounds/jack_connected.flac
- id: jack_disconnected_sound
file: https://github.com/esphome/home-assistant-voice-pe/raw/dev/sounds/jack_disconnected.flac
micro_wake_word:
id: mww
microphone: mic0
models:
- model: https://github.com/kahrendt/microWakeWord/releases/download/okay_nabu_20241226.3/okay_nabu.json
id: okay_nabu
- model: hey_jarvis
id: hey_jarvis
- model: hey_mycroft
id: hey_mycroft
- model: https://github.com/kahrendt/microWakeWord/releases/download/stop/stop.json
id: stop
internal: true
vad:
on_wake_word_detected:
# If a timer is ringing: Stop it, do not start the voice assistant (We can stop timer from voice!)
- if:
condition:
switch.is_on: timer_ringing
then:
- switch.turn_off: timer_ringing
# Start voice assistant, stop current announcement.
else:
- if:
condition:
lambda: return id(nabu_media_player)->state == media_player::MediaPlayerState::MEDIA_PLAYER_STATE_ANNOUNCING;
then:
lambda: |-
id(nabu_media_player)
->make_call()
.set_command(media_player::MediaPlayerCommand::MEDIA_PLAYER_COMMAND_STOP)
.set_announcement(true)
.perform();
else:
- script.execute:
id: play_sound
priority: true
sound_file: !lambda return id(wake_word_triggered_sound);
- delay: 500ms
- voice_assistant.start:
voice_assistant:
id: va
microphone: mic1
media_player: nabu_media_player
micro_wake_word: mww
noise_suppression_level: 1
auto_gain: 31dBFS
volume_multiplier: 4.0
on_client_connected:
- lambda: id(init_in_progress) = false;
- script.execute:
id: play_sound
priority: true
sound_file: !lambda return id(jack_connected_sound);
- micro_wake_word.start:
- lambda: id(voice_assistant_phase) = ${voice_assist_idle_phase_id};
- script.execute: control_leds
on_client_disconnected:
- voice_assistant.stop:
- script.execute:
id: play_sound
priority: true
sound_file: !lambda return id(jack_disconnected_sound);
- lambda: id(voice_assistant_phase) = ${voice_assist_not_ready_phase_id};
- script.execute: control_leds
on_error:
- if:
condition:
and:
- lambda: return !id(init_in_progress);
- lambda: return code != "duplicate_wake_up_detected";
then:
- lambda: id(voice_assistant_phase) = ${voice_assist_error_phase_id};
- script.execute: control_leds
# When the voice assistant starts: Play a wake up sound, duck audio.
on_start:
- nabu.set_ducking:
decibel_reduction: ${duck} # Number of dB quieter; higher implies more quiet, 0 implies full volume
duration: 0.0s # The duration of the transition (default is 0)
on_listening:
- lambda: id(voice_assistant_phase) = ${voice_assist_waiting_for_command_phase_id};
- script.execute: control_leds
on_stt_vad_start:
- lambda: id(voice_assistant_phase) = ${voice_assist_listening_for_command_phase_id};
- script.execute: control_leds
on_stt_vad_end:
- lambda: id(voice_assistant_phase) = ${voice_assist_thinking_phase_id};
- script.execute: control_leds
on_tts_start:
- lambda: id(voice_assistant_phase) = ${voice_assist_replying_phase_id};
- script.execute: control_leds
# Start a script that would potentially enable the stop word if the response is longer than a second
- script.execute: activate_stop_word_if_tts_step_is_long
# When the voice assistant ends ...
on_end:
- wait_until:
not:
voice_assistant.is_running:
# Stop ducking audio.
- nabu.set_ducking:
decibel_reduction: 0 # 0 dB means no reduction
duration: 1.0s
# Stop the script that would potentially enable the stop word if the response is longer than a second
- script.stop: activate_stop_word_if_tts_step_is_long
# Disable the stop word (If the timer is not ringing)
- if:
condition:
switch.is_off: timer_ringing
then:
- lambda: id(stop).disable();
# If the end happened because of an error, let the error phase on for a second
- if:
condition:
lambda: return id(voice_assistant_phase) == ${voice_assist_error_phase_id};
then:
- delay: 1s
# Reset the voice assistant phase id and reset the LED animations.
- lambda: id(voice_assistant_phase) = ${voice_assist_idle_phase_id};
- script.execute: control_leds
on_timer_finished:
- switch.turn_on: timer_ringing
on_timer_started:
- script.execute: control_leds
on_timer_cancelled:
- script.execute: control_leds
on_timer_updated:
- script.execute: control_leds
on_timer_tick:
- script.execute: control_leds
button:
- platform: restart
id: restart_btn
name: "${friendly_name} REBOOT"
switch:
# Internal switch to track when a timer is ringing on the device.
- platform: template
id: timer_ringing
optimistic: true
internal: true
restore_mode: ALWAYS_OFF
on_turn_off:
# Disable stop wake word
- lambda: id(stop).disable();
# Stop any current annoucement (ie: stop the timer ring mid playback)
- if:
condition:
lambda: return id(nabu_media_player)->state == media_player::MediaPlayerState::MEDIA_PLAYER_STATE_ANNOUNCING;
then:
lambda: |-
id(nabu_media_player)
->make_call()
.set_command(media_player::MediaPlayerCommand::MEDIA_PLAYER_COMMAND_STOP)
.set_announcement(true)
.perform();
# Set back ducking ratio to zero
- nabu.set_ducking:
decibel_reduction: 0
duration: 1.0s
on_turn_on:
# Duck audio
- nabu.set_ducking:
decibel_reduction: ${duck}
duration: 0.0s
# Enable stop wake word
- lambda: id(stop).enable();
# Ring timer
- script.execute: ring_timer
# If 15 minutes have passed and the timer is still ringing, stop it.
- delay: 15min
- switch.turn_off: timer_ringing
script:
# Master script controlling the LEDs, based on different conditions : initialization in progress, wifi and api connected and voice assistant phase.
# For the sake of simplicity and re-usability, the script calls child scripts defined below.
# This script will be called every time one of these conditions is changing.
- id: control_leds
then:
- lambda: |
id(check_if_timers_active).execute();
if (id(is_timer_active)){
id(fetch_first_active_timer).execute();
}
if (id(init_in_progress)) {
id(control_leds_init_state).execute();
} else if (!id(wifi_id).is_connected() || !id(api_id).is_connected()){
id(control_leds_no_ha_connection_state).execute();
} else if (id(timer_ringing).state) {
id(control_leds_timer_ringing).execute();
} else if (id(voice_assistant_phase) == ${voice_assist_waiting_for_command_phase_id}) {
id(control_leds_voice_assistant_waiting_for_command_phase).execute();
} else if (id(voice_assistant_phase) == ${voice_assist_listening_for_command_phase_id}) {
id(control_leds_voice_assistant_listening_for_command_phase).execute();
} else if (id(voice_assistant_phase) == ${voice_assist_thinking_phase_id}) {
id(control_leds_voice_assistant_thinking_phase).execute();
} else if (id(voice_assistant_phase) == ${voice_assist_replying_phase_id}) {
id(control_leds_voice_assistant_replying_phase).execute();
} else if (id(voice_assistant_phase) == ${voice_assist_error_phase_id}) {
id(control_leds_voice_assistant_error_phase).execute();
} else if (id(voice_assistant_phase) == ${voice_assist_not_ready_phase_id}) {
id(control_leds_voice_assistant_not_ready_phase).execute();
} else if (id(is_timer_active)) {
id(control_leds_timer_ticking).execute();
} else if (id(voice_assistant_phase) == ${voice_assist_idle_phase_id}) {
id(control_leds_voice_assistant_idle_phase).execute();
}
# Script executed when the device has no connection to Home Assistant
# Red Twinkle (This will be visible during HA updates for example)
- id: control_leds_no_ha_connection_state
then:
- light.turn_on:
brightness: 66%
red: 1
green: 0
blue: 0
id: voice_assistant_leds
effect: "Twinkle"
# Script executed during initialization
# Blue Twinkle if Wifi is connected, Else solid warm white
- id: control_leds_init_state
then:
- if:
condition:
wifi.connected:
then:
- light.turn_on:
brightness: 66%
red: 9.4%
green: 73.3%
blue: 94.9%
id: voice_assistant_leds
effect: "Twinkle"
else:
- light.turn_on:
brightness: 66%
red: 100%
green: 89%
blue: 71%
id: voice_assistant_leds
effect: "none"
# Script executed when the voice assistant is idle (waiting for a wake word)
# Nothing (Either LED ring off or LED ring on if the user decided to turn the user facing LED ring on)
- id: control_leds_voice_assistant_idle_phase
then:
- light.turn_off: voice_assistant_leds
- if:
condition:
light.is_on: led_ring
then:
light.turn_on: led_ring
# Script executed when the voice assistant is waiting for a command (After the wake word)
# Slow clockwise spin of the LED ring.
- id: control_leds_voice_assistant_waiting_for_command_phase
then:
- light.turn_on:
brightness: !lambda return max( id(led_ring).current_values.get_brightness() , 0.2f );
id: voice_assistant_leds
effect: "Waiting for Command"
# Script executed when the voice assistant is listening to a command
# Fast clockwise spin of the LED ring.
- id: control_leds_voice_assistant_listening_for_command_phase
then:
- light.turn_on:
brightness: !lambda return max( id(led_ring).current_values.get_brightness() , 0.2f );
id: voice_assistant_leds
effect: "Listening For Command"
# Script executed when the voice assistant is thinking to a command
# The spin stops and the 2 LEDs that are currently on and blinking indicating the commend is being processed.
- id: control_leds_voice_assistant_thinking_phase
then:
- light.turn_on:
brightness: !lambda return max( id(led_ring).current_values.get_brightness() , 0.2f );
id: voice_assistant_leds
effect: "Thinking"
# Script executed when the voice assistant is thinking to a command
# Fast anticlockwise spin of the LED ring.
- id: control_leds_voice_assistant_replying_phase
then:
- light.turn_on:
brightness: !lambda return max( id(led_ring).current_values.get_brightness() , 0.2f );
id: voice_assistant_leds
effect: "Replying"
# Script executed when the voice assistant is in error
# Fast Red Pulse
- id: control_leds_voice_assistant_error_phase
then:
- light.turn_on:
brightness: !lambda return min ( max( id(led_ring).current_values.get_brightness() , 0.2f ) + 0.1f , 1.0f );
red: 1
green: 0
blue: 0
id: voice_assistant_leds
effect: "Error"
# Script executed when the voice assistant is not ready
- id: control_leds_voice_assistant_not_ready_phase
then:
- light.turn_on:
brightness: 66%
red: 1
green: 0
blue: 0
id: voice_assistant_leds
effect: "Twinkle"
# Script executed when the timer is ringing, to control the LEDs
# The LED ring blinks.
- id: control_leds_timer_ringing
then:
- light.turn_on:
brightness: !lambda return min ( max( id(led_ring).current_values.get_brightness() , 0.2f ) + 0.1f , 1.0f );
id: voice_assistant_leds
effect: "Timer Ring"
# Script executed when the timer is ticking, to control the LEDs
# The LEDs shows the remaining time as a fraction of the full ring.
- id: control_leds_timer_ticking
then:
- light.turn_on:
brightness: !lambda return max( id(led_ring).current_values.get_brightness() , 0.2f );
id: voice_assistant_leds
effect: "Timer tick"
# Script executed when the timer is ringing, to playback sounds.
- id: ring_timer
then:
- script.execute:
id: timer_tts
- while:
condition:
switch.is_on: timer_ringing
then: # this is required to play the output on a media player
- script.execute:
id: play_sound
priority: true
sound_file: !lambda return id(timer_finished_sound);
- delay: 4s
- homeassistant.service:
service: assist_satellite.announce
data:
entity_id: assist_satellite.assist_media_player_kitchen_assist_satellite
message: !lambda return id(timer_tts_str);
- wait_until:
lambda: |-
return id(nabu_media_player)->state == media_player::MediaPlayerState::MEDIA_PLAYER_STATE_ANNOUNCING;
- wait_until:
not:
lambda: |-
return id(nabu_media_player)->state == media_player::MediaPlayerState::MEDIA_PLAYER_STATE_ANNOUNCING;
# Script executed when we want to play sounds on the device.
- id: play_sound
parameters:
priority: bool
sound_file: "media_player::MediaFile*"
then:
- lambda: |-
if (priority) {
id(nabu_media_player)
->make_call()
.set_command(media_player::MediaPlayerCommand::MEDIA_PLAYER_COMMAND_STOP)
.set_announcement(true)
.perform();
}
if ( (id(nabu_media_player).state != media_player::MediaPlayerState::MEDIA_PLAYER_STATE_ANNOUNCING ) || priority) {
id(nabu_media_player)
->make_call()
.set_announcement(true)
.set_local_media_file(sound_file)
.perform();
}
# Script used to check if a timer is active (Stored in global is_timer_active)
- id: check_if_timers_active
then:
- lambda: |
const auto timers = id(va).get_timers();
bool output = false;
if (timers.size() > 0) {
for (auto &iterable_timer : timers) {
if(iterable_timer.second.is_active) {
output = true;
}
}
}
id(is_timer_active) = output;
# Script used to fetch the first active timer (Stored in global first_active_timer)
- id: fetch_first_active_timer
then:
- lambda: |
const auto timers = id(va).get_timers();
auto output_timer = timers.begin()->second;
for (auto &iterable_timer : timers) {
if (iterable_timer.second.is_active && iterable_timer.second.seconds_left <= output_timer.seconds_left) {
output_timer = iterable_timer.second;
}
}
id(first_active_timer) = output_timer;
# Script used to create TTS string for timer
- id: timer_tts
then:
- script.execute:
id: fetch_first_active_timer
- lambda: |-
std::string finished_timer_name = id(first_active_timer).name;
int total_seconds = id(first_active_timer).total_seconds;
// Variables for hours, minutes, and seconds
int hours = total_seconds / 3600;
int minutes = (total_seconds % 3600) / 60;
int seconds = total_seconds % 60;
std::string finished_timer_duration;
std::string result;
// If more than 1 hour
if (hours > 0) {
finished_timer_duration = std::to_string(hours) + " hour" + (hours > 1 ? "s" : "");
if (minutes > 0) {
finished_timer_duration += " " + std::to_string(minutes) + " minute" + (minutes > 1 ? "s" : "");
}
if (seconds > 0) {
finished_timer_duration += " " + std::to_string(seconds) + " second" + (seconds > 1 ? "s" : "");
}
}
// If less than 1 hour but more than 1 minute
else if (minutes > 0) {
finished_timer_duration = std::to_string(minutes) + " minute" + (minutes > 1 ? "s" : "");
if (seconds > 0) {
finished_timer_duration += " " + std::to_string(seconds) + " second" + (seconds > 1 ? "s" : "");
}
}
// If less than 1 minute
else {
finished_timer_duration = std::to_string(seconds) + " second" + (seconds > 1 ? "s" : "");
}
// Construct the final message
if (finished_timer_name.empty()) {
result = finished_timer_duration + "timer finished";
}
else {
result = finished_timer_name + "timer finished";
}
id(timer_tts_str) = result;
# Script used activate the stop word if the TTS step is long.
# Why is this wrapped on a script?
# Becasue we want to stop the sequence if the TTS step is faster than that.
# This allows us to prevent having the deactivation of the stop word before its own activation.
- id: activate_stop_word_if_tts_step_is_long
then:
- delay: 1s
# Enable stop wake word
- lambda: id(stop).enable();
light:
# Hardware LED ring. Not used because remapping needed
- platform: esp32_rmt_led_strip
id: leds_internal
pin: GPIO9
rmt_channel: 1
num_leds: 19
rgb_order: GRB
chipset: WS2812
default_transition_length: 0ms
power_supply: led_power
# Voice Assistant LED ring. Remapping of the internal LED.
# This light is not exposed. The device controls it
- platform: partition
id: voice_assistant_leds
internal: true
default_transition_length: 0ms
segments:
- id: leds_internal
from: 10
to: 18
- id: leds_internal
from: 0
to: 9
effects:
- addressable_lambda:
name: "Waiting for Command"
update_interval: 100ms
lambda: |-
auto light_color = id(led_ring).current_values;
Color color(light_color.get_red() * 255, light_color.get_green() * 255,
light_color.get_blue() * 255);
for (int i = 0; i < 12; i++) {
if (i == id(global_led_animation_index) % 12) {
it[i] = color;
} else if (i == (id(global_led_animation_index) + 11) % 12) {
it[i] = color * 192;
} else if (i == (id(global_led_animation_index) + 10) % 12) {
it[i] = color * 128;
} else if (i == (id(global_led_animation_index) + 6) % 12) {
it[i] = color;
} else if (i == (id(global_led_animation_index) + 5) % 12) {
it[i] = color * 192;
} else if (i == (id(global_led_animation_index) + 4) % 12) {
it[i] = color * 128;
} else {
it[i] = Color::BLACK;
}
}
id(global_led_animation_index) = (id(global_led_animation_index) + 1) % 12;
- addressable_lambda:
name: "Listening For Command"
update_interval: 50ms
lambda: |-
auto light_color = id(led_ring).current_values;
Color color(light_color.get_red() * 255, light_color.get_green() * 255,
light_color.get_blue() * 255);
for (int i = 0; i < 12; i++) {
if (i == id(global_led_animation_index) % 12) {
it[i] = color;
} else if (i == (id(global_led_animation_index) + 11) % 12) {
it[i] = color * 192;
} else if (i == (id(global_led_animation_index) + 10) % 12) {
it[i] = color * 128;
} else if (i == (id(global_led_animation_index) + 6) % 12) {
it[i] = color;
} else if (i == (id(global_led_animation_index) + 5) % 12) {
it[i] = color * 192;
} else if (i == (id(global_led_animation_index) + 4) % 12) {
it[i] = color * 128;
} else {
it[i] = Color::BLACK;
}
}
id(global_led_animation_index) = (id(global_led_animation_index) + 1) % 12;
- addressable_lambda:
name: "Thinking"
update_interval: 10ms
lambda: |-
static uint8_t brightness_step = 0;
static bool brightness_decreasing = true;
static uint8_t brightness_step_number = 10;
if (initial_run) {
brightness_step = 0;
brightness_decreasing = true;
}
auto light_color = id(led_ring).current_values;
Color color(light_color.get_red() * 255, light_color.get_green() * 255,
light_color.get_blue() * 255);
for (int i = 0; i < 12; i++) {
if (i == id(global_led_animation_index) % 12) {
it[i] = color * uint8_t(255/brightness_step_number*(brightness_step_number-brightness_step));
} else if (i == (id(global_led_animation_index) + 6) % 12) {
it[i] = color * uint8_t(255/brightness_step_number*(brightness_step_number-brightness_step));
} else {
it[i] = Color::BLACK;
}
}
if (brightness_decreasing) {
brightness_step++;
} else {
brightness_step--;
}
if (brightness_step == 0 || brightness_step == brightness_step_number) {
brightness_decreasing = !brightness_decreasing;
}
- addressable_lambda:
name: "Replying"
update_interval: 50ms
lambda: |-
id(global_led_animation_index) = (12 + id(global_led_animation_index) - 1) % 12;
auto light_color = id(led_ring).current_values;
Color color(light_color.get_red() * 255, light_color.get_green() * 255,
light_color.get_blue() * 255);
for (int i = 0; i < 12; i++) {
if (i == (id(global_led_animation_index)) % 12) {
it[i] = color;
} else if (i == ( id(global_led_animation_index) + 1) % 12) {
it[i] = color * 192;
} else if (i == ( id(global_led_animation_index) + 2) % 12) {
it[i] = color * 128;
} else if (i == ( id(global_led_animation_index) + 6) % 12) {
it[i] = color;
} else if (i == ( id(global_led_animation_index) + 7) % 12) {
it[i] = color * 192;
} else if (i == ( id(global_led_animation_index) + 8) % 12) {
it[i] = color * 128;
} else {
it[i] = Color::BLACK;
}
}
- addressable_lambda:
name: "Volume Display"
update_interval: 50ms
lambda: |-
auto light_color = id(led_ring).current_values;
Color color(light_color.get_red() * 255, light_color.get_green() * 255,
light_color.get_blue() * 255);
Color silenced_color(255, 0, 0);
auto volume_ratio = 12.0f * id(nabu_media_player).volume;
for (int i = 0; i < 12; i++) {
if (i <= volume_ratio) {
it[(6+i)%12] = color * min( 255.0f * (volume_ratio - i) , 255.0f ) ;
} else {
it[(6+i)%12] = Color::BLACK;
}
}
if (id(nabu_media_player).volume == 0.0f) {
it[6] = silenced_color;
}
- addressable_twinkle:
name: "Twinkle"
twinkle_probability: 50%
- addressable_lambda:
name: "Error"
update_interval: 10ms
lambda: |-
static uint8_t brightness_step = 0;
static bool brightness_decreasing = true;
static uint8_t brightness_step_number = 10;
if (initial_run) {
brightness_step = 0;
brightness_decreasing = true;
}
Color error_color(255, 0, 0);
for (int i = 0; i < 12; i++) {
it[i] = error_color * uint8_t(255/brightness_step_number*(brightness_step_number-brightness_step));
}
if (brightness_decreasing) {
brightness_step++;
} else {
brightness_step--;
}
if (brightness_step == 0 || brightness_step == brightness_step_number) {
brightness_decreasing = !brightness_decreasing;
}
- addressable_lambda:
name: "Timer Ring"
update_interval: 10ms
lambda: |-
static uint8_t brightness_step = 0;
static bool brightness_decreasing = true;
static uint8_t brightness_step_number = 10;
if (initial_run) {
brightness_step = 0;
brightness_decreasing = true;
}
auto light_color = id(led_ring).current_values;
Color color(light_color.get_red() * 255, light_color.get_green() * 255,
light_color.get_blue() * 255);
Color muted_color(255, 0, 0);
for (int i = 0; i < 12; i++) {
it[i] = color * uint8_t(255/brightness_step_number*(brightness_step_number-brightness_step));
}
if (brightness_decreasing) {
brightness_step++;
} else {
brightness_step--;
}
if (brightness_step == 0 || brightness_step == brightness_step_number) {
brightness_decreasing = !brightness_decreasing;
}
- addressable_lambda:
name: "Timer Tick"
update_interval: 100ms
lambda: |-
auto light_color = id(led_ring).current_values;
Color color(light_color.get_red() * 255, light_color.get_green() * 255,
light_color.get_blue() * 255);
Color muted_color(255, 0, 0);
auto timer_ratio = 12.0f * id(first_active_timer).seconds_left / max(id(first_active_timer).total_seconds , static_cast<uint32_t>(1));
uint8_t last_led_on = static_cast<uint8_t>(ceil(timer_ratio)) - 1;
for (int i = 0; i < 12; i++) {
float brightness_dip = ( i == id(global_led_animation_index) % 12 && i != last_led_on ) ? 0.9f : 1.0f ;
if (i <= timer_ratio) {
it[i] = color * min(255.0f * brightness_dip * (timer_ratio - i) , 255.0f * brightness_dip) ;
} else {
it[i] = Color::BLACK;
}
}
id(global_led_animation_index) = (12 + id(global_led_animation_index) - 1) % 12;
- addressable_rainbow:
name: "Rainbow"
width: 12
- addressable_lambda:
name: "Tick"
update_interval: 333ms
lambda: |-
static uint8_t index = 0;
Color color(255, 0, 0);
if (initial_run) {
index = 0;
}
for (int i = 0; i < 12; i++) {
if (i <= index ) {
it[i] = Color::BLACK;
} else {
it[i] = color;
}
}
index = (index + 1) % 12;
# User facing LED ring. Remapping of the internal LEDs.
# Exposed to be used by the user.
- platform: partition
id: led_ring
name: LED Ring
entity_category: config
icon: "mdi:circle-outline"
default_transition_length: 0ms
restore_mode: RESTORE_DEFAULT_OFF
initial_state:
color_mode: rgb
brightness: 66%
red: 9.4%
green: 73.3%
blue: 94.9%
segments:
- id: leds_internal
from: 10
to: 18
- id: leds_internal
from: 0
to: 9
power_supply:
- id: led_power
pin: GPIO45
@Wetzel402 @indevor you may use non-forked version since you have separate I2S buses for input and output. Just set 16000 sample rate for microphone.
My fork is resampling from 48kHz to 16kHz, because on Respeaker Lite board single I2S bus used for both speaker and microphone, and voice assistant requires 48kHz for speaker, and 16kHz for mic.