Skip to content

Instantly share code, notes, and snippets.

@EverythingSmartHome
Last active January 21, 2025 10:09
Show Gist options
  • Save EverythingSmartHome/055fbdde31a607ef9d695d5cac780e94 to your computer and use it in GitHub Desktop.
Save EverythingSmartHome/055fbdde31a607ef9d695d5cac780e94 to your computer and use it in GitHub Desktop.
ESP32 & ESPHome Voice Assistant
esphome:
name: esp32-mic-speaker
friendly_name: esp32-mic-speaker
on_boot:
- priority: -100
then:
- wait_until: api.connected
- delay: 1s
- if:
condition:
switch.is_on: use_wake_word
then:
- voice_assistant.start_continuous:
esp32:
board: esp32dev
framework:
type: esp-idf
version: recommended
# Enable logging
logger:
# Enable Home Assistant API
api:
ota:
wifi:
ssid: !secret wifi_ssid
password: !secret wifi_password
# Enable fallback hotspot (captive portal) in case wifi connection fails
ap:
ssid: "Esp32-Mic-Speaker"
password: "9vYvAFzzPjuc"
i2s_audio:
i2s_lrclk_pin: GPIO27
i2s_bclk_pin: GPIO26
microphone:
- platform: i2s_audio
id: mic
adc_type: external
i2s_din_pin: GPIO13
pdm: false
speaker:
- platform: i2s_audio
id: big_speaker
dac_type: external
i2s_dout_pin: GPIO25
mode: mono
voice_assistant:
microphone: mic
use_wake_word: false
noise_suppression_level: 2
auto_gain: 31dBFS
volume_multiplier: 2.0
speaker: big_speaker
id: assist
switch:
- platform: template
name: Use wake word
id: use_wake_word
optimistic: true
restore_mode: RESTORE_DEFAULT_ON
entity_category: config
on_turn_on:
- lambda: id(assist).set_use_wake_word(true);
- if:
condition:
not:
- voice_assistant.is_running
then:
- voice_assistant.start_continuous
on_turn_off:
- voice_assistant.stop
- lambda: id(assist).set_use_wake_word(false);
@Djelle
Copy link

Djelle commented May 16, 2024

I am not sure if the pin-out for ESP32 DevKit is the same as for ur board. But I think they are. U should not use GPIO pin 6, 7 and 8.
https://randomnerdtutorials.com/esp32-pinout-reference-gpios/

@strusic
Copy link

strusic commented May 16, 2024

This config is for xiao esp32-s3

@imonlinux
Copy link

imonlinux commented May 16, 2024

Hey strusic,
I was having some of those same issues with the microphone until I enabled micro_wake_word. Below is my current testing yaml for a XIAO ESP32-S3 using duplex mode on a shared i2s_audio channel in order to cut down on the number of pins needed. This is working very well especially in conjunction with Extended OpenAI Conversation. I have ChatGPT 4.o configured to "embody" HAL 9000 and it has been very entertaining. Printing a HAL 9000 prop replica to hold the assistant hardware.

edit: removed OTA password....
edit2: fixed persistent "connecting" led effect after boot
edit3: fixed INMP441 L/R Channel error (thanks again indevor)

substitutions:
  device_name: "test-media-assistant-v2"
  friendly_name: "Test Media Assistant V2"
  device_description: "XIAO ESP32 S3"
  esp_board: "esp32-s3-devkitc-1"
  framework_type: "esp-idf"
  din: "GPIO1"   # MAX98357A - DIN
  lrclk: "GPIO7" # MAX98357A - LRCLK / INMP441 - WS
  bclk: "GPIO8"  # MAX98357A - BCLK / INMP441 - SCK
  sd: "GPIO2"    # INMP441 - SD
  l_r: "right"   # INMP441 - L/R (3.3v = right / GND = left)
  di: "GPIO9"   # WS2812 - DI
  api_key: "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
  # Phases of the Voice Assistant
  # IDLE: The voice assistant is ready to be triggered by a wake-word
  voice_assist_idle_phase_id: '1'
  # LISTENING: The voice assistant is ready to listen to a voice command (after being triggered by the wake word)
  voice_assist_listening_phase_id: '2'
  # THINKING: The voice assistant is currently processing the command
  voice_assist_thinking_phase_id: '3'
  # REPLYING: The voice assistant is replying to the command
  voice_assist_replying_phase_id: '4'
  # NOT_READY: The voice assistant is not ready
  voice_assist_not_ready_phase_id: '10'
  # ERROR: The voice assistant encountered an error
  voice_assist_error_phase_id: '11'
  # MUTED: The voice assistant is muted and will not reply to a wake-word
  voice_assist_muted_phase_id: '12'

external_components:
  - source:
      type: git
      url: https://github.com/gnumpi/esphome_audio
      ref: main
      #type: local
      #path: /Users/siekmann/Privat/Projects/espHome/esphome_audio/esphome/components
    components: [ adf_pipeline, i2s_audio ]

esphome:
  name: ${device_name}
  comment: ${device_description}
  friendly_name: ${friendly_name}
  min_version: 2024.2.0
  platformio_options:
    build_flags: -DBOARD_HAS_PSRAM
    board_build.flash_mode: dio
    board_upload.maximum_size: 16777216
  on_boot:
    priority: 600
    then:
      # Run the script to refresh the LED status
      # If after 30 seconds, the device is still initializing (It did not yet connect to Home Assistant), turn off the init_in_progress variable and run the script to refresh the LED status
      - delay: 30s
      - if:
          condition:
            lambda: return id(init_in_progress);
          then:
            - lambda: id(init_in_progress) = false;
                            
esp32:
  board: ${esp_board}
  variant: ESP32S3
  flash_size: 16MB
  framework:
    type: ${framework_type}
    version: recommended
    sdkconfig_options:
      # need to set a s3 compatible board for the adf-sdk to compile
      # board specific code is not used though
      CONFIG_ESP32_S3_BOX_BOARD: "y"
      CONFIG_ESP32_WIFI_STATIC_RX_BUFFER_NUM: "16"
      CONFIG_ESP32_WIFI_DYNAMIC_RX_BUFFER_NUM: "512"
      CONFIG_TCPIP_RECVMBOX_SIZE: "512"

      CONFIG_TCP_SND_BUF_DEFAULT: "65535"
      CONFIG_TCP_WND_DEFAULT: "512000"
      CONFIG_TCP_RECVMBOX_SIZE: "512"

logger:

globals:
  # Global initialisation variable. Initialized to true and set to false once everything is connected. Only used to have a smooth "plugging" experience
  - id: init_in_progress
    type: bool
    restore_value: no
    initial_value: 'true'
  # Global variable tracking the phase of the voice assistant (defined above). Initialized to not_ready
  - id: voice_assistant_phase
    type: int
    restore_value: no
    initial_value: ${voice_assist_not_ready_phase_id}

psram:
  mode: octal
  speed: 80MHz

wifi:
  enable_rrm: true
  ssid: !secret wifi_ssid
  password: !secret wifi_password
  fast_connect: true

ota:
  password: !secret ota_password

api:
  encryption:
    key: ${api_key}

i2s_audio:
  - id: i2s_dplx
    i2s_lrclk_pin: ${lrclk}
    i2s_bclk_pin: ${bclk}
    access_mode: duplex

adf_pipeline:
  - platform: i2s_audio
    type: audio_out
    id: adf_i2s_out
    i2s_audio_id: i2s_dplx
    i2s_dout_pin: ${din}
    sample_rate: 16000
    bits_per_sample: 32bit
    fixed_settings: true

  - platform: i2s_audio
    type: audio_in
    id: adf_i2s_in
    i2s_audio_id: i2s_dplx
    i2s_din_pin: ${sd}
    pdm: false
    channel: ${l_r}
    sample_rate: 16000
    bits_per_sample: 32bit
    fixed_settings: true

microphone:
  - platform: adf_pipeline
    id: adf_microphone
    keep_pipeline_alive: true
    pipeline:
      - adf_i2s_in
      - self
      
media_player:
  - platform: adf_pipeline
    id: adf_media_player
    name: media_player
    keep_pipeline_alive: true
    internal: false
    pipeline:
      - self
      - resampler
      - adf_i2s_out

micro_wake_word:
  model: okay_nabu
  on_wake_word_detected:
      - media_player.stop:
      - light.turn_on:
          id: led_ring
          blue: 0%
          red: 0%
          green: 100%
          brightness: 75%
          effect: pulse
      - voice_assistant.start:

voice_assistant:
  microphone: adf_microphone
  media_player: adf_media_player

  use_wake_word: false
  #vad_threshold: 3

  noise_suppression_level: 1
  auto_gain: 31dBFS
  volume_multiplier: 15.0

  on_client_connected:
    - lambda: id(init_in_progress) = false;
    - if:
        condition:
          switch.is_on: use_wake_word
        then:
          - micro_wake_word.start:
          - lambda: id(voice_assistant_phase) = ${voice_assist_idle_phase_id};
          - script.execute: reset_led
        else:
          - lambda: id(voice_assistant_phase) = ${voice_assist_muted_phase_id};

  on_client_disconnected:
    - lambda: id(voice_assistant_phase) = ${voice_assist_not_ready_phase_id};
    - voice_assistant.stop
    - micro_wake_word.stop
    - light.turn_on:
          id: led_ring
          blue: 0%
          red: 100%
          green: 100%
          brightness: 50%
          effect: connecting

  on_listening:
    - light.turn_on:
        id: led_ring
        blue: 100%
        red: 0%
        green: 0%
        brightness: 25%
        effect: wakeword
        
  on_tts_start:
    - light.turn_on:
        id: led_ring
        blue: 0%
        red: 0%
        green: 100%
        brightness: 75%
        effect: pulse
  
  on_end:
      then:
        - light.turn_off:
            id: led_ring
        - voice_assistant.stop
        - wait_until:
            not:
              media_player.is_playing:
        - script.execute: reset_led
        - if:
            condition:
              switch.is_on: use_wake_word
            then:
              - micro_wake_word.start:
  on_error:
    - light.turn_on:
        id: led_ring
        blue: 0%
        red: 100%
        green: 0%
        brightness: 100%
        effect: none
    - delay: 1s
    - script.execute: reset_led
    - script.wait: reset_led
    - lambda: |-
        if (code == "wake-provider-missing" || code == "wake-engine-missing") {
          id(use_wake_word).turn_off();
        }
    - if:
        condition:
          switch.is_on: use_wake_word
        then:
          - micro_wake_word.start:
          - script.execute: reset_led
              
script:
  - id: reset_led
    then:
      - if:
          condition:
            switch.is_on: use_wake_word
          then:
            - light.turn_on:
                id: led_ring
                blue: 100%
                red: 0%
                green: 0%
                brightness: 25%
                effect: none
          else:
            - light.turn_off: led_ring
 
button:
  - platform: restart
    id: restart_btn
    name: "${friendly_name} REBOOT"
            
switch:
  - platform: template
    name: Enable Voice Assistant
    id: use_wake_word
    optimistic: true
    restore_mode: RESTORE_DEFAULT_ON
    icon: mdi:assistant
    # When the switch is turned on (on Home Assistant):
    # Start the voice assistant component
    # Set the correct phase and run the script to refresh the LED status
    on_turn_on:
      - logger.log: "switch on"
      - if:
          condition:
            lambda: return !id(init_in_progress);
          then:
            - logger.log: "condition 1"
            - lambda: id(voice_assistant_phase) = ${voice_assist_idle_phase_id};
            - voice_assistant.stop
            - delay: 1s
            - if:
                condition:
                  not:
                    - voice_assistant.is_running
                then:
                  - logger.log: "Starting MWW"
                  #- voice_assistant.start_continuous
                  - micro_wake_word.start:
      - script.execute: reset_led
    on_turn_off:
      - if:
          condition:
            lambda: return !id(init_in_progress);
          then:
            - voice_assistant.stop
            - micro_wake_word.stop
            - lambda: id(voice_assistant_phase) = ${voice_assist_muted_phase_id};
      - script.execute: reset_led

  - platform: template
    name: Pipeline
    id: pipeline_switch
    optimistic: true
    restore_mode: RESTORE_DEFAULT_OFF

    on_turn_off:
      - media_player.stop

    on_turn_on:
      - media_player.play_media: "https://dl.espressif.com/dl/audio/ff-16b-2c-44100hz.mp3"

light:
  - platform: esp32_rmt_led_strip
    id: led_ring
    name: "${friendly_name} Light"
    pin: ${di}
    num_leds: 16
    rmt_channel: 0
    rgb_order: GRB
    chipset: ws2812
    default_transition_length: 0s
    effects:
      - pulse:
          name: "Pulse"
          transition_length: 0.5s
          update_interval: 0.5s
      - addressable_twinkle:
          name: "Working"
          twinkle_probability: 5%
          progress_interval: 4ms
      - addressable_color_wipe:
          name: "Wakeword"
          colors:
            - red: 0%
              green: 50%
              blue: 0%
              num_leds: 12
          add_led_interval: 20ms
          reverse: false
      - addressable_color_wipe:
          name: "Connecting"
          colors:
            - red: 60%
              green: 60%
              blue: 60%
              num_leds: 12
            - red: 60%
              green: 60%
              blue: 0%
              num_leds: 12
          add_led_interval: 100ms
          reverse: true

@strusic
Copy link

strusic commented May 17, 2024

I was trying to compile this code, but my home assistant is crashing when I get this line:

Compiling .pioenvs/esp32-s3-voice-assistant/components/esp-tflite-micro/tensorflow/lite/micro/micro_allocation_info.o

I also tried to compile it on windows, but after flashing device is not connecting to HA

@imonlinux
Copy link

Here are a few things that I do when I'm having issues flashing one of these:

  • In ESPHome dashboard, click Clean Build Files and then recompile
  • Make sure that your USB has enough power for the device (My laptop USB port is not sufficient and will result in a corrupt flash. I purchased a powered USB hub.)
  • Use esptool to erase the ESP32 (esptool --chip esp32 erase_flash)

I compiled that exact config today using the following:
ESPHome = 2024.5.0
HA Core = 2024.5.3
Supervisor = 2024.05.1
HASSOS = 12.3

@strusic
Copy link

strusic commented May 17, 2024

My ESP is well powered, issue I got is compiling this yaml is crashing whole HA OS. I am running RPI4B 4GB with HAOS on SSD. I've added 2GB of swap right now and additional fan. Maybe it helps. Compiling this yaml with micro_wake_word is crazy. I've never run into something like that.

Also there is a lot of warnings while compiling, like:

Compiling .pioenvs/esp32-s3-voice-assistant/components/esp-tflite-micro/tensorflow/lite/micro/kernels/mirror_pad.o
In file included from components/esp-tflite-micro/tensorflow/lite/micro/kernels/lstm_eval.cc:25:
components/esp-tflite-micro/tensorflow/lite/kernels/internal/reference/mul.h: In lambda function:
components/esp-tflite-micro/tensorflow/lite/kernels/internal/reference/mul.h:151:34: warning: declaration of 'const tflite::ArithmeticParams& params' shadows a parameter [-Wshadow]
          const uint8_t input2_val) {
                                  ^
components/esp-tflite-micro/tensorflow/lite/kernels/internal/reference/mul.h:126:56: note: shadowed declaration is here
 inline void BroadcastMul6DSlow(const ArithmeticParams& params,
                                ~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~
components/esp-tflite-micro/tensorflow/lite/kernels/internal/reference/mul.h: In lambda function:
components/esp-tflite-micro/tensorflow/lite/kernels/internal/reference/mul.h:210:28: warning: declaration of 'const tflite::ArithmeticParams& params' shadows a parameter [-Wshadow]
          const T input2_val) {
                            ^
components/esp-tflite-micro/tensorflow/lite/kernels/internal/reference/mul.h:171:44: note: shadowed declaration is here
 BroadcastMul6DSlow(const ArithmeticParams& params,
                    ~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~
components/esp-tflite-micro/tensorflow/lite/kernels/internal/reference/mul.h: In lambda function:
components/esp-tflite-micro/tensorflow/lite/kernels/internal/reference/mul.h:250:46: warning: declaration of 'const tflite::ArithmeticParams& params' shadows a parameter [-Wshadow]
          const std::complex<float> input2_val) {
                                              ^
components/esp-tflite-micro/tensorflow/lite/kernels/internal/reference/mul.h:221:56: note: shadowed declaration is here
 inline void BroadcastMul6DSlow(const ArithmeticParams& params,

Is this normal?

@imonlinux
Copy link

100%

You will see all kinds of warnings while it is compiling. Good luck.

@strusic
Copy link

strusic commented May 17, 2024

I successfully flashed this yaml, but there are a lot of cracklings and distortion while playing anything on speaker. I am using max98357a as amp and MEMS I2S - DFRobot SEN0526

@indevor
Copy link

indevor commented May 17, 2024

l_r: "right" # INMP441 - L/R (GND = right / 3.3v = left)
di: "GPIO9" # WS2812 - DI

Funny, you're wrong again. Maybe you're copying from the wrong source.
According to the document for this microphone:

Left/Right Channel Select. When set low, the microphone outputs its signal in the left channel
of the I²S frame. When set high, the microphone outputs its signal in the right channel.
GND = LEFT

@indevor
Copy link

indevor commented May 17, 2024

Hey strusic, I was having some of those same issues with the microphone until I enabled micro_wake_word. Below is my current testing yaml for a XIAO ESP32-S3 using duplex mode on a shared i2s_audio channel in order to cut down on the number of pins needed. This is working very well especially in conjunction with Extended OpenAI Conversation. I have ChatGPT 4.o configured to "embody" HAL 9000 and it has been very entertaining. Printing a HAL 9000 prop replica to hold the assistant hardware.

edit: removed OTA password.... edit2: fixed persistent "connecting" led effect after boot

substitutions:
  device_name: "test-media-assistant-v2"
  friendly_name: "Test Media Assistant V2"
  device_description: "XIAO ESP32 S3"
  esp_board: "esp32-s3-devkitc-1"
  framework_type: "esp-idf"
  din: "GPIO1"   # MAX98357A - DIN
  lrclk: "GPIO7" # MAX98357A - LRCLK / INMP441 - WS
  bclk: "GPIO8"  # MAX98357A - BCLK / INMP441 - SCK
  sd: "GPIO2"    # INMP441 - SD
  l_r: "right"   # INMP441 - L/R (GND = right / 3.3v = left)
  di: "GPIO9"   # WS2812 - DI
  api_key: "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
  # Phases of the Voice Assistant
  # IDLE: The voice assistant is ready to be triggered by a wake-word
  voice_assist_idle_phase_id: '1'
  # LISTENING: The voice assistant is ready to listen to a voice command (after being triggered by the wake word)
  voice_assist_listening_phase_id: '2'
  # THINKING: The voice assistant is currently processing the command
  voice_assist_thinking_phase_id: '3'
  # REPLYING: The voice assistant is replying to the command
  voice_assist_replying_phase_id: '4'
  # NOT_READY: The voice assistant is not ready
  voice_assist_not_ready_phase_id: '10'
  # ERROR: The voice assistant encountered an error
  voice_assist_error_phase_id: '11'
  # MUTED: The voice assistant is muted and will not reply to a wake-word
  voice_assist_muted_phase_id: '12'

external_components:
  - source:
      type: git
      url: https://github.com/gnumpi/esphome_audio
      ref: main
      #type: local
      #path: /Users/siekmann/Privat/Projects/espHome/esphome_audio/esphome/components
    components: [ adf_pipeline, i2s_audio ]

esphome:
  name: ${device_name}
  comment: ${device_description}
  friendly_name: ${friendly_name}
  min_version: 2024.2.0
  platformio_options:
    build_flags: -DBOARD_HAS_PSRAM
    board_build.flash_mode: dio
    board_upload.maximum_size: 16777216
  on_boot:
    priority: 600
    then:
      # Run the script to refresh the LED status
      # If after 30 seconds, the device is still initializing (It did not yet connect to Home Assistant), turn off the init_in_progress variable and run the script to refresh the LED status
      - delay: 30s
      - if:
          condition:
            lambda: return id(init_in_progress);
          then:
            - lambda: id(init_in_progress) = false;
                            
esp32:
  board: ${esp_board}
  variant: ESP32S3
  flash_size: 16MB
  framework:
    type: ${framework_type}
    version: recommended
    sdkconfig_options:
      # need to set a s3 compatible board for the adf-sdk to compile
      # board specific code is not used though
      CONFIG_ESP32_S3_BOX_BOARD: "y"
      CONFIG_ESP32_WIFI_STATIC_RX_BUFFER_NUM: "16"
      CONFIG_ESP32_WIFI_DYNAMIC_RX_BUFFER_NUM: "512"
      CONFIG_TCPIP_RECVMBOX_SIZE: "512"

      CONFIG_TCP_SND_BUF_DEFAULT: "65535"
      CONFIG_TCP_WND_DEFAULT: "512000"
      CONFIG_TCP_RECVMBOX_SIZE: "512"

logger:

globals:
  # Global initialisation variable. Initialized to true and set to false once everything is connected. Only used to have a smooth "plugging" experience
  - id: init_in_progress
    type: bool
    restore_value: no
    initial_value: 'true'
  # Global variable tracking the phase of the voice assistant (defined above). Initialized to not_ready
  - id: voice_assistant_phase
    type: int
    restore_value: no
    initial_value: ${voice_assist_not_ready_phase_id}

psram:
  mode: octal
  speed: 80MHz

wifi:
  enable_rrm: true
  ssid: !secret wifi_ssid
  password: !secret wifi_password
  fast_connect: true

ota:
  password: !secret ota_password

api:
  encryption:
    key: ${api_key}

i2s_audio:
  - id: i2s_dplx
    i2s_lrclk_pin: ${lrclk}
    i2s_bclk_pin: ${bclk}
    access_mode: duplex

adf_pipeline:
  - platform: i2s_audio
    type: audio_out
    id: adf_i2s_out
    i2s_audio_id: i2s_dplx
    i2s_dout_pin: ${din}
    sample_rate: 16000
    bits_per_sample: 32bit
    fixed_settings: true

  - platform: i2s_audio
    type: audio_in
    id: adf_i2s_in
    i2s_audio_id: i2s_dplx
    i2s_din_pin: ${sd}
    pdm: false
    channel: ${l_r}
    sample_rate: 16000
    bits_per_sample: 32bit
    fixed_settings: true

microphone:
  - platform: adf_pipeline
    id: adf_microphone
    keep_pipeline_alive: true
    pipeline:
      - adf_i2s_in
      - self
      
media_player:
  - platform: adf_pipeline
    id: adf_media_player
    name: media_player
    keep_pipeline_alive: true
    internal: false
    pipeline:
      - self
      - resampler
      - adf_i2s_out

micro_wake_word:
  model: okay_nabu
  on_wake_word_detected:
      - media_player.stop:
      - light.turn_on:
          id: led_ring
          blue: 0%
          red: 0%
          green: 100%
          brightness: 75%
          effect: pulse
      - voice_assistant.start:

voice_assistant:
  microphone: adf_microphone
  media_player: adf_media_player

  use_wake_word: false
  #vad_threshold: 3

  noise_suppression_level: 1
  auto_gain: 31dBFS
  volume_multiplier: 15.0

  on_client_connected:
    - lambda: id(init_in_progress) = false;
    - if:
        condition:
          switch.is_on: use_wake_word
        then:
          - micro_wake_word.start:
          - lambda: id(voice_assistant_phase) = ${voice_assist_idle_phase_id};
          - script.execute: reset_led
        else:
          - lambda: id(voice_assistant_phase) = ${voice_assist_muted_phase_id};

  on_client_disconnected:
    - lambda: id(voice_assistant_phase) = ${voice_assist_not_ready_phase_id};
    - voice_assistant.stop
    - micro_wake_word.stop
    - light.turn_on:
          id: led_ring
          blue: 0%
          red: 100%
          green: 100%
          brightness: 50%
          effect: connecting

  on_listening:
    - light.turn_on:
        id: led_ring
        blue: 100%
        red: 0%
        green: 0%
        brightness: 25%
        effect: wakeword
        
  on_tts_start:
    - light.turn_on:
        id: led_ring
        blue: 0%
        red: 0%
        green: 100%
        brightness: 75%
        effect: pulse
  
  on_end:
      then:
        - light.turn_off:
            id: led_ring
        - voice_assistant.stop
        - wait_until:
            not:
              media_player.is_playing:
        - script.execute: reset_led
        - if:
            condition:
              switch.is_on: use_wake_word
            then:
              - micro_wake_word.start:
  on_error:
    - light.turn_on:
        id: led_ring
        blue: 0%
        red: 100%
        green: 0%
        brightness: 100%
        effect: none
    - delay: 1s
    - script.execute: reset_led
    - script.wait: reset_led
    - lambda: |-
        if (code == "wake-provider-missing" || code == "wake-engine-missing") {
          id(use_wake_word).turn_off();
        }
    - if:
        condition:
          switch.is_on: use_wake_word
        then:
          - micro_wake_word.start:
          - script.execute: reset_led
              
script:
  - id: reset_led
    then:
      - if:
          condition:
            switch.is_on: use_wake_word
          then:
            - light.turn_on:
                id: led_ring
                blue: 100%
                red: 0%
                green: 0%
                brightness: 25%
                effect: none
          else:
            - light.turn_off: led_ring
 
button:
  - platform: restart
    id: restart_btn
    name: "${friendly_name} REBOOT"
            
switch:
  - platform: template
    name: Enable Voice Assistant
    id: use_wake_word
    optimistic: true
    restore_mode: RESTORE_DEFAULT_ON
    icon: mdi:assistant
    # When the switch is turned on (on Home Assistant):
    # Start the voice assistant component
    # Set the correct phase and run the script to refresh the LED status
    on_turn_on:
      - logger.log: "switch on"
      - if:
          condition:
            lambda: return !id(init_in_progress);
          then:
            - logger.log: "condition 1"
            - lambda: id(voice_assistant_phase) = ${voice_assist_idle_phase_id};
            - voice_assistant.stop
            - delay: 1s
            - if:
                condition:
                  not:
                    - voice_assistant.is_running
                then:
                  - logger.log: "Starting MWW"
                  #- voice_assistant.start_continuous
                  - micro_wake_word.start:
      - script.execute: reset_led
    on_turn_off:
      - if:
          condition:
            lambda: return !id(init_in_progress);
          then:
            - voice_assistant.stop
            - micro_wake_word.stop
            - lambda: id(voice_assistant_phase) = ${voice_assist_muted_phase_id};
      - script.execute: reset_led

  - platform: template
    name: Pipeline
    id: pipeline_switch
    optimistic: true
    restore_mode: RESTORE_DEFAULT_OFF

    on_turn_off:
      - media_player.stop

    on_turn_on:
      - media_player.play_media: "https://dl.espressif.com/dl/audio/ff-16b-2c-44100hz.mp3"

light:
  - platform: esp32_rmt_led_strip
    id: led_ring
    name: "${friendly_name} Light"
    pin: ${di}
    num_leds: 16
    rmt_channel: 0
    rgb_order: GRB
    chipset: ws2812
    default_transition_length: 0s
    effects:
      - pulse:
          name: "Pulse"
          transition_length: 0.5s
          update_interval: 0.5s
      - addressable_twinkle:
          name: "Working"
          twinkle_probability: 5%
          progress_interval: 4ms
      - addressable_color_wipe:
          name: "Wakeword"
          colors:
            - red: 0%
              green: 50%
              blue: 0%
              num_leds: 12
          add_led_interval: 20ms
          reverse: false
      - addressable_color_wipe:
          name: "Connecting"
          colors:
            - red: 60%
              green: 60%
              blue: 60%
              num_leds: 12
            - red: 60%
              green: 60%
              blue: 0%
              num_leds: 12
          add_led_interval: 100ms
          reverse: true
INFO ESPHome 2024.5.0
INFO Reading configuration /config/esphome/test.yaml...
WARNING GPIO3 is a strapping PIN and should only be used for I/O with care.
Attaching external pullup/down resistors to strapping pins can cause unexpected failures.
See https://esphome.io/guides/faq.html#why-am-i-getting-a-warning-about-strapping-pins
WARNING GPIO45 is a strapping PIN and should only be used for I/O with care.
Attaching external pullup/down resistors to strapping pins can cause unexpected failures.
See https://esphome.io/guides/faq.html#why-am-i-getting-a-warning-about-strapping-pins
INFO Generating C++ source...
INFO Updating https://github.com/espressif/esp-adf.git@v2.5
INFO Updating submodules (components/esp-adf-libs, components/esp-sr) for https://github.com/espressif/esp-adf.git@v2.5
Traceback (most recent call last):
  File "/usr/local/bin/esphome", line 33, in <module>
    sys.exit(load_entry_point('esphome', 'console_scripts', 'esphome')())
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/esphome/esphome/__main__.py", line 1065, in main
    return run_esphome(sys.argv)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/esphome/esphome/__main__.py", line 1052, in run_esphome
    rc = POST_CONFIG_ACTIONS[args.command](args, config)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/esphome/esphome/__main__.py", line 479, in command_run
    exit_code = write_cpp(config)
                ^^^^^^^^^^^^^^^^^
  File "/esphome/esphome/__main__.py", line 193, in write_cpp
    return write_cpp_file()
           ^^^^^^^^^^^^^^^^
  File "/esphome/esphome/__main__.py", line 211, in write_cpp_file
    writer.write_cpp(code_s)
  File "/esphome/esphome/writer.py", line 344, in write_cpp
    copy_src_tree()
  File "/esphome/esphome/writer.py", line 297, in copy_src_tree
    copy_files()
  File "/esphome/esphome/components/esp32/__init__.py", line 684, in copy_files
    repo_dir, _ = git.clone_or_update(
                  ^^^^^^^^^^^^^^^^^^^^
  File "/esphome/esphome/git.py", line 111, in clone_or_update
    run_git_command(
  File "/esphome/esphome/git.py", line 31, in run_git_command
    raise cv.Invalid(lines[-1][len("fatal: ") :])
voluptuous.error.Invalid: Unable to find current revision in submodule path 'components/esp-adf-libs'

@imonlinux
Copy link

imonlinux commented May 17, 2024

Hey indevor,

I have so many versions of this config running on different esp32 boards now I must have pulled that old config back in at some point. The odd thing is that this specific config running on the XIAO ESP32-S3 with a INMP441 with grounded L/R pin set to Right channel is working perfectly. When I'm back home I will change it back to Left and see if it continues to work. I was 100% wrong. I have the INMP441 L/R channel set high. That is why the selection of right channel is working but my note in substitutions is backwards again. Thanks again indevor!

Not sure what your last post was indicating, I have moved away from using GPIO3 and GPIO45 due to these warnings and am now using the single i2s_audio channel in duplex mode. Not sure where that voluptuous.error is coming from on your run. I just reran this config after doing a Clean Build Files and it compiles as expected.

edit: typo
edit2: added completion of build
edit3: confirmed that INMP441 L/R channel is set high and not low so right channel is working and fixed in config above

INFO ESPHome 2024.5.0
INFO Reading configuration /config/esphome/test-media-assistant-v2.yaml...
INFO Generating C++ source...
INFO Updating https://github.com/espressif/esp-adf.git@v2.5
INFO Updating submodules (components/esp-adf-libs, components/esp-sr) for https://github.com/espressif/esp-adf.git@v2.5
INFO Updating https://github.com/espressif/esp-tflite-micro@None
INFO Compiling app...
Processing test-media-assistant-v2 (board: esp32-s3-devkitc-1; framework: espidf; platform: platformio/espressif32@5.4.0)
--------------------------------------------------------------------------------
Library Manager: Installing esphome/noise-c @ 0.1.4
INFO Installing esphome/noise-c @ 0.1.4
Unpacking  [####################################]  100%
Library Manager: noise-c@0.1.4 has been installed!
INFO noise-c@0.1.4 has been installed!
Library Manager: Resolving dependencies...
INFO Resolving dependencies...
Library Manager: Installing esphome/libsodium @ 1.10018.1
INFO Installing esphome/libsodium @ 1.10018.1
Unpacking  [####################################]  100%
Library Manager: libsodium@1.10018.1 has been installed!
INFO libsodium@1.10018.1 has been installed!
HARDWARE: ESP32S3 240MHz, 320KB RAM, 16MB Flash
 - framework-espidf @ 3.40407.0 (4.4.7) 
 - tool-cmake @ 3.16.4 
 - tool-ninja @ 1.7.1 
 - toolchain-esp32ulp @ 2.35.0-20220830 
 - toolchain-riscv32-esp @ 8.4.0+2021r2-patch5 
 - toolchain-xtensa-esp32s3 @ 8.4.0+2021r2-patch5
Reading CMake configuration...
Generating assembly for certificate bundle...
Dependency Graph
|-- noise-c @ 0.1.4
Generating assembly for .pioenvs/test-media-assistant-v2/duer_profile.S
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/adf_pipeline/adf_audio_element.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/adf_pipeline/adf_audio_process.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/adf_pipeline/adf_audio_sinks.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/adf_pipeline/adf_audio_sources.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/adf_pipeline/adf_pipeline.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/adf_pipeline/adf_pipeline_controller.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/adf_pipeline/media_player/adf_media_player.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/adf_pipeline/microphone/esp_adf_microphone.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/api/api_connection.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/api/api_frame_helper.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/api/api_pb2.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/api/api_pb2_service.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/api/api_server.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/api/list_entities.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/api/proto.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/api/subscribe_state.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/api/user_services.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/button/button.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/esp32/core.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/esp32/gpio.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/esp32/preferences.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/esp32_rmt_led_strip/led_strip.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/i2s_audio/adf_pipeline/adf_i2s_in.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/i2s_audio/adf_pipeline/adf_i2s_out.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/i2s_audio/adf_pipeline/i2s_stream_mod.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/i2s_audio/external_adc.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/i2s_audio/external_dac.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/i2s_audio/i2s_audio.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/light/addressable_light.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/light/automation.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/light/esp_color_correction.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/light/esp_hsv_color.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/light/esp_range_view.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/light/light_call.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/light/light_json_schema.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/light/light_output.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/light/light_state.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/logger/logger.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/logger/logger_esp32.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/logger/logger_esp8266.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/logger/logger_host.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/logger/logger_libretiny.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/logger/logger_rp2040.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/md5/md5.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/mdns/mdns_component.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/mdns/mdns_esp32.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/mdns/mdns_esp8266.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/mdns/mdns_host.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/mdns/mdns_libretiny.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/mdns/mdns_rp2040.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/media_player/media_player.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/micro_wake_word/micro_wake_word.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/network/util.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/ota/ota_backend_arduino_esp32.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/ota/ota_backend_arduino_esp8266.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/ota/ota_backend_arduino_libretiny.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/ota/ota_backend_arduino_rp2040.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/ota/ota_backend_esp_idf.o
.
.
.
.
Linking .pioenvs/test-media-assistant-v2/firmware.elf
/data/cache/platformio/packages/toolchain-xtensa-esp32s3/bin/../lib/gcc/xtensa-esp32s3-elf/8.4.0/../../../../xtensa-esp32s3-elf/bin/ld: missing --end-group; added as last command line option
RAM:   [=         ]  12.0% (used 39348 bytes from 327680 bytes)
Flash: [==        ]  17.8% (used 1444629 bytes from 8126464 bytes)
Building .pioenvs/test-media-assistant-v2/firmware.bin
Creating esp32s3 image...
Successfully created esp32s3 image.
esp32_create_combined_bin([".pioenvs/test-media-assistant-v2/firmware.bin"], [".pioenvs/test-media-assistant-v2/firmware.elf"])
Wrote 0x170c80 bytes to file /data/build/test-media-assistant-v2/.pioenvs/test-media-assistant-v2/firmware-factory.bin, ready to flash to offset 0x0
======================== [SUCCESS] Took 1629.69 seconds ========================
INFO Successfully compiled program.

@imonlinux
Copy link

imonlinux commented May 17, 2024

Hey strusic,

I haven't used that mic board before, but I am using the same output amp. What kind of speaker are you using? I have tried a few and am having a lot of success with this one running at 50% volume:

https://www.amazon.com/dp/B01CHYIU26

When I get home, I will record a video of it working as a reference and link it here.

edit: included volume level

@strusic
Copy link

strusic commented May 17, 2024

I am using speaker from old bt speaker. It sounds good when I was using only i2s mediaplayer with esp32-s2 mini. I have plan to get all electronics into case of that speaker.

@imonlinux
Copy link

You may want to take a look at gnumpi's config for Shared I2S-Port with Exclusive Access instead of duplex mode.

https://github.com/gnumpi/esphome_audio

@strusic
Copy link

strusic commented May 18, 2024

I kinda managed to get it work but it sound like from hell

20240518_135917.mp4

@imonlinux
Copy link

Wow. That's ruff. If you changed to exclusive mode, did you include resampler in the pipeline for the media_player?

media_player:
  - platform: adf_pipeline
    id: adf_media_player
    name: s3-dev_media_player
    internal: false
    keep_pipeline_alive: true
    pipeline:
      - self
      - resampler
      - adf_i2s_out

This is mine in duplex mode.

20240518_092752.1.mp4

@HA-TB303
Copy link

HA-TB303 commented May 30, 2024

Hey, based on the config above I have been cooking this:

substitutions:
  device_name: "test-01"
  friendly_name: "test-01"
  device_description: "esp32s3"
  esp_board: "esp32-s3-devkitc-1"
  # Phases of the Voice Assistant
  # IDLE: The voice assistant is ready to be triggered by a wake-word
  voice_assist_idle_phase_id: '1'
  # LISTENING: The voice assistant is ready to listen to a voice command (after being triggered by the wake word)
  voice_assist_listening_phase_id: '2'
  # THINKING: The voice assistant is currently processing the command
  voice_assist_thinking_phase_id: '3'
  # REPLYING: The voice assistant is replying to the command
  voice_assist_replying_phase_id: '4'
  # NOT_READY: The voice assistant is not ready
  voice_assist_not_ready_phase_id: '10'
  # ERROR: The voice assistant encountered an error
  voice_assist_error_phase_id: '11'
  # MUTED: The voice assistant is muted and will not reply to a wake-word
  voice_assist_muted_phase_id: '12'

external_components:
  - source:
      type: git
      url: https://github.com/gnumpi/esphome_audio
      ref: main
      #type: local
      #path: /Users/siekmann/Privat/Projects/espHome/esphome_audio/esphome/components
    components: [ adf_pipeline, i2s_audio ]

esphome:
  name: ${device_name}
  comment: ${device_description}
  friendly_name: ${friendly_name}
  min_version: 2024.2.0
  platformio_options:
    build_flags: -DBOARD_HAS_PSRAM
    board_build.flash_mode: dio
    board_upload.maximum_size: 16777216
  on_boot:
    priority: 600
    then:
      # Run the script to refresh the LED status
      # If after 30 seconds, the device is still initializing (It did not yet connect to Home Assistant), turn off the init_in_progress variable and run the script to refresh the LED status
      - delay: 30s
      - if:
          condition:
            lambda: return id(init_in_progress);
          then:
            - lambda: id(init_in_progress) = false;
                            
esp32:
  board: ${esp_board}
  variant: ESP32S3
  flash_size: 16MB
  framework:
    type: esp-idf
    sdkconfig_options:
      # need to set a s3 compatible board for the adf-sdk to compile
      # board specific code is not used though
      CONFIG_ESP32_S3_BOX_BOARD: "y"
      CONFIG_ESP32_WIFI_STATIC_RX_BUFFER_NUM: "16"
      CONFIG_ESP32_WIFI_DYNAMIC_RX_BUFFER_NUM: "512"
      CONFIG_TCPIP_RECVMBOX_SIZE: "512"

      CONFIG_TCP_SND_BUF_DEFAULT: "65535"
      CONFIG_TCP_WND_DEFAULT: "512000"
      CONFIG_TCP_RECVMBOX_SIZE: "512"

logger:

globals:
  # Global initialisation variable. Initialized to true and set to false once everything is connected. Only used to have a smooth "plugging" experience
  - id: init_in_progress
    type: bool
    restore_value: no
    initial_value: 'true'
  # Global variable tracking the phase of the voice assistant (defined above). Initialized to not_ready
  - id: voice_assistant_phase
    type: int
    restore_value: no
    initial_value: ${voice_assist_not_ready_phase_id}

psram:
  mode: octal
  speed: 80MHz

wifi:
  enable_rrm: true
  ssid: !secret wifi_ssid
  password: !secret wifi_password
  fast_connect: true

ota:
  password: !secret ota_password

api:
  encryption:
    key: !secret api_key

i2s_audio:
  - id: i2s_in
    i2s_lrclk_pin: GPIO7
    i2s_bclk_pin: GPIO16
  - id: i2s_out
    i2s_lrclk_pin: GPIO8
    i2s_bclk_pin: GPIO18

adf_pipeline:
  - platform: i2s_audio
    type: audio_out
    id: adf_i2s_out
    i2s_audio_id: i2s_out
    i2s_dout_pin: GPIO17

  - platform: i2s_audio
    type: audio_in
    id: adf_i2s_in
    i2s_audio_id: i2s_in
    i2s_din_pin: GPIO15
    pdm: false
    channel: right
    sample_rate: 16000
    bits_per_sample: 32bit

microphone:
  - platform: adf_pipeline
    id: adf_microphone
    keep_pipeline_alive: true
    pipeline:
      - adf_i2s_in
      - self
      
media_player:
  - platform: adf_pipeline
    id: adf_media_player
    name: media_player
    keep_pipeline_alive: true
    internal: false
    pipeline:
      - self
      - resampler
      - adf_i2s_out

micro_wake_word:
  model: okay_nabu
  on_wake_word_detected:
      - media_player.stop:
      - light.turn_on:
          id: led_ring
          blue: 0%
          red: 0%
          green: 100%
          brightness: 75%
          effect: pulse
      - voice_assistant.start:

voice_assistant:
  microphone: adf_microphone
  media_player: adf_media_player
  use_wake_word: false
  #vad_threshold: 3
  #noise_suppression_level: 1
  auto_gain: 31dBFS
  #volume_multiplier: 15.0

  on_client_connected:
    - lambda: id(init_in_progress) = false;
    - if:
        condition:
          switch.is_on: use_wake_word
        then:
          - micro_wake_word.start:
          - lambda: id(voice_assistant_phase) = ${voice_assist_idle_phase_id};
          - script.execute: reset_led
        else:
          - lambda: id(voice_assistant_phase) = ${voice_assist_muted_phase_id};

  on_client_disconnected:
    - lambda: id(voice_assistant_phase) = ${voice_assist_not_ready_phase_id};
    - voice_assistant.stop
    - micro_wake_word.stop
    - light.turn_on:
          id: led_ring
          blue: 0%
          red: 100%
          green: 100%
          brightness: 50%
          effect: connecting

  on_listening:
    - light.turn_on:
        id: led_ring
        blue: 100%
        red: 0%
        green: 0%
        brightness: 25%
        effect: wakeword
        
  on_tts_start:
    - light.turn_on:
        id: led_ring
        blue: 0%
        red: 0%
        green: 100%
        brightness: 75%
        effect: pulse
  
  on_end:
      then:
        - light.turn_off:
            id: led_ring
        - voice_assistant.stop
        - wait_until:
            not:
              media_player.is_playing:
        - script.execute: reset_led
        - if:
            condition:
              switch.is_on: use_wake_word
            then:
              - micro_wake_word.start:
  on_error:
    - light.turn_on:
        id: led_ring
        blue: 0%
        red: 100%
        green: 0%
        brightness: 100%
        effect: none
    - delay: 1s
    - script.execute: reset_led
    - script.wait: reset_led
    - lambda: |-
        if (code == "wake-provider-missing" || code == "wake-engine-missing") {
          id(use_wake_word).turn_off();
        }
    - if:
        condition:
          switch.is_on: use_wake_word
        then:
          - micro_wake_word.start:
          - script.execute: reset_led
              
script:
  - id: reset_led
    then:
      - if:
          condition:
            switch.is_on: use_wake_word
          then:
            - light.turn_on:
                id: led_ring
                blue: 100%
                red: 0%
                green: 0%
                brightness: 25%
                effect: none
          else:
            - light.turn_off: led_ring
 
button:
  - platform: restart
    id: restart_btn
    name: "${friendly_name} REBOOT"
            
switch:
  - platform: template
    name: Enable Voice Assistant
    id: use_wake_word
    optimistic: true
    restore_mode: RESTORE_DEFAULT_ON
    icon: mdi:assistant
    # When the switch is turned on (on Home Assistant):
    # Start the voice assistant component
    # Set the correct phase and run the script to refresh the LED status
    on_turn_on:
      - logger.log: "switch on"
      - if:
          condition:
            lambda: return !id(init_in_progress);
          then:
            - logger.log: "condition 1"
            - lambda: id(voice_assistant_phase) = ${voice_assist_idle_phase_id};
            - voice_assistant.stop
            - delay: 1s
            - if:
                condition:
                  not:
                    - voice_assistant.is_running
                then:
                  - logger.log: "Starting MWW"
                  #- voice_assistant.start_continuous
                  - micro_wake_word.start:
      - script.execute: reset_led
    on_turn_off:
      - if:
          condition:
            lambda: return !id(init_in_progress);
          then:
            - voice_assistant.stop
            - micro_wake_word.stop
            - lambda: id(voice_assistant_phase) = ${voice_assist_muted_phase_id};
      - script.execute: reset_led

  - platform: template
    name: Pipeline
    id: pipeline_switch
    optimistic: true
    restore_mode: RESTORE_DEFAULT_OFF

    on_turn_off:
      - media_player.stop

light:
  - platform: esp32_rmt_led_strip
    id: led_ring
    name: "${friendly_name} Front LED"
    pin: GPIO05
    num_leds: 1
    rmt_channel: 1
    rgb_order: GRB
    chipset: ws2812
    default_transition_length: 0s
    effects:
      - pulse:
          name: "Pulse"
          transition_length: 0.5s
          update_interval: 0.5s
      - addressable_twinkle:
          name: "Working"
          twinkle_probability: 5%
          progress_interval: 4ms
      - addressable_color_wipe:
          name: "Wakeword"
          colors:
            - red: 0%
              green: 50%
              blue: 0%
              num_leds: 12
          add_led_interval: 20ms
          reverse: false
      - addressable_color_wipe:
          name: "Connecting"
          colors:
            - red: 60%
              green: 60%
              blue: 60%
              num_leds: 1
            - red: 60%
              green: 60%
              blue: 0%
              num_leds: 1
          add_led_interval: 100ms
          reverse: true

This works extremely well, but just once ;)

The second time, it detects the wake word, starts blinking green, does not process the request and keeps blinking.

LOG (after trying twice)

INFO ESPHome 2024.5.4
[21:08:48][I][app:100]: ESPHome version 2024.5.4 compiled on May 30 2024, 21:05:15
[21:08:48][C][wifi:580]: WiFi:
[21:08:48][C][wifi:408]:   Local MAC: 3C:84:27:CC:42:0C
[21:08:48][C][wifi:413]:   SSID: [redacted]
[21:08:48][C][wifi:416]:   IP Address: 192.168.207.246
[21:08:48][C][wifi:420]:   BSSID: [redacted]
[21:08:48][C][wifi:421]:   Hostname: 'test-01'
[21:08:48][C][wifi:423]:   Signal strength: -60 dB ▂▄▆█
[21:08:48][C][wifi:427]:   Channel: 11
[21:08:48][C][wifi:428]:   Subnet: 255.255.255.0
[21:08:48][C][wifi:429]:   Gateway: 192.168.207.1
[21:08:48][C][wifi:430]:   DNS1: 192.168.207.130
[21:08:48][C][wifi:431]:   DNS2: 0.0.0.0
[21:08:48][C][wifi:433]:   BTM: disabled
[21:08:48][C][wifi:434]:   RRM: enabled
[21:08:48][C][logger:185]: Logger:
[21:08:48][C][logger:186]:   Level: DEBUG
[21:08:48][C][logger:188]:   Log Baud Rate: 115200
[21:08:48][C][logger:189]:   Hardware UART: USB_SERIAL_JTAG
[21:08:48][C][esp32_rmt_led_strip:175]: ESP32 RMT LED Strip:
[21:08:48][C][esp32_rmt_led_strip:176]:   Pin: 5
[21:08:48][C][esp32_rmt_led_strip:177]:   Channel: 1
[21:08:48][C][esp32_rmt_led_strip:202]:   RGB Order: GRB
[21:08:48][C][esp32_rmt_led_strip:203]:   Max refresh rate: 0
[21:08:48][C][esp32_rmt_led_strip:204]:   Number of LEDs: 1
[21:08:48][C][light:103]: Light 'test-01 Front LED'
[21:08:48][C][light:105]:   Default Transition Length: 0.0s
[21:08:48][C][light:106]:   Gamma Correct: 2.80
[21:08:48][C][template.switch:068]: Template Switch 'Pipeline'
[21:08:48][C][template.switch:091]:   Restore Mode: restore defaults to OFF
[21:08:48][C][template.switch:057]:   Optimistic: YES
[21:08:48][C][template.switch:068]: Template Switch 'Enable Voice Assistant'
[21:08:48][C][template.switch:070]:   Icon: 'mdi:assistant'
[21:08:48][C][template.switch:091]:   Restore Mode: restore defaults to ON
[21:08:48][C][template.switch:057]:   Optimistic: YES
[21:08:48][C][psram:020]: PSRAM:
[21:08:48][C][psram:021]:   Available: YES
[21:08:48][C][psram:024]:   Size: 8191 KB
[21:08:48][C][i2s_audio:028]: I2SController:
[21:08:48][C][i2s_audio:029]:   AccessMode: exclusive
[21:08:48][C][i2s_audio:030]:   Port: 0
[21:08:48][C][i2s_audio:032]:   Reader registered.
[21:08:48][C][i2s_audio:028]: I2SController:
[21:08:48][C][i2s_audio:029]:   AccessMode: exclusive
[21:08:48][C][i2s_audio:030]:   Port: 1
[21:08:48][C][i2s_audio:035]:   Writer registered.
[21:08:48][C][i2s_audio:138]: I2S-Writer (Initial-CFG):
[21:08:48][C][i2s_audio:140]:   sample-rate: 16000 bits_per_sample: 32
[21:08:48][C][i2s_audio:141]:   channel_fmt: 0 channels: 2
[21:08:48][C][i2s_audio:142]:   use_apll: no, use_pdm: no
[21:08:48][C][i2s_audio:135]: I2S-Reader (Initial-CFG):
[21:08:48][C][i2s_audio:140]:   sample-rate: 16000 bits_per_sample: 32
[21:08:48][C][i2s_audio:141]:   channel_fmt: 3 channels: 1
[21:08:48][C][i2s_audio:142]:   use_apll: no, use_pdm: no
[21:08:48][C][restart.button:017]: Restart Button 'test-01 REBOOT'
[21:08:48][C][mdns:115]: mDNS:
[21:08:48][C][mdns:116]:   Hostname: test-01
[21:08:48][C][ota:096]: Over-The-Air Updates:
[21:08:48][C][ota:097]:   Address: test-01.local:3232
[21:08:48][C][ota:100]:   Using Password.
[21:08:48][C][ota:103]:   OTA version: 2.
[21:08:48][C][api:139]: API Server:
[21:08:48][C][api:140]:   Address: test-01.local:6053
[21:08:48][C][api:142]:   Using noise encryption: YES
[21:08:48][C][micro_wake_word:057]: microWakeWord:
[21:08:48][C][micro_wake_word:058]:   Wake Word: okay nabu
[21:08:48][C][micro_wake_word:059]:   Probability cutoff: 0.500
[21:08:48][C][micro_wake_word:060]:   Sliding window size: 10
[21:08:48][C][esp_adf_pipeline.microphone:020]: ADF-Microphone
[21:08:48][C][adf_media_player:016]: ESP-ADF-MediaPlayer:
[21:08:48][C][adf_media_player:018]:   Number of ASPComponents: 3
[21:08:51][D][api:102]: Accepted 192.168.207.101
[21:08:51][D][api.connection:1321]: Home Assistant 2024.5.5 (192.168.207.101): Connected successfully
[21:08:51][D][micro_wake_word:177]: State changed from IDLE to START_MICROPHONE
[21:08:51][D][light:036]: 'test-01 Front LED' Setting:
[21:08:51][D][light:051]:   Brightness: 25%
[21:08:51][D][light:059]:   Red: 0%, Green: 0%, Blue: 100%
[21:08:51][D][micro_wake_word:115]: Starting Microphone
[21:08:51][D][esp_adf_pipeline:050]: Starting request, current state UNINITIALIZED
[21:08:51][D][esp-idf:000]: I (10976) I2S: DMA Malloc info, datalen=blocksize=1024, dma_buf_count=4

[21:08:52][D][i2s_audio:072]: Installing driver : yes
[21:08:52][D][esp_adf_pipeline:358]: pipeline tag 0, i2s_in
[21:08:52][D][esp_adf_pipeline:358]: pipeline tag 1, pcm_reader
[21:08:52][D][esp-idf:000]: I (10988) AUDIO_PIPELINE: link el->rb, el:0x3d8178b0, tag:i2s_in, rb:0x3d817b8c

[21:08:52][D][esp_adf_pipeline:370]: Setting up event listener.
[21:08:52][D][esp_adf_pipeline:302]: State changed from UNINITIALIZED to PREPARING
[21:08:52][D][micro_wake_word:177]: State changed from START_MICROPHONE to STARTING_MICROPHONE
[21:08:52][D][esp_audio_sinks:053]: Set bitdepth to 32
[21:08:52][D][esp_adf_pipeline:302]: State changed from PREPARING to STARTING
[21:08:52][D][esp-idf:000]: I (11005) AUDIO_ELEMENT: [i2s_in-0x3d8178b0] Element task created

[21:08:52][D][esp-idf:000]: I (11007) AUDIO_ELEMENT: [pcm_reader-0x3d817a44] Element task created

[21:08:52][D][esp-idf:000]: I (11009) AUDIO_PIPELINE: Func:audio_pipeline_run, Line:359, MEM Total:8451851 Bytes, Inter:163104 Bytes, Dram:163104 Bytes


[21:08:52][D][esp-idf:000][i2s_in]: I (11012) AUDIO_ELEMENT: [i2s_in] AEL_MSG_CMD_RESUME,state:1

[21:08:52][D][esp-idf:000]: I (11015) AUDIO_PIPELINE: Pipeline started

[21:08:52][I][esp_adf_pipeline:214]: [ pcm_reader ] status: 12
[21:08:52][D][esp_adf_pipeline:131]: Check element [i2s_in] status, 3
[21:08:52][D][esp_adf_pipeline:131]: Check element [pcm_reader] status, 3
[21:08:52][D][esp_adf_pipeline:302]: State changed from STARTING to RUNNING
[21:08:52][D][micro_wake_word:177]: State changed from STARTING_MICROPHONE to DETECTING_WAKE_WORD
[21:08:52][I][esp_adf_pipeline:214]: [ pcm_reader ] status: 12
[21:08:52][I][esp_adf_pipeline:214]: [ i2s_in ] status: 12
[21:09:14][D][micro_wake_word:362]: Wake word sliding average probability is 0.522 and most recent probability is 1.000
[21:09:14][D][micro_wake_word:128]: Wake Word Detected
[21:09:14][D][micro_wake_word:177]: State changed from DETECTING_WAKE_WORD to STOP_MICROPHONE
[21:09:14][D][micro_wake_word:134]: Stopping Microphone
[21:09:14][D][esp_adf_pipeline:302]: State changed from RUNNING to STOPPING
[21:09:14][D][micro_wake_word:177]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE
[21:09:14][D][esp-idf:000][i2s_in]: W (33041) AUDIO_ELEMENT: OUT-[i2s_in] AEL_IO_ABORT

[21:09:14][D][esp-idf:000][i2s_in]: W (33044) AUDIO_ELEMENT: OUT-[i2s_in] AEL_IO_ABORT

[21:09:14][D][esp-idf:000][i2s_in]: W (33047) AUDIO_ELEMENT: OUT-[i2s_in] AEL_IO_ABORT

[21:09:14][D][esp-idf:000][i2s_in]: W (33050) AUDIO_ELEMENT: OUT-[i2s_in] AEL_IO_ABORT

[21:09:14][D][esp_adf_pipeline:302]: State changed from STOPPING to STOPPED
[21:09:14][D][micro_wake_word:177]: State changed from STOPPING_MICROPHONE to IDLE
[21:09:14][D][media_player:061]: 'media_player' - Setting
[21:09:14][D][media_player:065]:   Command: STOP
[21:09:14][D][esp_adf_pipeline:085]: Called 'stop' while in UNINITIALIZED state.
[21:09:14][D][light:036]: 'test-01 Front LED' Setting:
[21:09:14][D][light:051]:   Brightness: 75%
[21:09:14][D][light:059]:   Red: 0%, Green: 100%, Blue: 0%
[21:09:14][D][light:109]:   Effect: 'Pulse'
[21:09:14][D][voice_assistant:502]: State changed from IDLE to START_MICROPHONE
[21:09:14][D][voice_assistant:508]: Desired state set to START_PIPELINE
[21:09:14][D][voice_assistant:220]: Starting Microphone
[21:09:14][D][ring_buffer:024]: Created ring buffer with size 32768
[21:09:14][D][esp_adf_pipeline:050]: Starting request, current state STOPPED
[21:09:14][D][esp_adf_pipeline:302]: State changed from STOPPED to PREPARING
[21:09:14][D][voice_assistant:502]: State changed from START_MICROPHONE to STARTING_MICROPHONE
[21:09:14][D][esp_adf_pipeline:302]: State changed from PREPARING to STARTING
[21:09:14][D][esp-idf:000]: I (33111) AUDIO_PIPELINE: Func:audio_pipeline_run, Line:359, MEM Total:8422123 Bytes, Inter:168276 Bytes, Dram:168276 Bytes


[21:09:14][D][esp-idf:000][i2s_in]: I (33115) AUDIO_ELEMENT: [i2s_in] AEL_MSG_CMD_RESUME,state:1

[21:09:14][D][esp-idf:000]: I (33117) AUDIO_PIPELINE: Pipeline started

[21:09:14][I][esp_adf_pipeline:214]: [ pcm_reader ] status: 14
[21:09:14][I][esp_adf_pipeline:214]: [ i2s_in ] status: 14
[21:09:14][I][esp_adf_pipeline:214]: [ i2s_in ] status: 12
[21:09:14][D][esp_adf_pipeline:131]: Check element [i2s_in] status, 3
[21:09:14][D][esp_adf_pipeline:131]: Check element [pcm_reader] status, 3
[21:09:14][D][esp_adf_pipeline:302]: State changed from STARTING to RUNNING
[21:09:14][D][voice_assistant:502]: State changed from STARTING_MICROPHONE to START_PIPELINE
[21:09:14][I][esp_adf_pipeline:214]: [ pcm_reader ] status: 12
[21:09:14][D][voice_assistant:274]: Requesting start...
[21:09:14][D][voice_assistant:502]: State changed from START_PIPELINE to STARTING_PIPELINE
[21:09:14][D][voice_assistant:523]: Client started, streaming microphone
[21:09:14][D][voice_assistant:502]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE
[21:09:14][D][voice_assistant:508]: Desired state set to STREAMING_MICROPHONE
[21:09:14][D][voice_assistant:625]: Event Type: 1
[21:09:14][D][voice_assistant:628]: Assist Pipeline running
[21:09:14][D][voice_assistant:625]: Event Type: 3
[21:09:14][D][voice_assistant:639]: STT started
[21:09:14][D][light:036]: 'test-01 Front LED' Setting:
[21:09:14][D][light:051]:   Brightness: 25%
[21:09:14][D][light:059]:   Red: 0%, Green: 0%, Blue: 100%
[21:09:14][D][light:109]:   Effect: 'Wakeword'
[21:09:14][D][voice_assistant:625]: Event Type: 11
[21:09:14][D][voice_assistant:779]: Starting STT by VAD
[21:09:16][D][voice_assistant:625]: Event Type: 12
[21:09:16][D][voice_assistant:783]: STT by VAD end
[21:09:16][D][voice_assistant:502]: State changed from STREAMING_MICROPHONE to STOP_MICROPHONE
[21:09:16][D][voice_assistant:508]: Desired state set to AWAITING_RESPONSE
[21:09:16][D][esp_adf_pipeline:302]: State changed from RUNNING to STOPPING
[21:09:16][D][voice_assistant:502]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE
[21:09:16][D][esp-idf:000][i2s_in]: W (35138) AUDIO_ELEMENT: OUT-[i2s_in] AEL_IO_ABORT

[21:09:16][D][esp-idf:000][i2s_in]: W (35141) AUDIO_ELEMENT: OUT-[i2s_in] AEL_IO_ABORT

[21:09:16][D][esp-idf:000][i2s_in]: W (35146) AUDIO_ELEMENT: OUT-[i2s_in] AEL_IO_ABORT

[21:09:16][D][esp-idf:000][i2s_in]: W (35150) AUDIO_ELEMENT: OUT-[i2s_in] AEL_IO_ABORT

[21:09:16][D][esp_adf_pipeline:302]: State changed from STOPPING to STOPPED
[21:09:16][D][voice_assistant:502]: State changed from STOPPING_MICROPHONE to AWAITING_RESPONSE
[21:09:16][D][voice_assistant:625]: Event Type: 4
[21:09:16][D][voice_assistant:653]: Speech recognised as: "Zet licht in het kantoor uit?"
[21:09:16][D][voice_assistant:625]: Event Type: 5
[21:09:16][D][voice_assistant:658]: Intent started
[21:09:19][D][voice_assistant:625]: Event Type: 6
[21:09:19][D][voice_assistant:625]: Event Type: 7
[21:09:19][D][voice_assistant:681]: Response: "Het licht in het kantoor is uitgeschakeld."
[21:09:19][D][light:036]: 'test-01 Front LED' Setting:
[21:09:19][D][light:051]:   Brightness: 75%
[21:09:19][D][light:059]:   Red: 0%, Green: 100%, Blue: 0%
[21:09:19][D][light:109]:   Effect: 'Pulse'
[21:09:19][D][voice_assistant:625]: Event Type: 8
[21:09:19][D][voice_assistant:701]: Response URL: "http://192.168.207.101:8123/api/tts_proxy/80ccb868b538746ae77149736cbb3987d8494075_nl-nl_f57ed9f7cb_tts.home_assistant_cloud.mp3"
[21:09:19][D][voice_assistant:502]: State changed from AWAITING_RESPONSE to STREAMING_RESPONSE
[21:09:19][D][voice_assistant:508]: Desired state set to STREAMING_RESPONSE
[21:09:19][D][media_player:061]: 'media_player' - Setting
[21:09:19][D][media_player:068]:   Media URL: http://192.168.207.101:8123/api/tts_proxy/80ccb868b538746ae77149736cbb3987d8494075_nl-nl_f57ed9f7cb_tts.home_assistant_cloud.mp3
[21:09:19][D][media_player:074]:  Announcement: yes
[21:09:19][D][adf_media_player:030]: Got control call in state 1
[21:09:19][D][esp_adf_pipeline:050]: Starting request, current state UNINITIALIZED
[21:09:19][D][esp-idf:000]: I (38335) MP3_DECODER: MP3 init

[21:09:19][D][esp-idf:000]: I (38339) I2S: DMA Malloc info, datalen=blocksize=2048, dma_buf_count=4

[21:09:19][D][i2s_audio:072]: Installing driver : yes
[21:09:19][D][esp_adf_pipeline:358]: pipeline tag 0, http
[21:09:19][D][esp_adf_pipeline:358]: pipeline tag 1, decoder
[21:09:19][D][esp_adf_pipeline:358]: pipeline tag 2, resampler
[21:09:19][D][esp_adf_pipeline:358]: pipeline tag 3, i2s_out
[21:09:19][D][esp-idf:000]: I (38356) AUDIO_PIPELINE: link el->rb, el:0x3d820ed8, tag:http, rb:0x3d8215a8

[21:09:19][D][esp-idf:000]: I (38358) AUDIO_PIPELINE: link el->rb, el:0x3d8210dc, tag:decoder, rb:0x3d8225e8

[21:09:19][D][esp-idf:000]: I (38361) AUDIO_PIPELINE: link el->rb, el:0x3d821278, tag:resampler, rb:0x3d823628

[21:09:19][D][esp_adf_pipeline:370]: Setting up event listener.
[21:09:19][D][esp_adf_pipeline:302]: State changed from UNINITIALIZED to PREPARING
[21:09:19][I][adf_media_player:135]: got new pipeline state: 1
[21:09:19][D][adf_i2s_out:127]: Set final i2s settings: 16000
[21:09:19][D][esp_audio_processors:079]: New settings: SRC: rate: 16000, ch: 2 DST: rate: 16000, ch: 2 
[21:09:19][W][component:237]: Component voice_assistant took a long time for an operation (55 ms).
[21:09:19][W][component:238]: Components should block for at most 30 ms.
[21:09:19][D][voice_assistant:625]: Event Type: 2
[21:09:19][D][voice_assistant:715]: Assist Pipeline ended
[21:09:19][D][esp-idf:000]: I (38392) AUDIO_THREAD: The http task allocate stack on external memory

[21:09:19][D][esp-idf:000]: I (38394) AUDIO_ELEMENT: [http-0x3d820ed8] Element task created

[21:09:19][D][esp-idf:000]: I (38398) AUDIO_THREAD: The decoder task allocate stack on external memory

[21:09:19][D][esp-idf:000]: I (38401) AUDIO_ELEMENT: [decoder-0x3d8210dc] Element task created

[21:09:19][D][esp-idf:000][http]: I (38404) AUDIO_ELEMENT: [http] AEL_MSG_CMD_RESUME,state:1

[21:09:19][D][esp-idf:000][decoder]: I (38407) AUDIO_ELEMENT: [decoder] AEL_MSG_CMD_RESUME,state:1

[21:09:19][D][esp_aud:000]d]: 
ERROR Fatal error: protocol.data_received() call failed.
protocol: <aioesphomeapi._frame_helper.noise.APINoiseFrameHelper object at 0x7fa4dc080670>
transport: <_SelectorSocketTransport fd=6 read=polling write=<idle, bufsize=0>>
Traceback (most recent call last):
  File "/usr/lib/python3.11/asyncio/selector_events.py", line 1009, in _read_ready__data_received
    self._protocol.data_received(data)
  File "aioesphomeapi/_frame_helper/noise.py", line 136, in aioesphomeapi._frame_helper.noise.APINoiseFrameHelper.data_received
  File "aioesphomeapi/_frame_helper/noise.py", line 163, in aioesphomeapi._frame_helper.noise.APINoiseFrameHelper.data_received
  File "aioesphomeapi/_frame_helper/noise.py", line 319, in aioesphomeapi._frame_helper.noise.APINoiseFrameHelper._handle_frame
  File "/usr/local/lib/python3.11/dist-packages/noise/state.py", line 74, in decrypt_with_ad
    plaintext = self.cipher.decrypt(self.k, self.n, ad, ciphertext)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/noise/backends/default/ciphers.py", line 13, in decrypt
    return self.cipher.decrypt(nonce=self.format_nonce(n), data=ciphertext, associated_data=ad)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "src/chacha20poly1305_reuseable/__init__.py", line 127, in chacha20poly1305_reuseable.ChaCha20Poly1305Reusable.decrypt
  File "src/chacha20poly1305_reuseable/__init__.py", line 147, in chacha20poly1305_reuseable.ChaCha20Poly1305Reusable.decrypt
  File "src/chacha20poly1305_reuseable/__init__.py", line 263, in chacha20poly1305_reuseable._decrypt_with_fixed_nonce_len
  File "src/chacha20poly1305_reuseable/__init__.py", line 273, in chacha20poly1305_reuseable._decrypt_data
cryptography.exceptions.InvalidTag
WARNING test-01 @ 192.168.207.246: Connection error occurred: test-01 @ 192.168.207.246: Invalid encryption key: received_name=test-01
INFO Processing unexpected disconnect from ESPHome API for test-01 @ 192.168.207.246
WARNING Disconnected from API
INFO Successfully connected to test-01 @ 192.168.207.246 in 0.004s
INFO Successful handshake with test-01 @ 192.168.207.246 in 0.105s
[21:09:19][I][esp_adf_pipeline:214]: [ i2s_out ] status: 12
[21:09:19][D][esp_adf_pipeline:131]: Check element [http] status, 2
[21:09:19][I][esp_adf_pipeline:214]: [ resampler ] status: 12
[21:09:19][D][esp_adf_pipeline:131]: Check element [http] status, 2
[21:09:19][D][esp-idf:000][http]: I (38612) HTTP_CLIENT: Body received in fetch header state, 0x3fcc511b, 1841

[21:09:19][D][esp-idf:000][http]: I (38617) HTTP_STREAM: total_bytes=23199

[21:09:19][I][esp_adf_pipeline:214]: [ http ] status: 12
[21:09:19][D][esp_adf_pipeline:131]: Check element [http] status, 3
[21:09:19][D][esp_adf_pipeline:131]: Check element [decoder] status, 2
[21:09:19][I][esp_adf_pipeline:214]: [ decoder ] status: 12
[21:09:19][D][esp_adf_pipeline:131]: Check element [http] status, 3
[21:09:19][D][esp_adf_pipeline:131]: Check element [decoder] status, 3
[21:09:19][D][esp_adf_pipeline:131]: Check element [resampler] status, 3
[21:09:19][D][esp_adf_pipeline:131]: Check element [i2s_out] status, 3
[21:09:19][D][esp_adf_pipeline:302]: State changed from STARTING to RUNNING
[21:09:19][I][adf_media_player:135]: got new pipeline state: 3
[21:09:19][D][adf_i2s_out:127]: Set final i2s settings: 24000
[21:09:19][D][esp_audio_processors:079]: New settings: SRC: rate: 24000, ch: 1 DST: rate: 24000, ch: 1 
[21:09:19][I][HTTPStreamReader:129]: [ * ] Receive music info from mp3 decoder, sample_rates=24000, bits=16, ch=1
[21:09:19][D][adf_i2s_out:127]: Set final i2s settings: 24000
[21:09:19][D][esp_audio_processors:079]: New settings: SRC: rate: 24000, ch: 1 DST: rate: 24000, ch: 1 
[21:09:22][D][esp-idf:000][http]: W (41118) HTTP_STREAM: No more data,errno:0, total_bytes:23199, rlen = 0

[21:09:22][D][esp-idf:000][http]: I (41123) AUDIO_ELEMENT: IN-[http] AEL_IO_DONE,0

[21:09:22][I][esp_adf_pipeline:214]: [ http ] status: 15
[21:09:22][D][esp_adf_pipeline:302]: State changed from RUNNING to STOPPING
[21:09:22][I][adf_media_player:135]: got new pipeline state: 4
[21:09:22][D][esp-idf:000][decoder]: I (41799) AUDIO_ELEMENT: IN-[decoder] AEL_IO_DONE,-2

[21:09:23][D][esp-idf:000][decoder]: I (42161) MP3_DECODER: Closed

[21:09:23][D][esp-idf:000][resampler]: I (42267) AUDIO_ELEMENT: IN-[resampler] AEL_IO_DONE,-2

[21:09:23][D][esp-idf:000][i2s_out]: I (42330) AUDIO_ELEMENT: IN-[i2s_out] AEL_IO_DONE,-2

[21:09:23][D][esp_adf_pipeline:302]: State changed from STOPPING to STOPPED
[21:09:23][I][adf_media_player:135]: got new pipeline state: 5
[21:09:23][D][light:036]: 'test-01 Front LED' Setting:
[21:09:23][D][light:047]:   State: ON
[21:09:23][D][light:051]:   Brightness: 25%
[21:09:23][D][light:059]:   Red: 0%, Green: 0%, Blue: 100%
[21:09:23][D][micro_wake_word:177]: State changed from IDLE to START_MICROPHONE
[21:09:23][D][micro_wake_word:115]: Starting Microphone
[21:09:23][D][esp_adf_pipeline:050]: Starting request, current state STOPPED
[21:09:23][D][esp_adf_pipeline:302]: State changed from STOPPED to PREPARING
[21:09:23][D][micro_wake_word:177]: State changed from START_MICROPHONE to STARTING_MICROPHONE
[21:09:23][D][esp_adf_pipeline:302]: State changed from PREPARING to STARTING
[21:09:23][D][esp-idf:000]: I (42537) AUDIO_PIPELINE: Func:audio_pipeline_run, Line:359, MEM Total:8376627 Bytes, Inter:152808 Bytes, Dram:152808 Bytes


[21:09:23][D][esp-idf:000][i2s_in]: I (42539) AUDIO_ELEMENT: [i2s_in] AEL_MSG_CMD_RESUME,state:1

[21:09:23][D][esp-idf:000]: I (42543) AUDIO_PIPELINE: Pipeline started

[21:09:23][I][esp_adf_pipeline:214]: [ pcm_reader ] status: 14
[21:09:23][I][esp_adf_pipeline:214]: [ i2s_in ] status: 14
[21:09:23][I][esp_adf_pipeline:214]: [ i2s_in ] status: 12
[21:09:23][D][esp_adf_pipeline:131]: Check element [i2s_in] status, 3
[21:09:23][D][esp_adf_pipeline:131]: Check element [pcm_reader] status, 3
[21:09:23][D][esp_adf_pipeline:302]: State changed from STARTING to RUNNING
[21:09:23][D][micro_wake_word:177]: State changed from STARTING_MICROPHONE to DETECTING_WAKE_WORD
[21:09:23][I][esp_adf_pipeline:214]: [ pcm_reader ] status: 12
[21:09:25][D][micro_wake_word:362]: Wake word sliding average probability is 0.585 and most recent probability is 1.000
[21:09:25][D][micro_wake_word:128]: Wake Word Detected
[21:09:25][D][micro_wake_word:177]: State changed from DETECTING_WAKE_WORD to STOP_MICROPHONE
[21:09:25][D][micro_wake_word:134]: Stopping Microphone
[21:09:25][D][esp_adf_pipeline:302]: State changed from RUNNING to STOPPING
[21:09:25][D][micro_wake_word:177]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE
[21:09:25][D][esp-idf:000][i2s_in]: W (44496) AUDIO_ELEMENT: OUT-[i2s_in] AEL_IO_ABORT

[21:09:25][D][esp-idf:000][i2s_in]: W (44499) AUDIO_ELEMENT: OUT-[i2s_in] AEL_IO_ABORT

[21:09:25][D][esp-idf:000][i2s_in]: W (44503) AUDIO_ELEMENT: OUT-[i2s_in] AEL_IO_ABORT

[21:09:25][D][esp-idf:000][i2s_in]: W (44506) AUDIO_ELEMENT: OUT-[i2s_in] AEL_IO_ABORT

[21:09:25][D][esp_adf_pipeline:302]: State changed from STOPPING to STOPPED
[21:09:25][D][micro_wake_word:177]: State changed from STOPPING_MICROPHONE to IDLE
[21:09:25][D][media_player:061]: 'media_player' - Setting
[21:09:25][D][media_player:065]:   Command: STOP
[21:09:25][D][esp_adf_pipeline:085]: Called 'stop' while in STOPPED state.
[21:09:25][D][light:036]: 'test-01 Front LED' Setting:
[21:09:25][D][light:051]:   Brightness: 75%
[21:09:25][D][light:059]:   Red: 0%, Green: 100%, Blue: 0%
[21:09:25][D][light:109]:   Effect: 'Pulse'

Does anyone know what's going on here?

@indevor
Copy link

indevor commented May 30, 2024

Hey, based on the config above I have been cooking this:

substitutions:
  device_name: "test-01"
  friendly_name: "test-01"
  device_description: "esp32s3"
  esp_board: "esp32-s3-devkitc-1"
  # Phases of the Voice Assistant
  # IDLE: The voice assistant is ready to be triggered by a wake-word
  voice_assist_idle_phase_id: '1'
  # LISTENING: The voice assistant is ready to listen to a voice command (after being triggered by the wake word)
  voice_assist_listening_phase_id: '2'
  # THINKING: The voice assistant is currently processing the command
  voice_assist_thinking_phase_id: '3'
  # REPLYING: The voice assistant is replying to the command
  voice_assist_replying_phase_id: '4'
  # NOT_READY: The voice assistant is not ready
  voice_assist_not_ready_phase_id: '10'
  # ERROR: The voice assistant encountered an error
  voice_assist_error_phase_id: '11'
  # MUTED: The voice assistant is muted and will not reply to a wake-word
  voice_assist_muted_phase_id: '12'

external_components:
  - source:
      type: git
      url: https://github.com/gnumpi/esphome_audio
      ref: main
      #type: local
      #path: /Users/siekmann/Privat/Projects/espHome/esphome_audio/esphome/components
    components: [ adf_pipeline, i2s_audio ]

esphome:
  name: ${device_name}
  comment: ${device_description}
  friendly_name: ${friendly_name}
  min_version: 2024.2.0
  platformio_options:
    build_flags: -DBOARD_HAS_PSRAM
    board_build.flash_mode: dio
    board_upload.maximum_size: 16777216
  on_boot:
    priority: 600
    then:
      # Run the script to refresh the LED status
      # If after 30 seconds, the device is still initializing (It did not yet connect to Home Assistant), turn off the init_in_progress variable and run the script to refresh the LED status
      - delay: 30s
      - if:
          condition:
            lambda: return id(init_in_progress);
          then:
            - lambda: id(init_in_progress) = false;
                            
esp32:
  board: ${esp_board}
  variant: ESP32S3
  flash_size: 16MB
  framework:
    type: esp-idf
    sdkconfig_options:
      # need to set a s3 compatible board for the adf-sdk to compile
      # board specific code is not used though
      CONFIG_ESP32_S3_BOX_BOARD: "y"
      CONFIG_ESP32_WIFI_STATIC_RX_BUFFER_NUM: "16"
      CONFIG_ESP32_WIFI_DYNAMIC_RX_BUFFER_NUM: "512"
      CONFIG_TCPIP_RECVMBOX_SIZE: "512"

      CONFIG_TCP_SND_BUF_DEFAULT: "65535"
      CONFIG_TCP_WND_DEFAULT: "512000"
      CONFIG_TCP_RECVMBOX_SIZE: "512"

logger:

globals:
  # Global initialisation variable. Initialized to true and set to false once everything is connected. Only used to have a smooth "plugging" experience
  - id: init_in_progress
    type: bool
    restore_value: no
    initial_value: 'true'
  # Global variable tracking the phase of the voice assistant (defined above). Initialized to not_ready
  - id: voice_assistant_phase
    type: int
    restore_value: no
    initial_value: ${voice_assist_not_ready_phase_id}

psram:
  mode: octal
  speed: 80MHz

wifi:
  enable_rrm: true
  ssid: !secret wifi_ssid
  password: !secret wifi_password
  fast_connect: true

ota:
  password: !secret ota_password

api:
  encryption:
    key: !secret api_key

i2s_audio:
  - id: i2s_in
    i2s_lrclk_pin: GPIO7
    i2s_bclk_pin: GPIO16
  - id: i2s_out
    i2s_lrclk_pin: GPIO8
    i2s_bclk_pin: GPIO18

adf_pipeline:
  - platform: i2s_audio
    type: audio_out
    id: adf_i2s_out
    i2s_audio_id: i2s_out
    i2s_dout_pin: GPIO17

  - platform: i2s_audio
    type: audio_in
    id: adf_i2s_in
    i2s_audio_id: i2s_in
    i2s_din_pin: GPIO15
    pdm: false
    channel: right
    sample_rate: 16000
    bits_per_sample: 32bit

microphone:
  - platform: adf_pipeline
    id: adf_microphone
    keep_pipeline_alive: true
    pipeline:
      - adf_i2s_in
      - self
      
media_player:
  - platform: adf_pipeline
    id: adf_media_player
    name: media_player
    keep_pipeline_alive: true
    internal: false
    pipeline:
      - self
      - resampler
      - adf_i2s_out

micro_wake_word:
  model: okay_nabu
  on_wake_word_detected:
      - media_player.stop:
      - light.turn_on:
          id: led_ring
          blue: 0%
          red: 0%
          green: 100%
          brightness: 75%
          effect: pulse
      - voice_assistant.start:

voice_assistant:
  microphone: adf_microphone
  media_player: adf_media_player
  use_wake_word: false
  #vad_threshold: 3
  #noise_suppression_level: 1
  auto_gain: 31dBFS
  #volume_multiplier: 15.0

  on_client_connected:
    - lambda: id(init_in_progress) = false;
    - if:
        condition:
          switch.is_on: use_wake_word
        then:
          - micro_wake_word.start:
          - lambda: id(voice_assistant_phase) = ${voice_assist_idle_phase_id};
          - script.execute: reset_led
        else:
          - lambda: id(voice_assistant_phase) = ${voice_assist_muted_phase_id};

  on_client_disconnected:
    - lambda: id(voice_assistant_phase) = ${voice_assist_not_ready_phase_id};
    - voice_assistant.stop
    - micro_wake_word.stop
    - light.turn_on:
          id: led_ring
          blue: 0%
          red: 100%
          green: 100%
          brightness: 50%
          effect: connecting

  on_listening:
    - light.turn_on:
        id: led_ring
        blue: 100%
        red: 0%
        green: 0%
        brightness: 25%
        effect: wakeword
        
  on_tts_start:
    - light.turn_on:
        id: led_ring
        blue: 0%
        red: 0%
        green: 100%
        brightness: 75%
        effect: pulse
  
  on_end:
      then:
        - light.turn_off:
            id: led_ring
        - voice_assistant.stop
        - wait_until:
            not:
              media_player.is_playing:
        - script.execute: reset_led
        - if:
            condition:
              switch.is_on: use_wake_word
            then:
              - micro_wake_word.start:
  on_error:
    - light.turn_on:
        id: led_ring
        blue: 0%
        red: 100%
        green: 0%
        brightness: 100%
        effect: none
    - delay: 1s
    - script.execute: reset_led
    - script.wait: reset_led
    - lambda: |-
        if (code == "wake-provider-missing" || code == "wake-engine-missing") {
          id(use_wake_word).turn_off();
        }
    - if:
        condition:
          switch.is_on: use_wake_word
        then:
          - micro_wake_word.start:
          - script.execute: reset_led
              
script:
  - id: reset_led
    then:
      - if:
          condition:
            switch.is_on: use_wake_word
          then:
            - light.turn_on:
                id: led_ring
                blue: 100%
                red: 0%
                green: 0%
                brightness: 25%
                effect: none
          else:
            - light.turn_off: led_ring
 
button:
  - platform: restart
    id: restart_btn
    name: "${friendly_name} REBOOT"
            
switch:
  - platform: template
    name: Enable Voice Assistant
    id: use_wake_word
    optimistic: true
    restore_mode: RESTORE_DEFAULT_ON
    icon: mdi:assistant
    # When the switch is turned on (on Home Assistant):
    # Start the voice assistant component
    # Set the correct phase and run the script to refresh the LED status
    on_turn_on:
      - logger.log: "switch on"
      - if:
          condition:
            lambda: return !id(init_in_progress);
          then:
            - logger.log: "condition 1"
            - lambda: id(voice_assistant_phase) = ${voice_assist_idle_phase_id};
            - voice_assistant.stop
            - delay: 1s
            - if:
                condition:
                  not:
                    - voice_assistant.is_running
                then:
                  - logger.log: "Starting MWW"
                  #- voice_assistant.start_continuous
                  - micro_wake_word.start:
      - script.execute: reset_led
    on_turn_off:
      - if:
          condition:
            lambda: return !id(init_in_progress);
          then:
            - voice_assistant.stop
            - micro_wake_word.stop
            - lambda: id(voice_assistant_phase) = ${voice_assist_muted_phase_id};
      - script.execute: reset_led

  - platform: template
    name: Pipeline
    id: pipeline_switch
    optimistic: true
    restore_mode: RESTORE_DEFAULT_OFF

    on_turn_off:
      - media_player.stop

light:
  - platform: esp32_rmt_led_strip
    id: led_ring
    name: "${friendly_name} Front LED"
    pin: GPIO05
    num_leds: 1
    rmt_channel: 1
    rgb_order: GRB
    chipset: ws2812
    default_transition_length: 0s
    effects:
      - pulse:
          name: "Pulse"
          transition_length: 0.5s
          update_interval: 0.5s
      - addressable_twinkle:
          name: "Working"
          twinkle_probability: 5%
          progress_interval: 4ms
      - addressable_color_wipe:
          name: "Wakeword"
          colors:
            - red: 0%
              green: 50%
              blue: 0%
              num_leds: 12
          add_led_interval: 20ms
          reverse: false
      - addressable_color_wipe:
          name: "Connecting"
          colors:
            - red: 60%
              green: 60%
              blue: 60%
              num_leds: 1
            - red: 60%
              green: 60%
              blue: 0%
              num_leds: 1
          add_led_interval: 100ms
          reverse: true

This works extremely well, but just once ;)

The second time, it detects the wake word, starts blinking green, does not process the request and keeps blinking.

LOG (after trying twice)

INFO ESPHome 2024.5.4
[21:08:48][I][app:100]: ESPHome version 2024.5.4 compiled on May 30 2024, 21:05:15
[21:08:48][C][wifi:580]: WiFi:
[21:08:48][C][wifi:408]:   Local MAC: 3C:84:27:CC:42:0C
[21:08:48][C][wifi:413]:   SSID: [redacted]
[21:08:48][C][wifi:416]:   IP Address: 192.168.207.246
[21:08:48][C][wifi:420]:   BSSID: [redacted]
[21:08:48][C][wifi:421]:   Hostname: 'test-01'
[21:08:48][C][wifi:423]:   Signal strength: -60 dB ▂▄▆█
[21:08:48][C][wifi:427]:   Channel: 11
[21:08:48][C][wifi:428]:   Subnet: 255.255.255.0
[21:08:48][C][wifi:429]:   Gateway: 192.168.207.1
[21:08:48][C][wifi:430]:   DNS1: 192.168.207.130
[21:08:48][C][wifi:431]:   DNS2: 0.0.0.0
[21:08:48][C][wifi:433]:   BTM: disabled
[21:08:48][C][wifi:434]:   RRM: enabled
[21:08:48][C][logger:185]: Logger:
[21:08:48][C][logger:186]:   Level: DEBUG
[21:08:48][C][logger:188]:   Log Baud Rate: 115200
[21:08:48][C][logger:189]:   Hardware UART: USB_SERIAL_JTAG
[21:08:48][C][esp32_rmt_led_strip:175]: ESP32 RMT LED Strip:
[21:08:48][C][esp32_rmt_led_strip:176]:   Pin: 5
[21:08:48][C][esp32_rmt_led_strip:177]:   Channel: 1
[21:08:48][C][esp32_rmt_led_strip:202]:   RGB Order: GRB
[21:08:48][C][esp32_rmt_led_strip:203]:   Max refresh rate: 0
[21:08:48][C][esp32_rmt_led_strip:204]:   Number of LEDs: 1
[21:08:48][C][light:103]: Light 'test-01 Front LED'
[21:08:48][C][light:105]:   Default Transition Length: 0.0s
[21:08:48][C][light:106]:   Gamma Correct: 2.80
[21:08:48][C][template.switch:068]: Template Switch 'Pipeline'
[21:08:48][C][template.switch:091]:   Restore Mode: restore defaults to OFF
[21:08:48][C][template.switch:057]:   Optimistic: YES
[21:08:48][C][template.switch:068]: Template Switch 'Enable Voice Assistant'
[21:08:48][C][template.switch:070]:   Icon: 'mdi:assistant'
[21:08:48][C][template.switch:091]:   Restore Mode: restore defaults to ON
[21:08:48][C][template.switch:057]:   Optimistic: YES
[21:08:48][C][psram:020]: PSRAM:
[21:08:48][C][psram:021]:   Available: YES
[21:08:48][C][psram:024]:   Size: 8191 KB
[21:08:48][C][i2s_audio:028]: I2SController:
[21:08:48][C][i2s_audio:029]:   AccessMode: exclusive
[21:08:48][C][i2s_audio:030]:   Port: 0
[21:08:48][C][i2s_audio:032]:   Reader registered.
[21:08:48][C][i2s_audio:028]: I2SController:
[21:08:48][C][i2s_audio:029]:   AccessMode: exclusive
[21:08:48][C][i2s_audio:030]:   Port: 1
[21:08:48][C][i2s_audio:035]:   Writer registered.
[21:08:48][C][i2s_audio:138]: I2S-Writer (Initial-CFG):
[21:08:48][C][i2s_audio:140]:   sample-rate: 16000 bits_per_sample: 32
[21:08:48][C][i2s_audio:141]:   channel_fmt: 0 channels: 2
[21:08:48][C][i2s_audio:142]:   use_apll: no, use_pdm: no
[21:08:48][C][i2s_audio:135]: I2S-Reader (Initial-CFG):
[21:08:48][C][i2s_audio:140]:   sample-rate: 16000 bits_per_sample: 32
[21:08:48][C][i2s_audio:141]:   channel_fmt: 3 channels: 1
[21:08:48][C][i2s_audio:142]:   use_apll: no, use_pdm: no
[21:08:48][C][restart.button:017]: Restart Button 'test-01 REBOOT'
[21:08:48][C][mdns:115]: mDNS:
[21:08:48][C][mdns:116]:   Hostname: test-01
[21:08:48][C][ota:096]: Over-The-Air Updates:
[21:08:48][C][ota:097]:   Address: test-01.local:3232
[21:08:48][C][ota:100]:   Using Password.
[21:08:48][C][ota:103]:   OTA version: 2.
[21:08:48][C][api:139]: API Server:
[21:08:48][C][api:140]:   Address: test-01.local:6053
[21:08:48][C][api:142]:   Using noise encryption: YES
[21:08:48][C][micro_wake_word:057]: microWakeWord:
[21:08:48][C][micro_wake_word:058]:   Wake Word: okay nabu
[21:08:48][C][micro_wake_word:059]:   Probability cutoff: 0.500
[21:08:48][C][micro_wake_word:060]:   Sliding window size: 10
[21:08:48][C][esp_adf_pipeline.microphone:020]: ADF-Microphone
[21:08:48][C][adf_media_player:016]: ESP-ADF-MediaPlayer:
[21:08:48][C][adf_media_player:018]:   Number of ASPComponents: 3
[21:08:51][D][api:102]: Accepted 192.168.207.101
[21:08:51][D][api.connection:1321]: Home Assistant 2024.5.5 (192.168.207.101): Connected successfully
[21:08:51][D][micro_wake_word:177]: State changed from IDLE to START_MICROPHONE
[21:08:51][D][light:036]: 'test-01 Front LED' Setting:
[21:08:51][D][light:051]:   Brightness: 25%
[21:08:51][D][light:059]:   Red: 0%, Green: 0%, Blue: 100%
[21:08:51][D][micro_wake_word:115]: Starting Microphone
[21:08:51][D][esp_adf_pipeline:050]: Starting request, current state UNINITIALIZED
[21:08:51][D][esp-idf:000]: I (10976) I2S: DMA Malloc info, datalen=blocksize=1024, dma_buf_count=4

[21:08:52][D][i2s_audio:072]: Installing driver : yes
[21:08:52][D][esp_adf_pipeline:358]: pipeline tag 0, i2s_in
[21:08:52][D][esp_adf_pipeline:358]: pipeline tag 1, pcm_reader
[21:08:52][D][esp-idf:000]: I (10988) AUDIO_PIPELINE: link el->rb, el:0x3d8178b0, tag:i2s_in, rb:0x3d817b8c

[21:08:52][D][esp_adf_pipeline:370]: Setting up event listener.
[21:08:52][D][esp_adf_pipeline:302]: State changed from UNINITIALIZED to PREPARING
[21:08:52][D][micro_wake_word:177]: State changed from START_MICROPHONE to STARTING_MICROPHONE
[21:08:52][D][esp_audio_sinks:053]: Set bitdepth to 32
[21:08:52][D][esp_adf_pipeline:302]: State changed from PREPARING to STARTING
[21:08:52][D][esp-idf:000]: I (11005) AUDIO_ELEMENT: [i2s_in-0x3d8178b0] Element task created

[21:08:52][D][esp-idf:000]: I (11007) AUDIO_ELEMENT: [pcm_reader-0x3d817a44] Element task created

[21:08:52][D][esp-idf:000]: I (11009) AUDIO_PIPELINE: Func:audio_pipeline_run, Line:359, MEM Total:8451851 Bytes, Inter:163104 Bytes, Dram:163104 Bytes


[21:08:52][D][esp-idf:000][i2s_in]: I (11012) AUDIO_ELEMENT: [i2s_in] AEL_MSG_CMD_RESUME,state:1

[21:08:52][D][esp-idf:000]: I (11015) AUDIO_PIPELINE: Pipeline started

[21:08:52][I][esp_adf_pipeline:214]: [ pcm_reader ] status: 12
[21:08:52][D][esp_adf_pipeline:131]: Check element [i2s_in] status, 3
[21:08:52][D][esp_adf_pipeline:131]: Check element [pcm_reader] status, 3
[21:08:52][D][esp_adf_pipeline:302]: State changed from STARTING to RUNNING
[21:08:52][D][micro_wake_word:177]: State changed from STARTING_MICROPHONE to DETECTING_WAKE_WORD
[21:08:52][I][esp_adf_pipeline:214]: [ pcm_reader ] status: 12
[21:08:52][I][esp_adf_pipeline:214]: [ i2s_in ] status: 12
[21:09:14][D][micro_wake_word:362]: Wake word sliding average probability is 0.522 and most recent probability is 1.000
[21:09:14][D][micro_wake_word:128]: Wake Word Detected
[21:09:14][D][micro_wake_word:177]: State changed from DETECTING_WAKE_WORD to STOP_MICROPHONE
[21:09:14][D][micro_wake_word:134]: Stopping Microphone
[21:09:14][D][esp_adf_pipeline:302]: State changed from RUNNING to STOPPING
[21:09:14][D][micro_wake_word:177]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE
[21:09:14][D][esp-idf:000][i2s_in]: W (33041) AUDIO_ELEMENT: OUT-[i2s_in] AEL_IO_ABORT

[21:09:14][D][esp-idf:000][i2s_in]: W (33044) AUDIO_ELEMENT: OUT-[i2s_in] AEL_IO_ABORT

[21:09:14][D][esp-idf:000][i2s_in]: W (33047) AUDIO_ELEMENT: OUT-[i2s_in] AEL_IO_ABORT

[21:09:14][D][esp-idf:000][i2s_in]: W (33050) AUDIO_ELEMENT: OUT-[i2s_in] AEL_IO_ABORT

[21:09:14][D][esp_adf_pipeline:302]: State changed from STOPPING to STOPPED
[21:09:14][D][micro_wake_word:177]: State changed from STOPPING_MICROPHONE to IDLE
[21:09:14][D][media_player:061]: 'media_player' - Setting
[21:09:14][D][media_player:065]:   Command: STOP
[21:09:14][D][esp_adf_pipeline:085]: Called 'stop' while in UNINITIALIZED state.
[21:09:14][D][light:036]: 'test-01 Front LED' Setting:
[21:09:14][D][light:051]:   Brightness: 75%
[21:09:14][D][light:059]:   Red: 0%, Green: 100%, Blue: 0%
[21:09:14][D][light:109]:   Effect: 'Pulse'
[21:09:14][D][voice_assistant:502]: State changed from IDLE to START_MICROPHONE
[21:09:14][D][voice_assistant:508]: Desired state set to START_PIPELINE
[21:09:14][D][voice_assistant:220]: Starting Microphone
[21:09:14][D][ring_buffer:024]: Created ring buffer with size 32768
[21:09:14][D][esp_adf_pipeline:050]: Starting request, current state STOPPED
[21:09:14][D][esp_adf_pipeline:302]: State changed from STOPPED to PREPARING
[21:09:14][D][voice_assistant:502]: State changed from START_MICROPHONE to STARTING_MICROPHONE
[21:09:14][D][esp_adf_pipeline:302]: State changed from PREPARING to STARTING
[21:09:14][D][esp-idf:000]: I (33111) AUDIO_PIPELINE: Func:audio_pipeline_run, Line:359, MEM Total:8422123 Bytes, Inter:168276 Bytes, Dram:168276 Bytes


[21:09:14][D][esp-idf:000][i2s_in]: I (33115) AUDIO_ELEMENT: [i2s_in] AEL_MSG_CMD_RESUME,state:1

[21:09:14][D][esp-idf:000]: I (33117) AUDIO_PIPELINE: Pipeline started

[21:09:14][I][esp_adf_pipeline:214]: [ pcm_reader ] status: 14
[21:09:14][I][esp_adf_pipeline:214]: [ i2s_in ] status: 14
[21:09:14][I][esp_adf_pipeline:214]: [ i2s_in ] status: 12
[21:09:14][D][esp_adf_pipeline:131]: Check element [i2s_in] status, 3
[21:09:14][D][esp_adf_pipeline:131]: Check element [pcm_reader] status, 3
[21:09:14][D][esp_adf_pipeline:302]: State changed from STARTING to RUNNING
[21:09:14][D][voice_assistant:502]: State changed from STARTING_MICROPHONE to START_PIPELINE
[21:09:14][I][esp_adf_pipeline:214]: [ pcm_reader ] status: 12
[21:09:14][D][voice_assistant:274]: Requesting start...
[21:09:14][D][voice_assistant:502]: State changed from START_PIPELINE to STARTING_PIPELINE
[21:09:14][D][voice_assistant:523]: Client started, streaming microphone
[21:09:14][D][voice_assistant:502]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE
[21:09:14][D][voice_assistant:508]: Desired state set to STREAMING_MICROPHONE
[21:09:14][D][voice_assistant:625]: Event Type: 1
[21:09:14][D][voice_assistant:628]: Assist Pipeline running
[21:09:14][D][voice_assistant:625]: Event Type: 3
[21:09:14][D][voice_assistant:639]: STT started
[21:09:14][D][light:036]: 'test-01 Front LED' Setting:
[21:09:14][D][light:051]:   Brightness: 25%
[21:09:14][D][light:059]:   Red: 0%, Green: 0%, Blue: 100%
[21:09:14][D][light:109]:   Effect: 'Wakeword'
[21:09:14][D][voice_assistant:625]: Event Type: 11
[21:09:14][D][voice_assistant:779]: Starting STT by VAD
[21:09:16][D][voice_assistant:625]: Event Type: 12
[21:09:16][D][voice_assistant:783]: STT by VAD end
[21:09:16][D][voice_assistant:502]: State changed from STREAMING_MICROPHONE to STOP_MICROPHONE
[21:09:16][D][voice_assistant:508]: Desired state set to AWAITING_RESPONSE
[21:09:16][D][esp_adf_pipeline:302]: State changed from RUNNING to STOPPING
[21:09:16][D][voice_assistant:502]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE
[21:09:16][D][esp-idf:000][i2s_in]: W (35138) AUDIO_ELEMENT: OUT-[i2s_in] AEL_IO_ABORT

[21:09:16][D][esp-idf:000][i2s_in]: W (35141) AUDIO_ELEMENT: OUT-[i2s_in] AEL_IO_ABORT

[21:09:16][D][esp-idf:000][i2s_in]: W (35146) AUDIO_ELEMENT: OUT-[i2s_in] AEL_IO_ABORT

[21:09:16][D][esp-idf:000][i2s_in]: W (35150) AUDIO_ELEMENT: OUT-[i2s_in] AEL_IO_ABORT

[21:09:16][D][esp_adf_pipeline:302]: State changed from STOPPING to STOPPED
[21:09:16][D][voice_assistant:502]: State changed from STOPPING_MICROPHONE to AWAITING_RESPONSE
[21:09:16][D][voice_assistant:625]: Event Type: 4
[21:09:16][D][voice_assistant:653]: Speech recognised as: "Zet licht in het kantoor uit?"
[21:09:16][D][voice_assistant:625]: Event Type: 5
[21:09:16][D][voice_assistant:658]: Intent started
[21:09:19][D][voice_assistant:625]: Event Type: 6
[21:09:19][D][voice_assistant:625]: Event Type: 7
[21:09:19][D][voice_assistant:681]: Response: "Het licht in het kantoor is uitgeschakeld."
[21:09:19][D][light:036]: 'test-01 Front LED' Setting:
[21:09:19][D][light:051]:   Brightness: 75%
[21:09:19][D][light:059]:   Red: 0%, Green: 100%, Blue: 0%
[21:09:19][D][light:109]:   Effect: 'Pulse'
[21:09:19][D][voice_assistant:625]: Event Type: 8
[21:09:19][D][voice_assistant:701]: Response URL: "http://192.168.207.101:8123/api/tts_proxy/80ccb868b538746ae77149736cbb3987d8494075_nl-nl_f57ed9f7cb_tts.home_assistant_cloud.mp3"
[21:09:19][D][voice_assistant:502]: State changed from AWAITING_RESPONSE to STREAMING_RESPONSE
[21:09:19][D][voice_assistant:508]: Desired state set to STREAMING_RESPONSE
[21:09:19][D][media_player:061]: 'media_player' - Setting
[21:09:19][D][media_player:068]:   Media URL: http://192.168.207.101:8123/api/tts_proxy/80ccb868b538746ae77149736cbb3987d8494075_nl-nl_f57ed9f7cb_tts.home_assistant_cloud.mp3
[21:09:19][D][media_player:074]:  Announcement: yes
[21:09:19][D][adf_media_player:030]: Got control call in state 1
[21:09:19][D][esp_adf_pipeline:050]: Starting request, current state UNINITIALIZED
[21:09:19][D][esp-idf:000]: I (38335) MP3_DECODER: MP3 init

[21:09:19][D][esp-idf:000]: I (38339) I2S: DMA Malloc info, datalen=blocksize=2048, dma_buf_count=4

[21:09:19][D][i2s_audio:072]: Installing driver : yes
[21:09:19][D][esp_adf_pipeline:358]: pipeline tag 0, http
[21:09:19][D][esp_adf_pipeline:358]: pipeline tag 1, decoder
[21:09:19][D][esp_adf_pipeline:358]: pipeline tag 2, resampler
[21:09:19][D][esp_adf_pipeline:358]: pipeline tag 3, i2s_out
[21:09:19][D][esp-idf:000]: I (38356) AUDIO_PIPELINE: link el->rb, el:0x3d820ed8, tag:http, rb:0x3d8215a8

[21:09:19][D][esp-idf:000]: I (38358) AUDIO_PIPELINE: link el->rb, el:0x3d8210dc, tag:decoder, rb:0x3d8225e8

[21:09:19][D][esp-idf:000]: I (38361) AUDIO_PIPELINE: link el->rb, el:0x3d821278, tag:resampler, rb:0x3d823628

[21:09:19][D][esp_adf_pipeline:370]: Setting up event listener.
[21:09:19][D][esp_adf_pipeline:302]: State changed from UNINITIALIZED to PREPARING
[21:09:19][I][adf_media_player:135]: got new pipeline state: 1
[21:09:19][D][adf_i2s_out:127]: Set final i2s settings: 16000
[21:09:19][D][esp_audio_processors:079]: New settings: SRC: rate: 16000, ch: 2 DST: rate: 16000, ch: 2 
[21:09:19][W][component:237]: Component voice_assistant took a long time for an operation (55 ms).
[21:09:19][W][component:238]: Components should block for at most 30 ms.
[21:09:19][D][voice_assistant:625]: Event Type: 2
[21:09:19][D][voice_assistant:715]: Assist Pipeline ended
[21:09:19][D][esp-idf:000]: I (38392) AUDIO_THREAD: The http task allocate stack on external memory

[21:09:19][D][esp-idf:000]: I (38394) AUDIO_ELEMENT: [http-0x3d820ed8] Element task created

[21:09:19][D][esp-idf:000]: I (38398) AUDIO_THREAD: The decoder task allocate stack on external memory

[21:09:19][D][esp-idf:000]: I (38401) AUDIO_ELEMENT: [decoder-0x3d8210dc] Element task created

[21:09:19][D][esp-idf:000][http]: I (38404) AUDIO_ELEMENT: [http] AEL_MSG_CMD_RESUME,state:1

[21:09:19][D][esp-idf:000][decoder]: I (38407) AUDIO_ELEMENT: [decoder] AEL_MSG_CMD_RESUME,state:1

[21:09:19][D][esp_aud:000]d]: 
ERROR Fatal error: protocol.data_received() call failed.
protocol: <aioesphomeapi._frame_helper.noise.APINoiseFrameHelper object at 0x7fa4dc080670>
transport: <_SelectorSocketTransport fd=6 read=polling write=<idle, bufsize=0>>
Traceback (most recent call last):
  File "/usr/lib/python3.11/asyncio/selector_events.py", line 1009, in _read_ready__data_received
    self._protocol.data_received(data)
  File "aioesphomeapi/_frame_helper/noise.py", line 136, in aioesphomeapi._frame_helper.noise.APINoiseFrameHelper.data_received
  File "aioesphomeapi/_frame_helper/noise.py", line 163, in aioesphomeapi._frame_helper.noise.APINoiseFrameHelper.data_received
  File "aioesphomeapi/_frame_helper/noise.py", line 319, in aioesphomeapi._frame_helper.noise.APINoiseFrameHelper._handle_frame
  File "/usr/local/lib/python3.11/dist-packages/noise/state.py", line 74, in decrypt_with_ad
    plaintext = self.cipher.decrypt(self.k, self.n, ad, ciphertext)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/noise/backends/default/ciphers.py", line 13, in decrypt
    return self.cipher.decrypt(nonce=self.format_nonce(n), data=ciphertext, associated_data=ad)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "src/chacha20poly1305_reuseable/__init__.py", line 127, in chacha20poly1305_reuseable.ChaCha20Poly1305Reusable.decrypt
  File "src/chacha20poly1305_reuseable/__init__.py", line 147, in chacha20poly1305_reuseable.ChaCha20Poly1305Reusable.decrypt
  File "src/chacha20poly1305_reuseable/__init__.py", line 263, in chacha20poly1305_reuseable._decrypt_with_fixed_nonce_len
  File "src/chacha20poly1305_reuseable/__init__.py", line 273, in chacha20poly1305_reuseable._decrypt_data
cryptography.exceptions.InvalidTag
WARNING test-01 @ 192.168.207.246: Connection error occurred: test-01 @ 192.168.207.246: Invalid encryption key: received_name=test-01
INFO Processing unexpected disconnect from ESPHome API for test-01 @ 192.168.207.246
WARNING Disconnected from API
INFO Successfully connected to test-01 @ 192.168.207.246 in 0.004s
INFO Successful handshake with test-01 @ 192.168.207.246 in 0.105s
[21:09:19][I][esp_adf_pipeline:214]: [ i2s_out ] status: 12
[21:09:19][D][esp_adf_pipeline:131]: Check element [http] status, 2
[21:09:19][I][esp_adf_pipeline:214]: [ resampler ] status: 12
[21:09:19][D][esp_adf_pipeline:131]: Check element [http] status, 2
[21:09:19][D][esp-idf:000][http]: I (38612) HTTP_CLIENT: Body received in fetch header state, 0x3fcc511b, 1841

[21:09:19][D][esp-idf:000][http]: I (38617) HTTP_STREAM: total_bytes=23199

[21:09:19][I][esp_adf_pipeline:214]: [ http ] status: 12
[21:09:19][D][esp_adf_pipeline:131]: Check element [http] status, 3
[21:09:19][D][esp_adf_pipeline:131]: Check element [decoder] status, 2
[21:09:19][I][esp_adf_pipeline:214]: [ decoder ] status: 12
[21:09:19][D][esp_adf_pipeline:131]: Check element [http] status, 3
[21:09:19][D][esp_adf_pipeline:131]: Check element [decoder] status, 3
[21:09:19][D][esp_adf_pipeline:131]: Check element [resampler] status, 3
[21:09:19][D][esp_adf_pipeline:131]: Check element [i2s_out] status, 3
[21:09:19][D][esp_adf_pipeline:302]: State changed from STARTING to RUNNING
[21:09:19][I][adf_media_player:135]: got new pipeline state: 3
[21:09:19][D][adf_i2s_out:127]: Set final i2s settings: 24000
[21:09:19][D][esp_audio_processors:079]: New settings: SRC: rate: 24000, ch: 1 DST: rate: 24000, ch: 1 
[21:09:19][I][HTTPStreamReader:129]: [ * ] Receive music info from mp3 decoder, sample_rates=24000, bits=16, ch=1
[21:09:19][D][adf_i2s_out:127]: Set final i2s settings: 24000
[21:09:19][D][esp_audio_processors:079]: New settings: SRC: rate: 24000, ch: 1 DST: rate: 24000, ch: 1 
[21:09:22][D][esp-idf:000][http]: W (41118) HTTP_STREAM: No more data,errno:0, total_bytes:23199, rlen = 0

[21:09:22][D][esp-idf:000][http]: I (41123) AUDIO_ELEMENT: IN-[http] AEL_IO_DONE,0

[21:09:22][I][esp_adf_pipeline:214]: [ http ] status: 15
[21:09:22][D][esp_adf_pipeline:302]: State changed from RUNNING to STOPPING
[21:09:22][I][adf_media_player:135]: got new pipeline state: 4
[21:09:22][D][esp-idf:000][decoder]: I (41799) AUDIO_ELEMENT: IN-[decoder] AEL_IO_DONE,-2

[21:09:23][D][esp-idf:000][decoder]: I (42161) MP3_DECODER: Closed

[21:09:23][D][esp-idf:000][resampler]: I (42267) AUDIO_ELEMENT: IN-[resampler] AEL_IO_DONE,-2

[21:09:23][D][esp-idf:000][i2s_out]: I (42330) AUDIO_ELEMENT: IN-[i2s_out] AEL_IO_DONE,-2

[21:09:23][D][esp_adf_pipeline:302]: State changed from STOPPING to STOPPED
[21:09:23][I][adf_media_player:135]: got new pipeline state: 5
[21:09:23][D][light:036]: 'test-01 Front LED' Setting:
[21:09:23][D][light:047]:   State: ON
[21:09:23][D][light:051]:   Brightness: 25%
[21:09:23][D][light:059]:   Red: 0%, Green: 0%, Blue: 100%
[21:09:23][D][micro_wake_word:177]: State changed from IDLE to START_MICROPHONE
[21:09:23][D][micro_wake_word:115]: Starting Microphone
[21:09:23][D][esp_adf_pipeline:050]: Starting request, current state STOPPED
[21:09:23][D][esp_adf_pipeline:302]: State changed from STOPPED to PREPARING
[21:09:23][D][micro_wake_word:177]: State changed from START_MICROPHONE to STARTING_MICROPHONE
[21:09:23][D][esp_adf_pipeline:302]: State changed from PREPARING to STARTING
[21:09:23][D][esp-idf:000]: I (42537) AUDIO_PIPELINE: Func:audio_pipeline_run, Line:359, MEM Total:8376627 Bytes, Inter:152808 Bytes, Dram:152808 Bytes


[21:09:23][D][esp-idf:000][i2s_in]: I (42539) AUDIO_ELEMENT: [i2s_in] AEL_MSG_CMD_RESUME,state:1

[21:09:23][D][esp-idf:000]: I (42543) AUDIO_PIPELINE: Pipeline started

[21:09:23][I][esp_adf_pipeline:214]: [ pcm_reader ] status: 14
[21:09:23][I][esp_adf_pipeline:214]: [ i2s_in ] status: 14
[21:09:23][I][esp_adf_pipeline:214]: [ i2s_in ] status: 12
[21:09:23][D][esp_adf_pipeline:131]: Check element [i2s_in] status, 3
[21:09:23][D][esp_adf_pipeline:131]: Check element [pcm_reader] status, 3
[21:09:23][D][esp_adf_pipeline:302]: State changed from STARTING to RUNNING
[21:09:23][D][micro_wake_word:177]: State changed from STARTING_MICROPHONE to DETECTING_WAKE_WORD
[21:09:23][I][esp_adf_pipeline:214]: [ pcm_reader ] status: 12
[21:09:25][D][micro_wake_word:362]: Wake word sliding average probability is 0.585 and most recent probability is 1.000
[21:09:25][D][micro_wake_word:128]: Wake Word Detected
[21:09:25][D][micro_wake_word:177]: State changed from DETECTING_WAKE_WORD to STOP_MICROPHONE
[21:09:25][D][micro_wake_word:134]: Stopping Microphone
[21:09:25][D][esp_adf_pipeline:302]: State changed from RUNNING to STOPPING
[21:09:25][D][micro_wake_word:177]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE
[21:09:25][D][esp-idf:000][i2s_in]: W (44496) AUDIO_ELEMENT: OUT-[i2s_in] AEL_IO_ABORT

[21:09:25][D][esp-idf:000][i2s_in]: W (44499) AUDIO_ELEMENT: OUT-[i2s_in] AEL_IO_ABORT

[21:09:25][D][esp-idf:000][i2s_in]: W (44503) AUDIO_ELEMENT: OUT-[i2s_in] AEL_IO_ABORT

[21:09:25][D][esp-idf:000][i2s_in]: W (44506) AUDIO_ELEMENT: OUT-[i2s_in] AEL_IO_ABORT

[21:09:25][D][esp_adf_pipeline:302]: State changed from STOPPING to STOPPED
[21:09:25][D][micro_wake_word:177]: State changed from STOPPING_MICROPHONE to IDLE
[21:09:25][D][media_player:061]: 'media_player' - Setting
[21:09:25][D][media_player:065]:   Command: STOP
[21:09:25][D][esp_adf_pipeline:085]: Called 'stop' while in STOPPED state.
[21:09:25][D][light:036]: 'test-01 Front LED' Setting:
[21:09:25][D][light:051]:   Brightness: 75%
[21:09:25][D][light:059]:   Red: 0%, Green: 100%, Blue: 0%
[21:09:25][D][light:109]:   Effect: 'Pulse'

Does anyone know what's going on here?

try this:

external_components:
  - source:
      type: git
      url: https://github.com/gnumpi/esphome_audio
      ref: dev-next
    
    components: [ adf_pipeline, i2s_audio ]
    refresh: 0s

@HA-TB303
Copy link

That seems to have solved the issue. Thank you!

@AapoTahkola
Copy link

Regarding high pitch sounds during playback clicks are typically caused by signal integrity issues since these are relatively high speed signals. Especially if the distortion stops if you touch the high speed lines with your hand, move things. You should be able to so solve such issues by using better quality dupont lines with higher load spring mechanisms, soldering the dupont lines without the plastic cases, twisting the high speed lines together with gnd or even adding some grounded tinfoil around the signal wires. Another thing I noticed the INMP441 can pick up alot of noise if the PCB touches a surface that is vibrating. A good solution probably is to use some polyester fiber like in pillows to make sure the MEMS microphone does touch any hard surfaces in casing that you have.

Longer pauses like over 300ms are most likely caused by buffering issues, which is another story altogether.

@Wetzel402
Copy link

I'm trying to build this project with a wemos d1 mini, max98357, ICS43434, and adafruit 1314 speaker. I'm having trouble getting the project to build when I have gotten it to build I get constant static/crackle on the speaker. Any help or advice would be greatly appreciated. Thanks!

@EverythingSmartHome
Copy link
Author

I resolved this (thanks to another community member) by pulling the speaker pin low on boot using a simple output, haven't heard the crackle since.

@Wetzel402
Copy link

Wetzel402 commented Nov 20, 2024

Can I ask how? I find this helps but doesn't eliminate it...

esphome:
  name: ${device_name}
  friendly_name: ${friendly_name}
  on_boot:
    - priority: -100
      then:
        - output.turn_off: speaker_pin #suppress crackling during boot

speaker:
  - platform: i2s_audio
    id: spk
    dac_type: external
    i2s_dout_pin:
      number: ${din}
      allow_other_uses: true
    i2s_audio_id: i2s
    mode: mono

output:
  - platform: gpio
    pin: 
      number: ${din}
      allow_other_uses: true
    id: speaker_pin

Thanks!

Edit: I find I get crackling and static until the device gets to DETECTING_WAKE_WORD state.

Edit2: I actually found separating out lrclk and bclk resolved it for me.

i2s_audio:
  - id: i2s_in
    i2s_lrclk_pin: ${lrclk_mic}
    i2s_bclk_pin: ${bclk_mic}
  - id: i2s_out
    i2s_lrclk_pin: ${lrclk_amp}
    i2s_bclk_pin: ${bclk_amp}

@rkhanso
Copy link

rkhanso commented Nov 27, 2024

I'm getting an error when installing the .yaml onto the ESP32 I have. I've pared it down to just the minimums to try and isolate the issue.
When I do a basic, bare-bones install of the ESP-WROOM-32 into Home Assistant using ESPHome, it works fine. I can see the wifi connecting on boot. But with the bare-bones .yaml, that's all it does.
Then I started adding some of the .yaml above in. I first tried changing the framework from Arduino to esp-idf by adding only this section:
esp32:
board: esp32dev
framework:
type: esp-idf
version: recommended
and trying to install. But I get this error that my sub-novice brain doesn't know how to troubleshoot:
INFO ESPHome 2024.11.1
INFO Reading configuration /config/esphome/voice-assistant.yaml...
INFO Generating C++ source...
INFO Compiling app...
Processing voice-assistant (board: esp32dev; framework: espidf; platform: platformio/espressif32@5.4.0)

HARDWARE: ESP32 240MHz, 320KB RAM, 4MB Flash

  • framework-espidf @ 3.40408.0 (4.4.8)
  • tool-cmake @ 3.16.9
  • tool-ninja @ 1.10.2
  • toolchain-esp32ulp @ 2.35.0-20220830
  • toolchain-xtensa-esp32 @ 8.4.0+2021r2-patch5
    Reading CMake configuration...
    -- Component directory /data/cache/platformio/packages/framework-espidf/components/expat does not contain a CMakeLists.txt file. No component will be added
    -- Building ESP-IDF components for target esp32
    -- Configuring incomplete, errors occurred!
    See also "/data/build/voice-assistant/.pioenvs/voice-assistant/CMakeFiles/CMakeOutput.log".

fatal: not a git repository (or any parent up to mount point /)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
CMake Error at /data/cache/platformio/packages/framework-espidf/tools/cmake/build.cmake:201 (message):
Failed to resolve component 'freertos'.
Call Stack (most recent call first):
/data/cache/platformio/packages/framework-espidf/tools/cmake/build.cmake:236 (__build_resolve_and_add_req)
/data/cache/platformio/packages/framework-espidf/tools/cmake/build.cmake:237 (__build_expand_requirements)
/data/cache/platformio/packages/framework-espidf/tools/cmake/build.cmake:518 (__build_expand_requirements)
/data/cache/platformio/packages/framework-espidf/tools/cmake/project.cmake:476 (idf_build_process)
CMakeLists.txt:3 (project)
========================= [FAILED] Took 12.33 seconds =========================

voice-assistant

voice-assistant-error

@pkkrusty
Copy link

pkkrusty commented Jan 1, 2025

I'm having success with this setup, but noticing that the INMP441/ESP32S3 doesn't seem to be multiplying the volume of the mic even when "volume_multiplier: 15.0" is uncommented. I've tried 4x, all the way to 128x and the audio files that are passed to HA sound the same (that is, very low). This affects HA's ability to correctly do STT. I find that if I manually multiply the recorded sample, it is not at all distorted and relatively high quality, so there seems to be plenty of headroom to increase the mic volume, I just can't make the setup actually do it.

Any ideas? This is really the final sticking point for me being able to use these regularly in the house.

@kylevidrine
Copy link

I have been trying to get a voice assistant setup on an esp32 for months now with the ability to also have the media player. Is this possible? I also would like to use the same pin for both speaker and media player

@Wetzel402
Copy link

I have been trying to get a voice assistant setup on an esp32 for months now with the ability to also have the media player. Is this possible? I also would like to use the same pin for both speaker and media player

It is possible with the nabu media player...

substitutions:
  # Phases of the Voice Assistant
  # The voice assistant is ready to be triggered by a wake word
  voice_assist_idle_phase_id: '1'
  # The voice assistant is waiting for a voice command (after being triggered by the wake word)
  voice_assist_waiting_for_command_phase_id: '2'
  # The voice assistant is listening for a voice command
  voice_assist_listening_for_command_phase_id: '3'
  # The voice assistant is currently processing the command
  voice_assist_thinking_phase_id: '4'
  # The voice assistant is replying to the command
  voice_assist_replying_phase_id: '5'
  # The voice assistant is not ready
  voice_assist_not_ready_phase_id: '10'
  # The voice assistant encountered an error
  voice_assist_error_phase_id: '11'

  device_name: "assist-media-player-kitchen"
  friendly_name: "assist-media-player-kitchen"
  device_description: "assist-media-player-kitchen"
  din: "GPIO11"   # MAX98357A - DIN
  lrclk_mic: "GPIO15" # MAX98357A & ICS43434 - LRC/LRCL
  lrclk_amp: "GPIO9" # MAX98357A & ICS43434 - LRC/LRCL
  bclk_mic: "GPIO6"  # MAX98357A & ICS43434 - BCLK
  bclk_amp: "GPIO10"  # MAX98357A & ICS43434 - BCLK
  dout: "GPIO7"  # ICS43434 - DOUT
  # ICS43434 - SEL low = left, high = right
  duck: "30" # db reduction for audio ducking
  vol: "90%" # volume level at boot
  pwr: "GPIO37" # charger output
  tablet_battery: "sensor.kitchen_tablet_battery_level"
  
external_components:
  - source:
      type: git
      url: https://github.com/esphome/voice-kit
      ref: dev
    components:
      - media_player
      - micro_wake_word
      - microphone
      - nabu
      - voice_assistant
    refresh: 0s
  - source:
      type: git
      url: https://github.com/formatBCE/home-assistant-voice-pe
      ref: 48kHz_mic_support
    components:
      - nabu_microphone
    refresh: 0s

esphome:
  name: ${device_name}
  friendly_name: ${friendly_name}
  platformio_options:
    board_build.flash_mode: dio
  on_boot:
    priority: 375
    then:
      - media_player.volume_set: ${vol}
      - delay: 10min
      - if:
          condition:
            lambda: return id(init_in_progress);
          then:
            - lambda: id(init_in_progress) = false;

esp32:
  board: esp32-s3-devkitc-1
  variant: esp32s3
  flash_size: 8MB
  framework:
    type: esp-idf
    version: recommended
    sdkconfig_options:
      CONFIG_ESP32S3_DEFAULT_CPU_FREQ_240: "y"
      CONFIG_ESP32S3_DATA_CACHE_64KB: "y"
      CONFIG_ESP32S3_DATA_CACHE_LINE_64B: "y"
      CONFIG_ESP32S3_INSTRUCTION_CACHE_32KB: "y"
      CONFIG_ESP32_S3_BOX_BOARD: "y"
      CONFIG_SPIRAM_ALLOW_STACK_EXTERNAL_MEMORY: "y"

      CONFIG_SPIRAM_TRY_ALLOCATE_WIFI_LWIP: "y"

      # Settings based on https://github.com/espressif/esp-adf/issues/297#issuecomment-783811702
      CONFIG_ESP32_WIFI_STATIC_RX_BUFFER_NUM: "16"
      CONFIG_ESP32_WIFI_DYNAMIC_RX_BUFFER_NUM: "512"
      CONFIG_ESP32_WIFI_STATIC_TX_BUFFER: "y"
      CONFIG_ESP32_WIFI_TX_BUFFER_TYPE: "0"
      CONFIG_ESP32_WIFI_STATIC_TX_BUFFER_NUM: "8"
      CONFIG_ESP32_WIFI_CACHE_TX_BUFFER_NUM: "32"
      CONFIG_ESP32_WIFI_AMPDU_TX_ENABLED: "y"
      CONFIG_ESP32_WIFI_TX_BA_WIN: "16"
      CONFIG_ESP32_WIFI_AMPDU_RX_ENABLED: "y"
      CONFIG_ESP32_WIFI_RX_BA_WIN: "32"
      CONFIG_LWIP_MAX_ACTIVE_TCP: "16"
      CONFIG_LWIP_MAX_LISTENING_TCP: "16"
      CONFIG_TCP_MAXRTX: "12"
      CONFIG_TCP_SYNMAXRTX: "6"
      CONFIG_TCP_MSS: "1436"
      CONFIG_TCP_MSL: "60000"
      CONFIG_TCP_SND_BUF_DEFAULT: "65535"
      CONFIG_TCP_WND_DEFAULT: "65535"  # Adjusted from linked settings to avoid compilation error
      CONFIG_TCP_RECVMBOX_SIZE: "512"
      CONFIG_TCP_QUEUE_OOSEQ: "y"
      CONFIG_TCP_OVERSIZE_MSS: "y"
      CONFIG_LWIP_WND_SCALE: "y"
      CONFIG_TCP_RCV_SCALE: "3"
      CONFIG_LWIP_TCPIP_RECVMBOX_SIZE: "512"

      CONFIG_BT_ALLOCATION_FROM_SPIRAM_FIRST: "y"
      CONFIG_BT_BLE_DYNAMIC_ENV_MEMORY: "y"
  
psram:
  mode: quad # quad for N8R2 and octal for N16R8
  speed: 80MHz

globals:
  # Global initialization variable. Initialized to true and set to false once everything is connected. Only used to have a smooth "plugging" experience
  - id: init_in_progress
    type: bool
    restore_value: no
    initial_value: 'true'
  # Global variable tracking the phase of the voice assistant (defined above). Initialized to not_ready
  - id: voice_assistant_phase
    type: int
    restore_value: no
    initial_value: ${voice_assist_not_ready_phase_id}
  # Global variable storing the first active timer
  - id: first_active_timer
    type: voice_assistant::Timer
    restore_value: false
  # Global variable storing the timer finished TTS
  - id: timer_tts_str
    type: std::string
    restore_value: false

logger:
    
captive_portal:

web_server:

api:
  encryption:
    key: !secret assist_kitchen_api

ota:
  - platform: esphome
    password: "36e011a812d79f51cb886ff8d171eda2"

wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password

  ap:
    ssid: "Esp32-Mic-Speaker"
    password: "9vYvAFzzPjuc"

i2s_audio:
  - id: i2s_in
    i2s_lrclk_pin: ${lrclk_mic}
    i2s_bclk_pin: ${bclk_mic}
  - id: i2s_out
    i2s_lrclk_pin: ${lrclk_amp}
    i2s_bclk_pin: ${bclk_amp}

microphone:
 - platform: nabu_microphone
   i2s_din_pin: ${dout}
   adc_type: external
   pdm: false
   sample_rate: 48000
   bits_per_sample: 32bit
   i2s_audio_id: i2s_in
   channel_0:
     id: mic0
   channel_1:
     id: mic1

speaker:
  - platform: i2s_audio
    id: spk
    sample_rate: 48000
    i2s_dout_pin: ${din}
    bits_per_sample: 32bit
    i2s_audio_id: i2s_out
    dac_type: external
    channel: mono
    timeout: never
    buffer_duration: 100ms

media_player:
  - platform: nabu
    id: nabu_media_player
    name: Media Player
    internal: false
    speaker:
    sample_rate: 48000
    volume_increment: 0.05
    volume_min: 0.4
    volume_max: 1
    on_announcement:
      - nabu.set_ducking:
          decibel_reduction: ${duck}
          duration: 0.0s
    on_state:
      if:
        condition:
          and:
            - switch.is_off: timer_ringing
            - not:
                voice_assistant.is_running:
            - not:
                lambda: return id(nabu_media_player)->state == media_player::MediaPlayerState::MEDIA_PLAYER_STATE_ANNOUNCING;
        then:
          - nabu.set_ducking:
              decibel_reduction: 0
              duration: 1.0s

    files:
      - id: timer_finished_sound
        file: https://github.com/esphome/home-assistant-voice-pe/raw/dev/sounds/timer_finished.flac
      - id: wake_word_triggered_sound
        file: https://github.com/esphome/home-assistant-voice-pe/raw/dev/sounds/wake_word_triggered.flac
      - id: jack_connected_sound
        file: https://github.com/esphome/home-assistant-voice-pe/raw/dev/sounds/jack_connected.flac
      - id: jack_disconnected_sound
        file: https://github.com/esphome/home-assistant-voice-pe/raw/dev/sounds/jack_disconnected.flac

micro_wake_word:
  id: mww
  microphone: mic0
  models:
    - model: hey_jarvis
      id: hey_jarvis
    - model: https://github.com/kahrendt/microWakeWord/releases/download/stop/stop.json
      id: stop
      internal: true
  vad:
  on_wake_word_detected:
    # If a timer is ringing: Stop it, do not start the voice assistant (We can stop timer from voice!)
    - if:
        condition:
          switch.is_on: timer_ringing
        then:
          - switch.turn_off: timer_ringing
        # Start voice assistant, stop current announcement.
        else:
          - if:
              condition:
                lambda: return id(nabu_media_player)->state == media_player::MediaPlayerState::MEDIA_PLAYER_STATE_ANNOUNCING;
              then:
                lambda: |-
                  id(nabu_media_player)
                    ->make_call()
                    .set_command(media_player::MediaPlayerCommand::MEDIA_PLAYER_COMMAND_STOP)
                    .set_announcement(true)
                    .perform();
              else:
                - script.execute:
                    id: play_sound
                    priority: true
                    sound_file: !lambda return id(wake_word_triggered_sound);
                - delay: 500ms
                - voice_assistant.start:

voice_assistant:
  id: va
  microphone: mic1
  media_player: nabu_media_player
  micro_wake_word: mww
  noise_suppression_level: 1
  auto_gain: 31dBFS
  volume_multiplier: 4.0
  on_client_connected:
    - lambda: id(init_in_progress) = false;
    - script.execute:
        id: play_sound
        priority: true
        sound_file: !lambda return id(jack_connected_sound);
    - micro_wake_word.start:
    - lambda: id(voice_assistant_phase) = ${voice_assist_idle_phase_id};
  on_client_disconnected:
    - voice_assistant.stop:
    - script.execute:
        id: play_sound
        priority: true
        sound_file: !lambda return id(jack_disconnected_sound);
    - lambda: id(voice_assistant_phase) = ${voice_assist_not_ready_phase_id};
  on_error:
    - if:
        condition:
          and:
            - lambda: return !id(init_in_progress);
            - lambda: return code != "duplicate_wake_up_detected";
        then:
          - lambda: id(voice_assistant_phase) = ${voice_assist_error_phase_id};
  # When the voice assistant starts: Play a wake up sound, duck audio.
  on_start:
    - nabu.set_ducking:
        decibel_reduction: ${duck}   # Number of dB quieter; higher implies more quiet, 0 implies full volume
        duration: 0.0s          # The duration of the transition (default is 0)
  on_listening:
    - lambda: id(voice_assistant_phase) = ${voice_assist_waiting_for_command_phase_id};
  on_stt_vad_start:
    - lambda: id(voice_assistant_phase) = ${voice_assist_listening_for_command_phase_id};
  on_stt_vad_end:
    - lambda: id(voice_assistant_phase) = ${voice_assist_thinking_phase_id};
  on_tts_start:
    - lambda: id(voice_assistant_phase) = ${voice_assist_replying_phase_id};
    # Start a script that would potentially enable the stop word if the response is longer than a second
    - script.execute: activate_stop_word_if_tts_step_is_long
  # When the voice assistant ends ...
  on_end:
    - wait_until:
        not:
          voice_assistant.is_running:
    # Stop ducking audio.
    - nabu.set_ducking:
        decibel_reduction: 0   # 0 dB means no reduction
        duration: 1.0s
    # Stop the script that would potentially enable the stop word if the response is longer than a second
    - script.stop: activate_stop_word_if_tts_step_is_long
    # Disable the stop word (If the timer is not ringing)
    - if:
        condition:
          switch.is_off: timer_ringing
        then:
          - lambda: id(stop).disable();
    # If the end happened because of an error, let the error phase on for a second
    - if:
        condition:
          lambda: return id(voice_assistant_phase) == ${voice_assist_error_phase_id};
        then:
          - delay: 1s
    # Reset the voice assistant phase id and reset the LED animations.
    - lambda: id(voice_assistant_phase) = ${voice_assist_idle_phase_id};
  on_timer_finished:
    - switch.turn_on: timer_ringing
  
button:
  - platform: restart
    id: restart_btn
    name: "${friendly_name} REBOOT"

switch:
  # Internal switch to track when a timer is ringing on the device.
  - platform: template
    id: timer_ringing
    optimistic: true
    internal: true
    restore_mode: ALWAYS_OFF
    on_turn_off:
      # Disable stop wake word
      - lambda: id(stop).disable();
      # Stop any current annoucement (ie: stop the timer ring mid playback)
      - if:
          condition:
            lambda: return id(nabu_media_player)->state == media_player::MediaPlayerState::MEDIA_PLAYER_STATE_ANNOUNCING;
          then:
            lambda: |-
              id(nabu_media_player)
                ->make_call()
                .set_command(media_player::MediaPlayerCommand::MEDIA_PLAYER_COMMAND_STOP)
                .set_announcement(true)
                .perform();
      # Set back ducking ratio to zero
      - nabu.set_ducking:
          decibel_reduction: 0
          duration: 1.0s
    on_turn_on:
      # Duck audio
      - nabu.set_ducking:
          decibel_reduction: ${duck}
          duration: 0.0s
      # Enable stop wake word
      - lambda: id(stop).enable();
      # Ring timer
      - script.execute: ring_timer
      # If 15 minutes have passed and the timer is still ringing, stop it.
      - delay: 15min
      - switch.turn_off: timer_ringing
  # switch to charge tablet
  - platform: gpio
    name: ${device_name} charger
    pin: ${pwr}
    id: pwr
    icon: mdi:charging_station      

sensor:
  - platform: homeassistant
    entity_id: ${tablet_battery}
    id: battery
    internal: true
    filters:
      - heartbeat: 10s
    on_value:
      - if:
          condition:
              - lambda: "return id(battery).state < 30;" # '70' is the lower level, change if needed
          then:
            - switch.turn_on: pwr
          else:
            - if:
                condition:
                  - lambda: "return id(battery).state > 80 ;" # '80' is the upper level, change if needed
                then:
                  - switch.turn_off: pwr

script:
  # Script executed when the timer is ringing, to playback sounds.
  - id: ring_timer
    then:
      - script.execute:
          id: timer_tts
      - while:
          condition:
            switch.is_on: timer_ringing
          then: # this is required to play the output on a media player
            - script.execute:
                id: play_sound
                priority: true
                sound_file: !lambda return id(timer_finished_sound);
            - delay: 4s
            - homeassistant.service:
                service: assist_satellite.announce
                data:
                  entity_id: assist_satellite.assist_media_player_kitchen_assist_satellite
                  message: !lambda return id(timer_tts_str);
            - wait_until:
                lambda: |-
                  return id(nabu_media_player)->state == media_player::MediaPlayerState::MEDIA_PLAYER_STATE_ANNOUNCING;
            - wait_until:
                not:
                  lambda: |-
                    return id(nabu_media_player)->state == media_player::MediaPlayerState::MEDIA_PLAYER_STATE_ANNOUNCING;

  # Script executed when we want to play sounds on the device.
  - id: play_sound
    parameters:
      priority: bool
      sound_file: "media_player::MediaFile*"
    then:
      - lambda: |-
          if (priority) {
            id(nabu_media_player)
              ->make_call()
              .set_command(media_player::MediaPlayerCommand::MEDIA_PLAYER_COMMAND_STOP)
              .set_announcement(true)
              .perform();
          }
          if ( (id(nabu_media_player).state != media_player::MediaPlayerState::MEDIA_PLAYER_STATE_ANNOUNCING ) || priority) {
            id(nabu_media_player)
              ->make_call()
              .set_announcement(true)
              .set_local_media_file(sound_file)
              .perform();
          }

  # Script used to fetch the first active timer (Stored in global first_active_timer)
  - id: fetch_first_active_timer
    then:
      - lambda: |
          const auto timers = id(va).get_timers();
          auto output_timer = timers.begin()->second;
          for (auto &iterable_timer : timers) {
            if (iterable_timer.second.is_active && iterable_timer.second.seconds_left <= output_timer.seconds_left) {
              output_timer = iterable_timer.second;
            }
          }
          id(first_active_timer) = output_timer;
    
  # Script used to create TTS string for timer
  - id: timer_tts
    then:
      - script.execute:
          id: fetch_first_active_timer
      - lambda: |-
          std::string finished_timer_name = id(first_active_timer).name;
          int total_seconds = id(first_active_timer).total_seconds;
          // Variables for hours, minutes, and seconds
          int hours = total_seconds / 3600;
          int minutes = (total_seconds % 3600) / 60;
          int seconds = total_seconds % 60;
          std::string finished_timer_duration;
          std::string result;
          // If more than 1 hour
          if (hours > 0) {
            finished_timer_duration = std::to_string(hours) + " hour" + (hours > 1 ? "s" : "");
            if (minutes > 0) {
              finished_timer_duration += " " + std::to_string(minutes) + " minute" + (minutes > 1 ? "s" : "");
            }
            if (seconds > 0) {
              finished_timer_duration += " " + std::to_string(seconds) + " second" + (seconds > 1 ? "s" : "");
            }
          }
          // If less than 1 hour but more than 1 minute
          else if (minutes > 0) {
            finished_timer_duration = std::to_string(minutes) + " minute" + (minutes > 1 ? "s" : "");
            if (seconds > 0) {
              finished_timer_duration += " " + std::to_string(seconds) + " second" + (seconds > 1 ? "s" : "");
            }
          }
          // If less than 1 minute
          else {
            finished_timer_duration = std::to_string(seconds) + " second" + (seconds > 1 ? "s" : "");
          }
          // Construct the final message
          if (finished_timer_name.empty()) {
            result = finished_timer_duration + "timer finished";
          }
          else {
            result = finished_timer_name + "timer finished";
          }
          id(timer_tts_str) = result;

  # Script used activate the stop word if the TTS step is long.
  # Why is this wrapped on a script?
  #   Becasue we want to stop the sequence if the TTS step is faster than that.
  #   This allows us to prevent having the deactivation of the stop word before its own activation.
  - id: activate_stop_word_if_tts_step_is_long
    then:
      - delay: 1s
      # Enable stop wake word
      - lambda: id(stop).enable();

It requires an ESP32S3

For my build I'm also using a MAX98357A with 2 Adafruit ICS43434. There should be enough GPIO to run everything on separate pins which is also necessary for proper operation.

@indevor
Copy link

indevor commented Jan 18, 2025

I have been trying to get a voice assistant setup on an esp32 for months now with the ability to also have the media player. Is this possible? I also would like to use the same pin for both speaker and media player

It is possible with the nabu media player...

For my build I'm also using a MAX98357A with 2 Adafruit ICS43434. There should be enough GPIO to run everything on separate pins which is also necessary for proper operation.

Thank you for sharing the code. I built the layout on esp32-s3 + INMP441 + PCM5102A and it works. If someone wants to replicate it you need 2 microphones connected in parallel. One is tuned to the left channel (L\R is pulled to ground), the other L\R is pulled to the 3.3 volt power supply.

However, I noticed that the gain_log2 function is not implemented on the nabu_microphone platform, so the overall volume of the recorded voice is low. As a result, you need to speak much louder from the same distance than before. Also, if you listen to the recorded voice, it is barely audible.

Do you know any way to preliminarily increase the volume with esphome?

@indevor
Copy link

indevor commented Jan 18, 2025

I'm having success with this setup, but noticing that the INMP441/ESP32S3 doesn't seem to be multiplying the volume of the mic even when "volume_multiplier: 15.0" is uncommented. I've tried 4x, all the way to 128x and the audio files that are passed to HA sound the same (that is, very low). This affects HA's ability to correctly do STT. I find that if I manually multiply the recorded sample, it is not at all distorted and relatively high quality, so there seems to be plenty of headroom to increase the mic volume, I just can't make the setup actually do it.

Any ideas? This is really the final sticking point for me being able to use these regularly in the house.

I think it should be implemented in the domain: microphone, it was done in fork https://github.com/gnumpi/esphome_audio, gain_log2 setting, the recorded sound was louder. However, this is not present in nabu_microphone

@indevor
Copy link

indevor commented Jan 18, 2025

Maybe @formatBCE could look at and implement gain_log2 (microphone gain) if he had the time and desire, because we are plugging in his implementation (nabu_microphone). But I don't know how to mention him in this discussion) and how to make a function request in his github).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment