Skip to content

Instantly share code, notes, and snippets.

@EverythingSmartHome
Last active August 24, 2024 04:04
Show Gist options
  • Save EverythingSmartHome/055fbdde31a607ef9d695d5cac780e94 to your computer and use it in GitHub Desktop.
Save EverythingSmartHome/055fbdde31a607ef9d695d5cac780e94 to your computer and use it in GitHub Desktop.
ESP32 & ESPHome Voice Assistant
esphome:
name: esp32-mic-speaker
friendly_name: esp32-mic-speaker
on_boot:
- priority: -100
then:
- wait_until: api.connected
- delay: 1s
- if:
condition:
switch.is_on: use_wake_word
then:
- voice_assistant.start_continuous:
esp32:
board: esp32dev
framework:
type: esp-idf
version: recommended
# Enable logging
logger:
# Enable Home Assistant API
api:
ota:
wifi:
ssid: !secret wifi_ssid
password: !secret wifi_password
# Enable fallback hotspot (captive portal) in case wifi connection fails
ap:
ssid: "Esp32-Mic-Speaker"
password: "9vYvAFzzPjuc"
i2s_audio:
i2s_lrclk_pin: GPIO27
i2s_bclk_pin: GPIO26
microphone:
- platform: i2s_audio
id: mic
adc_type: external
i2s_din_pin: GPIO13
pdm: false
speaker:
- platform: i2s_audio
id: big_speaker
dac_type: external
i2s_dout_pin: GPIO25
mode: mono
voice_assistant:
microphone: mic
use_wake_word: false
noise_suppression_level: 2
auto_gain: 31dBFS
volume_multiplier: 2.0
speaker: big_speaker
id: assist
switch:
- platform: template
name: Use wake word
id: use_wake_word
optimistic: true
restore_mode: RESTORE_DEFAULT_ON
entity_category: config
on_turn_on:
- lambda: id(assist).set_use_wake_word(true);
- if:
condition:
not:
- voice_assistant.is_running
then:
- voice_assistant.start_continuous
on_turn_off:
- voice_assistant.stop
- lambda: id(assist).set_use_wake_word(false);
@Sn00kiT
Copy link

Sn00kiT commented Feb 16, 2024

Has anyone managed the fix the crackling noise? As soon as i switch the wakeword option on. the speaker starts to make this crackling noise. changing the bitrate didn´t help. i tried another esp32 board/mic/amp, but the problem stays the same.

Change the dout pin on the speaker. I have mine on GPIO12.

i2s_audio:
  - id: i2s_in
    i2s_lrclk_pin: GPIO25
    i2s_bclk_pin: GPIO26
  - id: i2s_out
    i2s_lrclk_pin: GPIO32
    i2s_bclk_pin: GPIO13

microphone:
  platform: i2s_audio 
  id: external_microphone 
  adc_type: external 
  i2s_audio_id: i2s_in
  i2s_din_pin: GPIO34
  pdm: false
  bits_per_sample: 32bit


speaker:
  platform: i2s_audio 
  id: external_speaker 
  dac_type: external
  i2s_audio_id: i2s_out
  i2s_dout_pin: GPIO12
  mode: mono 

Thank you that did the trick and also thank you for the post after that (-: !!!

@joshfedo
Copy link

@rich33584
That config seems to be working for me. Im still hitting some issues with TTS getting cut short or the wake work getting picked up by STT. But all in all its actually working better.

Heres a wire mapping for anyone who is going to try it out:
Mapping:

ESP32 (WROOM-32) MAX98357A (Speaker) I2S Microphone
GPIO33 DIN -
GPIO12 LRCLK -
GPIO13 BCLK -
GPIO34 - SD
GPIO25 - WS
GPIO26 - SCK
3.3V VDD VDD
GND GND GND

@rich33584
Copy link

@rich33584 That config seems to be working for me. Im still hitting some issues with TTS getting cut short or the wake work getting picked up by STT. But all in all its actually working better.

I seem to have spells of hours that it works perfectly, and then hours where the TTS is broken and crappy.
Not sure if its a wifi issue or if the ESP32 is just barely adequate to run these. I ordered som "Better" esp32 modules. Ill report back on them.

@Sn00kiT
Copy link

Sn00kiT commented Feb 27, 2024

Have you seen this?
https://beta.esphome.io/components/micro_wake_word.html

Has anyone luck with an ESP32-S3? The device seems to be more capable to manage the workload

In case someone needs a working config without crackling speakers in correct formating.

esphome:
  name: ha-mic-speaker01
  friendly_name: ha-mic-speaker01

esp32:
  board: esp32dev
  framework:
    type: esp-idf

# Enable logging
logger:

web_server:

# Enable Home Assistant API
api:
  encryption:
    key: "paste_your_key_here"

ota:
  password: "paste_your_key_here"

wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password

  # Enable fallback hotspot (captive portal) in case wifi connection fails
  ap:
    ssid: "Ha-Mic-Speaker01"
    password: "paste_your_key_here"

captive_portal:

i2s_audio:
  - id: i2s_in
    i2s_lrclk_pin: GPIO26   #WS / LRC
    i2s_bclk_pin: GPIO25    #SCK /BCLK

microphone:
  - platform: i2s_audio
    adc_type: external
    pdm: false
    id: mic_i2s
    channel: right
    bits_per_sample: 32bit
    i2s_audio_id: i2s_in
    i2s_din_pin: GPIO14    #SD

speaker:
  - platform: i2s_audio
    id: my_speaker
    dac_type: external
    i2s_dout_pin: GPIO12   #DIN 
    mode: mono
    i2s_audio_id: i2s_in


voice_assistant:
  microphone: mic_i2s
  id: va
  noise_suppression_level: 2
  auto_gain: 31dBFS
  volume_multiplier: 4.0
  use_wake_word: false
  speaker: my_speaker
  
  on_error: 
   - if:
        condition:
          switch.is_on: use_wake_word
        then:
          - switch.turn_off: use_wake_word
          - switch.turn_on: use_wake_word      

  on_client_connected:
    - if:
        condition:
          switch.is_on: use_wake_word
        then:
          - voice_assistant.start_continuous:

  on_client_disconnected:
    - if:
        condition:
          switch.is_on: use_wake_word
        then:
          - voice_assistant.stop:
  

binary_sensor:
  - platform: status
    name: API Connection
    id: api_connection
    filters:
      - delayed_on: 1s
    on_press:
      - if:
          condition:
            switch.is_on: use_wake_word
          then:
            - voice_assistant.start_continuous:
    on_release:
      - if:
          condition:
            switch.is_on: use_wake_word
          then:
            - voice_assistant.stop:


switch:
  - platform: template
    name: Use wake word
    id: use_wake_word
    optimistic: true
    restore_mode: RESTORE_DEFAULT_ON
    entity_category: config
    on_turn_on:
      - lambda: id(va).set_use_wake_word(true);
      - if:
          condition:
            not:
              - voice_assistant.is_running
          then:
            - voice_assistant.start_continuous
    
    on_turn_off:
      - voice_assistant.stop
      - lambda: id(va).set_use_wake_word(false);

@jake8796
Copy link

PS D:\Projects\LocalVoiceAssistant> esphome run voiceAssistantESP32S3.yaml
INFO ESPHome 2024.2.2
INFO Reading configuration voiceAssistantESP32S3.yaml...
INFO Updating https://github.com/esphome/esphome.git@pull/5230/head
Failed config

media_player.i2s_audio: [source voiceAssistantESP32S3.yaml:74]

This feature is only available with frameworks ['arduino'].
platform: i2s_audio
name: esp_speaker
id: media_player_speaker
dac_type: external
i2s_audio_id: i2s_out
i2s_dout_pin: GPIO16
mode: mono

Any ideas why I might be getting this error?

@LumCrafter
Copy link

INFO Resolving IP address of esp32-mic-speaker.local
ERROR Error resolving IP address of esp32-mic-speaker.local. Is it connected to WiFi?
ERROR (If this error persists, please set a static IP address: https://esphome.io/components/wifi.html#manual-ips)
ERROR Error resolving IP address: Error resolving address with mDNS: Did not respond. Maybe the device is offline., [Errno -5] No address associated with hostname

I keep getting this error what do i do?

@Xornop
Copy link

Xornop commented Mar 27, 2024

my setup has a screeching sound instead of a crackle when playing back a response. using the setup posted above but using different gpio pins.

Any ideas?

@Djelle
Copy link

Djelle commented Mar 29, 2024

I have discovered, that many of the problem mentioned above is due to interference noise. I had a lot of crackling noise with 10 cm wires on my test setup. So I had to go study on the net.

I2S is ment to be used between components on a PCB. It is without error protocol, so any errors will result in some kind of static noise. The solution is to keep the I2S connection as short as possible. There is no room for the pins on the ESP32 in my box. So I cut those off. Though I left the plastic part of the pin-row. I clammed the MAX98357A board on the backside of the ESP32 PCB, so I only needed 5 mm of wire. I placed the INMP441 on the side-edge of the ESP32 so they form a T (because that works with the box i use). And now it works without any noise.

I don't know how long the wires can be without shielding. But if you need to have even longer wires, they must be individually shielded (only the I2S wires). Shields should be connected to ground on the ESP32 (so only connected in one end). I have no idea how long they then can be. But not especially long. Maybe 20-30 cm?

/Djelle

@Battman2013
Copy link

Battman2013 commented Mar 29, 2024

PS D:\Projects\LocalVoiceAssistant> esphome run voiceAssistantESP32S3.yaml INFO ESPHome 2024.2.2 INFO Reading configuration voiceAssistantESP32S3.yaml... INFO Updating https://github.com/esphome/esphome.git@pull/5230/head Failed config

media_player.i2s_audio: [source voiceAssistantESP32S3.yaml:74]

This feature is only available with frameworks ['arduino']. platform: i2s_audio name: esp_speaker id: media_player_speaker dac_type: external i2s_audio_id: i2s_out i2s_dout_pin: GPIO16 mode: mono

Any ideas why I might be getting this error?

As the error message tells, the media_player is available in the arduino framework only.

esp32:
board: esp32dev # board type
framework:
type: arduino

@edwardtich1
Copy link

edwardtich1 commented Apr 9, 2024

The sound is interrupted when playing through the speaker

here is my code

i2s_audio:
  - id: i2s_in
    i2s_lrclk_pin: GPIO26
    i2s_bclk_pin: GPIO25

media_player:
  - platform: i2s_audio
    name: "Informer"
    id: media_player_speaker
    i2s_audio_id: i2s_in
    dac_type: external
    i2s_dout_pin: GPIO27
    mode: mono
    on_pause:
      - media_player.stop

here is my log

13:32:01	[W]	[component:232]	
Component i2s_audio.media_player took a long time for an operation (167 ms).
13:32:01	[W]	[component:233]	
Components should block for at most 30 ms.
13:32:01	[W]	[component:233]	
Components should block for at most 30 ms.

please help!

@imonlinux
Copy link

imonlinux commented Apr 17, 2024

I was doing some serious google'ing and ran across this Github repo: https://github.com/gnumpi/esphome_audio

Using a lot of what was put in this thread and the work that gnumpi has done on media-player for esp-idf, I have put together this config running on an esp-wroom-32. Still testing (and I think that I'm going to need a S3 board to make this reliable) but I did get it to work last night with a mic and speaker (4Ohm, 3W) with great sound.

Next I am going to add a 16bit led ring and see if I can get even better success on an S3 board.

substitutions:
  device_name: test-media-assistant
  friendly_name: Test Assistant
  device_description: "ESP WROOM 32"
  api_key: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

external_components:
  - source:
      type: git
      url: https://github.com/gnumpi/esphome_audio
      ref: main
    components: [ adf_pipeline, i2s_audio ]

esphome:
  name: ${device_name}
  comment: ${device_description}
  name_add_mac_suffix: false
  friendly_name: ${friendly_name}
  on_boot:
     - priority: -100
       then:
         - wait_until: api.connected
         - delay: 1s
         - if:
             condition:
               switch.is_on: use_wake_word
             then:
               - voice_assistant.start_continuous:

esp32:
  board: esp32dev
  framework:
    type: esp-idf

logger:

api:
  encryption:
    key: ${api_key}

ota:
  password: !secret ota_password

wifi:
  enable_rrm: true
  ssid: !secret wifi_ssid
  password: !secret wifi_password

i2s_audio:
  - id: i2s_in
    i2s_lrclk_pin: GPIO21 # INMP441 WS 
    i2s_bclk_pin: GPIO22  # INMP441 SCK 
  - id: i2s_out
    i2s_lrclk_pin: GPIO25 # PCM5102 LCK 
    i2s_bclk_pin: GPIO26  # PCM5102 BCK
    
adf_pipeline:
  - platform: i2s_audio
    type: audio_out
    id: adf_i2s_out
    i2s_audio_id: i2s_out
    i2s_dout_pin: GPIO33 # PCM5102 DIN

  - platform: i2s_audio
    type: audio_in
    id: adf_i2s_in
    i2s_audio_id: i2s_in
    i2s_din_pin: GPIO13 #INMP441 SD
    pdm: false
    channel: right
    sample_rate: 16000
    bits_per_sample: 32bit

microphone:
  - platform: adf_pipeline
    id: adf_microphone
    gain_log2: 3
    keep_pipeline_alive: true
    pipeline:
      - adf_i2s_in
      - self

media_player:
  - platform: adf_pipeline
    id: adf_media_player
    name: s3-dev_media_player
    keep_pipeline_alive: true
    internal: false
    pipeline:
      - self
      - adf_i2s_out

voice_assistant:
  microphone: adf_microphone
  use_wake_word: false
  noise_suppression_level: 2
  auto_gain: 31dBFS
  volume_multiplier: 3.0
  media_player: adf_media_player
  id: assist

switch:
  - platform: template
    name: Use wake word
    id: use_wake_word
    optimistic: true
    restore_mode: RESTORE_DEFAULT_ON
    entity_category: config
    on_turn_on:
      - lambda: id(assist).set_use_wake_word(true);
      - if:
          condition:
            not:
              - voice_assistant.is_running
          then:
            - voice_assistant.start_continuous
    on_turn_off:
      - voice_assistant.stop
      - lambda: id(assist).set_use_wake_word(false);

@Battman2013
Copy link

Hi imonlinux, many thanks for sharing your config. Good to know, that the media_player component can be configured with the idf framework also. I have tried this with a wroom32 board using very short wires, but it makes noise almost suppressing the speech.
Earlier I had some success with S3 board, but using the arduino framework. The output sound was clear, but the media_player function was not stable. And I could not figure out, how to get the mic work with the assist pipeline. Physicly the mic was OK (I tested it with a separate Arduino test program on the same board), but Home Assistant did not recognize it.
I plan to try your config on the S3 board, if possible, to see, how it performs...

@imonlinux
Copy link

imonlinux commented Apr 23, 2024

Hey Battman2013, I have been hacking away at a config for the ESP32 S3 N16R8. I purchased some from Amz and the following config is working pretty well. I was troubleshooting the MAX98357A only working on the 3.3v pin when I ran across this thread:
https://forum.arduino.cc/t/chinese-esp32-s3-5v-pin-warning/1192758
Once I bridged the IN-OUT pads, the 5v pin started making 5v. Also, I'm only using the onboard LED for the light. Saving my 16bit rings for boards that don't have them, like the XIAO ESP32 S3. Still troubleshooting the MIC on that one.

Next up is local wake word.

edited to correct INMP441 channel reference in substitutions

  device_name: "esp32-s3-n16r8-media-assist"
  friendly_name: "Media Assistant"
  device_description: "ESP32 S3 N16R8"
  api_key: "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
  esp_board: "esp32-s3-devkitc-1"
  framework_type: "esp-idf"
  din: "GPIO10"   # MAX98357A - DIN
  lrclk: "GPIO8" # MAX98357A - LRCLK
  bclk: "GPIO9"  # MAX98357A - BCLK
  sd: "GPIO4"    # INMP441 - SD
  ws: "GPIO5"    # INMP441 - WS
  sck: "GPIO6"   # INMP441 - SCK
  l_r: "right"   # INMP441 - L/R (GND = left / 3.3v = right)
  di: "GPIO48"   # WS2812 - DI
  # Phases of the Voice Assistant
  # IDLE: The voice assistant is ready to be triggered by a wake-word
  voice_assist_idle_phase_id: '1'
  # LISTENING: The voice assistant is ready to listen to a voice command (after being triggered by the wake word)
  voice_assist_listening_phase_id: '2'
  # THINKING: The voice assistant is currently processing the command
  voice_assist_thinking_phase_id: '3'
  # REPLYING: The voice assistant is replying to the command
  voice_assist_replying_phase_id: '4'
  # NOT_READY: The voice assistant is not ready
  voice_assist_not_ready_phase_id: '10'
  # ERROR: The voice assistant encountered an error
  voice_assist_error_phase_id: '11'
  # MUTED: The voice assistant is muted and will not reply to a wake-word
  voice_assist_muted_phase_id: '12'

external_components:
  - source:
      type: git
      url: https://github.com/gnumpi/esphome_audio
      ref: main
    components: [ adf_pipeline, i2s_audio ]

esphome:
  name: ${device_name}
  comment: ${device_description}
  friendly_name: ${friendly_name}
  min_version: 2024.1.0
  name_add_mac_suffix: false
  platformio_options:
    board_build.flash_mode: dio
    board_upload.maximum_size: 16777216
  on_boot:
    - priority: 600
      then:
        - light.turn_on:
            id: led_ring
            brightness: 70%
            effect: connecting

esp32:
  board: ${esp_board}
  variant: ESP32S3
  flash_size: 16MB
  framework:
    type: ${framework_type}
    version: recommended
    sdkconfig_options:
      # need to set a s3 compatible board for the adf-sdk to compile
      # board specific code is not used though
      CONFIG_ESP32_S3_BOX_BOARD: "y"
      CONFIG_ESP32_WIFI_STATIC_RX_BUFFER_NUM: "16"
      CONFIG_ESP32_WIFI_DYNAMIC_RX_BUFFER_NUM: "512"
      CONFIG_TCPIP_RECVMBOX_SIZE: "512"

      CONFIG_TCP_SND_BUF_DEFAULT: "65535"
      CONFIG_TCP_WND_DEFAULT: "512000"
      CONFIG_TCP_RECVMBOX_SIZE: "512"

psram:
  mode: octal
  speed: 80MHz

wifi:
  enable_rrm: true
  ssid: !secret wifi_ssid
  password: !secret wifi_password
  fast_connect: true

ota:
  password: !secret ota_password

api:
  encryption:
    key: ${api_key}
  on_client_connected:
    then:
      - if:
          condition:
            switch.is_on: use_wake_word
          then:
            - delay: 1s
            - voice_assistant.start_continuous:
            - delay: 1s
            - voice_assistant.stop:
            - delay: 2s
            - voice_assistant.start_continuous:
            - script.execute: reset_led
  on_client_disconnected:
    then:
      - light.turn_on:
          id: led_ring
          blue: 0%
          red: 100%
          green: 100%
          brightness: 50%
          effect: connecting

logger:

button:
  - platform: restart
    id: restart_btn
    name: "${friendly_name} REBOOT"

i2s_audio:
  - id: i2s_in
    i2s_lrclk_pin: ${ws}
    i2s_bclk_pin: ${sck}
  - id: i2s_out
    i2s_lrclk_pin: ${lrclk}
    i2s_bclk_pin: ${bclk}

adf_pipeline:
  - platform: i2s_audio
    type: audio_in
    id: adf_i2s_in
    i2s_audio_id: i2s_in
    i2s_din_pin: ${sd}
    pdm: false
    channel: left
    sample_rate: 16000
    bits_per_sample: 32bit
  - platform: i2s_audio
    type: audio_out
    id: adf_i2s_out
    i2s_audio_id: i2s_out
    i2s_dout_pin: ${din}

microphone:
  - platform: adf_pipeline
    id: adf_microphone
    gain_log2: 3
    keep_pipeline_alive: false
    pipeline:
      - adf_i2s_in
      - self
      
media_player:
  - platform: adf_pipeline
    id: adf_media_player
    name: media_player
    keep_pipeline_alive: false
    internal: false
    pipeline:
      - self
      - adf_i2s_out

voice_assistant:
  id: voice_asst
  microphone: adf_microphone
  media_player: adf_media_player
  noise_suppression_level: 4
  auto_gain: 31dBFS
  volume_multiplier: 15
  use_wake_word: false
  on_listening:
    - light.turn_on:
        id: led_ring
        blue: 100%
        red: 0%
        green: 0%
        brightness: 100%
        effect: wakeword
  on_tts_start:
    - light.turn_on:
        id: led_ring
        blue: 0%
        red: 0%
        green: 100%
        brightness: 75%
        effect: pulse
  on_end:
    - delay: 100ms
    - wait_until:
        not:
          media_player.is_playing:
    - script.execute: reset_led
  on_error:
    - light.turn_on:
        id: led_ring
        blue: 0%
        red: 100%
        green: 0%
        brightness: 100%
        effect: none
    - delay: 1s
    - script.execute: reset_led
    - script.wait: reset_led
    - lambda: |-
        if (code == "wake-provider-missing" || code == "wake-engine-missing") {
          id(use_wake_word).turn_off();
        }

script:
  - id: reset_led
    then:
      - if:
          condition:
            switch.is_on: use_wake_word
          then:
            - light.turn_on:
                id: led_ring
                blue: 100%
                red: 0%
                green: 0%
                brightness: 50%
                effect: none
          else:
            - light.turn_off: led_ring

switch:
  - platform: template
    name: Use wake word
    id: use_wake_word
    optimistic: true
    restore_mode: RESTORE_DEFAULT_ON
    entity_category: config
    on_turn_on:
      - lambda: id(voice_asst).set_use_wake_word(true);
      - if:
          condition:
            not:
              - voice_assistant.is_running
          then:
            - voice_assistant.start_continuous
      - script.execute: reset_led
    on_turn_off:
      - voice_assistant.stop
      - script.execute: reset_led

light:
  - platform: esp32_rmt_led_strip
    id: led_ring
    name: "${friendly_name} Light"
    pin: ${di}
    num_leds: 16
    rmt_channel: 0
    rgb_order: GRB
    chipset: ws2812
    default_transition_length: 0s
    effects:
      - pulse:
          name: "Pulse"
          transition_length: 0.5s
          update_interval: 0.5s
      - addressable_twinkle:
          name: "Working"
          twinkle_probability: 5%
          progress_interval: 4ms
      - addressable_color_wipe:
          name: "Wakeword"
          colors:
            - red: 0%
              green: 50%
              blue: 0%
              num_leds: 12
          add_led_interval: 20ms
          reverse: false
      - addressable_color_wipe:
          name: "Connecting"
          colors:
            - red: 60%
              green: 60%
              blue: 60%
              num_leds: 12
            - red: 60%
              green: 60%
              blue: 0%
              num_leds: 12
          add_led_interval: 100ms
          reverse: true

@indevor
Copy link

indevor commented May 4, 2024

l_r: "right" # INMP441 - L/R (GND = right / 3.3v = left)

Your description of the script contains an error

Left/Right Channel Select. When set low, the microphone outputs its signal in the left channel
of the I²S frame. When set high, the microphone outputs its signal in the right channel.

GND = LEFT

@imonlinux
Copy link

Hey indevor. Thanks for catching that mistake.

I have updated my previous post with this correction. Explains why I was fighting with my microphone on the XIAO ESP32 S3 board.

@indevor
Copy link

indevor commented May 9, 2024

CONFIG_TCP_SND_BUF_DEFAULT: "65535"
CONFIG_TCP_WND_DEFAULT: "512000"
CONFIG_TCP_RECVMBOX_SIZE: "512"

could you explain how this works?
or a link to a primary source?

@imonlinux
Copy link

Hey indevor, those are pulled straight out of the example provided by gnumpi. I haven't tested to see if removing them affects the functionality of the device using gnumpi's work.

https://github.com/gnumpi/esphome_audio/blob/main/examples/esp32-s3-N16R8-adf.yaml

@strusic
Copy link

strusic commented May 16, 2024

I was trying probably everything... ESP32, ESP32-S2 mini, XIAO ESP32-S3. All of them have distorted crackling sound, on xiao mic is not working at all dunno why.

Now my code looks like this:

substitutions:
 # Phases of the Voice Assistant
  # IDLE: The voice assistant is ready to be triggered by a wake-word
  voice_assist_idle_phase_id: '1'
  # LISTENING: The voice assistant is ready to listen to a voice command (after being triggered by the wake word)
  voice_assist_listening_phase_id: '2'
  # THINKING: The voice assistant is currently processing the command
  voice_assist_thinking_phase_id: '3'
  # REPLYING: The voice assistant is replying to the command
  voice_assist_replying_phase_id: '4'
  # NOT_READY: The voice assistant is not ready
  voice_assist_not_ready_phase_id: '10'
  # ERROR: The voice assistant encountered an error
  voice_assist_error_phase_id: '11'
  # MUTED: The voice assistant is muted and will not reply to a wake-word
  voice_assist_muted_phase_id: '12'

esphome:
  name: esp32-s3-voice-assistant
  friendly_name: ESP32-S3 Voice Assistant
  min_version: 2024.1.0
  platformio_options:
    build_flags: -DBOARD_HAS_PSRAM
    board_build.flash_mode: dio
    board_build.mcu: esp32s3
  on_boot:
    - priority: 600
      then:
        - light.turn_on:
            id: led_ring
            brightness: 70%
            effect: connecting

psram:
  mode: octal
  speed: 80MHz

external_components:
  - source:
      type: git
      url: https://github.com/gnumpi/esphome_audio
      ref: main
      #type: local
      #path: /Users/siekmann/Privat/Projects/espHome/esphome_audio/esphome/components
    components: [ adf_pipeline, i2s_audio ]

esp32:
  board: esp32-s3-devkitc-1
  variant: esp32s3
  flash_size: 8MB
  framework:
    type: esp-idf
    version: recommended
    sdkconfig_options:
      # need to set a s3 compatible board for the adf-sdk to compile
      # board specific code is not used though
      CONFIG_ESP32_S3_BOX_BOARD: "y"
      CONFIG_ESP32_WIFI_STATIC_RX_BUFFER_NUM: "16"
      CONFIG_ESP32_WIFI_DYNAMIC_RX_BUFFER_NUM: "512"
      CONFIG_TCPIP_RECVMBOX_SIZE: "512"

      CONFIG_TCP_SND_BUF_DEFAULT: "65535"
      CONFIG_TCP_WND_DEFAULT: "512000"
      CONFIG_TCP_RECVMBOX_SIZE: "512"

logger:

# Enable Home Assistant API
api:
  encryption:
    key: DELETED
  on_client_connected:
    then:
      - if:
          condition:
            switch.is_on: use_wake_word
          then:
            - delay: 1s
            - voice_assistant.start_continuous:
            - delay: 1s
            - voice_assistant.stop:
            - delay: 2s
            - voice_assistant.start_continuous:
            - script.execute: reset_led
  on_client_disconnected:
    then:
      - light.turn_on:
          id: led_ring
          blue: 0%
          red: 100%
          green: 100%
          brightness: 50%
          effect: connecting

ota:
  password: "c04fc98d47fab50835359a772849716b"

wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password
  manual_ip:
    static_ip: 192.168.0.144
    gateway: 192.168.0.1
    subnet: 255.255.255.0
  fast_connect: true

  # Enable fallback hotspot (captive portal) in case wifi connection fails
  ap:
    ssid: "Esp32-S3-Voice-Assistant"
    password: "494Uimj1IQc9"

captive_portal:

button:
  - platform: restart
    id: restart_btn
    name: "REBOOT"

i2s_audio:
  - id: i2s_in
    i2s_lrclk_pin: GPIO7
    i2s_bclk_pin: GPIO8
  - id: i2s_out
    i2s_lrclk_pin: GPIO6
    i2s_bclk_pin: GPIO5

adf_pipeline:
  - platform: i2s_audio
    type: audio_out
    id: adf_i2s_out
    i2s_audio_id: i2s_out
    i2s_dout_pin: GPIO4

  - platform: i2s_audio
    type: audio_in
    id: adf_i2s_in
    i2s_audio_id: i2s_in
    i2s_din_pin: GPIO9
    pdm: false
    channel: left
    sample_rate: 16000
    bits_per_sample: 32bit

microphone:
  - platform: adf_pipeline
    id: adf_microphone
    gain_log2: 3
    keep_pipeline_alive: true
    pipeline:
      - adf_i2s_in
      - self

media_player:
  - platform: adf_pipeline
    id: adf_media_player
    name: adf_media_player
    keep_pipeline_alive: false
    internal: false
    pipeline:
      - self
      - adf_i2s_out

voice_assistant:
  id: voice_asst
  microphone: adf_microphone
  media_player: adf_media_player
  noise_suppression_level: 4
  auto_gain: 31dBFS
  volume_multiplier: 15
  use_wake_word: false

  on_listening:
    - light.turn_on:
        id: led_ring
        blue: 100%
        red: 0%
        green: 0%
        brightness: 100%
        effect: wakeword
  on_tts_start:
    - light.turn_on:
        id: led_ring
        blue: 0%
        red: 0%
        green: 100%
        brightness: 75%
        effect: pulse
  on_end:
    - delay: 100ms
    - wait_until:
        not:
          media_player.is_playing:
    - script.execute: reset_led
  on_error:
    - light.turn_on:
        id: led_ring
        blue: 0%
        red: 100%
        green: 0%
        brightness: 100%
        effect: none
    - delay: 1s
    - script.execute: reset_led
    - script.wait: reset_led
    - lambda: |-
        if (code == "wake-provider-missing" || code == "wake-engine-missing") {
          id(use_wake_word).turn_off();
        }

script:
  - id: reset_led
    then:
      - if:
          condition:
            switch.is_on: use_wake_word
          then:
            - light.turn_on:
                id: led_ring
                blue: 100%
                red: 0%
                green: 0%
                brightness: 50%
                effect: none
          else:
            - light.turn_off: led_ring

switch:
  - platform: template
    name: Use wake word
    id: use_wake_word
    optimistic: true
    restore_mode: RESTORE_DEFAULT_ON
    entity_category: config
    on_turn_on:
      - lambda: id(voice_asst).set_use_wake_word(true);
      - if:
          condition:
            not:
              - voice_assistant.is_running
          then:
            - voice_assistant.start_continuous
      - script.execute: reset_led
    on_turn_off:
      - voice_assistant.stop
      - script.execute: reset_led

light:
  - platform: esp32_rmt_led_strip
    id: led_ring
    name: "Light"
    pin: GPIO1
    num_leds: 8
    rmt_channel: 0
    rgb_order: GRB
    chipset: ws2812
    default_transition_length: 0s
    effects:
      - pulse:
          name: "Pulse"
          transition_length: 0.5s
          update_interval: 0.5s
      - addressable_twinkle:
          name: "Working"
          twinkle_probability: 5%
          progress_interval: 4ms
      - addressable_color_wipe:
          name: "Wakeword"
          colors:
            - red: 0%
              green: 50%
              blue: 0%
              num_leds: 12
          add_led_interval: 20ms
          reverse: false
      - addressable_color_wipe:
          name: "Connecting"
          colors:
            - red: 60%
              green: 60%
              blue: 60%
              num_leds: 12
            - red: 60%
              green: 60%
              blue: 0%
              num_leds: 12
          add_led_interval: 100ms
          reverse: true

I dont know how to properly set this up right now :/

@Djelle
Copy link

Djelle commented May 16, 2024

I am not sure if the pin-out for ESP32 DevKit is the same as for ur board. But I think they are. U should not use GPIO pin 6, 7 and 8.
https://randomnerdtutorials.com/esp32-pinout-reference-gpios/

@strusic
Copy link

strusic commented May 16, 2024

This config is for xiao esp32-s3

@imonlinux
Copy link

imonlinux commented May 16, 2024

Hey strusic,
I was having some of those same issues with the microphone until I enabled micro_wake_word. Below is my current testing yaml for a XIAO ESP32-S3 using duplex mode on a shared i2s_audio channel in order to cut down on the number of pins needed. This is working very well especially in conjunction with Extended OpenAI Conversation. I have ChatGPT 4.o configured to "embody" HAL 9000 and it has been very entertaining. Printing a HAL 9000 prop replica to hold the assistant hardware.

edit: removed OTA password....
edit2: fixed persistent "connecting" led effect after boot
edit3: fixed INMP441 L/R Channel error (thanks again indevor)

substitutions:
  device_name: "test-media-assistant-v2"
  friendly_name: "Test Media Assistant V2"
  device_description: "XIAO ESP32 S3"
  esp_board: "esp32-s3-devkitc-1"
  framework_type: "esp-idf"
  din: "GPIO1"   # MAX98357A - DIN
  lrclk: "GPIO7" # MAX98357A - LRCLK / INMP441 - WS
  bclk: "GPIO8"  # MAX98357A - BCLK / INMP441 - SCK
  sd: "GPIO2"    # INMP441 - SD
  l_r: "right"   # INMP441 - L/R (3.3v = right / GND = left)
  di: "GPIO9"   # WS2812 - DI
  api_key: "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
  # Phases of the Voice Assistant
  # IDLE: The voice assistant is ready to be triggered by a wake-word
  voice_assist_idle_phase_id: '1'
  # LISTENING: The voice assistant is ready to listen to a voice command (after being triggered by the wake word)
  voice_assist_listening_phase_id: '2'
  # THINKING: The voice assistant is currently processing the command
  voice_assist_thinking_phase_id: '3'
  # REPLYING: The voice assistant is replying to the command
  voice_assist_replying_phase_id: '4'
  # NOT_READY: The voice assistant is not ready
  voice_assist_not_ready_phase_id: '10'
  # ERROR: The voice assistant encountered an error
  voice_assist_error_phase_id: '11'
  # MUTED: The voice assistant is muted and will not reply to a wake-word
  voice_assist_muted_phase_id: '12'

external_components:
  - source:
      type: git
      url: https://github.com/gnumpi/esphome_audio
      ref: main
      #type: local
      #path: /Users/siekmann/Privat/Projects/espHome/esphome_audio/esphome/components
    components: [ adf_pipeline, i2s_audio ]

esphome:
  name: ${device_name}
  comment: ${device_description}
  friendly_name: ${friendly_name}
  min_version: 2024.2.0
  platformio_options:
    build_flags: -DBOARD_HAS_PSRAM
    board_build.flash_mode: dio
    board_upload.maximum_size: 16777216
  on_boot:
    priority: 600
    then:
      # Run the script to refresh the LED status
      # If after 30 seconds, the device is still initializing (It did not yet connect to Home Assistant), turn off the init_in_progress variable and run the script to refresh the LED status
      - delay: 30s
      - if:
          condition:
            lambda: return id(init_in_progress);
          then:
            - lambda: id(init_in_progress) = false;
                            
esp32:
  board: ${esp_board}
  variant: ESP32S3
  flash_size: 16MB
  framework:
    type: ${framework_type}
    version: recommended
    sdkconfig_options:
      # need to set a s3 compatible board for the adf-sdk to compile
      # board specific code is not used though
      CONFIG_ESP32_S3_BOX_BOARD: "y"
      CONFIG_ESP32_WIFI_STATIC_RX_BUFFER_NUM: "16"
      CONFIG_ESP32_WIFI_DYNAMIC_RX_BUFFER_NUM: "512"
      CONFIG_TCPIP_RECVMBOX_SIZE: "512"

      CONFIG_TCP_SND_BUF_DEFAULT: "65535"
      CONFIG_TCP_WND_DEFAULT: "512000"
      CONFIG_TCP_RECVMBOX_SIZE: "512"

logger:

globals:
  # Global initialisation variable. Initialized to true and set to false once everything is connected. Only used to have a smooth "plugging" experience
  - id: init_in_progress
    type: bool
    restore_value: no
    initial_value: 'true'
  # Global variable tracking the phase of the voice assistant (defined above). Initialized to not_ready
  - id: voice_assistant_phase
    type: int
    restore_value: no
    initial_value: ${voice_assist_not_ready_phase_id}

psram:
  mode: octal
  speed: 80MHz

wifi:
  enable_rrm: true
  ssid: !secret wifi_ssid
  password: !secret wifi_password
  fast_connect: true

ota:
  password: !secret ota_password

api:
  encryption:
    key: ${api_key}

i2s_audio:
  - id: i2s_dplx
    i2s_lrclk_pin: ${lrclk}
    i2s_bclk_pin: ${bclk}
    access_mode: duplex

adf_pipeline:
  - platform: i2s_audio
    type: audio_out
    id: adf_i2s_out
    i2s_audio_id: i2s_dplx
    i2s_dout_pin: ${din}
    sample_rate: 16000
    bits_per_sample: 32bit
    fixed_settings: true

  - platform: i2s_audio
    type: audio_in
    id: adf_i2s_in
    i2s_audio_id: i2s_dplx
    i2s_din_pin: ${sd}
    pdm: false
    channel: ${l_r}
    sample_rate: 16000
    bits_per_sample: 32bit
    fixed_settings: true

microphone:
  - platform: adf_pipeline
    id: adf_microphone
    keep_pipeline_alive: true
    pipeline:
      - adf_i2s_in
      - self
      
media_player:
  - platform: adf_pipeline
    id: adf_media_player
    name: media_player
    keep_pipeline_alive: true
    internal: false
    pipeline:
      - self
      - resampler
      - adf_i2s_out

micro_wake_word:
  model: okay_nabu
  on_wake_word_detected:
      - media_player.stop:
      - light.turn_on:
          id: led_ring
          blue: 0%
          red: 0%
          green: 100%
          brightness: 75%
          effect: pulse
      - voice_assistant.start:

voice_assistant:
  microphone: adf_microphone
  media_player: adf_media_player

  use_wake_word: false
  #vad_threshold: 3

  noise_suppression_level: 1
  auto_gain: 31dBFS
  volume_multiplier: 15.0

  on_client_connected:
    - lambda: id(init_in_progress) = false;
    - if:
        condition:
          switch.is_on: use_wake_word
        then:
          - micro_wake_word.start:
          - lambda: id(voice_assistant_phase) = ${voice_assist_idle_phase_id};
          - script.execute: reset_led
        else:
          - lambda: id(voice_assistant_phase) = ${voice_assist_muted_phase_id};

  on_client_disconnected:
    - lambda: id(voice_assistant_phase) = ${voice_assist_not_ready_phase_id};
    - voice_assistant.stop
    - micro_wake_word.stop
    - light.turn_on:
          id: led_ring
          blue: 0%
          red: 100%
          green: 100%
          brightness: 50%
          effect: connecting

  on_listening:
    - light.turn_on:
        id: led_ring
        blue: 100%
        red: 0%
        green: 0%
        brightness: 25%
        effect: wakeword
        
  on_tts_start:
    - light.turn_on:
        id: led_ring
        blue: 0%
        red: 0%
        green: 100%
        brightness: 75%
        effect: pulse
  
  on_end:
      then:
        - light.turn_off:
            id: led_ring
        - voice_assistant.stop
        - wait_until:
            not:
              media_player.is_playing:
        - script.execute: reset_led
        - if:
            condition:
              switch.is_on: use_wake_word
            then:
              - micro_wake_word.start:
  on_error:
    - light.turn_on:
        id: led_ring
        blue: 0%
        red: 100%
        green: 0%
        brightness: 100%
        effect: none
    - delay: 1s
    - script.execute: reset_led
    - script.wait: reset_led
    - lambda: |-
        if (code == "wake-provider-missing" || code == "wake-engine-missing") {
          id(use_wake_word).turn_off();
        }
    - if:
        condition:
          switch.is_on: use_wake_word
        then:
          - micro_wake_word.start:
          - script.execute: reset_led
              
script:
  - id: reset_led
    then:
      - if:
          condition:
            switch.is_on: use_wake_word
          then:
            - light.turn_on:
                id: led_ring
                blue: 100%
                red: 0%
                green: 0%
                brightness: 25%
                effect: none
          else:
            - light.turn_off: led_ring
 
button:
  - platform: restart
    id: restart_btn
    name: "${friendly_name} REBOOT"
            
switch:
  - platform: template
    name: Enable Voice Assistant
    id: use_wake_word
    optimistic: true
    restore_mode: RESTORE_DEFAULT_ON
    icon: mdi:assistant
    # When the switch is turned on (on Home Assistant):
    # Start the voice assistant component
    # Set the correct phase and run the script to refresh the LED status
    on_turn_on:
      - logger.log: "switch on"
      - if:
          condition:
            lambda: return !id(init_in_progress);
          then:
            - logger.log: "condition 1"
            - lambda: id(voice_assistant_phase) = ${voice_assist_idle_phase_id};
            - voice_assistant.stop
            - delay: 1s
            - if:
                condition:
                  not:
                    - voice_assistant.is_running
                then:
                  - logger.log: "Starting MWW"
                  #- voice_assistant.start_continuous
                  - micro_wake_word.start:
      - script.execute: reset_led
    on_turn_off:
      - if:
          condition:
            lambda: return !id(init_in_progress);
          then:
            - voice_assistant.stop
            - micro_wake_word.stop
            - lambda: id(voice_assistant_phase) = ${voice_assist_muted_phase_id};
      - script.execute: reset_led

  - platform: template
    name: Pipeline
    id: pipeline_switch
    optimistic: true
    restore_mode: RESTORE_DEFAULT_OFF

    on_turn_off:
      - media_player.stop

    on_turn_on:
      - media_player.play_media: "https://dl.espressif.com/dl/audio/ff-16b-2c-44100hz.mp3"

light:
  - platform: esp32_rmt_led_strip
    id: led_ring
    name: "${friendly_name} Light"
    pin: ${di}
    num_leds: 16
    rmt_channel: 0
    rgb_order: GRB
    chipset: ws2812
    default_transition_length: 0s
    effects:
      - pulse:
          name: "Pulse"
          transition_length: 0.5s
          update_interval: 0.5s
      - addressable_twinkle:
          name: "Working"
          twinkle_probability: 5%
          progress_interval: 4ms
      - addressable_color_wipe:
          name: "Wakeword"
          colors:
            - red: 0%
              green: 50%
              blue: 0%
              num_leds: 12
          add_led_interval: 20ms
          reverse: false
      - addressable_color_wipe:
          name: "Connecting"
          colors:
            - red: 60%
              green: 60%
              blue: 60%
              num_leds: 12
            - red: 60%
              green: 60%
              blue: 0%
              num_leds: 12
          add_led_interval: 100ms
          reverse: true

@strusic
Copy link

strusic commented May 17, 2024

I was trying to compile this code, but my home assistant is crashing when I get this line:

Compiling .pioenvs/esp32-s3-voice-assistant/components/esp-tflite-micro/tensorflow/lite/micro/micro_allocation_info.o

I also tried to compile it on windows, but after flashing device is not connecting to HA

@imonlinux
Copy link

Here are a few things that I do when I'm having issues flashing one of these:

  • In ESPHome dashboard, click Clean Build Files and then recompile
  • Make sure that your USB has enough power for the device (My laptop USB port is not sufficient and will result in a corrupt flash. I purchased a powered USB hub.)
  • Use esptool to erase the ESP32 (esptool --chip esp32 erase_flash)

I compiled that exact config today using the following:
ESPHome = 2024.5.0
HA Core = 2024.5.3
Supervisor = 2024.05.1
HASSOS = 12.3

@strusic
Copy link

strusic commented May 17, 2024

My ESP is well powered, issue I got is compiling this yaml is crashing whole HA OS. I am running RPI4B 4GB with HAOS on SSD. I've added 2GB of swap right now and additional fan. Maybe it helps. Compiling this yaml with micro_wake_word is crazy. I've never run into something like that.

Also there is a lot of warnings while compiling, like:

Compiling .pioenvs/esp32-s3-voice-assistant/components/esp-tflite-micro/tensorflow/lite/micro/kernels/mirror_pad.o
In file included from components/esp-tflite-micro/tensorflow/lite/micro/kernels/lstm_eval.cc:25:
components/esp-tflite-micro/tensorflow/lite/kernels/internal/reference/mul.h: In lambda function:
components/esp-tflite-micro/tensorflow/lite/kernels/internal/reference/mul.h:151:34: warning: declaration of 'const tflite::ArithmeticParams& params' shadows a parameter [-Wshadow]
          const uint8_t input2_val) {
                                  ^
components/esp-tflite-micro/tensorflow/lite/kernels/internal/reference/mul.h:126:56: note: shadowed declaration is here
 inline void BroadcastMul6DSlow(const ArithmeticParams& params,
                                ~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~
components/esp-tflite-micro/tensorflow/lite/kernels/internal/reference/mul.h: In lambda function:
components/esp-tflite-micro/tensorflow/lite/kernels/internal/reference/mul.h:210:28: warning: declaration of 'const tflite::ArithmeticParams& params' shadows a parameter [-Wshadow]
          const T input2_val) {
                            ^
components/esp-tflite-micro/tensorflow/lite/kernels/internal/reference/mul.h:171:44: note: shadowed declaration is here
 BroadcastMul6DSlow(const ArithmeticParams& params,
                    ~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~
components/esp-tflite-micro/tensorflow/lite/kernels/internal/reference/mul.h: In lambda function:
components/esp-tflite-micro/tensorflow/lite/kernels/internal/reference/mul.h:250:46: warning: declaration of 'const tflite::ArithmeticParams& params' shadows a parameter [-Wshadow]
          const std::complex<float> input2_val) {
                                              ^
components/esp-tflite-micro/tensorflow/lite/kernels/internal/reference/mul.h:221:56: note: shadowed declaration is here
 inline void BroadcastMul6DSlow(const ArithmeticParams& params,

Is this normal?

@imonlinux
Copy link

100%

You will see all kinds of warnings while it is compiling. Good luck.

@strusic
Copy link

strusic commented May 17, 2024

I successfully flashed this yaml, but there are a lot of cracklings and distortion while playing anything on speaker. I am using max98357a as amp and MEMS I2S - DFRobot SEN0526

@indevor
Copy link

indevor commented May 17, 2024

l_r: "right" # INMP441 - L/R (GND = right / 3.3v = left)
di: "GPIO9" # WS2812 - DI

Funny, you're wrong again. Maybe you're copying from the wrong source.
According to the document for this microphone:

Left/Right Channel Select. When set low, the microphone outputs its signal in the left channel
of the I²S frame. When set high, the microphone outputs its signal in the right channel.
GND = LEFT

@indevor
Copy link

indevor commented May 17, 2024

Hey strusic, I was having some of those same issues with the microphone until I enabled micro_wake_word. Below is my current testing yaml for a XIAO ESP32-S3 using duplex mode on a shared i2s_audio channel in order to cut down on the number of pins needed. This is working very well especially in conjunction with Extended OpenAI Conversation. I have ChatGPT 4.o configured to "embody" HAL 9000 and it has been very entertaining. Printing a HAL 9000 prop replica to hold the assistant hardware.

edit: removed OTA password.... edit2: fixed persistent "connecting" led effect after boot

substitutions:
  device_name: "test-media-assistant-v2"
  friendly_name: "Test Media Assistant V2"
  device_description: "XIAO ESP32 S3"
  esp_board: "esp32-s3-devkitc-1"
  framework_type: "esp-idf"
  din: "GPIO1"   # MAX98357A - DIN
  lrclk: "GPIO7" # MAX98357A - LRCLK / INMP441 - WS
  bclk: "GPIO8"  # MAX98357A - BCLK / INMP441 - SCK
  sd: "GPIO2"    # INMP441 - SD
  l_r: "right"   # INMP441 - L/R (GND = right / 3.3v = left)
  di: "GPIO9"   # WS2812 - DI
  api_key: "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
  # Phases of the Voice Assistant
  # IDLE: The voice assistant is ready to be triggered by a wake-word
  voice_assist_idle_phase_id: '1'
  # LISTENING: The voice assistant is ready to listen to a voice command (after being triggered by the wake word)
  voice_assist_listening_phase_id: '2'
  # THINKING: The voice assistant is currently processing the command
  voice_assist_thinking_phase_id: '3'
  # REPLYING: The voice assistant is replying to the command
  voice_assist_replying_phase_id: '4'
  # NOT_READY: The voice assistant is not ready
  voice_assist_not_ready_phase_id: '10'
  # ERROR: The voice assistant encountered an error
  voice_assist_error_phase_id: '11'
  # MUTED: The voice assistant is muted and will not reply to a wake-word
  voice_assist_muted_phase_id: '12'

external_components:
  - source:
      type: git
      url: https://github.com/gnumpi/esphome_audio
      ref: main
      #type: local
      #path: /Users/siekmann/Privat/Projects/espHome/esphome_audio/esphome/components
    components: [ adf_pipeline, i2s_audio ]

esphome:
  name: ${device_name}
  comment: ${device_description}
  friendly_name: ${friendly_name}
  min_version: 2024.2.0
  platformio_options:
    build_flags: -DBOARD_HAS_PSRAM
    board_build.flash_mode: dio
    board_upload.maximum_size: 16777216
  on_boot:
    priority: 600
    then:
      # Run the script to refresh the LED status
      # If after 30 seconds, the device is still initializing (It did not yet connect to Home Assistant), turn off the init_in_progress variable and run the script to refresh the LED status
      - delay: 30s
      - if:
          condition:
            lambda: return id(init_in_progress);
          then:
            - lambda: id(init_in_progress) = false;
                            
esp32:
  board: ${esp_board}
  variant: ESP32S3
  flash_size: 16MB
  framework:
    type: ${framework_type}
    version: recommended
    sdkconfig_options:
      # need to set a s3 compatible board for the adf-sdk to compile
      # board specific code is not used though
      CONFIG_ESP32_S3_BOX_BOARD: "y"
      CONFIG_ESP32_WIFI_STATIC_RX_BUFFER_NUM: "16"
      CONFIG_ESP32_WIFI_DYNAMIC_RX_BUFFER_NUM: "512"
      CONFIG_TCPIP_RECVMBOX_SIZE: "512"

      CONFIG_TCP_SND_BUF_DEFAULT: "65535"
      CONFIG_TCP_WND_DEFAULT: "512000"
      CONFIG_TCP_RECVMBOX_SIZE: "512"

logger:

globals:
  # Global initialisation variable. Initialized to true and set to false once everything is connected. Only used to have a smooth "plugging" experience
  - id: init_in_progress
    type: bool
    restore_value: no
    initial_value: 'true'
  # Global variable tracking the phase of the voice assistant (defined above). Initialized to not_ready
  - id: voice_assistant_phase
    type: int
    restore_value: no
    initial_value: ${voice_assist_not_ready_phase_id}

psram:
  mode: octal
  speed: 80MHz

wifi:
  enable_rrm: true
  ssid: !secret wifi_ssid
  password: !secret wifi_password
  fast_connect: true

ota:
  password: !secret ota_password

api:
  encryption:
    key: ${api_key}

i2s_audio:
  - id: i2s_dplx
    i2s_lrclk_pin: ${lrclk}
    i2s_bclk_pin: ${bclk}
    access_mode: duplex

adf_pipeline:
  - platform: i2s_audio
    type: audio_out
    id: adf_i2s_out
    i2s_audio_id: i2s_dplx
    i2s_dout_pin: ${din}
    sample_rate: 16000
    bits_per_sample: 32bit
    fixed_settings: true

  - platform: i2s_audio
    type: audio_in
    id: adf_i2s_in
    i2s_audio_id: i2s_dplx
    i2s_din_pin: ${sd}
    pdm: false
    channel: ${l_r}
    sample_rate: 16000
    bits_per_sample: 32bit
    fixed_settings: true

microphone:
  - platform: adf_pipeline
    id: adf_microphone
    keep_pipeline_alive: true
    pipeline:
      - adf_i2s_in
      - self
      
media_player:
  - platform: adf_pipeline
    id: adf_media_player
    name: media_player
    keep_pipeline_alive: true
    internal: false
    pipeline:
      - self
      - resampler
      - adf_i2s_out

micro_wake_word:
  model: okay_nabu
  on_wake_word_detected:
      - media_player.stop:
      - light.turn_on:
          id: led_ring
          blue: 0%
          red: 0%
          green: 100%
          brightness: 75%
          effect: pulse
      - voice_assistant.start:

voice_assistant:
  microphone: adf_microphone
  media_player: adf_media_player

  use_wake_word: false
  #vad_threshold: 3

  noise_suppression_level: 1
  auto_gain: 31dBFS
  volume_multiplier: 15.0

  on_client_connected:
    - lambda: id(init_in_progress) = false;
    - if:
        condition:
          switch.is_on: use_wake_word
        then:
          - micro_wake_word.start:
          - lambda: id(voice_assistant_phase) = ${voice_assist_idle_phase_id};
          - script.execute: reset_led
        else:
          - lambda: id(voice_assistant_phase) = ${voice_assist_muted_phase_id};

  on_client_disconnected:
    - lambda: id(voice_assistant_phase) = ${voice_assist_not_ready_phase_id};
    - voice_assistant.stop
    - micro_wake_word.stop
    - light.turn_on:
          id: led_ring
          blue: 0%
          red: 100%
          green: 100%
          brightness: 50%
          effect: connecting

  on_listening:
    - light.turn_on:
        id: led_ring
        blue: 100%
        red: 0%
        green: 0%
        brightness: 25%
        effect: wakeword
        
  on_tts_start:
    - light.turn_on:
        id: led_ring
        blue: 0%
        red: 0%
        green: 100%
        brightness: 75%
        effect: pulse
  
  on_end:
      then:
        - light.turn_off:
            id: led_ring
        - voice_assistant.stop
        - wait_until:
            not:
              media_player.is_playing:
        - script.execute: reset_led
        - if:
            condition:
              switch.is_on: use_wake_word
            then:
              - micro_wake_word.start:
  on_error:
    - light.turn_on:
        id: led_ring
        blue: 0%
        red: 100%
        green: 0%
        brightness: 100%
        effect: none
    - delay: 1s
    - script.execute: reset_led
    - script.wait: reset_led
    - lambda: |-
        if (code == "wake-provider-missing" || code == "wake-engine-missing") {
          id(use_wake_word).turn_off();
        }
    - if:
        condition:
          switch.is_on: use_wake_word
        then:
          - micro_wake_word.start:
          - script.execute: reset_led
              
script:
  - id: reset_led
    then:
      - if:
          condition:
            switch.is_on: use_wake_word
          then:
            - light.turn_on:
                id: led_ring
                blue: 100%
                red: 0%
                green: 0%
                brightness: 25%
                effect: none
          else:
            - light.turn_off: led_ring
 
button:
  - platform: restart
    id: restart_btn
    name: "${friendly_name} REBOOT"
            
switch:
  - platform: template
    name: Enable Voice Assistant
    id: use_wake_word
    optimistic: true
    restore_mode: RESTORE_DEFAULT_ON
    icon: mdi:assistant
    # When the switch is turned on (on Home Assistant):
    # Start the voice assistant component
    # Set the correct phase and run the script to refresh the LED status
    on_turn_on:
      - logger.log: "switch on"
      - if:
          condition:
            lambda: return !id(init_in_progress);
          then:
            - logger.log: "condition 1"
            - lambda: id(voice_assistant_phase) = ${voice_assist_idle_phase_id};
            - voice_assistant.stop
            - delay: 1s
            - if:
                condition:
                  not:
                    - voice_assistant.is_running
                then:
                  - logger.log: "Starting MWW"
                  #- voice_assistant.start_continuous
                  - micro_wake_word.start:
      - script.execute: reset_led
    on_turn_off:
      - if:
          condition:
            lambda: return !id(init_in_progress);
          then:
            - voice_assistant.stop
            - micro_wake_word.stop
            - lambda: id(voice_assistant_phase) = ${voice_assist_muted_phase_id};
      - script.execute: reset_led

  - platform: template
    name: Pipeline
    id: pipeline_switch
    optimistic: true
    restore_mode: RESTORE_DEFAULT_OFF

    on_turn_off:
      - media_player.stop

    on_turn_on:
      - media_player.play_media: "https://dl.espressif.com/dl/audio/ff-16b-2c-44100hz.mp3"

light:
  - platform: esp32_rmt_led_strip
    id: led_ring
    name: "${friendly_name} Light"
    pin: ${di}
    num_leds: 16
    rmt_channel: 0
    rgb_order: GRB
    chipset: ws2812
    default_transition_length: 0s
    effects:
      - pulse:
          name: "Pulse"
          transition_length: 0.5s
          update_interval: 0.5s
      - addressable_twinkle:
          name: "Working"
          twinkle_probability: 5%
          progress_interval: 4ms
      - addressable_color_wipe:
          name: "Wakeword"
          colors:
            - red: 0%
              green: 50%
              blue: 0%
              num_leds: 12
          add_led_interval: 20ms
          reverse: false
      - addressable_color_wipe:
          name: "Connecting"
          colors:
            - red: 60%
              green: 60%
              blue: 60%
              num_leds: 12
            - red: 60%
              green: 60%
              blue: 0%
              num_leds: 12
          add_led_interval: 100ms
          reverse: true
INFO ESPHome 2024.5.0
INFO Reading configuration /config/esphome/test.yaml...
WARNING GPIO3 is a strapping PIN and should only be used for I/O with care.
Attaching external pullup/down resistors to strapping pins can cause unexpected failures.
See https://esphome.io/guides/faq.html#why-am-i-getting-a-warning-about-strapping-pins
WARNING GPIO45 is a strapping PIN and should only be used for I/O with care.
Attaching external pullup/down resistors to strapping pins can cause unexpected failures.
See https://esphome.io/guides/faq.html#why-am-i-getting-a-warning-about-strapping-pins
INFO Generating C++ source...
INFO Updating https://github.com/espressif/esp-adf.git@v2.5
INFO Updating submodules (components/esp-adf-libs, components/esp-sr) for https://github.com/espressif/esp-adf.git@v2.5
Traceback (most recent call last):
  File "/usr/local/bin/esphome", line 33, in <module>
    sys.exit(load_entry_point('esphome', 'console_scripts', 'esphome')())
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/esphome/esphome/__main__.py", line 1065, in main
    return run_esphome(sys.argv)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/esphome/esphome/__main__.py", line 1052, in run_esphome
    rc = POST_CONFIG_ACTIONS[args.command](args, config)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/esphome/esphome/__main__.py", line 479, in command_run
    exit_code = write_cpp(config)
                ^^^^^^^^^^^^^^^^^
  File "/esphome/esphome/__main__.py", line 193, in write_cpp
    return write_cpp_file()
           ^^^^^^^^^^^^^^^^
  File "/esphome/esphome/__main__.py", line 211, in write_cpp_file
    writer.write_cpp(code_s)
  File "/esphome/esphome/writer.py", line 344, in write_cpp
    copy_src_tree()
  File "/esphome/esphome/writer.py", line 297, in copy_src_tree
    copy_files()
  File "/esphome/esphome/components/esp32/__init__.py", line 684, in copy_files
    repo_dir, _ = git.clone_or_update(
                  ^^^^^^^^^^^^^^^^^^^^
  File "/esphome/esphome/git.py", line 111, in clone_or_update
    run_git_command(
  File "/esphome/esphome/git.py", line 31, in run_git_command
    raise cv.Invalid(lines[-1][len("fatal: ") :])
voluptuous.error.Invalid: Unable to find current revision in submodule path 'components/esp-adf-libs'

@imonlinux
Copy link

imonlinux commented May 17, 2024

Hey indevor,

I have so many versions of this config running on different esp32 boards now I must have pulled that old config back in at some point. The odd thing is that this specific config running on the XIAO ESP32-S3 with a INMP441 with grounded L/R pin set to Right channel is working perfectly. When I'm back home I will change it back to Left and see if it continues to work. I was 100% wrong. I have the INMP441 L/R channel set high. That is why the selection of right channel is working but my note in substitutions is backwards again. Thanks again indevor!

Not sure what your last post was indicating, I have moved away from using GPIO3 and GPIO45 due to these warnings and am now using the single i2s_audio channel in duplex mode. Not sure where that voluptuous.error is coming from on your run. I just reran this config after doing a Clean Build Files and it compiles as expected.

edit: typo
edit2: added completion of build
edit3: confirmed that INMP441 L/R channel is set high and not low so right channel is working and fixed in config above

INFO ESPHome 2024.5.0
INFO Reading configuration /config/esphome/test-media-assistant-v2.yaml...
INFO Generating C++ source...
INFO Updating https://github.com/espressif/esp-adf.git@v2.5
INFO Updating submodules (components/esp-adf-libs, components/esp-sr) for https://github.com/espressif/esp-adf.git@v2.5
INFO Updating https://github.com/espressif/esp-tflite-micro@None
INFO Compiling app...
Processing test-media-assistant-v2 (board: esp32-s3-devkitc-1; framework: espidf; platform: platformio/espressif32@5.4.0)
--------------------------------------------------------------------------------
Library Manager: Installing esphome/noise-c @ 0.1.4
INFO Installing esphome/noise-c @ 0.1.4
Unpacking  [####################################]  100%
Library Manager: noise-c@0.1.4 has been installed!
INFO noise-c@0.1.4 has been installed!
Library Manager: Resolving dependencies...
INFO Resolving dependencies...
Library Manager: Installing esphome/libsodium @ 1.10018.1
INFO Installing esphome/libsodium @ 1.10018.1
Unpacking  [####################################]  100%
Library Manager: libsodium@1.10018.1 has been installed!
INFO libsodium@1.10018.1 has been installed!
HARDWARE: ESP32S3 240MHz, 320KB RAM, 16MB Flash
 - framework-espidf @ 3.40407.0 (4.4.7) 
 - tool-cmake @ 3.16.4 
 - tool-ninja @ 1.7.1 
 - toolchain-esp32ulp @ 2.35.0-20220830 
 - toolchain-riscv32-esp @ 8.4.0+2021r2-patch5 
 - toolchain-xtensa-esp32s3 @ 8.4.0+2021r2-patch5
Reading CMake configuration...
Generating assembly for certificate bundle...
Dependency Graph
|-- noise-c @ 0.1.4
Generating assembly for .pioenvs/test-media-assistant-v2/duer_profile.S
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/adf_pipeline/adf_audio_element.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/adf_pipeline/adf_audio_process.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/adf_pipeline/adf_audio_sinks.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/adf_pipeline/adf_audio_sources.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/adf_pipeline/adf_pipeline.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/adf_pipeline/adf_pipeline_controller.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/adf_pipeline/media_player/adf_media_player.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/adf_pipeline/microphone/esp_adf_microphone.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/api/api_connection.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/api/api_frame_helper.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/api/api_pb2.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/api/api_pb2_service.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/api/api_server.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/api/list_entities.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/api/proto.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/api/subscribe_state.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/api/user_services.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/button/button.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/esp32/core.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/esp32/gpio.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/esp32/preferences.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/esp32_rmt_led_strip/led_strip.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/i2s_audio/adf_pipeline/adf_i2s_in.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/i2s_audio/adf_pipeline/adf_i2s_out.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/i2s_audio/adf_pipeline/i2s_stream_mod.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/i2s_audio/external_adc.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/i2s_audio/external_dac.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/i2s_audio/i2s_audio.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/light/addressable_light.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/light/automation.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/light/esp_color_correction.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/light/esp_hsv_color.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/light/esp_range_view.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/light/light_call.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/light/light_json_schema.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/light/light_output.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/light/light_state.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/logger/logger.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/logger/logger_esp32.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/logger/logger_esp8266.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/logger/logger_host.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/logger/logger_libretiny.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/logger/logger_rp2040.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/md5/md5.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/mdns/mdns_component.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/mdns/mdns_esp32.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/mdns/mdns_esp8266.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/mdns/mdns_host.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/mdns/mdns_libretiny.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/mdns/mdns_rp2040.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/media_player/media_player.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/micro_wake_word/micro_wake_word.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/network/util.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/ota/ota_backend_arduino_esp32.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/ota/ota_backend_arduino_esp8266.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/ota/ota_backend_arduino_libretiny.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/ota/ota_backend_arduino_rp2040.o
Compiling .pioenvs/test-media-assistant-v2/src/esphome/components/ota/ota_backend_esp_idf.o
.
.
.
.
Linking .pioenvs/test-media-assistant-v2/firmware.elf
/data/cache/platformio/packages/toolchain-xtensa-esp32s3/bin/../lib/gcc/xtensa-esp32s3-elf/8.4.0/../../../../xtensa-esp32s3-elf/bin/ld: missing --end-group; added as last command line option
RAM:   [=         ]  12.0% (used 39348 bytes from 327680 bytes)
Flash: [==        ]  17.8% (used 1444629 bytes from 8126464 bytes)
Building .pioenvs/test-media-assistant-v2/firmware.bin
Creating esp32s3 image...
Successfully created esp32s3 image.
esp32_create_combined_bin([".pioenvs/test-media-assistant-v2/firmware.bin"], [".pioenvs/test-media-assistant-v2/firmware.elf"])
Wrote 0x170c80 bytes to file /data/build/test-media-assistant-v2/.pioenvs/test-media-assistant-v2/firmware-factory.bin, ready to flash to offset 0x0
======================== [SUCCESS] Took 1629.69 seconds ========================
INFO Successfully compiled program.

@imonlinux
Copy link

imonlinux commented May 17, 2024

Hey strusic,

I haven't used that mic board before, but I am using the same output amp. What kind of speaker are you using? I have tried a few and am having a lot of success with this one running at 50% volume:

https://www.amazon.com/dp/B01CHYIU26

When I get home, I will record a video of it working as a reference and link it here.

edit: included volume level

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment