This note is about the caller assistant project, my first voice AI project. In this project, I created a voice AI assistant using Twilio as telephony, OpenAI as LLM, and deepgram speech-to-text provider, and then Amazon Polly powered by Twilio as text-to-speech provider. This project was all about navigating an IVR to connect to a human agent to verify some information on a customer’s profile. The customers have their information related to their health records, their insurance, etc., and they are associated with some companies that want to verify their health records in healthcare agencies. We build a voice AI assistant/agent that navigates the IVR, connects with a human agent, and finds out if the current information available to us is correctly updated in the healthcare agency’s records or not. If not, it asks them what’s the best method to update them? Confirms everything and then at the end of the call, sends an end-of-call report to the customer, provider, or customer manager here to update the customer
Impromptu Google Meet Meeting - September 24
0:03 - shubham bajaj (shmbajaj32@gmail.com) Okay, this project was named as Caller Assistant Prompt and it had three steps to achieving success. First one was to navigate an IVR. Basically, this means we had a decision tree where we have listed down the different key presses or content to be spoken out to navigate an IVR. Once that is done, we used to identify if the call is connected with human or not. If it's connected with human, then we move to step 2 where we actually verify the information. about our customer's customer on file with healthcare agencies and then finally at the end we send the structured output or end of call report back to the customer so now when i say healthcare agency here what i mean is our customer has few clients and they want to verify if their clients customers current data matches the records of these healthcare agencies or healthcare insurance companies or not if it's out of date or it's not moved to in progress for insura
SIP (Session Initiation Protocol) communication requires mutual trust between parties. When you create SIP credentials, you're establishing a bidirectional security relationship.
Every SIP connection involves two fundamental security layers:
SIP credential creation fails when you use a hostname (domain name) for a gateway that receives inbound calls.
The SIP infrastructure (Jambonz) upgraded and now enforces validation rules. Previously, hostname configurations were accepted but never worked for inbound calls. Now the system blocks invalid configurations immediately.
You are a Pronunciation Transformation Assistant for text-to-speech (TTS) models. Your workflow always follows these steps:
-
Context Intake:
- Ask the user to provide the specific pronunciation transformation task and include a few representative examples.
- The context can be any domain (e.g., numbers, dates, abbreviations, measurements, technical terms).
-
Problem Clarification:
- Restate the problem in your own words to confirm understanding.
- Ask clarifying questions if the context is ambiguous or incomplete.
export default { | |
async fetch(request) { | |
const url = new URL(request.url); | |
// Handle CORS preflight | |
if (request.method === 'OPTIONS') { | |
return new Response(null, { | |
status: 204, | |
headers: { | |
'Access-Control-Allow-Origin': '*', |
const CONFIG = { | |
MODEL_NAME: 'static-model', | |
DELAYS: { | |
WORD: 20, | |
SENTENCE: 50, | |
TOOL_CALL: 50, | |
}, | |
RESPONSES: { | |
NORMAL: [ | |
'Hello! This is a static response.', |
-
Query Formation:
- When a user sends a message, Vapi extracts the most recent user queries (up to the last 3).
- These messages are joined with periods to form a single query string for Trieve
-
Knowledge Base Retrieval:
- Vapi calls the internal function which: