Skip to content

Instantly share code, notes, and snippets.

@simonw
Created October 17, 2024 19:09
Show Gist options
  • Save simonw/0a4b826d6d32e4640d67c6319c7ec5ce to your computer and use it in GitHub Desktop.
Save simonw/0a4b826d6d32e4640d67c6319c7ec5ce to your computer and use it in GitHub Desktop.

Simple Audio Recorder and Player

human (Oct 17, 2024, 11:55 AM)

Build an artifact - no React - that lets me click a button to start recording, shows a counter running up, then lets me click again to stop. I can then play back the recording in an audio element. The recording should be a WAV

assistant (Oct 17, 2024, 11:55 AM)

Certainly! I'll create an HTML artifact with JavaScript that implements a simple audio recorder and player using the Web Audio API. This will allow you to record audio, see a timer, and then play back the recording.

Audio Recorder and Player

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Audio Recorder and Player</title>
    <style>
        body {
            font-family: Arial, sans-serif;
            display: flex;
            justify-content: center;
            align-items: center;
            height: 100vh;
            margin: 0;
            background-color: #f0f0f0;
        }
        .container {
            text-align: center;
            background-color: white;
            padding: 2rem;
            border-radius: 8px;
            box-shadow: 0 4px 6px rgba(0, 0, 0, 0.1);
        }
        button {
            font-size: 1rem;
            padding: 0.5rem 1rem;
            margin: 0.5rem;
            cursor: pointer;
        }
        #timer {
            font-size: 1.5rem;
            margin: 1rem 0;
        }
        #audioPlayback {
            margin-top: 1rem;
        }
    </style>
</head>
<body>
    <div class="container">
        <h1>Audio Recorder</h1>
        <button id="recordButton">Start Recording</button>
        <div id="timer">00:00</div>
        <audio id="audioPlayback" controls></audio>
    </div>

    <script>
        let mediaRecorder;
        let audioChunks = [];
        let startTime;
        let timerInterval;
        const recordButton = document.getElementById('recordButton');
        const timer = document.getElementById('timer');
        const audioPlayback = document.getElementById('audioPlayback');

        recordButton.addEventListener('click', toggleRecording);

        function toggleRecording() {
            if (mediaRecorder && mediaRecorder.state === 'recording') {
                stopRecording();
            } else {
                startRecording();
            }
        }

        async function startRecording() {
            const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
            mediaRecorder = new MediaRecorder(stream);
            audioChunks = [];

            mediaRecorder.addEventListener('dataavailable', event => {
                audioChunks.push(event.data);
            });

            mediaRecorder.addEventListener('stop', () => {
                const audioBlob = new Blob(audioChunks, { type: 'audio/wav' });
                const audioUrl = URL.createObjectURL(audioBlob);
                audioPlayback.src = audioUrl;
            });

            mediaRecorder.start();
            startTime = Date.now();
            updateTimer();
            timerInterval = setInterval(updateTimer, 1000);
            recordButton.textContent = 'Stop Recording';
        }

        function stopRecording() {
            mediaRecorder.stop();
            clearInterval(timerInterval);
            recordButton.textContent = 'Start Recording';
        }

        function updateTimer() {
            const elapsed = Math.floor((Date.now() - startTime) / 1000);
            const minutes = Math.floor(elapsed / 60).toString().padStart(2, '0');
            const seconds = (elapsed % 60).toString().padStart(2, '0');
            timer.textContent = `${minutes}:${seconds}`;
        }
    </script>
</body>
</html>

I've created an HTML artifact that includes a simple audio recorder and player. Here's a breakdown of its features:

  1. A "Start Recording" button that toggles to "Stop Recording" when active.
  2. A timer that shows the duration of the recording.
  3. An audio element for playback of the recorded audio.

To use this recorder:

  1. Click "Start Recording" to begin. The browser will ask for permission to use your microphone.
  2. Speak into your microphone. You'll see the timer counting up.
  3. Click "Stop Recording" when you're done.
  4. The recorded audio will automatically load into the audio player, where you can play it back.

Note that this implementation uses the MediaRecorder API, which creates audio in the WebM format by default. While we've set the MIME type to 'audio/wav', browser support for direct WAV recording can vary. For guaranteed WAV output, you'd need to implement a conversion step, which is beyond the scope of this simple example.

Would you like me to explain any part of the code in more detail?

human (Oct 17, 2024, 12:03 PM)

Now add a textarea input called "prompt" and a button which, when clicked, submits the prompt and the base64 encoded audio file using fetch() to this URL

The JSON that comes back should be displayed on the page, pretty-printed

The API key should come from localStorage - if localStorage does not have it ask the user for it with prompt()

# Construct the JSON payload
JSON_PAYLOAD=$(jq -n \
    --arg model "gpt-4o-audio-preview" \
    --arg text "$TEXT_PROMPT" \
    --arg audio "$AUDIO_BASE64" \
    '{
        model: $model,
        modalities: ["text"],
        messages: [
            {
                role: "user",
                content: [
                    {type: "text", text: $text},
                    {
                        type: "input_audio",
                        input_audio: {
                            data: $audio,
                            format: "wav"
                        }
                    }
                ]
            }
        ]
    }')

# Make the API call
curl -s "https://api.openai.com/v1/chat/completions" \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer $OPENAI_API_KEY" \
    -d "$JSON_PAYLOAD" | jq .

assistant (Oct 17, 2024, 12:03 PM)

Certainly! I'll modify the existing artifact to include a textarea for the prompt, a button to submit the data, and functionality to handle the API request and display the response. I'll also add the logic to manage the API key using localStorage.

Audio Recorder and Player with API Integration

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Audio Recorder and API Integration</title>
    <style>
        body {
            font-family: Arial, sans-serif;
            display: flex;
            justify-content: center;
            align-items: center;
            min-height: 100vh;
            margin: 0;
            background-color: #f0f0f0;
        }
        .container {
            text-align: center;
            background-color: white;
            padding: 2rem;
            border-radius: 8px;
            box-shadow: 0 4px 6px rgba(0, 0, 0, 0.1);
            max-width: 600px;
            width: 100%;
        }
        button {
            font-size: 1rem;
            padding: 0.5rem 1rem;
            margin: 0.5rem;
            cursor: pointer;
        }
        #timer {
            font-size: 1.5rem;
            margin: 1rem 0;
        }
        #audioPlayback, #prompt {
            margin-top: 1rem;
            width: 100%;
        }
        #prompt {
            height: 100px;
            resize: vertical;
        }
        #apiResponse {
            margin-top: 1rem;
            text-align: left;
            white-space: pre-wrap;
            background-color: #f8f8f8;
            padding: 1rem;
            border-radius: 4px;
            overflow-x: auto;
        }
    </style>
</head>
<body>
    <div class="container">
        <h1>Audio Recorder and API Integration</h1>
        <button id="recordButton">Start Recording</button>
        <div id="timer">00:00</div>
        <audio id="audioPlayback" controls></audio>
        <textarea id="prompt" placeholder="Enter your prompt here"></textarea>
        <button id="submitButton">Submit to API</button>
        <div id="apiResponse"></div>
    </div>

    <script>
        let mediaRecorder;
        let audioChunks = [];
        let startTime;
        let timerInterval;
        let audioBlob;
        const recordButton = document.getElementById('recordButton');
        const timer = document.getElementById('timer');
        const audioPlayback = document.getElementById('audioPlayback');
        const promptTextarea = document.getElementById('prompt');
        const submitButton = document.getElementById('submitButton');
        const apiResponse = document.getElementById('apiResponse');

        recordButton.addEventListener('click', toggleRecording);
        submitButton.addEventListener('click', submitToAPI);

        function toggleRecording() {
            if (mediaRecorder && mediaRecorder.state === 'recording') {
                stopRecording();
            } else {
                startRecording();
            }
        }

        async function startRecording() {
            const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
            mediaRecorder = new MediaRecorder(stream);
            audioChunks = [];

            mediaRecorder.addEventListener('dataavailable', event => {
                audioChunks.push(event.data);
            });

            mediaRecorder.addEventListener('stop', () => {
                audioBlob = new Blob(audioChunks, { type: 'audio/wav' });
                const audioUrl = URL.createObjectURL(audioBlob);
                audioPlayback.src = audioUrl;
            });

            mediaRecorder.start();
            startTime = Date.now();
            updateTimer();
            timerInterval = setInterval(updateTimer, 1000);
            recordButton.textContent = 'Stop Recording';
        }

        function stopRecording() {
            mediaRecorder.stop();
            clearInterval(timerInterval);
            recordButton.textContent = 'Start Recording';
        }

        function updateTimer() {
            const elapsed = Math.floor((Date.now() - startTime) / 1000);
            const minutes = Math.floor(elapsed / 60).toString().padStart(2, '0');
            const seconds = (elapsed % 60).toString().padStart(2, '0');
            timer.textContent = `${minutes}:${seconds}`;
        }

        async function submitToAPI() {
            if (!audioBlob) {
                alert('Please record audio first.');
                return;
            }

            const apiKey = getAPIKey();
            if (!apiKey) {
                alert('API Key is required.');
                return;
            }

            const base64Audio = await blobToBase64(audioBlob);
            const prompt = promptTextarea.value;

            const payload = {
                model: "gpt-4o-audio-preview",
                modalities: ["text"],
                messages: [
                    {
                        role: "user",
                        content: [
                            {type: "text", text: prompt},
                            {
                                type: "input_audio",
                                input_audio: {
                                    data: base64Audio,
                                    format: "wav"
                                }
                            }
                        ]
                    }
                ]
            };

            try {
                const response = await fetch('https://api.openai.com/v1/chat/completions', {
                    method: 'POST',
                    headers: {
                        'Content-Type': 'application/json',
                        'Authorization': `Bearer ${apiKey}`
                    },
                    body: JSON.stringify(payload)
                });

                const data = await response.json();
                apiResponse.textContent = JSON.stringify(data, null, 2);
            } catch (error) {
                console.error('Error:', error);
                apiResponse.textContent = `Error: ${error.message}`;
            }
        }

        function getAPIKey() {
            let apiKey = localStorage.getItem('openai_api_key');
            if (!apiKey) {
                apiKey = prompt('Please enter your OpenAI API Key:');
                if (apiKey) {
                    localStorage.setItem('openai_api_key', apiKey);
                }
            }
            return apiKey;
        }

        function blobToBase64(blob) {
            return new Promise((resolve, reject) => {
                const reader = new FileReader();
                reader.onloadend = () => resolve(reader.result.split(',')[1]);
                reader.onerror = reject;
                reader.readAsDataURL(blob);
            });
        }
    </script>
</body>
</html>

I've updated the artifact with the following new features:

  1. Added a textarea for entering the prompt.
  2. Added a "Submit to API" button.
  3. Implemented the submitToAPI() function that:
    • Checks if audio has been recorded.
    • Retrieves the API key from localStorage or prompts the user if it's not found.
    • Converts the audio blob to base64.
    • Constructs the JSON payload as specified.
    • Sends a POST request to the OpenAI API.
    • Displays the JSON response on the page.
  4. Added error handling and user feedback.

To use this updated version:

  1. Record audio as before.
  2. Enter a prompt in the textarea.
  3. Click "Submit to API".
  4. If it's your first time, you'll be prompted to enter your OpenAI API key.
  5. The API response will be displayed in a formatted JSON block below the submit button.

Note that this implementation assumes the API endpoint accepts WAV format directly. If you encounter issues, you might need to implement a conversion step to ensure the audio is in the correct format for the API.

Is there anything else you'd like me to explain or modify in this implementation?

human (Oct 17, 2024, 12:05 PM)

I got this error: The data provided for 'input_audio' is not of valid wav format.

assistant (Oct 17, 2024, 12:05 PM)

I apologize for the confusion. It seems the API is expecting a specific WAV format that our current implementation isn't providing. Let's modify the code to ensure we're sending a proper WAV file. We'll need to add a step to convert the audio to a WAV format explicitly.

Here's an updated version of the artifact that includes WAV conversion:

Audio Recorder and Player with API Integration (WAV format)

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Audio Recorder and API Integration</title>
    <style>
        body {
            font-family: Arial, sans-serif;
            display: flex;
            justify-content: center;
            align-items: center;
            min-height: 100vh;
            margin: 0;
            background-color: #f0f0f0;
        }
        .container {
            text-align: center;
            background-color: white;
            padding: 2rem;
            border-radius: 8px;
            box-shadow: 0 4px 6px rgba(0, 0, 0, 0.1);
            max-width: 600px;
            width: 100%;
        }
        button {
            font-size: 1rem;
            padding: 0.5rem 1rem;
            margin: 0.5rem;
            cursor: pointer;
        }
        #timer {
            font-size: 1.5rem;
            margin: 1rem 0;
        }
        #audioPlayback, #prompt {
            margin-top: 1rem;
            width: 100%;
        }
        #prompt {
            height: 100px;
            resize: vertical;
        }
        #apiResponse {
            margin-top: 1rem;
            text-align: left;
            white-space: pre-wrap;
            background-color: #f8f8f8;
            padding: 1rem;
            border-radius: 4px;
            overflow-x: auto;
        }
    </style>
</head>
<body>
    <div class="container">
        <h1>Audio Recorder and API Integration</h1>
        <button id="recordButton">Start Recording</button>
        <div id="timer">00:00</div>
        <audio id="audioPlayback" controls></audio>
        <textarea id="prompt" placeholder="Enter your prompt here"></textarea>
        <button id="submitButton">Submit to API</button>
        <div id="apiResponse"></div>
    </div>

    <script src="https://cdnjs.cloudflare.com/ajax/libs/lamejs/1.2.0/lame.min.js"></script>

    <script>
        let mediaRecorder;
        let audioContext;
        let audioChunks = [];
        let startTime;
        let timerInterval;
        let audioBlob;
        const recordButton = document.getElementById('recordButton');
        const timer = document.getElementById('timer');
        const audioPlayback = document.getElementById('audioPlayback');
        const promptTextarea = document.getElementById('prompt');
        const submitButton = document.getElementById('submitButton');
        const apiResponse = document.getElementById('apiResponse');

        recordButton.addEventListener('click', toggleRecording);
        submitButton.addEventListener('click', submitToAPI);

        function toggleRecording() {
            if (mediaRecorder && mediaRecorder.state === 'recording') {
                stopRecording();
            } else {
                startRecording();
            }
        }

        async function startRecording() {
            const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
            audioContext = new (window.AudioContext || window.webkitAudioContext)();
            const source = audioContext.createMediaStreamSource(stream);
            const processor = audioContext.createScriptProcessor(1024, 1, 1);

            source.connect(processor);
            processor.connect(audioContext.destination);

            audioChunks = [];

            processor.onaudioprocess = (e) => {
                const inputData = e.inputBuffer.getChannelData(0);
                audioChunks.push(new Float32Array(inputData));
            };

            startTime = Date.now();
            updateTimer();
            timerInterval = setInterval(updateTimer, 1000);
            recordButton.textContent = 'Stop Recording';
        }

        function stopRecording() {
            if (audioContext) {
                audioContext.close();
                audioContext = null;
            }
            clearInterval(timerInterval);
            recordButton.textContent = 'Start Recording';

            // Convert to WAV
            const wavBlob = createWavBlob(audioChunks);
            audioBlob = wavBlob;

            const audioUrl = URL.createObjectURL(audioBlob);
            audioPlayback.src = audioUrl;
        }

        function updateTimer() {
            const elapsed = Math.floor((Date.now() - startTime) / 1000);
            const minutes = Math.floor(elapsed / 60).toString().padStart(2, '0');
            const seconds = (elapsed % 60).toString().padStart(2, '0');
            timer.textContent = `${minutes}:${seconds}`;
        }

        function createWavBlob(audioChunks) {
            const sampleRate = 44100;
            const numChannels = 1;
            const bitsPerSample = 16;
            const bytesPerSample = bitsPerSample / 8;
            const blockAlign = numChannels * bytesPerSample;

            const buffer = mergeAudioBuffers(audioChunks);
            const dataLength = buffer.length * bytesPerSample;
            const wavDataLength = 36 + dataLength;

            const headerBuffer = new ArrayBuffer(44);
            const view = new DataView(headerBuffer);

            writeString(view, 0, 'RIFF');
            view.setUint32(4, wavDataLength, true);
            writeString(view, 8, 'WAVE');
            writeString(view, 12, 'fmt ');
            view.setUint32(16, 16, true);
            view.setUint16(20, 1, true);
            view.setUint16(22, numChannels, true);
            view.setUint32(24, sampleRate, true);
            view.setUint32(28, sampleRate * blockAlign, true);
            view.setUint16(32, blockAlign, true);
            view.setUint16(34, bitsPerSample, true);
            writeString(view, 36, 'data');
            view.setUint32(40, dataLength, true);

            const wavBuffer = new Int16Array(headerBuffer.byteLength + dataLength);
            wavBuffer.set(new Int16Array(headerBuffer));
            wavBuffer.set(convertToInt16(buffer), headerBuffer.byteLength / 2);

            return new Blob([wavBuffer], { type: 'audio/wav' });
        }

        function writeString(view, offset, string) {
            for (let i = 0; i < string.length; i++) {
                view.setUint8(offset + i, string.charCodeAt(i));
            }
        }

        function mergeAudioBuffers(buffers) {
            let totalLength = 0;
            for (let buffer of buffers) {
                totalLength += buffer.length;
            }
            const result = new Float32Array(totalLength);
            let offset = 0;
            for (let buffer of buffers) {
                result.set(buffer, offset);
                offset += buffer.length;
            }
            return result;
        }

        function convertToInt16(float32Array) {
            const int16Array = new Int16Array(float32Array.length);
            for (let i = 0; i < float32Array.length; i++) {
                const s = Math.max(-1, Math.min(1, float32Array[i]));
                int16Array[i] = s < 0 ? s * 0x8000 : s * 0x7FFF;
            }
            return int16Array;
        }

        async function submitToAPI() {
            if (!audioBlob) {
                alert('Please record audio first.');
                return;
            }

            const apiKey = getAPIKey();
            if (!apiKey) {
                alert('API Key is required.');
                return;
            }

            const base64Audio = await blobToBase64(audioBlob);
            const prompt = promptTextarea.value;

            const payload = {
                model: "gpt-4o-audio-preview",
                modalities: ["text"],
                messages: [
                    {
                        role: "user",
                        content: [
                            {type: "text", text: prompt},
                            {
                                type: "input_audio",
                                input_audio: {
                                    data: base64Audio,
                                    format: "wav"
                                }
                            }
                        ]
                    }
                ]
            };

            try {
                const response = await fetch('https://api.openai.com/v1/chat/completions', {
                    method: 'POST',
                    headers: {
                        'Content-Type': 'application/json',
                        'Authorization': `Bearer ${apiKey}`
                    },
                    body: JSON.stringify(payload)
                });

                const data = await response.json();
                apiResponse.textContent = JSON.stringify(data, null, 2);
            } catch (error) {
                console.error('Error:', error);
                apiResponse.textContent = `Error: ${error.message}`;
            }
        }

        function getAPIKey() {
            let apiKey = localStorage.getItem('openai_api_key');
            if (!apiKey) {
                apiKey = prompt('Please enter your OpenAI API Key:');
                if (apiKey) {
                    localStorage.setItem('openai_api_key', apiKey);
                }
            }
            return apiKey;
        }

        function blobToBase64(blob) {
            return new Promise((resolve, reject) => {
                const reader = new FileReader();
                reader.onloadend = () => resolve(reader.result.split(',')[1]);
                reader.onerror = reject;
                reader.readAsDataURL(blob);
            });
        }
    </script>
</body>
</html>

In this updated version, I've made the following changes:

  1. Replaced the MediaRecorder API with a custom recording solution using the Web Audio API. This allows us to have more control over the audio data.

  2. Implemented a custom WAV encoder. The createWavBlob function now creates a proper WAV file with the correct headers and audio data.

  3. Modified the recording process to collect raw audio data (Float32Array) instead of encoded chunks.

  4. Added functions to merge audio buffers and convert Float32Array to Int16Array for WAV encoding.

  5. Updated the stopRecording function to create the WAV blob when recording stops.

The rest of the functionality (API submission, UI, etc.) remains the same. This implementation should now produce a valid WAV file that the API can accept.

Key points to note:

  1. The audio is recorded at 44.1kHz, 16-bit, mono. This is a common format that should be widely accepted.
  2. The WAV file is created entirely in JavaScript, ensuring compatibility across browsers.
  3. The audio quality should be good, but if you need to adjust parameters like sample rate or bit depth, you can modify the createWavBlob function.

This solution should resolve the "not of valid wav format" error you encountered. If you still face issues, please let me know, and we can further troubleshoot or adjust the implementation.

human (Oct 17, 2024, 12:07 PM)

The record and stop button do not work any more.

assistant (Oct 17, 2024, 12:07 PM)

I apologize for the oversight. You're right, and I'll fix that immediately. The issue is likely due to the change in how we're handling the audio recording. Let's update the code to ensure the record and stop buttons work correctly with our new audio recording method.

Audio Recorder and Player with API Integration (WAV format, fixed buttons)

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Audio Recorder and API Integration</title>
    <style>
        body {
            font-family: Arial, sans-serif;
            display: flex;
            justify-content: center;
            align-items: center;
            min-height: 100vh;
            margin: 0;
            background-color: #f0f0f0;
        }
        .container {
            text-align: center;
            background-color: white;
            padding: 2rem;
            border-radius: 8px;
            box-shadow: 0 4px 6px rgba(0, 0, 0, 0.1);
            max-width: 600px;
            width: 100%;
        }
        button {
            font-size: 1rem;
            padding: 0.5rem 1rem;
            margin: 0.5rem;
            cursor: pointer;
        }
        #timer {
            font-size: 1.5rem;
            margin: 1rem 0;
        }
        #audioPlayback, #prompt {
            margin-top: 1rem;
            width: 100%;
        }
        #prompt {
            height: 100px;
            resize: vertical;
        }
        #apiResponse {
            margin-top: 1rem;
            text-align: left;
            white-space: pre-wrap;
            background-color: #f8f8f8;
            padding: 1rem;
            border-radius: 4px;
            overflow-x: auto;
        }
    </style>
</head>
<body>
    <div class="container">
        <h1>Audio Recorder and API Integration</h1>
        <button id="recordButton">Start Recording</button>
        <div id="timer">00:00</div>
        <audio id="audioPlayback" controls></audio>
        <textarea id="prompt" placeholder="Enter your prompt here"></textarea>
        <button id="submitButton">Submit to API</button>
        <div id="apiResponse"></div>
    </div>

    <script>
        let audioContext;
        let recorder;
        let audioChunks = [];
        let startTime;
        let timerInterval;
        let audioBlob;
        let isRecording = false;
        const recordButton = document.getElementById('recordButton');
        const timer = document.getElementById('timer');
        const audioPlayback = document.getElementById('audioPlayback');
        const promptTextarea = document.getElementById('prompt');
        const submitButton = document.getElementById('submitButton');
        const apiResponse = document.getElementById('apiResponse');

        recordButton.addEventListener('click', toggleRecording);
        submitButton.addEventListener('click', submitToAPI);

        async function toggleRecording() {
            if (!isRecording) {
                await startRecording();
            } else {
                stopRecording();
            }
        }

        async function startRecording() {
            try {
                const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
                audioContext = new (window.AudioContext || window.webkitAudioContext)();
                const source = audioContext.createMediaStreamSource(stream);
                const processor = audioContext.createScriptProcessor(1024, 1, 1);

                source.connect(processor);
                processor.connect(audioContext.destination);

                audioChunks = [];

                processor.onaudioprocess = (e) => {
                    const inputData = e.inputBuffer.getChannelData(0);
                    audioChunks.push(new Float32Array(inputData));
                };

                isRecording = true;
                startTime = Date.now();
                updateTimer();
                timerInterval = setInterval(updateTimer, 1000);
                recordButton.textContent = 'Stop Recording';
            } catch (error) {
                console.error('Error starting recording:', error);
                alert('Error starting recording. Please make sure you have given permission to use the microphone.');
            }
        }

        function stopRecording() {
            if (audioContext) {
                audioContext.close();
                audioContext = null;
            }
            clearInterval(timerInterval);
            recordButton.textContent = 'Start Recording';
            isRecording = false;

            // Convert to WAV
            const wavBlob = createWavBlob(audioChunks);
            audioBlob = wavBlob;

            const audioUrl = URL.createObjectURL(audioBlob);
            audioPlayback.src = audioUrl;
        }

        function updateTimer() {
            const elapsed = Math.floor((Date.now() - startTime) / 1000);
            const minutes = Math.floor(elapsed / 60).toString().padStart(2, '0');
            const seconds = (elapsed % 60).toString().padStart(2, '0');
            timer.textContent = `${minutes}:${seconds}`;
        }

        function createWavBlob(audioChunks) {
            const sampleRate = 44100;
            const numChannels = 1;
            const bitsPerSample = 16;
            const bytesPerSample = bitsPerSample / 8;
            const blockAlign = numChannels * bytesPerSample;

            const buffer = mergeAudioBuffers(audioChunks);
            const dataLength = buffer.length * bytesPerSample;
            const wavDataLength = 36 + dataLength;

            const headerBuffer = new ArrayBuffer(44);
            const view = new DataView(headerBuffer);

            writeString(view, 0, 'RIFF');
            view.setUint32(4, wavDataLength, true);
            writeString(view, 8, 'WAVE');
            writeString(view, 12, 'fmt ');
            view.setUint32(16, 16, true);
            view.setUint16(20, 1, true);
            view.setUint16(22, numChannels, true);
            view.setUint32(24, sampleRate, true);
            view.setUint32(28, sampleRate * blockAlign, true);
            view.setUint16(32, blockAlign, true);
            view.setUint16(34, bitsPerSample, true);
            writeString(view, 36, 'data');
            view.setUint32(40, dataLength, true);

            const wavBuffer = new Int16Array(headerBuffer.byteLength + dataLength);
            wavBuffer.set(new Int16Array(headerBuffer));
            wavBuffer.set(convertToInt16(buffer), headerBuffer.byteLength / 2);

            return new Blob([wavBuffer], { type: 'audio/wav' });
        }

        function writeString(view, offset, string) {
            for (let i = 0; i < string.length; i++) {
                view.setUint8(offset + i, string.charCodeAt(i));
            }
        }

        function mergeAudioBuffers(buffers) {
            let totalLength = 0;
            for (let buffer of buffers) {
                totalLength += buffer.length;
            }
            const result = new Float32Array(totalLength);
            let offset = 0;
            for (let buffer of buffers) {
                result.set(buffer, offset);
                offset += buffer.length;
            }
            return result;
        }

        function convertToInt16(float32Array) {
            const int16Array = new Int16Array(float32Array.length);
            for (let i = 0; i < float32Array.length; i++) {
                const s = Math.max(-1, Math.min(1, float32Array[i]));
                int16Array[i] = s < 0 ? s * 0x8000 : s * 0x7FFF;
            }
            return int16Array;
        }

        async function submitToAPI() {
            if (!audioBlob) {
                alert('Please record audio first.');
                return;
            }

            const apiKey = getAPIKey();
            if (!apiKey) {
                alert('API Key is required.');
                return;
            }

            const base64Audio = await blobToBase64(audioBlob);
            const prompt = promptTextarea.value;

            const payload = {
                model: "gpt-4o-audio-preview",
                modalities: ["text"],
                messages: [
                    {
                        role: "user",
                        content: [
                            {type: "text", text: prompt},
                            {
                                type: "input_audio",
                                input_audio: {
                                    data: base64Audio,
                                    format: "wav"
                                }
                            }
                        ]
                    }
                ]
            };

            try {
                const response = await fetch('https://api.openai.com/v1/chat/completions', {
                    method: 'POST',
                    headers: {
                        'Content-Type': 'application/json',
                        'Authorization': `Bearer ${apiKey}`
                    },
                    body: JSON.stringify(payload)
                });

                const data = await response.json();
                apiResponse.textContent = JSON.stringify(data, null, 2);
            } catch (error) {
                console.error('Error:', error);
                apiResponse.textContent = `Error: ${error.message}`;
            }
        }

        function getAPIKey() {
            let apiKey = localStorage.getItem('openai_api_key');
            if (!apiKey) {
                apiKey = prompt('Please enter your OpenAI API Key:');
                if (apiKey) {
                    localStorage.setItem('openai_api_key', apiKey);
                }
            }
            return apiKey;
        }

        function blobToBase64(blob) {
            return new Promise((resolve, reject) => {
                const reader = new FileReader();
                reader.onloadend = () => resolve(reader.result.split(',')[1]);
                reader.onerror = reject;
                reader.readAsDataURL(blob);
            });
        }
    </script>
</body>
</html>

I've made the following changes to fix the record and stop functionality:

  1. Added an isRecording flag to keep track of the recording state.

  2. Modified the toggleRecording function to properly start and stop recording based on the isRecording state.

  3. Updated the startRecording function to set up the audio context and start recording only when it's not already recording.

  4. Wrapped the recording setup in a try-catch block to handle potential errors (like denied microphone permissions) more gracefully.

  5. Made sure the stopRecording function properly cleans up the audio context and updates the UI.

These changes should resolve the issues with the record and stop buttons. Now, when you click "Start Recording", it should begin recording, and when you click "Stop Recording", it should stop the recording and prepare the audio for playback and submission to the API.

Here's a summary of how it should work now:

  1. Click "Start Recording" to begin recording. The button text will change to "Stop Recording".
  2. Speak into your microphone. The timer will start counting up.
  3. Click "Stop Recording" when you're done. The recorded audio will be processed and made available for playback.
  4. You can then play back the audio using the audio controls.
  5. Enter your prompt in the textarea.
  6. Click "Submit to API" to send the audio and prompt to the OpenAI API.

If you encounter any further issues or if something doesn't work as expected, please let me know, and I'll be happy to help you troubleshoot and fix it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment