Skip to content

Instantly share code, notes, and snippets.

@simonw
Last active October 22, 2024 02:04
Show Gist options
  • Save simonw/0a7d0ddeb0fdd63a844669475778ca06 to your computer and use it in GitHub Desktop.
Save simonw/0a7d0ddeb0fdd63a844669475778ca06 to your computer and use it in GitHub Desktop.

HTML Text to URL Extractor

human (Oct 15, 2024, 04:12 PM)

Build an artifact - no react - that lets me paste in rich HTML text and when I do it outputs a list of all of the URLs from links that I pasted in in a textarea

assistant (Oct 15, 2024, 04:12 PM)

Certainly, I can help you create an HTML artifact that allows you to paste rich HTML text and extract URLs from the links within it. Here's a simple implementation:

URL Extractor from Rich HTML

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>URL Extractor</title>
    <style>
        body {
            font-family: Arial, sans-serif;
            max-width: 800px;
            margin: 0 auto;
            padding: 20px;
        }
        textarea {
            width: 100%;
            height: 200px;
            margin-bottom: 10px;
        }
        #output {
            white-space: pre-wrap;
            border: 1px solid #ccc;
            padding: 10px;
            background-color: #f0f0f0;
        }
    </style>
</head>
<body>
    <h1>URL Extractor</h1>
    <p>Paste your rich HTML text below:</p>
    <textarea id="input" placeholder="Paste your HTML here"></textarea>
    <button onclick="extractURLs()">Extract URLs</button>
    <h2>Extracted URLs:</h2>
    <div id="output"></div>

    <script>
        function extractURLs() {
            const input = document.getElementById('input').value;
            const parser = new DOMParser();
            const doc = parser.parseFromString(input, 'text/html');
            const links = doc.getElementsByTagName('a');
            const urls = Array.from(links).map(link => link.href).filter(url => url.startsWith('http'));
            
            const output = document.getElementById('output');
            output.textContent = urls.join('\n');
        }
    </script>
</body>
</html>

This HTML page provides a simple interface for extracting URLs from pasted rich HTML text. Here's how it works:

  1. There's a textarea where you can paste your rich HTML content.
  2. When you click the "Extract URLs" button, it triggers the extractURLs() function.
  3. The function uses DOMParser to parse the input HTML.
  4. It then finds all <a> tags in the parsed HTML and extracts their href attributes.
  5. The extracted URLs are filtered to include only those starting with "http" to avoid relative links or other non-URL href values.
  6. Finally, the URLs are displayed in the output div, one per line.

You can copy this HTML and save it as a .html file, then open it in a web browser to use the URL extractor. Let me know if you need any modifications or have any questions!

human (Oct 15, 2024, 04:13 PM)

no I don't want to paste HTML, I want to paste a rich text clipboard with fancy rich text on it and have that get treated as HTML and then a href parsed out of it with the DOM

It shouldn't have a submit button either - onpaste it should do the thing

assistant (Oct 15, 2024, 04:13 PM)

I apologize for the misunderstanding. You're right, we want to handle rich text pasted directly from the clipboard, not HTML. Let's create a new artifact that addresses these requirements:

URL Extractor from Rich Text Clipboard

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Rich Text URL Extractor</title>
    <style>
        body {
            font-family: Arial, sans-serif;
            max-width: 800px;
            margin: 0 auto;
            padding: 20px;
        }
        #input {
            width: 100%;
            height: 200px;
            margin-bottom: 10px;
            border: 1px solid #ccc;
            padding: 10px;
        }
        #output {
            white-space: pre-wrap;
            border: 1px solid #ccc;
            padding: 10px;
            background-color: #f0f0f0;
        }
    </style>
</head>
<body>
    <h1>Rich Text URL Extractor</h1>
    <p>Paste your rich text content below:</p>
    <div id="input" contenteditable="true"></div>
    <h2>Extracted URLs:</h2>
    <div id="output"></div>

    <script>
        const input = document.getElementById('input');
        const output = document.getElementById('output');

        input.addEventListener('paste', function(e) {
            e.preventDefault();
            
            // Get clipboard data as HTML
            const clipboardData = e.clipboardData || window.clipboardData;
            const pastedData = clipboardData.getData('text/html');

            // Create a temporary element to hold the pasted content
            const temp = document.createElement('div');
            temp.innerHTML = pastedData;

            // Extract URLs from links
            const links = temp.getElementsByTagName('a');
            const urls = Array.from(links)
                .map(link => link.href)
                .filter(url => url.startsWith('http'));

            // Display the extracted URLs
            output.textContent = urls.join('\n');

            // Insert the rich text content into the input div
            input.innerHTML = temp.innerHTML;
        });
    </script>
</body>
</html>

This new version addresses your requirements:

  1. It uses a contenteditable div instead of a textarea, which allows for rich text input.
  2. The URL extraction happens automatically on paste, without needing a submit button.
  3. It captures the pasted content as HTML from the clipboard.
  4. The pasted rich text is displayed in the input div, preserving its formatting.
  5. URLs are extracted from the HTML representation of the pasted content.

Here's how it works:

  1. When you paste rich text into the input div, the paste event is intercepted.
  2. The clipboard data is retrieved as HTML.
  3. A temporary div is created to hold this HTML content.
  4. URLs are extracted from all <a> tags in this temporary div.
  5. The extracted URLs are displayed in the output div.
  6. The rich text content is then inserted into the input div, preserving its formatting.

You can copy this HTML and save it as a .html file, then open it in a web browser to use the Rich Text URL extractor. Simply paste your rich text content with links into the input area, and you'll see the extracted URLs appear below.

human (Oct 15, 2024, 04:13 PM)

that's nearly it, but make sure that the HTML I paste in is not displayed at all, just the extracted links

assistant (Oct 15, 2024, 04:13 PM)

Certainly, I'll modify the artifact to hide the pasted HTML content and only show the extracted links. Here's the updated version:

URL Extractor from Hidden Rich Text

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Hidden Rich Text URL Extractor</title>
    <style>
        body {
            font-family: Arial, sans-serif;
            max-width: 800px;
            margin: 0 auto;
            padding: 20px;
        }
        #input {
            width: 100%;
            height: 200px;
            margin-bottom: 10px;
            border: 1px solid #ccc;
            padding: 10px;
            background-color: #f0f0f0;
        }
        #output {
            white-space: pre-wrap;
            border: 1px solid #ccc;
            padding: 10px;
            background-color: #fff;
        }
    </style>
</head>
<body>
    <h1>Hidden Rich Text URL Extractor</h1>
    <p>Paste your rich text content below (it will not be displayed):</p>
    <div id="input" contenteditable="true"></div>
    <h2>Extracted URLs:</h2>
    <div id="output"></div>

    <script>
        const input = document.getElementById('input');
        const output = document.getElementById('output');

        input.addEventListener('paste', function(e) {
            e.preventDefault();
            
            // Get clipboard data as HTML
            const clipboardData = e.clipboardData || window.clipboardData;
            const pastedData = clipboardData.getData('text/html');

            // Create a temporary element to hold the pasted content
            const temp = document.createElement('div');
            temp.innerHTML = pastedData;

            // Extract URLs from links
            const links = temp.getElementsByTagName('a');
            const urls = Array.from(links)
                .map(link => link.href)
                .filter(url => url.startsWith('http'));

            // Display the extracted URLs
            output.textContent = urls.join('\n');

            // Clear the input div and show a message
            input.textContent = 'Content pasted. URLs extracted.';
        });

        // Clear the input message when focused
        input.addEventListener('focus', function() {
            if (input.textContent === 'Content pasted. URLs extracted.') {
                input.textContent = '';
            }
        });
    </script>
</body>
</html>

This updated version addresses your requirement to hide the pasted HTML content. Here are the key changes and how it works:

  1. The pasted HTML content is no longer displayed in the input div.
  2. After pasting, the input div shows a message "Content pasted. URLs extracted." instead of the actual content.
  3. The extracted URLs are still displayed in the output div below.
  4. The input div is cleared when focused, allowing for new content to be pasted.

To use this:

  1. Copy this HTML and save it as a .html file, then open it in a web browser.
  2. Copy some rich text content with links from any source (e.g., a web page, a Word document).
  3. Click on the input area and paste (Ctrl+V or Command+V) your copied content.
  4. The input area will show "Content pasted. URLs extracted.", and the extracted URLs will appear in the output area below.
  5. To paste new content, simply click on the input area (it will clear) and paste again.

This version keeps the functionality of extracting URLs from rich text content while ensuring that the pasted content itself remains hidden. Let me know if you need any further modifications or have any questions!

human (Oct 15, 2024, 04:15 PM)

make it so the output is a textarea too, and add a "copy to clipboard" button which changes text to "Copied" and then back to normal 1.5s later after it is clicked

Make the initial textarea you paste into quite a bit smaller

Hide the Extracted URLs heading and the output textarea until there is something to display

assistant (Oct 15, 2024, 04:15 PM)

Certainly! I'll update the artifact with these changes. Here's the revised version:

Enhanced URL Extractor from Hidden Rich Text

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Enhanced Hidden Rich Text URL Extractor</title>
    <style>
        body {
            font-family: Arial, sans-serif;
            max-width: 800px;
            margin: 0 auto;
            padding: 20px;
        }
        #input {
            width: 100%;
            height: 100px;
            margin-bottom: 10px;
            border: 1px solid #ccc;
            padding: 10px;
            background-color: #f0f0f0;
        }
        #output-container {
            display: none;
        }
        #output {
            width: 100%;
            height: 150px;
            margin-bottom: 10px;
            border: 1px solid #ccc;
            padding: 10px;
            background-color: #fff;
        }
        #copy-button {
            padding: 5px 10px;
            cursor: pointer;
        }
    </style>
</head>
<body>
    <h1>Enhanced Hidden Rich Text URL Extractor</h1>
    <p>Paste your rich text content below (it will not be displayed):</p>
    <div id="input" contenteditable="true"></div>
    <div id="output-container">
        <h2>Extracted URLs:</h2>
        <textarea id="output" readonly></textarea>
        <button id="copy-button">Copy to clipboard</button>
    </div>

    <script>
        const input = document.getElementById('input');
        const outputContainer = document.getElementById('output-container');
        const output = document.getElementById('output');
        const copyButton = document.getElementById('copy-button');

        input.addEventListener('paste', function(e) {
            e.preventDefault();
            
            const clipboardData = e.clipboardData || window.clipboardData;
            const pastedData = clipboardData.getData('text/html');

            const temp = document.createElement('div');
            temp.innerHTML = pastedData;

            const links = temp.getElementsByTagName('a');
            const urls = Array.from(links)
                .map(link => link.href)
                .filter(url => url.startsWith('http'));

            if (urls.length > 0) {
                output.value = urls.join('\n');
                outputContainer.style.display = 'block';
            } else {
                outputContainer.style.display = 'none';
            }

            input.textContent = 'Content pasted. URLs extracted.';
        });

        input.addEventListener('focus', function() {
            if (input.textContent === 'Content pasted. URLs extracted.') {
                input.textContent = '';
            }
        });

        copyButton.addEventListener('click', function() {
            output.select();
            document.execCommand('copy');
            
            const originalText = copyButton.textContent;
            copyButton.textContent = 'Copied!';
            
            setTimeout(() => {
                copyButton.textContent = originalText;
            }, 1500);
        });
    </script>
</body>
</html>

Here's a summary of the changes and how the updated version works:

  1. The input area is now smaller (height: 100px instead of 200px).

  2. The output is now a textarea instead of a div, making it easier to select and copy the content.

  3. Added a "Copy to clipboard" button:

    • When clicked, it copies the content of the output textarea.
    • The button text changes to "Copied!" for 1.5 seconds before reverting back.
  4. The "Extracted URLs" heading and the output textarea are hidden initially:

    • They only appear when there are URLs to display.
    • If no URLs are found in the pasted content, the output area remains hidden.
  5. The functionality for pasting and extracting URLs remains the same.

To use this updated version:

  1. Copy the HTML and save it as a .html file, then open it in a web browser.
  2. Copy some rich text content with links from any source.
  3. Click on the input area and paste (Ctrl+V or Command+V) your copied content.
  4. If URLs are found, they will appear in the output textarea below. If not, no output area will be shown.
  5. To copy the extracted URLs, click the "Copy to clipboard" button.
  6. To paste new content, simply click on the input area (it will clear) and paste again.

This enhanced version provides a more streamlined user experience with the requested features. Let me know if you need any further adjustments or have any questions!

human (Oct 15, 2024, 04:17 PM)

The page title and first heading should say "Extract URLs" - then the help copy should say "Copy content from a web page and paste here to extract linked URLs"

The Extracted URLs: heading should just say "Extracted" - no colon

Make sure to have box-sizing: border-box on everything and set the max-width on the textareas to 100% - make sure it looks OK on mobile

assistant (Oct 15, 2024, 04:17 PM)

Certainly! I'll make these adjustments to improve the user experience and ensure it looks good on mobile devices. Here's the updated version:

Extract URLs from Rich Text

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Extract URLs</title>
    <style>
        * {
            box-sizing: border-box;
        }
        body {
            font-family: Arial, sans-serif;
            max-width: 800px;
            margin: 0 auto;
            padding: 20px;
        }
        #input {
            width: 100%;
            max-width: 100%;
            height: 100px;
            margin-bottom: 10px;
            border: 1px solid #ccc;
            padding: 10px;
            background-color: #f0f0f0;
        }
        #output-container {
            display: none;
        }
        #output {
            width: 100%;
            max-width: 100%;
            height: 150px;
            margin-bottom: 10px;
            border: 1px solid #ccc;
            padding: 10px;
            background-color: #fff;
        }
        #copy-button {
            padding: 5px 10px;
            cursor: pointer;
        }
        @media (max-width: 600px) {
            body {
                padding: 10px;
            }
            #input, #output {
                height: 120px;
            }
        }
    </style>
</head>
<body>
    <h1>Extract URLs</h1>
    <p>Copy content from a web page and paste here to extract linked URLs:</p>
    <div id="input" contenteditable="true"></div>
    <div id="output-container">
        <h2>Extracted</h2>
        <textarea id="output" readonly></textarea>
        <button id="copy-button">Copy to clipboard</button>
    </div>

    <script>
        const input = document.getElementById('input');
        const outputContainer = document.getElementById('output-container');
        const output = document.getElementById('output');
        const copyButton = document.getElementById('copy-button');

        input.addEventListener('paste', function(e) {
            e.preventDefault();
            
            const clipboardData = e.clipboardData || window.clipboardData;
            const pastedData = clipboardData.getData('text/html');

            const temp = document.createElement('div');
            temp.innerHTML = pastedData;

            const links = temp.getElementsByTagName('a');
            const urls = Array.from(links)
                .map(link => link.href)
                .filter(url => url.startsWith('http'));

            if (urls.length > 0) {
                output.value = urls.join('\n');
                outputContainer.style.display = 'block';
            } else {
                outputContainer.style.display = 'none';
            }

            input.textContent = 'Content pasted. URLs extracted.';
        });

        input.addEventListener('focus', function() {
            if (input.textContent === 'Content pasted. URLs extracted.') {
                input.textContent = '';
            }
        });

        copyButton.addEventListener('click', function() {
            output.select();
            document.execCommand('copy');
            
            const originalText = copyButton.textContent;
            copyButton.textContent = 'Copied!';
            
            setTimeout(() => {
                copyButton.textContent = originalText;
            }, 1500);
        });
    </script>
</body>
</html>

Here's a summary of the changes made:

  1. Updated the page title and main heading to "Extract URLs".

  2. Changed the help text to "Copy content from a web page and paste here to extract linked URLs".

  3. Changed the "Extracted URLs:" heading to simply "Extracted" (no colon).

  4. Added box-sizing: border-box to all elements using the universal selector *.

  5. Set max-width: 100% on both input and output areas to ensure they don't overflow on mobile devices.

  6. Added a media query for screens smaller than 600px to adjust padding and heights for better mobile experience.

  7. Ensured that all widths are set to 100% for full responsiveness.

These changes improve the user interface, make the purpose of the tool clearer, and ensure that it looks good and functions well on both desktop and mobile devices.

To use this updated version:

  1. Copy the HTML and save it as a .html file, then open it in a web browser (works on both desktop and mobile).
  2. Copy some content with links from any web page.
  3. Click on the input area and paste (Ctrl+V, Command+V, or long-press and paste on mobile) your copied content.
  4. If URLs are found, they will appear in the output textarea below. If not, no output area will be shown.
  5. To copy the extracted URLs, click the "Copy to clipboard" button.
  6. To paste new content, simply click on the input area (it will clear) and paste again.

This version should now be more user-friendly and responsive across different devices. Let me know if you need any further adjustments or have any questions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment