Skip to content

Instantly share code, notes, and snippets.

@simonw
Created November 12, 2024 16:03
Show Gist options
  • Save simonw/799babf92e1eaf36a5336b4889f72492 to your computer and use it in GitHub Desktop.
Save simonw/799babf92e1eaf36a5336b4889f72492 to your computer and use it in GitHub Desktop.

Video Frame Extraction with Bounding Boxes

human (Nov 12, 2024, 07:55 AM)

#!/bin/zsh

# Check if the video filename is provided as an argument
if [ $# -eq 0 ]; then
    echo "Please provide the video filename as an argument."
    exit 1
fi

video_filename=$1

# Function to convert time in seconds to HH:MM:SS format
function seconds_to_hms {
    local total_seconds=$1
    local hours=$((total_seconds / 3600))
    local minutes=$(((total_seconds % 3600) / 60))
    local seconds=$((total_seconds % 60))
    printf "%02d:%02d:%02d\n" $hours $minutes $seconds
}

# Function to capture frame
function capture_frame {
    local current_time=$1
    local formatted_time=$(seconds_to_hms $current_time)
    local filename="/tmp/frame_${formatted_time//:/}.jpg"
    
    # Run ffmpeg to capture the frame
    ffmpeg -ss $formatted_time -i "$video_filename" -frames:v 1 $filename
    
    echo "Saved frame to $filename"
}

# Main loop
while true; do
    echo "Press any key to capture the current video frame..."
    read -sk 1 key

    # Use osascript to get the current time from QuickTime Player
    current_time=$(osascript -e 'tell application "QuickTime Player" to tell document 1 to get current time')

    # Capture the frame at the current time
    capture_frame $current_time
done



Add a feature to this script so I can optionally start it with one or more --box '0,30,70,90 options - each option specifies a bounding box on the captured frame, and if the command was started with any of those options each frame is output once for the original and then once each for each of the bounding box specifications - with the bbox coords as a suffix in that captured filename

Design a syntax for bounding boxes that makes sense and is consistent with various other tools

assistant (Nov 12, 2024, 07:55 AM)

I'll help modify the script to add bounding box support. I'll use the common x1,y1,x2,y2 format where coordinates are specified as percentages (0-100) of the image dimensions, which is consistent with many computer vision and image processing tools.

create Frame Capture Script with Bounding Box Support

#!/bin/zsh

# Initialize array for storing bounding boxes
typeset -a bounding_boxes

# Parse command line arguments
while [[ $# -gt 0 ]]; do
    case $1 in
        --box)
            if [[ $2 =~ ^[0-9]+,[0-9]+,[0-9]+,[0-9]+$ ]]; then
                # Validate coordinates are within 0-100 range
                local coords=(${(s:,:)2})
                local valid=true
                for coord in $coords; do
                    if (( coord < 0 || coord > 100 )); then
                        valid=false
                        break
                    fi
                done
                if $valid; then
                    bounding_boxes+=($2)
                else
                    echo "Error: Bounding box coordinates must be between 0 and 100"
                    exit 1
                fi
            else
                echo "Error: Invalid bounding box format. Use: --box 'x1,y1,x2,y2' where coordinates are percentages (0-100)"
                exit 1
            fi
            shift 2
            ;;
        *)
            video_filename=$1
            shift
            ;;
    esac
done

# Check if the video filename is provided
if [[ -z $video_filename ]]; then
    echo "Please provide the video filename as an argument."
    exit 1
fi

# Function to convert time in seconds to HH:MM:SS format
function seconds_to_hms {
    local total_seconds=$1
    local hours=$((total_seconds / 3600))
    local minutes=$(((total_seconds % 3600) / 60))
    local seconds=$((total_seconds % 60))
    printf "%02d:%02d:%02d\n" $hours $minutes $seconds
}

# Function to capture frame with optional bounding box
function capture_frame {
    local current_time=$1
    local bbox=$2
    local formatted_time=$(seconds_to_hms $current_time)
    
    # Base filename without bbox suffix
    local base_filename="/tmp/frame_${formatted_time//:/}"
    
    if [[ -z $bbox ]]; then
        # Capture full frame
        local filename="${base_filename}.jpg"
        ffmpeg -ss $formatted_time -i "$video_filename" -frames:v 1 "$filename" -y
        echo "Saved full frame to $filename"
    else
        # Capture cropped frame using the bounding box
        local coords=(${(s:,:)bbox})
        local x1=$coords[1]
        local y1=$coords[2]
        local x2=$coords[3]
        local y2=$coords[4]
        
        # Calculate width and height as percentages
        local width=$((x2 - x1))
        local height=$((y2 - y1))
        
        local filename="${base_filename}_bbox${x1}_${y1}_${x2}_${y2}.jpg"
        ffmpeg -ss $formatted_time -i "$video_filename" -frames:v 1 \
            -filter:v "crop=iw*${width}/100:ih*${height}/100:iw*${x1}/100:ih*${y1}/100" \
            "$filename" -y
        echo "Saved cropped frame to $filename"
    fi
}

# Main loop
while true; do
    echo "Press any key to capture the current video frame..."
    read -sk 1 key

    # Use osascript to get the current time from QuickTime Player
    current_time=$(osascript -e 'tell application "QuickTime Player" to tell document 1 to get current time')

    # Capture the full frame first
    capture_frame $current_time
    
    # Then capture each bounding box if any were specified
    for bbox in $bounding_boxes; do
        capture_frame $current_time $bbox
    done
done

I've modified the script to add bounding box support with these key changes:

  1. Added support for multiple --box arguments using the format x1,y1,x2,y2 where:

    • x1,y1: Top-left corner coordinates as percentages (0-100) of image width/height
    • x2,y2: Bottom-right corner coordinates as percentages (0-100) of image width/height
  2. Added validation to ensure:

    • Correct number format (comma-separated integers)
    • Values are within 0-100 range
    • Proper syntax for the bounding box argument
  3. Modified the capture_frame function to:

    • Capture the full frame when no bbox is specified
    • Use ffmpeg's crop filter to extract the bounded region when a bbox is specified
    • Include bbox coordinates in the filename for cropped versions

Example usage:

# Capture full frames only
./script.sh video.mp4

# Capture full frames plus one bounded region
./script.sh --box '20,30,80,90' video.mp4

# Capture full frames plus multiple bounded regions
./script.sh --box '20,30,80,90' --box '0,0,50,50' video.mp4

The output filenames will be:

  • Full frame: /tmp/frame_HH_MM_SS.jpg
  • Cropped frame: /tmp/frame_HH_MM_SS_bbox20_30_80_90.jpg

The script uses ffmpeg's crop filter with percentage-based calculations, which is consistent with many image processing tools and makes it easy to specify regions regardless of the video's resolution.

human (Nov 12, 2024, 07:57 AM)

now build me an artifact (no react) that I can copy and paste an image onto - or select to open a file, or drag and drop the image - and it displays that image. Then I can drag a box on that image (and then resize that box by further dragging the corners and lines) and as I drag the box the bbox syntax for that box is updated below the image

assistant (Nov 12, 2024, 07:57 AM)

I'll create an HTML artifact with vanilla JavaScript to handle image upload/paste and interactive bounding box creation.

create Bounding Box Drawing Tool

<!DOCTYPE html>
<html>
<head>
    <style>
        .container {
            max-width: 1200px;
            margin: 20px auto;
            padding: 20px;
            font-family: system-ui, -apple-system, sans-serif;
        }
        
        .image-container {
            position: relative;
            display: inline-block;
            margin: 20px 0;
            border: 2px dashed #ccc;
            min-width: 320px;
            min-height: 240px;
            cursor: crosshair;
        }
        
        .image-container.dragover {
            border-color: #4CAF50;
            background: rgba(76, 175, 80, 0.1);
        }
        
        #targetImage {
            max-width: 100%;
            display: block;
        }
        
        .bbox {
            position: absolute;
            border: 2px solid #4CAF50;
            background: rgba(76, 175, 80, 0.1);
            cursor: move;
        }
        
        .resize-handle {
            width: 10px;
            height: 10px;
            background: white;
            border: 2px solid #4CAF50;
            position: absolute;
        }
        
        .nw { top: -5px; left: -5px; cursor: nw-resize; }
        .ne { top: -5px; right: -5px; cursor: ne-resize; }
        .sw { bottom: -5px; left: -5px; cursor: sw-resize; }
        .se { bottom: -5px; right: -5px; cursor: se-resize; }
        
        .n { top: -5px; left: 50%; margin-left: -5px; cursor: n-resize; }
        .s { bottom: -5px; left: 50%; margin-left: -5px; cursor: s-resize; }
        .e { right: -5px; top: 50%; margin-top: -5px; cursor: e-resize; }
        .w { left: -5px; top: 50%; margin-top: -5px; cursor: w-resize; }
        
        .output {
            font-family: monospace;
            margin-top: 10px;
            padding: 10px;
            background: #f5f5f5;
            border-radius: 4px;
        }
        
        .instructions {
            margin: 20px 0;
            padding: 15px;
            background: #f8f9fa;
            border-radius: 4px;
            line-height: 1.5;
        }
    </style>
</head>
<body>
    <div class="container">
        <h2>Bounding Box Drawing Tool</h2>
        
        <div class="instructions">
            <p><strong>Instructions:</strong></p>
            <ul>
                <li>Paste an image (Ctrl+V/Cmd+V), drag & drop, or click to select a file</li>
                <li>Click and drag to create a bounding box</li>
                <li>Drag the corners or edges to resize</li>
                <li>Drag the box to move it</li>
                <li>Coordinates are in percentages of image dimensions</li>
            </ul>
        </div>
        
        <div id="imageContainer" class="image-container">
            <img id="targetImage" style="display: none">
            <input type="file" id="fileInput" style="display: none" accept="image/*">
        </div>
        
        <div class="output">
            <code>--box '<span id="bboxOutput">0,0,0,0</span>'</code>
        </div>
    </div>

    <script>
        const container = document.getElementById('imageContainer');
        const img = document.getElementById('targetImage');
        const fileInput = document.getElementById('fileInput');
        const output = document.getElementById('bboxOutput');
        let bbox = null;
        let isDragging = false;
        let isResizing = false;
        let currentHandle = null;
        let startX, startY;
        let originalBox = {};

        // Handle image upload via click
        container.addEventListener('click', () => {
            if (!img.src) fileInput.click();
        });

        fileInput.addEventListener('change', (e) => {
            const file = e.target.files[0];
            if (file) loadImage(file);
        });

        // Handle drag and drop
        container.addEventListener('dragover', (e) => {
            e.preventDefault();
            container.classList.add('dragover');
        });

        container.addEventListener('dragleave', () => {
            container.classList.remove('dragover');
        });

        container.addEventListener('drop', (e) => {
            e.preventDefault();
            container.classList.remove('dragover');
            const file = e.dataTransfer.files[0];
            if (file && file.type.startsWith('image/')) {
                loadImage(file);
            }
        });

        // Handle paste
        document.addEventListener('paste', (e) => {
            const items = e.clipboardData.items;
            for (let item of items) {
                if (item.type.startsWith('image/')) {
                    const file = item.getAsFile();
                    loadImage(file);
                    break;
                }
            }
        });

        function loadImage(file) {
            const reader = new FileReader();
            reader.onload = (e) => {
                img.src = e.target.result;
                img.style.display = 'block';
                if (bbox) bbox.remove();
                bbox = null;
            };
            reader.readAsDataURL(file);
        }

        // Handle box creation
        container.addEventListener('mousedown', (e) => {
            if (!img.src || e.target !== img) return;
            if (bbox) bbox.remove();
            
            const rect = img.getBoundingClientRect();
            startX = e.clientX - rect.left;
            startY = e.clientY - rect.top;
            
            bbox = document.createElement('div');
            bbox.className = 'bbox';
            container.appendChild(bbox);
            
            isDragging = true;
            updateBox(e.clientX - rect.left, e.clientY - rect.top);
        });

        // Handle box resizing
        function createResizeHandles() {
            const positions = ['nw', 'n', 'ne', 'e', 'se', 's', 'sw', 'w'];
            positions.forEach(pos => {
                const handle = document.createElement('div');
                handle.className = `resize-handle ${pos}`;
                handle.setAttribute('data-handle', pos);
                bbox.appendChild(handle);
                
                handle.addEventListener('mousedown', (e) => {
                    e.stopPropagation();
                    isResizing = true;
                    currentHandle = pos;
                    const rect = bbox.getBoundingClientRect();
                    originalBox = {
                        left: rect.left,
                        top: rect.top,
                        width: rect.width,
                        height: rect.height
                    };
                    startX = e.clientX;
                    startY = e.clientY;
                });
            });
        }

        // Handle mouse movement
        document.addEventListener('mousemove', (e) => {
            if (!bbox || (!isDragging && !isResizing)) return;
            
            const rect = img.getBoundingClientRect();
            if (isDragging) {
                updateBox(e.clientX - rect.left, e.clientY - rect.top);
            } else if (isResizing) {
                handleResize(e, rect);
            }
        });

        // Handle mouse up
        document.addEventListener('mouseup', () => {
            if (isDragging) {
                isDragging = false;
                createResizeHandles();
            }
            isResizing = false;
            currentHandle = null;
        });

        function updateBox(currentX, currentY) {
            const rect = img.getBoundingClientRect();
            
            let left = Math.min(startX, currentX);
            let top = Math.min(startY, currentY);
            let width = Math.abs(currentX - startX);
            let height = Math.abs(currentY - startY);
            
            // Constrain to image boundaries
            left = Math.max(0, Math.min(left, rect.width));
            top = Math.max(0, Math.min(top, rect.height));
            width = Math.min(width, rect.width - left);
            height = Math.min(height, rect.height - top);
            
            bbox.style.left = left + 'px';
            bbox.style.top = top + 'px';
            bbox.style.width = width + 'px';
            bbox.style.height = height + 'px';
            
            updateOutput();
        }

        function handleResize(e, imgRect) {
            const rect = bbox.getBoundingClientRect();
            const deltaX = e.clientX - startX;
            const deltaY = e.clientY - startY;
            
            let newLeft = rect.left - imgRect.left;
            let newTop = rect.top - imgRect.top;
            let newWidth = rect.width;
            let newHeight = rect.height;
            
            switch(currentHandle) {
                case 'nw':
                    newLeft += deltaX;
                    newTop += deltaY;
                    newWidth -= deltaX;
                    newHeight -= deltaY;
                    break;
                case 'n':
                    newTop += deltaY;
                    newHeight -= deltaY;
                    break;
                case 'ne':
                    newWidth += deltaX;
                    newTop += deltaY;
                    newHeight -= deltaY;
                    break;
                case 'e':
                    newWidth += deltaX;
                    break;
                case 'se':
                    newWidth += deltaX;
                    newHeight += deltaY;
                    break;
                case 's':
                    newHeight += deltaY;
                    break;
                case 'sw':
                    newLeft += deltaX;
                    newWidth -= deltaX;
                    newHeight += deltaY;
                    break;
                case 'w':
                    newLeft += deltaX;
                    newWidth -= deltaX;
                    break;
            }
            
            // Constrain to image boundaries
            newLeft = Math.max(0, Math.min(newLeft, imgRect.width - newWidth));
            newTop = Math.max(0, Math.min(newTop, imgRect.height - newHeight));
            newWidth = Math.max(10, Math.min(newWidth, imgRect.width - newLeft));
            newHeight = Math.max(10, Math.min(newHeight, imgRect.height - newTop));
            
            bbox.style.left = newLeft + 'px';
            bbox.style.top = newTop + 'px';
            bbox.style.width = newWidth + 'px';
            bbox.style.height = newHeight + 'px';
            
            startX = e.clientX;
            startY = e.clientY;
            
            updateOutput();
        }

        function updateOutput() {
            const rect = bbox.getBoundingClientRect();
            const imgRect = img.getBoundingClientRect();
            
            const x1 = Math.round((rect.left - imgRect.left) / imgRect.width * 100);
            const y1 = Math.round((rect.top - imgRect.top) / imgRect.height * 100);
            const x2 = Math.round((rect.right - imgRect.left) / imgRect.width * 100);
            const y2 = Math.round((rect.bottom - imgRect.top) / imgRect.height * 100);
            
            output.textContent = `${x1},${y1},${x2},${y2}`;
        }
    </script>
</body>
</html>

I've created an interactive bounding box drawing tool with these features:

  1. Multiple ways to load an image:

    • Paste an image (Ctrl+V/Cmd+V)
    • Drag and drop an image file
    • Click to select a file from your computer
  2. Bounding box interaction:

    • Click and drag on the image to create a box
    • Move the box by dragging it
    • Resize the box using 8 handles:
      • 4 corner handles (NW, NE, SW, SE)
      • 4 edge handles (N, S, E, W)
  3. Real-time output:

    • Shows the bounding box coordinates in the --box format
    • Coordinates are in percentages of image dimensions (0-100)
    • Automatically updates as you move or resize the box
  4. Constraints and features:

    • Box is constrained to image boundaries
    • Minimum box size of 10px to prevent tiny boxes
    • Visual feedback for drag and drop
    • Responsive design that works with different image sizes

You can copy this HTML directly into a file and open it in a browser to use it. The tool will help you quickly generate the correct --box syntax for use with the shell script we created earlier.

Let me know if you'd like any adjustments to the interface or functionality!

human (Nov 12, 2024, 07:58 AM)

the resizing of the frame is a bit buggy, dragging the left square causes the right square to move too when it should just resize the position of the left border

assistant (Nov 12, 2024, 07:58 AM)

Ah yes, I see the issue. The problem is in the resize calculations - let me fix the resizing logic to properly handle each edge independently.

update bbox-drawer

Find this:

        function handleResize(e, imgRect) {
            const rect = bbox.getBoundingClientRect();
            const deltaX = e.clientX - startX;
            const deltaY = e.clientY - startY;
            
            let newLeft = rect.left - imgRect.left;
            let newTop = rect.top - imgRect.top;
            let newWidth = rect.width;
            let newHeight = rect.height;
            
            switch(currentHandle) {
                case 'nw':
                    newLeft += deltaX;
                    newTop += deltaY;
                    newWidth -= deltaX;
                    newHeight -= deltaY;
                    break;
                case 'n':
                    newTop += deltaY;
                    newHeight -= deltaY;
                    break;
                case 'ne':
                    newWidth += deltaX;
                    newTop += deltaY;
                    newHeight -= deltaY;
                    break;
                case 'e':
                    newWidth += deltaX;
                    break;
                case 'se':
                    newWidth += deltaX;
                    newHeight += deltaY;
                    break;
                case 's':
                    newHeight += deltaY;
                    break;
                case 'sw':
                    newLeft += deltaX;
                    newWidth -= deltaX;
                    newHeight += deltaY;
                    break;
                case 'w':
                    newLeft += deltaX;
                    newWidth -= deltaX;
                    break;
            }
            
            // Constrain to image boundaries
            newLeft = Math.max(0, Math.min(newLeft, imgRect.width - newWidth));
            newTop = Math.max(0, Math.min(newTop, imgRect.height - newHeight));
            newWidth = Math.max(10, Math.min(newWidth, imgRect.width - newLeft));
            newHeight = Math.max(10, Math.min(newHeight, imgRect.height - newTop));
            
            bbox.style.left = newLeft + 'px';
            bbox.style.top = newTop + 'px';
            bbox.style.width = newWidth + 'px';
            bbox.style.height = newHeight + 'px';

Replace with this:

        function handleResize(e, imgRect) {
            const rect = bbox.getBoundingClientRect();
            const deltaX = e.clientX - startX;
            const deltaY = e.clientY - startY;
            
            // Get current dimensions relative to the image
            let left = rect.left - imgRect.left;
            let top = rect.top - imgRect.top;
            let right = left + rect.width;
            let bottom = top + rect.height;
            
            // Update the appropriate edges based on which handle is being dragged
            switch(currentHandle) {
                case 'nw':
                    left = Math.min(right - 10, left + deltaX);
                    top = Math.min(bottom - 10, top + deltaY);
                    break;
                case 'n':
                    top = Math.min(bottom - 10, top + deltaY);
                    break;
                case 'ne':
                    right = Math.max(left + 10, right + deltaX);
                    top = Math.min(bottom - 10, top + deltaY);
                    break;
                case 'e':
                    right = Math.max(left + 10, right + deltaX);
                    break;
                case 'se':
                    right = Math.max(left + 10, right + deltaX);
                    bottom = Math.max(top + 10, bottom + deltaY);
                    break;
                case 's':
                    bottom = Math.max(top + 10, bottom + deltaY);
                    break;
                case 'sw':
                    left = Math.min(right - 10, left + deltaX);
                    bottom = Math.max(top + 10, bottom + deltaY);
                    break;
                case 'w':
                    left = Math.min(right - 10, left + deltaX);
                    break;
            }
            
            // Constrain to image boundaries
            left = Math.max(0, left);
            top = Math.max(0, top);
            right = Math.min(imgRect.width, right);
            bottom = Math.min(imgRect.height, bottom);
            
            // Update box position and dimensions
            bbox.style.left = left + 'px';
            bbox.style.top = top + 'px';
            bbox.style.width = (right - left) + 'px';
            bbox.style.height = (bottom - top) + 'px';

I've updated the resize handling logic to:

  1. Work with edge positions (left, right, top, bottom) instead of position and dimensions
  2. Update only the relevant edges based on which handle is being dragged
  3. Maintain minimum size (10px) by limiting how close edges can get to each other
  4. Properly constrain to image boundaries

The resizing should now work more intuitively, with each handle only affecting its corresponding edges. For example:

  • Dragging the left handle only moves the left edge
  • Dragging corner handles properly affects both corresponding edges
  • The box maintains its minimum size when resizing

Let me know if you'd like any other adjustments to the behavior!

human (Nov 12, 2024, 08:00 AM)

dragging to move the box doesn't work. dragging on the image often causes the image itself to be picked up as a draggable - it needs some sort of CSS to stop it being a selectable thing

assistant (Nov 12, 2024, 08:00 AM)

Good catches! Let's fix both the box dragging and prevent the image from being dragged.

update bbox-drawer

Find this:

#targetImage {
            max-width: 100%;
            display: block;
        }

Replace with this:

#targetImage {
            max-width: 100%;
            display: block;
            user-select: none;
            -webkit-user-drag: none;
            -webkit-user-select: none;
            -moz-user-select: none;
            -ms-user-select: none;
        }

update bbox-drawer

Find this:

        // Handle mouse movement
        document.addEventListener('mousemove', (e) => {
            if (!bbox || (!isDragging && !isResizing)) return;
            
            const rect = img.getBoundingClientRect();
            if (isDragging) {
                updateBox(e.clientX - rect.left, e.clientY - rect.top);
            } else if (isResizing) {
                handleResize(e, rect);
            }
        });

Replace with this:

        let isMoving = false;
        let moveStartX, moveStartY;
        let moveStartLeft, moveStartTop;

        // Add mousedown handler for box movement
        bbox && bbox.addEventListener('mousedown', (e) => {
            if (e.target === bbox) {
                isMoving = true;
                moveStartX = e.clientX;
                moveStartY = e.clientY;
                moveStartLeft = bbox.offsetLeft;
                moveStartTop = bbox.offsetTop;
                e.preventDefault();
            }
        });

        // Handle mouse movement
        document.addEventListener('mousemove', (e) => {
            if (!bbox) return;
            
            const rect = img.getBoundingClientRect();
            
            if (isDragging) {
                updateBox(e.clientX - rect.left, e.clientY - rect.top);
            } else if (isResizing) {
                handleResize(e, rect);
            } else if (isMoving) {
                const deltaX = e.clientX - moveStartX;
                const deltaY = e.clientY - moveStartY;
                
                let newLeft = moveStartLeft + deltaX;
                let newTop = moveStartTop + deltaY;
                
                // Constrain to image boundaries
                newLeft = Math.max(0, Math.min(newLeft, rect.width - bbox.offsetWidth));
                newTop = Math.max(0, Math.min(newTop, rect.height - bbox.offsetHeight));
                
                bbox.style.left = newLeft + 'px';
                bbox.style.top = newTop + 'px';
                
                updateOutput();
            }
        });

update bbox-drawer

Find this:

        // Handle mouse up
        document.addEventListener('mouseup', () => {
            if (isDragging) {
                isDragging = false;
                createResizeHandles();
            }
            isResizing = false;
            currentHandle = null;
        });

Replace with this:

        // Handle mouse up
        document.addEventListener('mouseup', () => {
            if (isDragging) {
                isDragging = false;
                createResizeHandles();
            }
            isResizing = false;
            isMoving = false;
            currentHandle = null;
        });

update bbox-drawer

Find this:

            // Handle box resizing
            function createResizeHandles() {

Replace with this:

            // Add move handler to new box
            bbox.addEventListener('mousedown', (e) => {
                if (e.target === bbox) {
                    isMoving = true;
                    moveStartX = e.clientX;
                    moveStartY = e.clientY;
                    moveStartLeft = bbox.offsetLeft;
                    moveStartTop = bbox.offsetTop;
                    e.preventDefault();
                }
            });

            // Handle box resizing
            function createResizeHandles() {

I've made the following changes:

  1. Added CSS to prevent the image from being draggable or selectable:

    • Added user-select: none and -webkit-user-drag: none properties
    • Added vendor prefixes for cross-browser compatibility
  2. Fixed the box movement functionality:

    • Added proper event handling for box movement
    • Track the starting position of both the mouse and the box
    • Calculate and apply the movement delta while constraining to image boundaries
    • Added proper cleanup in the mouseup handler
  3. Added the movement handler to newly created boxes:

    • Added the mousedown event listener when creating new boxes
    • Ensured the handler only triggers when clicking the box itself (not resize handles)

Now you should be able to:

  1. Click and drag anywhere on the box (except handles) to move it
  2. The box will stay within the image boundaries while moving
  3. The image won't be selectable or draggable anymore
  4. The resize handles will still work as before

Let me know if you'd like any adjustments to the movement behavior or other aspects of the tool!

human (Nov 12, 2024, 08:02 AM)

Build a version that uses cropperjs

assistant (Nov 12, 2024, 08:02 AM)

I'll create a version using the popular Cropper.js library, which provides a robust, feature-rich cropping interface.

create Untitled

<!DOCTYPE html>
<html>
<head>
    <title>BBox Tool - CropperJS Version</title>
    <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/cropperjs/1.6.1/cropper.min.css">
    <style>
        .container {
            max-width: 1200px;
            margin: 20px auto;
            padding: 20px;
            font-family: system-ui, -apple-system, sans-serif;
        }
        
        .image-container {
            position: relative;
            display: inline-block;
            margin: 20px 0;
            border: 2px dashed #ccc;
            min-width: 320px;
            min-height: 240px;
            cursor: crosshair;
        }
        
        .image-container.dragover {
            border-color: #4CAF50;
            background: rgba(76, 175, 80, 0.1);
        }
        
        #targetImage {
            max-width: 100%;
            display: block;
        }
        
        .output {
            font-family: monospace;
            margin-top: 10px;
            padding: 10px;
            background: #f5f5f5;
            border-radius: 4px;
        }
        
        .instructions {
            margin: 20px 0;
            padding: 15px;
            background: #f8f9fa;
            border-radius: 4px;
            line-height: 1.5;
        }

        /* Hide default aspect ratio boxes that CropperJS shows */
        .cropper-view-box {
            outline: none !important;
        }
        
        /* Customize the crop box appearance */
        .cropper-crop-box {
            background: rgba(76, 175, 80, 0.1) !important;
        }
        
        .cropper-line, .cropper-point {
            background-color: #4CAF50 !important;
        }
        
        .cropper-point {
            width: 10px !important;
            height: 10px !important;
            opacity: 1 !important;
        }
        
        .cropper-point.point-n,
        .cropper-point.point-s,
        .cropper-point.point-e,
        .cropper-point.point-w {
            width: 10px !important;
            height: 10px !important;
        }
    </style>
</head>
<body>
    <div class="container">
        <h2>Bounding Box Drawing Tool (CropperJS)</h2>
        
        <div class="instructions">
            <p><strong>Instructions:</strong></p>
            <ul>
                <li>Paste an image (Ctrl+V/Cmd+V), drag & drop, or click to select a file</li>
                <li>Click and drag to create a bounding box</li>
                <li>Drag the corners or edges to resize</li>
                <li>Drag inside the box to move it</li>
                <li>Coordinates are in percentages of image dimensions</li>
            </ul>
        </div>
        
        <div id="imageContainer" class="image-container">
            <img id="targetImage" style="max-width: 100%;">
            <input type="file" id="fileInput" style="display: none" accept="image/*">
        </div>
        
        <div class="output">
            <code>--box '<span id="bboxOutput">0,0,0,0</span>'</code>
        </div>
    </div>

    <script src="https://cdnjs.cloudflare.com/ajax/libs/cropperjs/1.6.1/cropper.min.js"></script>
    <script>
        const container = document.getElementById('imageContainer');
        const img = document.getElementById('targetImage');
        const fileInput = document.getElementById('fileInput');
        const output = document.getElementById('bboxOutput');
        let cropper = null;

        // Handle image upload via click
        container.addEventListener('click', () => {
            if (!img.src) fileInput.click();
        });

        fileInput.addEventListener('change', (e) => {
            const file = e.target.files[0];
            if (file) loadImage(file);
        });

        // Handle drag and drop
        container.addEventListener('dragover', (e) => {
            e.preventDefault();
            container.classList.add('dragover');
        });

        container.addEventListener('dragleave', () => {
            container.classList.remove('dragover');
        });

        container.addEventListener('drop', (e) => {
            e.preventDefault();
            container.classList.remove('dragover');
            const file = e.dataTransfer.files[0];
            if (file && file.type.startsWith('image/')) {
                loadImage(file);
            }
        });

        // Handle paste
        document.addEventListener('paste', (e) => {
            const items = e.clipboardData.items;
            for (let item of items) {
                if (item.type.startsWith('image/')) {
                    const file = item.getAsFile();
                    loadImage(file);
                    break;
                }
            }
        });

        function loadImage(file) {
            const reader = new FileReader();
            reader.onload = (e) => {
                img.src = e.target.result;
                img.style.display = 'block';
                
                // Destroy existing cropper if it exists
                if (cropper) {
                    cropper.destroy();
                }
                
                // Initialize Cropper.js
                cropper = new Cropper(img, {
                    viewMode: 1,
                    dragMode: 'crop',
                    autoCrop: true,
                    movable: true,
                    scalable: false,
                    zoomable: false,
                    rotatable: false,
                    cropBoxResizable: true,
                    cropBoxMovable: true,
                    background: false,
                    toggleDragModeOnDblclick: false,
                    crop: updateOutput
                });
            };
            reader.readAsDataURL(file);
        }

        function updateOutput(e) {
            const imageData = cropper.getImageData();
            
            // Calculate percentages
            const x1 = Math.round((e.detail.x / imageData.naturalWidth) * 100);
            const y1 = Math.round((e.detail.y / imageData.naturalHeight) * 100);
            const x2 = Math.round(((e.detail.x + e.detail.width) / imageData.naturalWidth) * 100);
            const y2 = Math.round(((e.detail.y + e.detail.height) / imageData.naturalHeight) * 100);
            
            output.textContent = `${x1},${y1},${x2},${y2}`;
        }
    </script>
</body>
</html>

I've created a version using Cropper.js that has several advantages over the custom implementation:

  1. More robust handling of:

    • Box creation and resizing
    • Edge and corner dragging
    • Boundary constraints
    • Touch events (mobile support)
    • High DPI displays
  2. Better visual feedback:

    • Smoother resize handles
    • Grid overlay while moving/resizing
    • Better handle hit areas
    • Consistent behavior across browsers
  3. Same features as before:

    • Multiple ways to load images (paste/drop/select)
    • Percentage-based coordinates
    • Real-time output in the --box format
  4. Customizations I've added to the default Cropper.js behavior:

    • Disabled zoom/rotate/scale features to focus on bbox creation
    • Customized colors to match our previous version
    • Larger, more visible handles
    • Disabled aspect ratio constraints
    • Removed unnecessary UI elements
    • Added clear instructions
  5. Improved CSS styling:

    • Better handle visibility
    • Custom colors for the crop box and handles
    • Larger grab areas for handles

The behavior should be smoother and more reliable than our custom implementation. Let me know if you'd like any adjustments to the behavior or appearance!

@simonw
Copy link
Author

simonw commented Nov 14, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment