-
-
Save simonw/323e1b00ee4f8453c7834a7560eeafc1 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
╭────────────────────────────────────────────────╮ | |
│ ✻ Welcome to Claude Code research preview! │ | |
│ │ | |
│ /help for help │ | |
│ │ | |
│ cwd: /Users/simon/Dropbox/Development/tools │ | |
╰────────────────────────────────────────────────╯ | |
> Almost all of the HTML files in this directory were created using Claude prompts, and the details of | |
those prompts are linked in the commit messages. Build a Python script that checks the commit history | |
for each HTML file in turn and extracts any URLs from those commit messages into a list. It should | |
then output a JSON file with this structure: {"pages": {"name-of-file.html": ["url"], | |
{"name-of-file-2.html": ["url1", "url2"], ... - as you can see, some files may have more than one URL | |
in their commit history. The script should be called gather_links.py and it should save a JSON file | |
called gathered_links.json | |
⏺ I'll build a Python script that extracts URLs from commit messages for HTML files. Let me first | |
check if we have the git history available and examine some commit messages to understand the | |
structure. | |
⏺ Bash(git log -n 5 --oneline)… | |
⎿ a282f7f Link to json schema builder | |
49c382e Paste rich text in README | |
b4e3e15 Encrypt / decrypt message | |
0a186e5 Create json-schema-builder.html | |
63852fe Maybe fix for Windows? | |
⏺ Let me check a specific commit message to understand how the URLs are formatted: | |
⏺ Bash(git show a282f7f)… | |
⎿ commit a282f7fc49d3b106d8cbe625913596c27383e8f5 | |
Author: Simon Willison <swillison@gmail.com> | |
Date: Fri Mar 7 20:36:22 2025 -0600 | |
Link to json schema builder | |
diff --git a/README.md b/README.md | |
index cd2d5a0..8c70ae9 100644 | |
--- a/README.md | |
+++ b/README.md | |
@@ -42,6 +42,7 @@ This collection is partly **an experiment** in how much it's possible to get | |
don | |
- [Claude Token Counter](https://tools.simonwillison.net/claude-token-counter) - counts the | |
number of tokens in a Claude prompt | |
- [OpenAI audio input](https://tools.simonwillison.net/openai-audio) - record audio through the | |
microphone and send it to OpenAI's audio model | |
- [OpenAI audio output](https://tools.simonwillison.net/openai-audio-output) - run prompts | |
against OpenAI that produce audio output and listen to it or download it from the browser | |
+- [JSON schema builder](https://tools.simonwillison.net/json-schema-builder) - interactive tool | |
for building a JSON schema | |
## Miscellaneous | |
⏺ Now let me check a commit that created an HTML file to see if it contains a URL to the Claude | |
prompt: | |
⏺ Bash(git show 0a186e5)… | |
⎿ commit 0a186e5a3a939fc9e9679ecc31068494c1368b99 | |
Author: Simon Willison <swillison@gmail.com> | |
Date: Fri Mar 7 18:17:52 2025 -0600 | |
Create json-schema-builder.html | |
https://claude.ai/share/c84c867a-dbaf-41c2-a6f1-ae7dad002485 | |
diff --git a/json-schema-builder.html b/json-schema-builder.html | |
new file mode 100644 | |
index 0000000..cfcc9cf | |
--- /dev/null | |
+++ b/json-schema-builder.html | |
@@ -0,0 +1,609 @@ | |
+<!DOCTYPE html> | |
+<html lang="en"> | |
+<head> | |
+ <meta charset="UTF-8"> | |
+ <meta name="viewport" content="width=device-width, initial-scale=1.0"> | |
+ <title>JSON Schema Builder</title> | |
+ <style> | |
+ * { | |
+ box-sizing: border-box; | |
+ } | |
+ | |
+ body { | |
+ font-family: Helvetica, Arial, sans-serif; | |
+ margin: 0; | |
+ padding: 20px; | |
+ background-color: #f5f5f5; | |
+ } | |
+ | |
+ h1, h2, h3 { | |
+ font-weight: 500; | |
+ } | |
+ | |
+ .container { | |
+ max-width: 100%; | |
+ margin: 0 auto; | |
+ background-color: white; | |
+ border-radius: 8px; | |
+ box-shadow: 0 2px 10px rgba(0, 0, 0, 0.1); | |
+ padding: 20px; | |
+ } | |
+ | |
+ .schema-builder { | |
+ display: flex; | |
+ flex-direction: column; | |
+ gap: 20px; | |
+ } | |
+ | |
+ .editor-panel { | |
+ width: 100%; | |
+ padding: 15px; | |
+ border: 1px solid #ddd; | |
+ border-radius: 4px; | |
+ } | |
+ | |
+ .output-panel { | |
+ width: 100%; | |
+ position: relative; | |
+ } | |
+ | |
+ .schema-output { | |
+ width: 100%; | |
+ height: 300px; | |
+ background-color: #f8f8f8; | |
+ border: 1px solid #ddd; | |
+ border-radius: 4px; | |
+ padding: 15px; | |
+ font-family: monospace; | |
+ overflow-y: auto; | |
+ white-space: pre-wrap; | |
+ } | |
+ | |
+ .property-list { | |
+ margin-top: 20px; | |
+ } | |
+ | |
+ .property-item { | |
+ border: 1px solid #ddd; | |
+ border-radius: 4px; | |
+ padding: 15px; | |
+ margin-bottom: 15px; | |
+ background-color: #f9f9f9; | |
+ } | |
+ | |
+ .nested-properties { | |
+ margin-left: 20px; | |
+ margin-top: 15px; | |
+ padding-left: 10px; | |
+ border-left: 2px solid #2196F3; | |
+ } | |
+ | |
+ .actions { | |
+ margin-top: 15px; | |
+ display: flex; | |
+ gap: 10px; | |
+ } | |
+ | |
+ button { | |
+ padding: 8px 12px; | |
+ background-color: #2196F3; | |
+ color: white; | |
+ border: none; | |
+ border-radius: 4px; | |
+ cursor: pointer; | |
+ font-size: 14px; | |
+ } | |
+ | |
+ button:hover { | |
+ background-color: #0b7dda; | |
+ } | |
+ | |
+ button.secondary { | |
+ background-color: #f1f1f1; | |
+ color: #333; | |
+ border: 1px solid #ddd; | |
+ } | |
+ | |
+ button.secondary:hover { | |
+ background-color: #e1e1e1; | |
+ } | |
+ | |
+ button.remove { | |
+ background-color: #f44336; | |
+ } | |
+ | |
+ button.remove:hover { | |
+ background-color: #d32f2f; | |
+ } | |
+ | |
+ button.copy { | |
+ position: absolute; | |
+ top: 10px; | |
+ right: 10px; | |
+ z-index: 10; | |
+ } | |
+ | |
+ .form-group { | |
+ margin-bottom: 15px; | |
+ } | |
+ | |
+ label { | |
+ display: block; | |
+ margin-bottom: 5px; | |
+ font-weight: bold; | |
+ } | |
+ | |
+ input, select, textarea { | |
+ width: 100%; | |
+ padding: 8px; | |
+ border: 1px solid #ddd; | |
+ border-radius: 4px; | |
+ font-size: 16px; | |
+ font-family: Helvetica, Arial, sans-serif; | |
+ } | |
+ | |
+ .checkbox-group { | |
+ display: flex; | |
+ align-items: center; | |
+ gap: 5px; | |
+ } | |
+ | |
+ .checkbox-group input { | |
+ width: auto; | |
+ } | |
+ | |
+ .alert { | |
+ padding: 10px 15px; | |
+ background-color: #f8d7da; | |
+ color: #721c24; | |
+ border: 1px solid #f5c6cb; | |
+ border-radius: 4px; | |
+ margin-bottom: 15px; | |
+ display: none; | |
+ } | |
+ | |
+ .success { | |
+ background-color: #d4edda; | |
+ color: #155724; | |
+ border-color: #c3e6cb; | |
+ } | |
+ | |
+ @media (min-width: 768px) { | |
+ .schema-output { | |
+ height: 400px; | |
+ } | |
+ } | |
+ </style> | |
+</head> | |
+<body> | |
+ <div class="container"> | |
+ <h1>JSON Schema Builder</h1> | |
+ <p>Build your JSON schema by adding properties and nested objects.</p> | |
+ | |
+ <div id="alert" class="alert"></div> | |
+ | |
+ <div class="schema-builder"> | |
+ <div class="editor-panel"> | |
+ <h2>Schema properties</h2> | |
+ | |
+ <div class="form-group"> | |
+ <label for="schema-title">Schema title</label> | |
+ <input type="text" id="schema-title" placeholder="Enter schema title"> | |
+ </div> | |
+ | |
+ <div class="form-group"> | |
+ <label for="schema-description">Schema description</label> | |
+ <textarea id="schema-description" rows="2" placeholder="Enter schema | |
description"></textarea> | |
+ </div> | |
+ | |
+ <div class="property-list" id="property-list"> | |
+ <!-- Property items will be added here --> | |
+ </div> | |
+ | |
+ <button id="add-property">Add property</button> | |
+ </div> | |
+ | |
+ <div class="output-panel"> | |
+ <h2>Schema output</h2> | |
+ <button id="copy-button" class="copy">Copy</button> | |
+ <div id="schema-output" class="schema-output"></div> | |
+ </div> | |
+ </div> | |
+ </div> | |
+ | |
+ <script type="module"> | |
+// Main schema object | |
+const schema = { | |
+ type: "object", | |
+ properties: {} | |
+}; | |
+ | |
+// DOM elements | |
+const schemaTitle = document.getElementById("schema-title"); | |
+const schemaDescription = document.getElementById("schema-description"); | |
+const propertyList = document.getElementById("property-list"); | |
+const addPropertyButton = document.getElementById("add-property"); | |
+const schemaOutput = document.getElementById("schema-output"); | |
+const copyButton = document.getElementById("copy-button"); | |
+const alertElement = document.getElementById("alert"); | |
+ | |
+// Event listeners | |
+schemaTitle.addEventListener("input", updateSchema); | |
+schemaDescription.addEventListener("input", updateSchema); | |
+addPropertyButton.addEventListener("click", addProperty); | |
+copyButton.addEventListener("click", copySchema); | |
+ | |
+// Initialize | |
+loadFromHash(); | |
+window.addEventListener("hashchange", loadFromHash); | |
+ | |
+// Add a new property | |
+function addProperty(event, parentElement = propertyList, parentPath = null) { | |
+ const propertyItem = document.createElement("div"); | |
+ propertyItem.className = "property-item"; | |
+ | |
+ const propertyId = "prop-" + Date.now() + "-" + Math.floor(Math.random() * 1000); | |
+ propertyItem.dataset.id = propertyId; | |
+ | |
+ propertyItem.innerHTML = ` | |
+ <div class="form-group"> | |
+ <label for="${propertyId}-name">Property name</label> | |
+ <input type="text" id="${propertyId}-name" class="property-name" placeholder="Enter | |
property name"> | |
+ </div> | |
+ | |
+ <div class="form-group"> | |
+ <label for="${propertyId}-type">Property type</label> | |
+ <select id="${propertyId}-type" class="property-type"> | |
+ <option value="string">String</option> | |
+ <option value="number">Number</option> | |
+ <option value="integer">Integer</option> | |
+ <option value="boolean">Boolean</option> | |
+ <option value="object">Object</option> | |
+ <option value="array">Array</option> | |
+ </select> | |
+ </div> | |
+ | |
+ <div class="form-group"> | |
+ <label for="${propertyId}-description">Description</label> | |
+ <input type="text" id="${propertyId}-description" class="property-description" | |
placeholder="Property description"> | |
+ </div> | |
+ | |
+ <div class="form-group array-item-type-container" style="display: none;"> | |
+ <label for="${propertyId}-array-type">Array items type</label> | |
+ <select id="${propertyId}-array-type" class="array-item-type"> | |
+ <option value="string">String</option> | |
+ <option value="number">Number</option> | |
+ <option value="integer">Integer</option> | |
+ <option value="boolean">Boolean</option> | |
+ <option value="object">Object</option> | |
+ </select> | |
+ </div> | |
+ | |
+ <div class="form-group"> | |
+ <div class="checkbox-group"> | |
+ <input type="checkbox" id="${propertyId}-required" class="property-required"> | |
+ <label for="${propertyId}-required">Required</label> | |
+ </div> | |
+ </div> | |
+ | |
+ <div class="nested-properties" id="${propertyId}-nested" style="display: none;"></div> | |
+ | |
+ <div class="actions"> | |
+ <button class="add-nested" style="display: none;">Add nested property</button> | |
+ <button class="remove">Remove</button> | |
+ </div> | |
+ `; | |
+ | |
+ parentElement.appendChild(propertyItem); | |
+ | |
+ // Event listeners for the new property | |
+ const typeSelect = propertyItem.querySelector(".property-type"); | |
+ const arrayTypeContainer = propertyItem.querySelector(".array-item-type-container"); | |
+ const arrayTypeSelect = propertyItem.querySelector(".array-item-type"); | |
+ const nestedContainer = propertyItem.querySelector(".nested-properties"); | |
+ const addNestedButton = propertyItem.querySelector(".add-nested"); | |
+ const removeButton = propertyItem.querySelector(".remove"); | |
+ | |
+ const nameInput = propertyItem.querySelector(".property-name"); | |
+ const descriptionInput = propertyItem.querySelector(".property-description"); | |
+ const requiredCheckbox = propertyItem.querySelector(".property-required"); | |
+ | |
+ // Handle property type change | |
+ typeSelect.addEventListener("change", function() { | |
+ const isObject = this.value === "object"; | |
+ const isArray = this.value === "array"; | |
+ | |
+ nestedContainer.style.display = isObject ? "block" : "none"; | |
+ addNestedButton.style.display = isObject ? "inline-block" : "none"; | |
+ arrayTypeContainer.style.display = isArray ? "block" : "none"; | |
+ | |
+ updateSchema(); | |
+ }); | |
+ | |
+ // Handle array item type change | |
+ arrayTypeSelect.addEventListener("change", function() { | |
+ const isObject = this.value === "object"; | |
+ | |
+ if (isObject && typeSelect.value === "array") { | |
+ nestedContainer.style.display = "block"; | |
+ addNestedButton.style.display = "inline-block"; | |
+ } else { | |
+ nestedContainer.style.display = "none"; | |
+ addNestedButton.style.display = "none"; | |
+ } | |
+ | |
+ updateSchema(); | |
+ }); | |
+ | |
+ // Add nested property | |
+ addNestedButton.addEventListener("click", function(event) { | |
+ event.preventDefault(); | |
+ addProperty(event, nestedContainer, propertyId); | |
+ }); | |
+ | |
+ // Remove property | |
+ removeButton.addEventListener("click", function() { | |
+ propertyItem.remove(); | |
+ updateSchema(); | |
+ }); | |
+ | |
+ // Update schema on input change | |
+ nameInput.addEventListener("input", updateSchema); | |
+ descriptionInput.addEventListener("input", updateSchema); | |
+ requiredCheckbox.addEventListener("change", updateSchema); | |
+ | |
+ updateSchema(); | |
+ return propertyItem; | |
+} | |
+ | |
+// Update schema JSON based on form inputs | |
+function updateSchema() { | |
+ // Update title and description | |
+ if (schemaTitle.value) { | |
+ schema.title = schemaTitle.value; | |
+ } else { | |
+ delete schema.title; | |
+ } | |
+ | |
+ if (schemaDescription.value) { | |
+ schema.description = schemaDescription.value; | |
+ } else { | |
+ delete schema.description; | |
+ } | |
+ | |
+ // Reset properties and required | |
+ schema.properties = {}; | |
+ schema.required = []; | |
+ | |
+ // Process all property items | |
+ processPropertyItems(propertyList.children, schema.properties, schema.required); | |
+ | |
+ // Remove required if empty | |
+ if (schema.required.length === 0) { | |
+ delete schema.required; | |
+ } | |
+ | |
+ // Update the output | |
+ schemaOutput.textContent = JSON.stringify(schema, null, 2); | |
+ | |
+ // Update URL hash | |
+ updateUrlHash(); | |
+} | |
+ | |
+// Process property items recursively | |
+function processPropertyItems(items, propertiesObj, requiredArr) { | |
+ Array.from(items).forEach(item => { | |
+ if (!item.classList.contains("property-item")) return; | |
+ | |
+ const nameInput = item.querySelector(".property-name"); | |
+ const typeSelect = item.querySelector(".property-type"); | |
+ const descriptionInput = item.querySelector(".property-description"); | |
+ const requiredCheckbox = item.querySelector(".property-required"); | |
+ const arrayTypeSelect = item.querySelector(".array-item-type"); | |
+ const nestedContainer = item.querySelector(".nested-properties"); | |
+ | |
+ const name = nameInput.value.trim(); | |
+ if (!name) return; | |
+ | |
+ const type = typeSelect.value; | |
+ | |
+ // Create property definition | |
+ const property = { | |
+ type: type | |
+ }; | |
+ | |
+ // Add description if provided | |
+ if (descriptionInput.value.trim()) { | |
+ property.description = descriptionInput.value.trim(); | |
+ } | |
+ | |
+ // Handle arrays | |
+ if (type === "array") { | |
+ const itemsType = arrayTypeSelect.value; | |
+ | |
+ if (itemsType === "object") { | |
+ property.items = { | |
+ type: "object", | |
+ properties: {} | |
+ }; | |
+ | |
+ const nestedRequired = []; | |
+ processPropertyItems(nestedContainer.children, property.items.properties, | |
nestedRequired); | |
+ | |
+ if (nestedRequired.length > 0) { | |
+ property.items.required = nestedRequired; | |
+ } | |
+ } else { | |
+ property.items = { type: itemsType }; | |
+ } | |
+ } | |
+ | |
+ // Handle objects | |
+ if (type === "object") { | |
+ property.properties = {}; | |
+ const nestedRequired = []; | |
+ | |
+ processPropertyItems(nestedContainer.children, property.properties, nestedRequired); | |
+ | |
+ if (nestedRequired.length > 0) { | |
+ property.required = nestedRequired; | |
+ } | |
+ } | |
+ | |
+ // Add to properties | |
+ propertiesObj[name] = property; | |
+ | |
+ // Add to required array if checked | |
+ if (requiredCheckbox.checked) { | |
+ requiredArr.push(name); | |
+ } | |
+ }); | |
+} | |
+ | |
+// Copy schema to clipboard | |
+function copySchema() { | |
+ const textarea = document.createElement("textarea"); | |
+ textarea.value = schemaOutput.textContent; | |
+ document.body.appendChild(textarea); | |
+ textarea.select(); | |
+ | |
+ try { | |
+ document.execCommand("copy"); | |
+ showAlert("Schema copied to clipboard!", true); | |
+ } catch (err) { | |
+ showAlert("Failed to copy schema", false); | |
+ } | |
+ | |
+ document.body.removeChild(textarea); | |
+} | |
+ | |
+// Show alert message | |
+function showAlert(message, isSuccess = false) { | |
+ alertElement.textContent = message; | |
+ alertElement.style.display = "block"; | |
+ | |
+ if (isSuccess) { | |
+ alertElement.classList.add("success"); | |
+ } else { | |
+ alertElement.classList.remove("success"); | |
+ } | |
+ | |
+ setTimeout(() => { | |
+ alertElement.style.display = "none"; | |
+ }, 3000); | |
+} | |
+ | |
+// Update URL hash with schema | |
+function updateUrlHash() { | |
+ try { | |
+ const hashValue = btoa(JSON.stringify(schema)); | |
+ window.history.replaceState(null, null, "#" + hashValue); | |
+ } catch (e) { | |
+ console.error("Error updating URL hash:", e); | |
+ } | |
+} | |
+ | |
+// Load schema from URL hash | |
+function loadFromHash() { | |
+ if (!window.location.hash) return; | |
+ | |
+ try { | |
+ const hash = window.location.hash.substring(1); | |
+ const decodedSchema = JSON.parse(atob(hash)); | |
+ | |
+ // Reset form | |
+ propertyList.innerHTML = ""; | |
+ | |
+ // Set title and description | |
+ if (decodedSchema.title) { | |
+ schemaTitle.value = decodedSchema.title; | |
+ } else { | |
+ schemaTitle.value = ""; | |
+ } | |
+ | |
+ if (decodedSchema.description) { | |
+ schemaDescription.value = decodedSchema.description; | |
+ } else { | |
+ schemaDescription.value = ""; | |
+ } | |
+ | |
+ // Rebuild properties | |
+ if (decodedSchema.properties) { | |
+ buildPropertiesFromSchema(decodedSchema.properties, propertyList, decodedSchema.required | |
|| []); | |
+ } | |
+ | |
+ // Update output | |
+ schema.title = decodedSchema.title; | |
+ schema.description = decodedSchema.description; | |
+ schema.properties = decodedSchema.properties || {}; | |
+ schema.required = decodedSchema.required || []; | |
+ | |
+ schemaOutput.textContent = JSON.stringify(schema, null, 2); | |
+ } catch (e) { | |
+ console.error("Error loading from hash:", e); | |
+ showAlert("Failed to load schema from URL", false); | |
+ } | |
+} | |
+ | |
+// Build properties from loaded schema | |
+function buildPropertiesFromSchema(properties, container, requiredArr) { | |
+ for (const propName in properties) { | |
+ const property = properties[propName]; | |
+ const propItem = addProperty(null, container); | |
+ | |
+ // Set basic property fields | |
+ propItem.querySelector(".property-name").value = propName; | |
+ propItem.querySelector(".property-type").value = property.type; | |
+ | |
+ if (property.description) { | |
+ propItem.querySelector(".property-description").value = property.description; | |
+ } | |
+ | |
+ if (requiredArr.includes(propName)) { | |
+ propItem.querySelector(".property-required").checked = true; | |
+ } | |
+ | |
+ // Handle arrays | |
+ if (property.type === "array" && property.items) { | |
+ const arrayTypeContainer = propItem.querySelector(".array-item-type-container"); | |
+ const arrayTypeSelect = propItem.querySelector(".array-item-type"); | |
+ const nestedContainer = propItem.querySelector(".nested-properties"); | |
+ const addNestedButton = propItem.querySelector(".add-nested"); | |
+ | |
+ arrayTypeContainer.style.display = "block"; | |
+ | |
+ if (property.items.type) { | |
+ arrayTypeSelect.value = property.items.type; | |
+ | |
+ if (property.items.type === "object" && property.items.properties) { | |
+ nestedContainer.style.display = "block"; | |
+ addNestedButton.style.display = "inline-block"; | |
+ | |
+ buildPropertiesFromSchema( | |
+ property.items.properties, | |
+ nestedContainer, | |
+ property.items.required || [] | |
+ ); | |
+ } | |
+ } | |
+ } | |
+ | |
+ // Handle objects | |
+ if (property.type === "object" && property.properties) { | |
+ const nestedContainer = propItem.querySelector(".nested-properties"); | |
+ const addNestedButton = propItem.querySelector(".add-nested"); | |
+ | |
+ nestedContainer.style.display = "block"; | |
+ addNestedButton.style.display = "inline-block"; | |
+ | |
+ buildPropertiesFromSchema( | |
+ property.properties, | |
+ nestedContainer, | |
+ property.required || [] | |
+ ); | |
+ } | |
+ } | |
+} | |
+ </script> | |
+</body> | |
+</html> | |
⏺ Now I'll get a list of HTML files in the directory to process: | |
⏺ Search(pattern: "*.html", path: "undefined")… | |
⎿ Found 78 files | |
⏺ Write(file_path: /Users/simon/Dropbox/Development/tools/gather_links.py)… | |
⎿ Wrote 77 lines to /Users/simon/Dropbox/Development/tools/gather_links.py | |
#!/usr/bin/env python3 | |
import json | |
import os | |
import re | |
import subprocess | |
from pathlib import Path | |
def get_file_commit_history(file_path): | |
""" | |
Get the commit history for a specific file. | |
Returns a list of commit messages. | |
""" | |
try: | |
result = subprocess.run( | |
["git", "log", "--format=%B", "--", file_path], | |
capture_output=True, | |
text=True, | |
check=True | |
) | |
return result.stdout.strip().split("\n\n") | |
except subprocess.CalledProcessError: | |
print(f"Error getting commit history for {file_path}") | |
return [] | |
def extract_urls(text): | |
""" | |
Extract URLs from text using regex pattern. | |
Returns a list of URLs. | |
""" | |
# Pattern for URLs, specifically looking for claude.ai and other common URLs | |
url_pattern = r'https?://(?:[-\w.]|(?:%[\da-fA-F]{2}))+' | |
return re.findall(url_pattern, text) | |
def main(): | |
# Get current directory | |
current_dir = Path.cwd() | |
# Find all HTML files | |
html_files = list(current_dir.glob("*.html")) | |
# Dictionary to store results | |
results = {"pages": {}} | |
# Process each HTML file | |
for html_file in html_files: | |
file_name = html_file.name | |
print(f"Processing {file_name}...") | |
# Get commit history for this file | |
commit_messages = get_file_commit_history(html_file) | |
# Extract URLs from commit messages | |
all_urls = [] | |
for message in commit_messages: | |
urls = extract_urls(message) | |
all_urls.extend(urls) | |
# Remove duplicates but preserve order | |
unique_urls = [] | |
for url in all_urls: | |
if url not in unique_urls: | |
unique_urls.append(url) | |
# Add to results if any URLs were found | |
if unique_urls: | |
results["pages"][file_name] = unique_urls | |
# Save results to JSON file | |
with open("gathered_links.json", "w") as f: | |
json.dump(results, f, indent=2) | |
print(f"Processed {len(html_files)} files") | |
print(f"Found URLs for {len(results['pages'])} files") | |
print("Results saved to gathered_links.json") | |
if __name__ == "__main__": | |
main() | |
⏺ Let's make the script executable: | |
⏺ Bash(chmod +x /Users/simon/Dropbox/Development/tools/gather_links.py)… | |
⎿ (No content) | |
⏺ Now let's run the script to generate the JSON file: | |
⏺ Bash(cd /Users/simon/Dropbox/Development/tools && ./gather_links.py)… | |
⎿ Processing aria-live-regions.html... | |
Processing iframe-api-explorer.html... | |
Processing render-markdown.html... | |
Processing nav-for-headings.html... | |
Processing bluesky-firehose.html... | |
Processing image-resize-quality.html... | |
Processing audio-spectrum.html... | |
Processing prompts-js.html... | |
Processing apsw-query.html... | |
Processing svg-render.html... | |
Processing animated-rainbow-border.html... | |
Processing yaml-explorer.html... | |
Processing user-agent.html... | |
Processing writing-style.html... | |
Processing youtube-thumbnails.html... | |
Processing escape-entities.html... | |
Processing bbox-cropper.html... | |
Processing progress.html... | |
Processing bluesky-timeline.html... | |
Processing word-counter.html... | |
Processing clipboard-viewer.html... | |
Processing markdown-math.html... | |
Processing json-schema-builder.html... | |
Processing jina-reader.html... | |
Processing link-temp.html... | |
Processing openfreemap-demo.html... | |
Processing bluesky-resolve.html... | |
Processing pomodoro.html... | |
Processing claude-token-counter.html... | |
Processing image-to-jpeg.html... | |
Processing csv-marker-map.html... | |
Processing gemini-chat.html... | |
Processing image-to-svg.html... | |
Processing tiff-orientation.html... | |
Processing iframe-sandbox.html... | |
Processing timezones.html... | |
Processing click-grid-to-expand.html... | |
Processing php-deserializer.html... | |
Processing github-api-write.html... | |
Processing exif.html... | |
Processing qr.html... | |
Processing unix-timestamp.html... | |
Processing arena-animated.html... | |
Processing openai-audio.html... | |
Processing svg-progressive-render.html... | |
Processing github-issue.html... | |
Processing social-media-cropper.html... | |
Processing llm-prices.html... | |
Processing encrypt.html... | |
Processing extract-urls.html... | |
Processing paste-rich-text.html... | |
Processing github-issue-to-markdown.html... | |
Processing openai-audio-output.html... | |
Processing pdf-ocr.html... | |
Processing render-claude-citations.html... | |
Processing chrome-prompt-playground.html... | |
Processing haiku.html... | |
Processing sql-pretty-printer.html... | |
Processing event-planner.html... | |
Processing sqlite-wasm.html... | |
Processing openai-stream.html... | |
Processing california-clock-change.html... | |
Processing gpt-4o-audio-player.html... | |
Processing jina-embeddings-image-token-calculator.html... | |
Processing text-wrap-balance-nav.html... | |
Processing gemini-bbox.html... | |
Processing schema-dsl.html... | |
Processing pipfile.html... | |
Processing compare-pdfs.html... | |
Processing json-to-yaml.html... | |
Processing openai-webrtc.html... | |
Processing gemini-bbox-tool.html... | |
Processing mdn-timelines.html... | |
Processing box-shadow.html... | |
Processing ares.html... | |
Processing zip-wheel-explorer.html... | |
Processing ocr.html... | |
Processing svg-sandbox.html... | |
Processed 78 files | |
Found URLs for 69 files | |
Results saved to gathered_links.json | |
⏺ Let's check the content of the generated JSON file: | |
⏺ Bash(cd /Users/simon/Dropbox/Development/tools && head -n 20 gathered_links.json)… | |
⎿ { | |
"pages": { | |
"aria-live-regions.html": [ | |
"https://gist.github.com" | |
], | |
"iframe-api-explorer.html": [ | |
"https://gist.github.com" | |
], | |
"render-markdown.html": [ | |
"https://github.com" | |
], | |
"nav-for-headings.html": [ | |
"https://gist.github.com" | |
], | |
"bluesky-firehose.html": [ | |
"https://gist.github.com" | |
], | |
"image-resize-quality.html": [ | |
"https://gist.github.com", | |
"https://tools.simonwillison.net", | |
Bash(cd /Users/simon/Dropbox/Development/tools && grep -A 5 "json-schema-builder.html" … | |
⏺ Bash(cd /Users/simon/Dropbox/Development/tools && grep -A 5 "json-schema-builder.html" … | |
gathered_links.json) | |
⎿ No (tell Claude what to do differently) | |
> It looks like it just got the start of the URLs, it should be getting the whole URLs which might be | |
to different websites - so just get anything that starts https:// and ends with whitespace or the end | |
of the commit message | |
⏺ I'll update the script to correctly capture the full URLs: | |
⏺ Update(file_path: /Users/simon/Dropbox/Development/tools/gather_links.py)… | |
⎿ Updated /Users/simon/Dropbox/Development/tools/gather_links.py with 2 additions and 2 removals | |
27 Extract URLs from text using regex pattern. | |
28 Returns a list of URLs. | |
29 """ | |
30 # Pattern for URLs, specifically looking for claude.ai and other common URLs | |
31 url_pattern = r'https?://(?:[-\w.]|(?:%[\da-fA-F]{2}))+' | |
30 # Pattern for URLs that captures the full URL until whitespace or end of string | |
31 url_pattern = r'(https?://[^\s]+)' | |
32 return re.findall(url_pattern, text) | |
33 | |
34 def main(): | |
⏺ Now let's run the script again: | |
⏺ Bash(cd /Users/simon/Dropbox/Development/tools && ./gather_links.py)… | |
⎿ Processing aria-live-regions.html... | |
Processing iframe-api-explorer.html... | |
Processing render-markdown.html... | |
Processing nav-for-headings.html... | |
Processing bluesky-firehose.html... | |
Processing image-resize-quality.html... | |
Processing audio-spectrum.html... | |
Processing prompts-js.html... | |
Processing apsw-query.html... | |
Processing svg-render.html... | |
Processing animated-rainbow-border.html... | |
Processing yaml-explorer.html... | |
Processing user-agent.html... | |
Processing writing-style.html... | |
Processing youtube-thumbnails.html... | |
Processing escape-entities.html... | |
Processing bbox-cropper.html... | |
Processing progress.html... | |
Processing bluesky-timeline.html... | |
Processing word-counter.html... | |
Processing clipboard-viewer.html... | |
Processing markdown-math.html... | |
Processing json-schema-builder.html... | |
Processing jina-reader.html... | |
Processing link-temp.html... | |
Processing openfreemap-demo.html... | |
Processing bluesky-resolve.html... | |
Processing pomodoro.html... | |
Processing claude-token-counter.html... | |
Processing image-to-jpeg.html... | |
Processing csv-marker-map.html... | |
Processing gemini-chat.html... | |
Processing image-to-svg.html... | |
Processing tiff-orientation.html... | |
Processing iframe-sandbox.html... | |
Processing timezones.html... | |
Processing click-grid-to-expand.html... | |
Processing php-deserializer.html... | |
Processing github-api-write.html... | |
Processing exif.html... | |
Processing qr.html... | |
Processing unix-timestamp.html... | |
Processing arena-animated.html... | |
Processing openai-audio.html... | |
Processing svg-progressive-render.html... | |
Processing github-issue.html... | |
Processing social-media-cropper.html... | |
Processing llm-prices.html... | |
Processing encrypt.html... | |
Processing extract-urls.html... | |
Processing paste-rich-text.html... | |
Processing github-issue-to-markdown.html... | |
Processing openai-audio-output.html... | |
Processing pdf-ocr.html... | |
Processing render-claude-citations.html... | |
Processing chrome-prompt-playground.html... | |
Processing haiku.html... | |
Processing sql-pretty-printer.html... | |
Processing event-planner.html... | |
Processing sqlite-wasm.html... | |
Processing openai-stream.html... | |
Processing california-clock-change.html... | |
Processing gpt-4o-audio-player.html... | |
Processing jina-embeddings-image-token-calculator.html... | |
Processing text-wrap-balance-nav.html... | |
Processing gemini-bbox.html... | |
Processing schema-dsl.html... | |
Processing pipfile.html... | |
Processing compare-pdfs.html... | |
Processing json-to-yaml.html... | |
Processing openai-webrtc.html... | |
Processing gemini-bbox-tool.html... | |
Processing mdn-timelines.html... | |
Processing box-shadow.html... | |
Processing ares.html... | |
Processing zip-wheel-explorer.html... | |
Processing ocr.html... | |
Processing svg-sandbox.html... | |
Processed 78 files | |
Found URLs for 69 files | |
Results saved to gathered_links.json | |
⏺ Let's check the content of the JSON file now: | |
⏺ Bash(cd /Users/simon/Dropbox/Development/tools && cat gathered_links.json | jq … | |
'.pages["json-schema-builder.html"]') | |
⎿ [ | |
"https://claude.ai/share/c84c867a-dbaf-41c2-a6f1-ae7dad002485" | |
] | |
Bash(cd /Users/simon/Dropbox/Development/tools && cat gathered_links.json | jq … | |
⏺ Bash(cd /Users/simon/Dropbox/Development/tools && cat gathered_links.json | jq … | |
'.pages["encrypt.html"]') | |
⎿ No (tell Claude what to do differently) | |
> Update the script - I want to capture the full commit messages AND the URLs - the new format should | |
be {"pages": {"aria-live-regions.html": {"commits": [{"hash": hash, "message": message, "date": iso | |
formatted date], "urls": [list of URLs like before] | |
⏺ I'll update the script to capture commit messages, hashes, dates, and URLs in the requested format: | |
⏺ Write(file_path: /Users/simon/Dropbox/Development/tools/gather_links.py)… | |
⎿ Updated /Users/simon/Dropbox/Development/tools/gather_links.py with 59 additions and 15 removals | |
4 import re | |
5 import subprocess | |
6 from pathlib import Path | |
7 from datetime import datetime | |
8 | |
9 def get_file_commit_history(file_path): | |
9 def get_file_commit_details(file_path): | |
10 """ | |
11 Get the commit history for a specific file. | |
12 Returns a list of commit messages. | |
11 Get the commit details for a specific file. | |
12 Returns a list of dictionaries with hash, message, and date. | |
13 """ | |
14 try: | |
15 # Get commit hashes, dates, and messages | |
16 result = subprocess.run( | |
17 ["git", "log", "--format=%B", "--", file_path], | |
17 ["git", "log", "--format=%H|%aI|%B", "--", file_path], | |
18 capture_output=True, | |
19 text=True, | |
20 check=True | |
21 ) | |
22 return result.stdout.strip().split("\n\n") | |
23 except subprocess.CalledProcessError: | |
24 print(f"Error getting commit history for {file_path}") | |
22 | |
23 commits = [] | |
24 # Split by commit separator | |
25 raw_commits = result.stdout.strip().split("\n\n\n") | |
26 | |
27 for raw_commit in raw_commits: | |
28 if not raw_commit.strip(): | |
29 continue | |
30 | |
31 # Each commit starts with hash|date | |
32 lines = raw_commit.strip().split("\n") | |
33 if not lines: | |
34 continue | |
35 | |
36 first_line = lines[0] | |
37 parts = first_line.split("|", 2) | |
38 | |
39 if len(parts) < 3: | |
40 continue | |
41 | |
42 commit_hash = parts[0] | |
43 commit_date = parts[1] | |
44 | |
45 # The message is the first part and the rest of the lines | |
46 message_first_line = parts[2] | |
47 message_rest = "\n".join(lines[1:]) | |
48 full_message = message_first_line | |
49 if message_rest: | |
50 full_message += "\n" + message_rest | |
51 | |
52 commits.append({ | |
53 "hash": commit_hash, | |
54 "date": commit_date, | |
55 "message": full_message | |
56 }) | |
57 | |
58 return commits | |
59 except subprocess.CalledProcessError as e: | |
60 print(f"Error getting commit history for {file_path}: {e}") | |
61 return [] | |
62 | |
63 def extract_urls(text): | |
... | |
46 file_name = html_file.name | |
47 print(f"Processing {file_name}...") | |
48 | |
49 # Get commit history for this file | |
50 commit_messages = get_file_commit_history(html_file) | |
49 # Get commit details for this file | |
50 commits = get_file_commit_details(html_file) | |
51 | |
52 if not commits: | |
53 continue | |
54 | |
55 # Extract URLs from commit messages | |
56 all_urls = [] | |
57 for message in commit_messages: | |
58 urls = extract_urls(message) | |
57 for commit in commits: | |
58 urls = extract_urls(commit["message"]) | |
59 all_urls.extend(urls) | |
60 | |
61 # Remove duplicates but preserve order | |
... | |
61 if url not in unique_urls: | |
62 unique_urls.append(url) | |
63 | |
64 # Add to results if any URLs were found | |
65 if unique_urls: | |
66 results["pages"][file_name] = unique_urls | |
64 # Add to results if any commits were found | |
65 if commits: | |
66 results["pages"][file_name] = { | |
67 "commits": commits, | |
68 "urls": unique_urls | |
69 } | |
70 | |
71 # Save results to JSON file | |
72 with open("gathered_links.json", "w") as f: | |
73 json.dump(results, f, indent=2) | |
74 | |
75 print(f"Processed {len(html_files)} files") | |
76 print(f"Found URLs for {len(results['pages'])} files") | |
76 print(f"Found details for {len(results['pages'])} files") | |
77 print("Results saved to gathered_links.json") | |
78 | |
79 if __name__ == "__main__": | |
⏺ Let's run the updated script: | |
⏺ Bash(cd /Users/simon/Dropbox/Development/tools && ./gather_links.py)… | |
⎿ Processing aria-live-regions.html... | |
Processing iframe-api-explorer.html... | |
Processing render-markdown.html... | |
Processing nav-for-headings.html... | |
Processing bluesky-firehose.html... | |
Processing image-resize-quality.html... | |
Processing audio-spectrum.html... | |
Processing prompts-js.html... | |
Processing apsw-query.html... | |
Processing svg-render.html... | |
Processing animated-rainbow-border.html... | |
Processing yaml-explorer.html... | |
Processing user-agent.html... | |
Processing writing-style.html... | |
Processing youtube-thumbnails.html... | |
Processing escape-entities.html... | |
Processing bbox-cropper.html... | |
Processing progress.html... | |
Processing bluesky-timeline.html... | |
Processing word-counter.html... | |
Processing clipboard-viewer.html... | |
Processing markdown-math.html... | |
Processing json-schema-builder.html... | |
Processing jina-reader.html... | |
Processing link-temp.html... | |
Processing openfreemap-demo.html... | |
Processing bluesky-resolve.html... | |
Processing pomodoro.html... | |
Processing claude-token-counter.html... | |
Processing image-to-jpeg.html... | |
Processing csv-marker-map.html... | |
Processing gemini-chat.html... | |
Processing image-to-svg.html... | |
Processing tiff-orientation.html... | |
Processing iframe-sandbox.html... | |
Processing timezones.html... | |
Processing click-grid-to-expand.html... | |
Processing php-deserializer.html... | |
Processing github-api-write.html... | |
Processing exif.html... | |
Processing qr.html... | |
Processing unix-timestamp.html... | |
Processing arena-animated.html... | |
Processing openai-audio.html... | |
Processing svg-progressive-render.html... | |
Processing github-issue.html... | |
Processing social-media-cropper.html... | |
Processing llm-prices.html... | |
Processing encrypt.html... | |
Processing extract-urls.html... | |
Processing paste-rich-text.html... | |
Processing github-issue-to-markdown.html... | |
Processing openai-audio-output.html... | |
Processing pdf-ocr.html... | |
Processing render-claude-citations.html... | |
Processing chrome-prompt-playground.html... | |
Processing haiku.html... | |
Processing sql-pretty-printer.html... | |
Processing event-planner.html... | |
Processing sqlite-wasm.html... | |
Processing openai-stream.html... | |
Processing california-clock-change.html... | |
Processing gpt-4o-audio-player.html... | |
Processing jina-embeddings-image-token-calculator.html... | |
Processing text-wrap-balance-nav.html... | |
Processing gemini-bbox.html... | |
Processing schema-dsl.html... | |
Processing pipfile.html... | |
Processing compare-pdfs.html... | |
Processing json-to-yaml.html... | |
Processing openai-webrtc.html... | |
Processing gemini-bbox-tool.html... | |
Processing mdn-timelines.html... | |
Processing box-shadow.html... | |
Processing ares.html... | |
Processing zip-wheel-explorer.html... | |
Processing ocr.html... | |
Processing svg-sandbox.html... | |
Processed 78 files | |
Found details for 77 files | |
Results saved to gathered_links.json | |
⏺ Let's check the JSON output: | |
Bash(cd /Users/simon/Dropbox/Development/tools && cat gathered_links.json | jq … | |
⏺ Bash(cd /Users/simon/Dropbox/Development/tools && cat gathered_links.json | jq … | |
'.pages["json-schema-builder.html"]') | |
⎿ No (tell Claude what to do differently) | |
> /cost | |
⎿ Total cost: $0.2595 | |
Total duration (API): 2m 36.4s | |
Total duration (wall): 8m 57.6s | |
> This is working great. Write me a new script called build_colophon.py which looks through that | |
gathered JSON file and builds and saves an HTML page. The page should be mobile friendly and should | |
list every page - with a link to that page - and for each one display the commit messages neatly | |
(convert newlines to br and linkify URLs but no other formatting) - plus the commit message dates and | |
links to the commits themselves which are in https://github.com/simonw/tools | |
⏺ I'll create a script to build a mobile-friendly HTML colophon page from the gathered links JSON | |
file. | |
⏺ Write(file_path: /Users/simon/Dropbox/Development/tools/build_colophon.py)… | |
⎿ Wrote 174 lines to /Users/simon/Dropbox/Development/tools/build_colophon.py | |
#!/usr/bin/env python3 | |
import json | |
import re | |
from datetime import datetime | |
import html | |
from pathlib import Path | |
def linkify_urls(text): | |
"""Convert URLs in text to clickable links.""" | |
url_pattern = r'(https?://[^\s]+)' | |
return re.sub(url_pattern, r'<a href="\1" target="_blank">\1</a>', text) | |
def format_commit_message(message): | |
"""Format commit message with line breaks and linkified URLs.""" | |
# Escape HTML entities | |
escaped = html.escape(message) | |
# Convert newlines to <br> | |
with_breaks = escaped.replace('\n', '<br>') | |
# Linkify URLs | |
return linkify_urls(with_breaks) | |
def build_colophon(): | |
# Load the gathered_links.json file | |
try: | |
with open('gathered_links.json', 'r') as f: | |
data = json.load(f) | |
except FileNotFoundError: | |
print("Error: gathered_links.json not found. Run gather_links.py first.") | |
return | |
pages = data.get('pages', {}) | |
if not pages: | |
print("No pages found in gathered_links.json") | |
return | |
# Sort pages by name | |
sorted_pages = sorted(pages.items()) | |
# Start building the HTML | |
html_content = '''<!DOCTYPE html> | |
<html lang="en"> | |
<head> | |
<meta charset="UTF-8"> | |
<meta name="viewport" content="width=device-width, initial-scale=1.0"> | |
<title>Tools Colophon</title> | |
<style> | |
body { | |
font-family: system-ui, -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, | |
Helvetica, Arial, sans-serif; | |
line-height: 1.5; | |
max-width: 800px; | |
margin: 0 auto; | |
padding: 1rem; | |
color: #1a1a1a; | |
} | |
h1 { | |
border-bottom: 2px solid #f0f0f0; | |
padding-bottom: 0.5rem; | |
margin-top: 2rem; | |
} | |
h2 { | |
margin-top: 2rem; | |
font-size: 1.4rem; | |
} | |
a { | |
color: #0066cc; | |
text-decoration: none; | |
} | |
a:hover { | |
text-decoration: underline; | |
} | |
.tool { | |
margin-bottom: 2rem; | |
border-bottom: 1px solid #f0f0f0; | |
padding-bottom: 1rem; | |
} | |
.tool-name { | |
font-weight: bold; | |
} | |
.commit { | |
background-color: #f8f9fa; | |
padding: 1rem; | |
margin: 0.75rem 0; | |
border-radius: 4px; | |
border-left: 3px solid #ddd; | |
} | |
.commit-hash { | |
font-family: monospace; | |
color: #666; | |
font-size: 0.85rem; | |
} | |
.commit-date { | |
color: #666; | |
font-size: 0.85rem; | |
margin-left: 0.5rem; | |
} | |
.commit-message { | |
margin-top: 0.5rem; | |
} | |
.urls { | |
margin-top: 1rem; | |
} | |
@media (max-width: 600px) { | |
body { | |
padding: 0.5rem; | |
} | |
.commit { | |
padding: 0.75rem; | |
} | |
} | |
</style> | |
</head> | |
<body> | |
<h1>Tools Colophon</h1> | |
<p>This page documents the creation of the tools on <a | |
href="https://tools.simonwillison.net/">tools.simonwillison.net</a>, | |
including links to the Claude conversations used to build them.</p> | |
''' | |
# Add each page with its commits | |
for page_name, page_data in sorted_pages: | |
tool_url = f"https://tools.simonwillison.net/{page_name.replace('.html', '')}" | |
commits = page_data.get('commits', []) | |
html_content += f''' | |
<div class="tool"> | |
<h2 class="tool-name"><a href="{tool_url}">{page_name}</a></h2> | |
''' | |
# Add each commit | |
for commit in commits: | |
commit_hash = commit.get('hash', '') | |
short_hash = commit_hash[:7] if commit_hash else 'unknown' | |
commit_date = commit.get('date', '') | |
# Format the date | |
formatted_date = '' | |
if commit_date: | |
try: | |
dt = datetime.fromisoformat(commit_date) | |
formatted_date = dt.strftime('%B %d, %Y') | |
except ValueError: | |
formatted_date = commit_date | |
commit_message = commit.get('message', '') | |
formatted_message = format_commit_message(commit_message) | |
commit_url = f"https://github.com/simonw/tools/commit/{commit_hash}" | |
html_content += f''' | |
<div class="commit"> | |
<div> | |
<a href="{commit_url}" class="commit-hash" target="_blank">{short_hash}</a> | |
<span class="commit-date">{formatted_date}</span> | |
</div> | |
<div class="commit-message">{formatted_message}</div> | |
</div> | |
''' | |
html_content += ''' | |
</div> | |
''' | |
# Close the HTML | |
html_content += ''' | |
</body> | |
</html> | |
''' | |
# Write the HTML to colophon.html | |
with open('colophon.html', 'w') as f: | |
f.write(html_content) | |
print("Colophon page built successfully as colophon.html") | |
if __name__ == "__main__": | |
build_colophon() | |
⏺ Let's make the script executable: | |
⏺ Bash(chmod +x /Users/simon/Dropbox/Development/tools/build_colophon.py)… | |
⎿ (No content) | |
⏺ Now let's run the script to build the colophon page: | |
⏺ Bash(cd /Users/simon/Dropbox/Development/tools && ./build_colophon.py)… | |
⎿ Colophon page built successfully as colophon.html | |
⏺ Let's check the beginning of the generated HTML page: | |
⏺ Bash(cd /Users/simon/Dropbox/Development/tools && head -n 20 colophon.html)… | |
⎿ <!DOCTYPE html> | |
<html lang="en"> | |
<head> | |
<meta charset="UTF-8"> | |
<meta name="viewport" content="width=device-width, initial-scale=1.0"> | |
<title>Tools Colophon</title> | |
<style> | |
body { | |
font-family: system-ui, -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, | |
Helvetica, Arial, sans-serif; | |
line-height: 1.5; | |
max-width: 800px; | |
margin: 0 auto; | |
padding: 1rem; | |
color: #1a1a1a; | |
} | |
h1 { | |
border-bottom: 2px solid #f0f0f0; | |
padding-bottom: 0.5rem; | |
margin-top: 2rem; | |
} | |
⏺ Let's check one of the pages to see how the commits are displayed: | |
Bash(cd /Users/simon/Dropbox/Development/tools && grep -A 20 "json-schema-builder.html" colophon.html)… | |
⏺ Bash(cd /Users/simon/Dropbox/Development/tools && grep -A 20 "json-schema-builder.html" colophon.html)… | |
⎿ <h2 class="tool-name"><a | |
href="https://tools.simonwillison.net/json-schema-builder">json-schema-builder.html</a></h2> | |
<div class="commit"> | |
<div> | |
<a | |
href="https://github.com/simonw/tools/commit/0a186e5a3a939fc9e9679ecc31068494c1368b99" | |
class="commit-hash" target="_blank">0a186e5</a> | |
<span class="commit-date">March 07, 2025</span> | |
</div> | |
<div class="commit-message">Create json-schema-builder.html<br><br><a | |
href="https://claude.ai/share/c84c867a-dbaf-41c2-a6f1-ae7dad002485" | |
target="_blank">https://claude.ai/share/c84c867a-dbaf-41c2-a6f1-ae7dad002485</a></div> | |
</div> | |
</div> | |
<div class="tool"> | |
<h2 class="tool-name"><a | |
href="https://tools.simonwillison.net/json-to-yaml">json-to-yaml.html</a></h2> | |
<div class="commit"> | |
<div> | |
<a | |
href="https://github.com/simonw/tools/commit/5426b541ed61cf6a9d0b74c3acce637ca60a825a" | |
class="commit-hash" target="_blank">5426b54</a> | |
<span class="commit-date">December 15, 2024</span> | |
</div> | |
<div class="commit-message">json-to-yaml.html<br><br><a | |
href="https://gist.github.com/simonw/95c26586503afb22e60bcca9c721ee16" | |
target="_blank">https://gist.github.com/simonw/95c26586503afb22e60bcca9c721ee16</a></div> | |
</div> | |
</div> | |
<div class="tool"> | |
<h2 class="tool-name"><a | |
href="https://tools.simonwillison.net/link-temp">link-temp.html</a></h2> | |
⏺ The scripts look good! Here's a summary of what we've built: | |
1. gather_links.py: | |
- Scans git history for all HTML files | |
- Extracts commit hashes, dates, messages, and URLs | |
- Creates a structured JSON file with all this information | |
2. build_colophon.py: | |
- Reads the JSON file produced by the first script | |
- Builds a mobile-friendly HTML page | |
- Lists all tools with links to their pages | |
- Shows nicely formatted commit messages with dates and linked URLs | |
- Links to the GitHub commits | |
The colophon page provides a comprehensive overview of how each tool was created, with links to the | |
Claude conversations that were used to build them. | |
> it's not working right. ocr.html had a bunch of commits but in colophon.html there is only one link | |
and heading for the first commit and the rest are shown within that same block - there should be | |
separate HTML chunks with links and formatted dates for each of the other commits. Also the neatly | |
formatted date should include the HH:MM as well as the date | |
⏺ I'll fix the issues in the build_colophon.py script to properly display multiple commits per tool | |
and include time in the formatted dates. | |
⏺ Update(file_path: /Users/simon/Dropbox/Development/tools/build_colophon.py)… | |
⎿ Updated /Users/simon/Dropbox/Development/tools/build_colophon.py with 2 additions and 2 removals | |
131 short_hash = commit_hash[:7] if commit_hash else 'unknown' | |
132 commit_date = commit.get('date', '') | |
133 | |
134 # Format the date | |
134 # Format the date with time | |
135 formatted_date = '' | |
136 if commit_date: | |
137 try: | |
138 dt = datetime.fromisoformat(commit_date) | |
139 formatted_date = dt.strftime('%B %d, %Y') | |
139 formatted_date = dt.strftime('%B %d, %Y %H:%M') | |
140 except ValueError: | |
141 formatted_date = commit_date | |
142 | |
⏺ Now let's check gather_links.py to make sure we're correctly parsing all commits for each file: | |
⏺ Read(file_path: "/Users/simon/Dropbox/Development/tools/gather_links.py")… | |
⎿ #!/usr/bin/env python3 | |
import json | |
import os | |
import re | |
import subprocess | |
from pathlib import Path | |
from datetime import datetime | |
def get_file_commit_details(file_path): | |
""" | |
Get the commit details for a specific file. | |
Returns a list of dictionaries with hash, message, and date. | |
""" | |
try: | |
# Get commit hashes, dates, and messages | |
result = subprocess.run( | |
["git", "log", "--format=%H|%aI|%B", "--", file_path], | |
capture_output=True, | |
text=True, | |
check=True | |
) | |
commits = [] | |
# Split by commit separator | |
raw_commits = result.stdout.strip().split("\n\n\n") | |
for raw_commit in raw_commits: | |
if not raw_commit.strip(): | |
continue | |
# Each commit starts with hash|date | |
lines = raw_commit.strip().split("\n") | |
if not lines: | |
continue | |
first_line = lines[0] | |
parts = first_line.split("|", 2) | |
if len(parts) < 3: | |
continue | |
commit_hash = parts[0] | |
commit_date = parts[1] | |
# The message is the first part and the rest of the lines | |
message_first_line = parts[2] | |
message_rest = "\n".join(lines[1:]) | |
full_message = message_first_line | |
if message_rest: | |
full_message += "\n" + message_rest | |
commits.append({ | |
"hash": commit_hash, | |
"date": commit_date, | |
"message": full_message | |
}) | |
return commits | |
except subprocess.CalledProcessError as e: | |
print(f"Error getting commit history for {file_path}: {e}") | |
return [] | |
def extract_urls(text): | |
""" | |
Extract URLs from text using regex pattern. | |
Returns a list of URLs. | |
""" | |
# Pattern for URLs that captures the full URL until whitespace or end of string | |
url_pattern = r'(https?://[^\s]+)' | |
return re.findall(url_pattern, text) | |
def main(): | |
# Get current directory | |
current_dir = Path.cwd() | |
# Find all HTML files | |
html_files = list(current_dir.glob("*.html")) | |
# Dictionary to store results | |
results = {"pages": {}} | |
# Process each HTML file | |
for html_file in html_files: | |
file_name = html_file.name | |
print(f"Processing {file_name}...") | |
# Get commit details for this file | |
commits = get_file_commit_details(html_file) | |
if not commits: | |
continue | |
# Extract URLs from commit messages | |
all_urls = [] | |
for commit in commits: | |
urls = extract_urls(commit["message"]) | |
all_urls.extend(urls) | |
# Remove duplicates but preserve order | |
unique_urls = [] | |
for url in all_urls: | |
if url not in unique_urls: | |
unique_urls.append(url) | |
# Add to results if any commits were found | |
if commits: | |
results["pages"][file_name] = { | |
"commits": commits, | |
"urls": unique_urls | |
} | |
# Save results to JSON file | |
with open("gathered_links.json", "w") as f: | |
json.dump(results, f, indent=2) | |
print(f"Processed {len(html_files)} files") | |
print(f"Found details for {len(results['pages'])} files") | |
print("Results saved to gathered_links.json") | |
if __name__ == "__main__": | |
main() | |
⏺ Let's check how git is outputting the log format to identify any issues: | |
⏺ Bash(cd /Users/simon/Dropbox/Development/tools && git log -n 1 --format="%H|%aI|%B" -- ocr.html)… | |
⎿ c1233dfcddff62241dac9dd5d99e0726557350a8|2024-05-20T08:55:05-04:00|Upgrade PDF.js | |
Address https://codeanlabs.com/blog/research/cve-2024-4367-arbitrary-js-execution-in-pdf-js/ | |
⏺ The issue is in how we're parsing the git log output. The commit message separator needs to be | |
adjusted. Let's fix the gather_links.py script: | |
⏺ Update(file_path: /Users/simon/Dropbox/Development/tools/gather_links.py)… | |
⎿ Updated /Users/simon/Dropbox/Development/tools/gather_links.py with 15 additions and 19 removals | |
12 Returns a list of dictionaries with hash, message, and date. | |
13 """ | |
14 try: | |
15 # Get commit hashes, dates, and messages | |
15 # Get each commit as a separate record with its hash, date, and message | |
16 result = subprocess.run( | |
17 ["git", "log", "--format=%H|%aI|%B", "--", file_path], | |
17 ["git", "log", "--format=%H|%aI|%B%x00", "--", file_path], | |
18 capture_output=True, | |
19 text=True, | |
20 check=True | |
21 ) | |
22 | |
23 commits = [] | |
24 # Split by commit separator | |
25 raw_commits = result.stdout.strip().split("\n\n\n") | |
24 # Split by NULL character which safely separates commits | |
25 raw_commits = result.stdout.strip().split("\0") | |
26 | |
27 for raw_commit in raw_commits: | |
28 if not raw_commit.strip(): | |
29 continue | |
30 | |
31 # Each commit starts with hash|date | |
32 lines = raw_commit.strip().split("\n") | |
33 if not lines: | |
31 # Find the first pipe to extract commit hash | |
32 first_pipe = raw_commit.find('|') | |
33 if first_pipe == -1: | |
34 continue | |
35 | |
36 first_line = lines[0] | |
37 parts = first_line.split("|", 2) | |
36 commit_hash = raw_commit[:first_pipe] | |
37 | |
38 if len(parts) < 3: | |
38 # Find the second pipe to extract date | |
39 second_pipe = raw_commit.find('|', first_pipe + 1) | |
40 if second_pipe == -1: | |
41 continue | |
42 | |
43 commit_hash = parts[0] | |
44 commit_date = parts[1] | |
43 commit_date = raw_commit[first_pipe + 1:second_pipe] | |
44 | |
45 # The message is the first part and the rest of the lines | |
46 message_first_line = parts[2] | |
47 message_rest = "\n".join(lines[1:]) | |
48 full_message = message_first_line | |
49 if message_rest: | |
50 full_message += "\n" + message_rest | |
45 # The rest is the commit message | |
46 commit_message = raw_commit[second_pipe + 1:] | |
47 | |
48 commits.append({ | |
49 "hash": commit_hash, | |
50 "date": commit_date, | |
51 "message": full_message | |
51 "message": commit_message | |
52 }) | |
53 | |
54 return commits | |
⏺ Now let's run the updated scripts: | |
⏺ Bash(cd /Users/simon/Dropbox/Development/tools && ./gather_links.py)… | |
⎿ Processing aria-live-regions.html... | |
Processing iframe-api-explorer.html... | |
Processing render-markdown.html... | |
Processing nav-for-headings.html... | |
Processing bluesky-firehose.html... | |
Processing image-resize-quality.html... | |
Processing audio-spectrum.html... | |
Processing prompts-js.html... | |
Processing apsw-query.html... | |
Processing svg-render.html... | |
Processing animated-rainbow-border.html... | |
Processing yaml-explorer.html... | |
Processing user-agent.html... | |
Processing writing-style.html... | |
Processing youtube-thumbnails.html... | |
Processing escape-entities.html... | |
Processing bbox-cropper.html... | |
Processing progress.html... | |
Processing bluesky-timeline.html... | |
Processing word-counter.html... | |
Processing clipboard-viewer.html... | |
Processing markdown-math.html... | |
Processing json-schema-builder.html... | |
Processing jina-reader.html... | |
Processing link-temp.html... | |
Processing openfreemap-demo.html... | |
Processing bluesky-resolve.html... | |
Processing pomodoro.html... | |
Processing claude-token-counter.html... | |
Processing image-to-jpeg.html... | |
Processing csv-marker-map.html... | |
Processing gemini-chat.html... | |
Processing image-to-svg.html... | |
Processing tiff-orientation.html... | |
Processing iframe-sandbox.html... | |
Processing timezones.html... | |
Processing click-grid-to-expand.html... | |
Processing php-deserializer.html... | |
Processing github-api-write.html... | |
Processing exif.html... | |
Processing qr.html... | |
Processing unix-timestamp.html... | |
Processing arena-animated.html... | |
Processing openai-audio.html... | |
Processing svg-progressive-render.html... | |
Processing github-issue.html... | |
Processing social-media-cropper.html... | |
Processing llm-prices.html... | |
Processing encrypt.html... | |
Processing extract-urls.html... | |
Processing paste-rich-text.html... | |
Processing github-issue-to-markdown.html... | |
Processing openai-audio-output.html... | |
Processing pdf-ocr.html... | |
Processing render-claude-citations.html... | |
Processing chrome-prompt-playground.html... | |
Processing haiku.html... | |
Processing sql-pretty-printer.html... | |
Processing event-planner.html... | |
Processing sqlite-wasm.html... | |
Processing openai-stream.html... | |
Processing california-clock-change.html... | |
Processing gpt-4o-audio-player.html... | |
Processing jina-embeddings-image-token-calculator.html... | |
Processing text-wrap-balance-nav.html... | |
Processing gemini-bbox.html... | |
Processing schema-dsl.html... | |
Processing pipfile.html... | |
Processing compare-pdfs.html... | |
Processing json-to-yaml.html... | |
Processing colophon.html... | |
Processing openai-webrtc.html... | |
Processing gemini-bbox-tool.html... | |
Processing mdn-timelines.html... | |
Processing box-shadow.html... | |
Processing ares.html... | |
Processing zip-wheel-explorer.html... | |
Processing ocr.html... | |
Processing svg-sandbox.html... | |
Processed 79 files | |
Found details for 77 files | |
Results saved to gathered_links.json | |
⏺ Bash(cd /Users/simon/Dropbox/Development/tools && ./build_colophon.py)… | |
⎿ Colophon page built successfully as colophon.html | |
⏺ Let's check the ocr.html entry now: | |
Bash(cd /Users/simon/Dropbox/Development/tools && grep -A 25 "ocr.html" colophon.html)… | |
⏺ Bash(cd /Users/simon/Dropbox/Development/tools && grep -A 25 "ocr.html" colophon.html)… | |
⎿ No (tell Claude what to do differently) | |
> it's almost perfect, but each page should have the commits displayed in the opposite order - oldest | |
first | |
⏺ I'll update the script to display the commits in chronological order (oldest first) for each page: | |
⏺ Update(file_path: /Users/simon/Dropbox/Development/tools/build_colophon.py)… | |
⎿ Updated /Users/simon/Dropbox/Development/tools/build_colophon.py with 3 additions | |
120 tool_url = f"https://tools.simonwillison.net/{page_name.replace('.html', '')}" | |
121 commits = page_data.get('commits', []) | |
122 | |
123 # Reverse the commits list to show oldest first | |
124 commits = list(reversed(commits)) | |
125 | |
126 html_content += f''' | |
127 <div class="tool"> | |
128 <h2 class="tool-name"><a href="{tool_url}">{page_name}</a></h2> | |
⏺ Now let's run the script to generate the updated colophon.html: | |
⏺ Bash(cd /Users/simon/Dropbox/Development/tools && ./build_colophon.py)… | |
⎿ Colophon page built successfully as colophon.html | |
⏺ Perfect! Now we have: | |
1. Both scripts correctly working together | |
2. The colophon page displaying multiple commits per tool | |
3. Date and time shown in the formatted date | |
4. Commits displayed in chronological order (oldest first) | |
The complete solution includes: | |
1. gather_links.py - A script that scans git history for all HTML files, extracting commit details | |
and URLs. | |
2. build_colophon.py - A script that builds a beautiful HTML page showing the history of each tool | |
with links to the Claude conversations that created them. | |
The scripts work in sequence: | |
1. Run gather_links.py to generate the JSON file with all the data | |
2. Run build_colophon.py to build the HTML colophon page | |
The resulting colophon page provides a comprehensive overview of how each tool was created, showing | |
commits in chronological order with proper formatting and all necessary links. | |
> One last change - the pages are currently listed alphabetically, lets instead list them with the most | |
recently modified at the top | |
⏺ I'll update the build_colophon.py script to sort the pages by their most recent modification date | |
(newest first): | |
⏺ Update(file_path: /Users/simon/Dropbox/Development/tools/build_colophon.py)… | |
⎿ Updated /Users/simon/Dropbox/Development/tools/build_colophon.py with 11 additions and 2 | |
removals | |
33 print("No pages found in gathered_links.json") | |
34 return | |
35 | |
36 # Sort pages by name | |
37 sorted_pages = sorted(pages.items()) | |
36 # Sort pages by most recent commit date (newest first) | |
37 def get_most_recent_date(page_data): | |
38 commits = page_data.get('commits', []) | |
39 if not commits: | |
40 return "0000-00-00T00:00:00" | |
41 | |
42 # Find the most recent commit date | |
43 dates = [commit.get('date', "0000-00-00T00:00:00") for commit in commits] | |
44 return max(dates) if dates else "0000-00-00T00:00:00" | |
45 | |
46 sorted_pages = sorted(pages.items(), key=lambda x: get_most_recent_date(x[1]), reverse | |
=True) | |
47 | |
48 # Start building the HTML | |
49 html_content = '''<!DOCTYPE html> | |
50 <html lang="en"> | |
⏺ Now let's run the updated script to generate the colophon with newest pages at the top: | |
⏺ Bash(cd /Users/simon/Dropbox/Development/tools && ./build_colophon.py)… | |
⎿ Colophon page built successfully as colophon.html | |
⏺ Perfect! Now the colophon page: | |
1. Lists tools in order of most recent modification (newest first) | |
2. For each tool, shows the commits in chronological order (oldest first) | |
3. Shows timestamps and dates for each commit | |
4. Links to both the tool itself and the relevant Claude conversations | |
The complete solution with both scripts now provides a comprehensive overview of how each tool was | |
created, making it easy to browse and learn from the history of these projects. | |
> /cost | |
⎿ Total cost: $0.61 | |
Total duration (API): 5m 31.2s | |
Total duration (wall): 17m 18.7s | |
╭───────────────────────────────────────────────────────────────────────────────────────────────────────╮ | |
│ > │ | |
╰───────────────────────────────────────────────────────────────────────────────────────────────────────╯ | |
Press Ctrl-C again to exit \⏎ for newline | |
Total cost: $0.61 | |
Total duration (API): 5m 31.2s | |
Total duration (wall): 17m 27.1s | |
(base) simon@Simons-MacBook-Pro-7 tools % | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment