Skip to content

Instantly share code, notes, and snippets.

@simonw
Created March 10, 2025 05:19
Show Gist options
  • Save simonw/323e1b00ee4f8453c7834a7560eeafc1 to your computer and use it in GitHub Desktop.
Save simonw/323e1b00ee4f8453c7834a7560eeafc1 to your computer and use it in GitHub Desktop.
╭────────────────────────────────────────────────╮
│ ✻ Welcome to Claude Code research preview! │
│ │
│ /help for help │
│ │
│ cwd: /Users/simon/Dropbox/Development/tools │
╰────────────────────────────────────────────────╯
> Almost all of the HTML files in this directory were created using Claude prompts, and the details of
those prompts are linked in the commit messages. Build a Python script that checks the commit history
for each HTML file in turn and extracts any URLs from those commit messages into a list. It should
then output a JSON file with this structure: {"pages": {"name-of-file.html": ["url"],
{"name-of-file-2.html": ["url1", "url2"], ... - as you can see, some files may have more than one URL
in their commit history. The script should be called gather_links.py and it should save a JSON file
called gathered_links.json
⏺ I'll build a Python script that extracts URLs from commit messages for HTML files. Let me first
check if we have the git history available and examine some commit messages to understand the
structure.
⏺ Bash(git log -n 5 --oneline)…
  ⎿  a282f7f Link to json schema builder
49c382e Paste rich text in README
b4e3e15 Encrypt / decrypt message
0a186e5 Create json-schema-builder.html
63852fe Maybe fix for Windows?
⏺ Let me check a specific commit message to understand how the URLs are formatted:
⏺ Bash(git show a282f7f)…
  ⎿  commit a282f7fc49d3b106d8cbe625913596c27383e8f5
Author: Simon Willison <swillison@gmail.com>
Date: Fri Mar 7 20:36:22 2025 -0600
Link to json schema builder
diff --git a/README.md b/README.md
index cd2d5a0..8c70ae9 100644
--- a/README.md
+++ b/README.md
@@ -42,6 +42,7 @@ This collection is partly **an experiment** in how much it's possible to get
don
- [Claude Token Counter](https://tools.simonwillison.net/claude-token-counter) - counts the
number of tokens in a Claude prompt
- [OpenAI audio input](https://tools.simonwillison.net/openai-audio) - record audio through the
microphone and send it to OpenAI's audio model
- [OpenAI audio output](https://tools.simonwillison.net/openai-audio-output) - run prompts
against OpenAI that produce audio output and listen to it or download it from the browser
+- [JSON schema builder](https://tools.simonwillison.net/json-schema-builder) - interactive tool
for building a JSON schema
## Miscellaneous
⏺ Now let me check a commit that created an HTML file to see if it contains a URL to the Claude
prompt:
⏺ Bash(git show 0a186e5)…
  ⎿  commit 0a186e5a3a939fc9e9679ecc31068494c1368b99
Author: Simon Willison <swillison@gmail.com>
Date: Fri Mar 7 18:17:52 2025 -0600
Create json-schema-builder.html
https://claude.ai/share/c84c867a-dbaf-41c2-a6f1-ae7dad002485
diff --git a/json-schema-builder.html b/json-schema-builder.html
new file mode 100644
index 0000000..cfcc9cf
--- /dev/null
+++ b/json-schema-builder.html
@@ -0,0 +1,609 @@
+<!DOCTYPE html>
+<html lang="en">
+<head>
+ <meta charset="UTF-8">
+ <meta name="viewport" content="width=device-width, initial-scale=1.0">
+ <title>JSON Schema Builder</title>
+ <style>
+ * {
+ box-sizing: border-box;
+ }
+
+ body {
+ font-family: Helvetica, Arial, sans-serif;
+ margin: 0;
+ padding: 20px;
+ background-color: #f5f5f5;
+ }
+
+ h1, h2, h3 {
+ font-weight: 500;
+ }
+
+ .container {
+ max-width: 100%;
+ margin: 0 auto;
+ background-color: white;
+ border-radius: 8px;
+ box-shadow: 0 2px 10px rgba(0, 0, 0, 0.1);
+ padding: 20px;
+ }
+
+ .schema-builder {
+ display: flex;
+ flex-direction: column;
+ gap: 20px;
+ }
+
+ .editor-panel {
+ width: 100%;
+ padding: 15px;
+ border: 1px solid #ddd;
+ border-radius: 4px;
+ }
+
+ .output-panel {
+ width: 100%;
+ position: relative;
+ }
+
+ .schema-output {
+ width: 100%;
+ height: 300px;
+ background-color: #f8f8f8;
+ border: 1px solid #ddd;
+ border-radius: 4px;
+ padding: 15px;
+ font-family: monospace;
+ overflow-y: auto;
+ white-space: pre-wrap;
+ }
+
+ .property-list {
+ margin-top: 20px;
+ }
+
+ .property-item {
+ border: 1px solid #ddd;
+ border-radius: 4px;
+ padding: 15px;
+ margin-bottom: 15px;
+ background-color: #f9f9f9;
+ }
+
+ .nested-properties {
+ margin-left: 20px;
+ margin-top: 15px;
+ padding-left: 10px;
+ border-left: 2px solid #2196F3;
+ }
+
+ .actions {
+ margin-top: 15px;
+ display: flex;
+ gap: 10px;
+ }
+
+ button {
+ padding: 8px 12px;
+ background-color: #2196F3;
+ color: white;
+ border: none;
+ border-radius: 4px;
+ cursor: pointer;
+ font-size: 14px;
+ }
+
+ button:hover {
+ background-color: #0b7dda;
+ }
+
+ button.secondary {
+ background-color: #f1f1f1;
+ color: #333;
+ border: 1px solid #ddd;
+ }
+
+ button.secondary:hover {
+ background-color: #e1e1e1;
+ }
+
+ button.remove {
+ background-color: #f44336;
+ }
+
+ button.remove:hover {
+ background-color: #d32f2f;
+ }
+
+ button.copy {
+ position: absolute;
+ top: 10px;
+ right: 10px;
+ z-index: 10;
+ }
+
+ .form-group {
+ margin-bottom: 15px;
+ }
+
+ label {
+ display: block;
+ margin-bottom: 5px;
+ font-weight: bold;
+ }
+
+ input, select, textarea {
+ width: 100%;
+ padding: 8px;
+ border: 1px solid #ddd;
+ border-radius: 4px;
+ font-size: 16px;
+ font-family: Helvetica, Arial, sans-serif;
+ }
+
+ .checkbox-group {
+ display: flex;
+ align-items: center;
+ gap: 5px;
+ }
+
+ .checkbox-group input {
+ width: auto;
+ }
+
+ .alert {
+ padding: 10px 15px;
+ background-color: #f8d7da;
+ color: #721c24;
+ border: 1px solid #f5c6cb;
+ border-radius: 4px;
+ margin-bottom: 15px;
+ display: none;
+ }
+
+ .success {
+ background-color: #d4edda;
+ color: #155724;
+ border-color: #c3e6cb;
+ }
+
+ @media (min-width: 768px) {
+ .schema-output {
+ height: 400px;
+ }
+ }
+ </style>
+</head>
+<body>
+ <div class="container">
+ <h1>JSON Schema Builder</h1>
+ <p>Build your JSON schema by adding properties and nested objects.</p>
+
+ <div id="alert" class="alert"></div>
+
+ <div class="schema-builder">
+ <div class="editor-panel">
+ <h2>Schema properties</h2>
+
+ <div class="form-group">
+ <label for="schema-title">Schema title</label>
+ <input type="text" id="schema-title" placeholder="Enter schema title">
+ </div>
+
+ <div class="form-group">
+ <label for="schema-description">Schema description</label>
+ <textarea id="schema-description" rows="2" placeholder="Enter schema
description"></textarea>
+ </div>
+
+ <div class="property-list" id="property-list">
+ <!-- Property items will be added here -->
+ </div>
+
+ <button id="add-property">Add property</button>
+ </div>
+
+ <div class="output-panel">
+ <h2>Schema output</h2>
+ <button id="copy-button" class="copy">Copy</button>
+ <div id="schema-output" class="schema-output"></div>
+ </div>
+ </div>
+ </div>
+
+ <script type="module">
+// Main schema object
+const schema = {
+ type: "object",
+ properties: {}
+};
+
+// DOM elements
+const schemaTitle = document.getElementById("schema-title");
+const schemaDescription = document.getElementById("schema-description");
+const propertyList = document.getElementById("property-list");
+const addPropertyButton = document.getElementById("add-property");
+const schemaOutput = document.getElementById("schema-output");
+const copyButton = document.getElementById("copy-button");
+const alertElement = document.getElementById("alert");
+
+// Event listeners
+schemaTitle.addEventListener("input", updateSchema);
+schemaDescription.addEventListener("input", updateSchema);
+addPropertyButton.addEventListener("click", addProperty);
+copyButton.addEventListener("click", copySchema);
+
+// Initialize
+loadFromHash();
+window.addEventListener("hashchange", loadFromHash);
+
+// Add a new property
+function addProperty(event, parentElement = propertyList, parentPath = null) {
+ const propertyItem = document.createElement("div");
+ propertyItem.className = "property-item";
+
+ const propertyId = "prop-" + Date.now() + "-" + Math.floor(Math.random() * 1000);
+ propertyItem.dataset.id = propertyId;
+
+ propertyItem.innerHTML = `
+ <div class="form-group">
+ <label for="${propertyId}-name">Property name</label>
+ <input type="text" id="${propertyId}-name" class="property-name" placeholder="Enter
property name">
+ </div>
+
+ <div class="form-group">
+ <label for="${propertyId}-type">Property type</label>
+ <select id="${propertyId}-type" class="property-type">
+ <option value="string">String</option>
+ <option value="number">Number</option>
+ <option value="integer">Integer</option>
+ <option value="boolean">Boolean</option>
+ <option value="object">Object</option>
+ <option value="array">Array</option>
+ </select>
+ </div>
+
+ <div class="form-group">
+ <label for="${propertyId}-description">Description</label>
+ <input type="text" id="${propertyId}-description" class="property-description"
placeholder="Property description">
+ </div>
+
+ <div class="form-group array-item-type-container" style="display: none;">
+ <label for="${propertyId}-array-type">Array items type</label>
+ <select id="${propertyId}-array-type" class="array-item-type">
+ <option value="string">String</option>
+ <option value="number">Number</option>
+ <option value="integer">Integer</option>
+ <option value="boolean">Boolean</option>
+ <option value="object">Object</option>
+ </select>
+ </div>
+
+ <div class="form-group">
+ <div class="checkbox-group">
+ <input type="checkbox" id="${propertyId}-required" class="property-required">
+ <label for="${propertyId}-required">Required</label>
+ </div>
+ </div>
+
+ <div class="nested-properties" id="${propertyId}-nested" style="display: none;"></div>
+
+ <div class="actions">
+ <button class="add-nested" style="display: none;">Add nested property</button>
+ <button class="remove">Remove</button>
+ </div>
+ `;
+
+ parentElement.appendChild(propertyItem);
+
+ // Event listeners for the new property
+ const typeSelect = propertyItem.querySelector(".property-type");
+ const arrayTypeContainer = propertyItem.querySelector(".array-item-type-container");
+ const arrayTypeSelect = propertyItem.querySelector(".array-item-type");
+ const nestedContainer = propertyItem.querySelector(".nested-properties");
+ const addNestedButton = propertyItem.querySelector(".add-nested");
+ const removeButton = propertyItem.querySelector(".remove");
+
+ const nameInput = propertyItem.querySelector(".property-name");
+ const descriptionInput = propertyItem.querySelector(".property-description");
+ const requiredCheckbox = propertyItem.querySelector(".property-required");
+
+ // Handle property type change
+ typeSelect.addEventListener("change", function() {
+ const isObject = this.value === "object";
+ const isArray = this.value === "array";
+
+ nestedContainer.style.display = isObject ? "block" : "none";
+ addNestedButton.style.display = isObject ? "inline-block" : "none";
+ arrayTypeContainer.style.display = isArray ? "block" : "none";
+
+ updateSchema();
+ });
+
+ // Handle array item type change
+ arrayTypeSelect.addEventListener("change", function() {
+ const isObject = this.value === "object";
+
+ if (isObject && typeSelect.value === "array") {
+ nestedContainer.style.display = "block";
+ addNestedButton.style.display = "inline-block";
+ } else {
+ nestedContainer.style.display = "none";
+ addNestedButton.style.display = "none";
+ }
+
+ updateSchema();
+ });
+
+ // Add nested property
+ addNestedButton.addEventListener("click", function(event) {
+ event.preventDefault();
+ addProperty(event, nestedContainer, propertyId);
+ });
+
+ // Remove property
+ removeButton.addEventListener("click", function() {
+ propertyItem.remove();
+ updateSchema();
+ });
+
+ // Update schema on input change
+ nameInput.addEventListener("input", updateSchema);
+ descriptionInput.addEventListener("input", updateSchema);
+ requiredCheckbox.addEventListener("change", updateSchema);
+
+ updateSchema();
+ return propertyItem;
+}
+
+// Update schema JSON based on form inputs
+function updateSchema() {
+ // Update title and description
+ if (schemaTitle.value) {
+ schema.title = schemaTitle.value;
+ } else {
+ delete schema.title;
+ }
+
+ if (schemaDescription.value) {
+ schema.description = schemaDescription.value;
+ } else {
+ delete schema.description;
+ }
+
+ // Reset properties and required
+ schema.properties = {};
+ schema.required = [];
+
+ // Process all property items
+ processPropertyItems(propertyList.children, schema.properties, schema.required);
+
+ // Remove required if empty
+ if (schema.required.length === 0) {
+ delete schema.required;
+ }
+
+ // Update the output
+ schemaOutput.textContent = JSON.stringify(schema, null, 2);
+
+ // Update URL hash
+ updateUrlHash();
+}
+
+// Process property items recursively
+function processPropertyItems(items, propertiesObj, requiredArr) {
+ Array.from(items).forEach(item => {
+ if (!item.classList.contains("property-item")) return;
+
+ const nameInput = item.querySelector(".property-name");
+ const typeSelect = item.querySelector(".property-type");
+ const descriptionInput = item.querySelector(".property-description");
+ const requiredCheckbox = item.querySelector(".property-required");
+ const arrayTypeSelect = item.querySelector(".array-item-type");
+ const nestedContainer = item.querySelector(".nested-properties");
+
+ const name = nameInput.value.trim();
+ if (!name) return;
+
+ const type = typeSelect.value;
+
+ // Create property definition
+ const property = {
+ type: type
+ };
+
+ // Add description if provided
+ if (descriptionInput.value.trim()) {
+ property.description = descriptionInput.value.trim();
+ }
+
+ // Handle arrays
+ if (type === "array") {
+ const itemsType = arrayTypeSelect.value;
+
+ if (itemsType === "object") {
+ property.items = {
+ type: "object",
+ properties: {}
+ };
+
+ const nestedRequired = [];
+ processPropertyItems(nestedContainer.children, property.items.properties,
nestedRequired);
+
+ if (nestedRequired.length > 0) {
+ property.items.required = nestedRequired;
+ }
+ } else {
+ property.items = { type: itemsType };
+ }
+ }
+
+ // Handle objects
+ if (type === "object") {
+ property.properties = {};
+ const nestedRequired = [];
+
+ processPropertyItems(nestedContainer.children, property.properties, nestedRequired);
+
+ if (nestedRequired.length > 0) {
+ property.required = nestedRequired;
+ }
+ }
+
+ // Add to properties
+ propertiesObj[name] = property;
+
+ // Add to required array if checked
+ if (requiredCheckbox.checked) {
+ requiredArr.push(name);
+ }
+ });
+}
+
+// Copy schema to clipboard
+function copySchema() {
+ const textarea = document.createElement("textarea");
+ textarea.value = schemaOutput.textContent;
+ document.body.appendChild(textarea);
+ textarea.select();
+
+ try {
+ document.execCommand("copy");
+ showAlert("Schema copied to clipboard!", true);
+ } catch (err) {
+ showAlert("Failed to copy schema", false);
+ }
+
+ document.body.removeChild(textarea);
+}
+
+// Show alert message
+function showAlert(message, isSuccess = false) {
+ alertElement.textContent = message;
+ alertElement.style.display = "block";
+
+ if (isSuccess) {
+ alertElement.classList.add("success");
+ } else {
+ alertElement.classList.remove("success");
+ }
+
+ setTimeout(() => {
+ alertElement.style.display = "none";
+ }, 3000);
+}
+
+// Update URL hash with schema
+function updateUrlHash() {
+ try {
+ const hashValue = btoa(JSON.stringify(schema));
+ window.history.replaceState(null, null, "#" + hashValue);
+ } catch (e) {
+ console.error("Error updating URL hash:", e);
+ }
+}
+
+// Load schema from URL hash
+function loadFromHash() {
+ if (!window.location.hash) return;
+
+ try {
+ const hash = window.location.hash.substring(1);
+ const decodedSchema = JSON.parse(atob(hash));
+
+ // Reset form
+ propertyList.innerHTML = "";
+
+ // Set title and description
+ if (decodedSchema.title) {
+ schemaTitle.value = decodedSchema.title;
+ } else {
+ schemaTitle.value = "";
+ }
+
+ if (decodedSchema.description) {
+ schemaDescription.value = decodedSchema.description;
+ } else {
+ schemaDescription.value = "";
+ }
+
+ // Rebuild properties
+ if (decodedSchema.properties) {
+ buildPropertiesFromSchema(decodedSchema.properties, propertyList, decodedSchema.required
|| []);
+ }
+
+ // Update output
+ schema.title = decodedSchema.title;
+ schema.description = decodedSchema.description;
+ schema.properties = decodedSchema.properties || {};
+ schema.required = decodedSchema.required || [];
+
+ schemaOutput.textContent = JSON.stringify(schema, null, 2);
+ } catch (e) {
+ console.error("Error loading from hash:", e);
+ showAlert("Failed to load schema from URL", false);
+ }
+}
+
+// Build properties from loaded schema
+function buildPropertiesFromSchema(properties, container, requiredArr) {
+ for (const propName in properties) {
+ const property = properties[propName];
+ const propItem = addProperty(null, container);
+
+ // Set basic property fields
+ propItem.querySelector(".property-name").value = propName;
+ propItem.querySelector(".property-type").value = property.type;
+
+ if (property.description) {
+ propItem.querySelector(".property-description").value = property.description;
+ }
+
+ if (requiredArr.includes(propName)) {
+ propItem.querySelector(".property-required").checked = true;
+ }
+
+ // Handle arrays
+ if (property.type === "array" && property.items) {
+ const arrayTypeContainer = propItem.querySelector(".array-item-type-container");
+ const arrayTypeSelect = propItem.querySelector(".array-item-type");
+ const nestedContainer = propItem.querySelector(".nested-properties");
+ const addNestedButton = propItem.querySelector(".add-nested");
+
+ arrayTypeContainer.style.display = "block";
+
+ if (property.items.type) {
+ arrayTypeSelect.value = property.items.type;
+
+ if (property.items.type === "object" && property.items.properties) {
+ nestedContainer.style.display = "block";
+ addNestedButton.style.display = "inline-block";
+
+ buildPropertiesFromSchema(
+ property.items.properties,
+ nestedContainer,
+ property.items.required || []
+ );
+ }
+ }
+ }
+
+ // Handle objects
+ if (property.type === "object" && property.properties) {
+ const nestedContainer = propItem.querySelector(".nested-properties");
+ const addNestedButton = propItem.querySelector(".add-nested");
+
+ nestedContainer.style.display = "block";
+ addNestedButton.style.display = "inline-block";
+
+ buildPropertiesFromSchema(
+ property.properties,
+ nestedContainer,
+ property.required || []
+ );
+ }
+ }
+}
+ </script>
+</body>
+</html>
⏺ Now I'll get a list of HTML files in the directory to process:
⏺ Search(pattern: "*.html", path: "undefined")…
  ⎿  Found 78 files
⏺ Write(file_path: /Users/simon/Dropbox/Development/tools/gather_links.py)…
⎿ Wrote 77 lines to /Users/simon/Dropbox/Development/tools/gather_links.py
#!/usr/bin/env python3
import json
import os
import re
import subprocess
from pathlib import Path
def get_file_commit_history(file_path):
"""
Get the commit history for a specific file.
Returns a list of commit messages.
"""
try:
result = subprocess.run(
["git", "log", "--format=%B", "--", file_path],
capture_output=True,
text=True,
check=True
)
return result.stdout.strip().split("\n\n")
except subprocess.CalledProcessError:
print(f"Error getting commit history for {file_path}")
return []
def extract_urls(text):
"""
Extract URLs from text using regex pattern.
Returns a list of URLs.
"""
# Pattern for URLs, specifically looking for claude.ai and other common URLs
url_pattern = r'https?://(?:[-\w.]|(?:%[\da-fA-F]{2}))+'
return re.findall(url_pattern, text)
def main():
# Get current directory
current_dir = Path.cwd()
# Find all HTML files
html_files = list(current_dir.glob("*.html"))
# Dictionary to store results
results = {"pages": {}}
# Process each HTML file
for html_file in html_files:
file_name = html_file.name
print(f"Processing {file_name}...")
# Get commit history for this file
commit_messages = get_file_commit_history(html_file)
# Extract URLs from commit messages
all_urls = []
for message in commit_messages:
urls = extract_urls(message)
all_urls.extend(urls)
# Remove duplicates but preserve order
unique_urls = []
for url in all_urls:
if url not in unique_urls:
unique_urls.append(url)
# Add to results if any URLs were found
if unique_urls:
results["pages"][file_name] = unique_urls
# Save results to JSON file
with open("gathered_links.json", "w") as f:
json.dump(results, f, indent=2)
print(f"Processed {len(html_files)} files")
print(f"Found URLs for {len(results['pages'])} files")
print("Results saved to gathered_links.json")
if __name__ == "__main__":
main()
⏺ Let's make the script executable:
⏺ Bash(chmod +x /Users/simon/Dropbox/Development/tools/gather_links.py)…
  ⎿  (No content)
⏺ Now let's run the script to generate the JSON file:
⏺ Bash(cd /Users/simon/Dropbox/Development/tools && ./gather_links.py)…
  ⎿  Processing aria-live-regions.html...
Processing iframe-api-explorer.html...
Processing render-markdown.html...
Processing nav-for-headings.html...
Processing bluesky-firehose.html...
Processing image-resize-quality.html...
Processing audio-spectrum.html...
Processing prompts-js.html...
Processing apsw-query.html...
Processing svg-render.html...
Processing animated-rainbow-border.html...
Processing yaml-explorer.html...
Processing user-agent.html...
Processing writing-style.html...
Processing youtube-thumbnails.html...
Processing escape-entities.html...
Processing bbox-cropper.html...
Processing progress.html...
Processing bluesky-timeline.html...
Processing word-counter.html...
Processing clipboard-viewer.html...
Processing markdown-math.html...
Processing json-schema-builder.html...
Processing jina-reader.html...
Processing link-temp.html...
Processing openfreemap-demo.html...
Processing bluesky-resolve.html...
Processing pomodoro.html...
Processing claude-token-counter.html...
Processing image-to-jpeg.html...
Processing csv-marker-map.html...
Processing gemini-chat.html...
Processing image-to-svg.html...
Processing tiff-orientation.html...
Processing iframe-sandbox.html...
Processing timezones.html...
Processing click-grid-to-expand.html...
Processing php-deserializer.html...
Processing github-api-write.html...
Processing exif.html...
Processing qr.html...
Processing unix-timestamp.html...
Processing arena-animated.html...
Processing openai-audio.html...
Processing svg-progressive-render.html...
Processing github-issue.html...
Processing social-media-cropper.html...
Processing llm-prices.html...
Processing encrypt.html...
Processing extract-urls.html...
Processing paste-rich-text.html...
Processing github-issue-to-markdown.html...
Processing openai-audio-output.html...
Processing pdf-ocr.html...
Processing render-claude-citations.html...
Processing chrome-prompt-playground.html...
Processing haiku.html...
Processing sql-pretty-printer.html...
Processing event-planner.html...
Processing sqlite-wasm.html...
Processing openai-stream.html...
Processing california-clock-change.html...
Processing gpt-4o-audio-player.html...
Processing jina-embeddings-image-token-calculator.html...
Processing text-wrap-balance-nav.html...
Processing gemini-bbox.html...
Processing schema-dsl.html...
Processing pipfile.html...
Processing compare-pdfs.html...
Processing json-to-yaml.html...
Processing openai-webrtc.html...
Processing gemini-bbox-tool.html...
Processing mdn-timelines.html...
Processing box-shadow.html...
Processing ares.html...
Processing zip-wheel-explorer.html...
Processing ocr.html...
Processing svg-sandbox.html...
Processed 78 files
Found URLs for 69 files
Results saved to gathered_links.json
⏺ Let's check the content of the generated JSON file:
⏺ Bash(cd /Users/simon/Dropbox/Development/tools && head -n 20 gathered_links.json)…
  ⎿  {
"pages": {
"aria-live-regions.html": [
"https://gist.github.com"
],
"iframe-api-explorer.html": [
"https://gist.github.com"
],
"render-markdown.html": [
"https://github.com"
],
"nav-for-headings.html": [
"https://gist.github.com"
],
"bluesky-firehose.html": [
"https://gist.github.com"
],
"image-resize-quality.html": [
"https://gist.github.com",
"https://tools.simonwillison.net",
Bash(cd /Users/simon/Dropbox/Development/tools && grep -A 5 "json-schema-builder.html" …
⏺ Bash(cd /Users/simon/Dropbox/Development/tools && grep -A 5 "json-schema-builder.html" …
gathered_links.json)
  ⎿  No (tell Claude what to do differently)
> It looks like it just got the start of the URLs, it should be getting the whole URLs which might be
to different websites - so just get anything that starts https:// and ends with whitespace or the end
of the commit message
⏺ I'll update the script to correctly capture the full URLs:
⏺ Update(file_path: /Users/simon/Dropbox/Development/tools/gather_links.py)…
⎿ Updated /Users/simon/Dropbox/Development/tools/gather_links.py with 2 additions and 2 removals
27 Extract URLs from text using regex pattern.
28 Returns a list of URLs.
29 """
30 # Pattern for URLs, specifically looking for claude.ai and other common URLs
31 url_pattern = r'https?://(?:[-\w.]|(?:%[\da-fA-F]{2}))+'
30 # Pattern for URLs that captures the full URL until whitespace or end of string
31 url_pattern = r'(https?://[^\s]+)'
32 return re.findall(url_pattern, text)
33
34 def main():
⏺ Now let's run the script again:
⏺ Bash(cd /Users/simon/Dropbox/Development/tools && ./gather_links.py)…
  ⎿  Processing aria-live-regions.html...
Processing iframe-api-explorer.html...
Processing render-markdown.html...
Processing nav-for-headings.html...
Processing bluesky-firehose.html...
Processing image-resize-quality.html...
Processing audio-spectrum.html...
Processing prompts-js.html...
Processing apsw-query.html...
Processing svg-render.html...
Processing animated-rainbow-border.html...
Processing yaml-explorer.html...
Processing user-agent.html...
Processing writing-style.html...
Processing youtube-thumbnails.html...
Processing escape-entities.html...
Processing bbox-cropper.html...
Processing progress.html...
Processing bluesky-timeline.html...
Processing word-counter.html...
Processing clipboard-viewer.html...
Processing markdown-math.html...
Processing json-schema-builder.html...
Processing jina-reader.html...
Processing link-temp.html...
Processing openfreemap-demo.html...
Processing bluesky-resolve.html...
Processing pomodoro.html...
Processing claude-token-counter.html...
Processing image-to-jpeg.html...
Processing csv-marker-map.html...
Processing gemini-chat.html...
Processing image-to-svg.html...
Processing tiff-orientation.html...
Processing iframe-sandbox.html...
Processing timezones.html...
Processing click-grid-to-expand.html...
Processing php-deserializer.html...
Processing github-api-write.html...
Processing exif.html...
Processing qr.html...
Processing unix-timestamp.html...
Processing arena-animated.html...
Processing openai-audio.html...
Processing svg-progressive-render.html...
Processing github-issue.html...
Processing social-media-cropper.html...
Processing llm-prices.html...
Processing encrypt.html...
Processing extract-urls.html...
Processing paste-rich-text.html...
Processing github-issue-to-markdown.html...
Processing openai-audio-output.html...
Processing pdf-ocr.html...
Processing render-claude-citations.html...
Processing chrome-prompt-playground.html...
Processing haiku.html...
Processing sql-pretty-printer.html...
Processing event-planner.html...
Processing sqlite-wasm.html...
Processing openai-stream.html...
Processing california-clock-change.html...
Processing gpt-4o-audio-player.html...
Processing jina-embeddings-image-token-calculator.html...
Processing text-wrap-balance-nav.html...
Processing gemini-bbox.html...
Processing schema-dsl.html...
Processing pipfile.html...
Processing compare-pdfs.html...
Processing json-to-yaml.html...
Processing openai-webrtc.html...
Processing gemini-bbox-tool.html...
Processing mdn-timelines.html...
Processing box-shadow.html...
Processing ares.html...
Processing zip-wheel-explorer.html...
Processing ocr.html...
Processing svg-sandbox.html...
Processed 78 files
Found URLs for 69 files
Results saved to gathered_links.json
⏺ Let's check the content of the JSON file now:
⏺ Bash(cd /Users/simon/Dropbox/Development/tools && cat gathered_links.json | jq …
'.pages["json-schema-builder.html"]')
  ⎿  [
"https://claude.ai/share/c84c867a-dbaf-41c2-a6f1-ae7dad002485"
]
Bash(cd /Users/simon/Dropbox/Development/tools && cat gathered_links.json | jq …
⏺ Bash(cd /Users/simon/Dropbox/Development/tools && cat gathered_links.json | jq …
'.pages["encrypt.html"]')
  ⎿  No (tell Claude what to do differently)
> Update the script - I want to capture the full commit messages AND the URLs - the new format should
be {"pages": {"aria-live-regions.html": {"commits": [{"hash": hash, "message": message, "date": iso
formatted date], "urls": [list of URLs like before]
⏺ I'll update the script to capture commit messages, hashes, dates, and URLs in the requested format:
⏺ Write(file_path: /Users/simon/Dropbox/Development/tools/gather_links.py)…
⎿ Updated /Users/simon/Dropbox/Development/tools/gather_links.py with 59 additions and 15 removals
4 import re
5 import subprocess
6 from pathlib import Path
7 from datetime import datetime
8
9 def get_file_commit_history(file_path):
9 def get_file_commit_details(file_path):
10 """
11 Get the commit history for a specific file.
12 Returns a list of commit messages.
11 Get the commit details for a specific file.
12 Returns a list of dictionaries with hash, message, and date.
13 """
14 try:
15 # Get commit hashes, dates, and messages
16 result = subprocess.run(
17 ["git", "log", "--format=%B", "--", file_path],
17 ["git", "log", "--format=%H|%aI|%B", "--", file_path],
18 capture_output=True,
19 text=True,
20 check=True
21 )
22 return result.stdout.strip().split("\n\n")
23 except subprocess.CalledProcessError:
24 print(f"Error getting commit history for {file_path}")
22
23 commits = []
24 # Split by commit separator
25 raw_commits = result.stdout.strip().split("\n\n\n")
26
27 for raw_commit in raw_commits:
28 if not raw_commit.strip():
29 continue
30
31 # Each commit starts with hash|date
32 lines = raw_commit.strip().split("\n")
33 if not lines:
34 continue
35
36 first_line = lines[0]
37 parts = first_line.split("|", 2)
38
39 if len(parts) < 3:
40 continue
41
42 commit_hash = parts[0]
43 commit_date = parts[1]
44
45 # The message is the first part and the rest of the lines
46 message_first_line = parts[2]
47 message_rest = "\n".join(lines[1:])
48 full_message = message_first_line
49 if message_rest:
50 full_message += "\n" + message_rest
51
52 commits.append({
53 "hash": commit_hash,
54 "date": commit_date,
55 "message": full_message
56 })
57
58 return commits
59 except subprocess.CalledProcessError as e:
60 print(f"Error getting commit history for {file_path}: {e}")
61 return []
62
63 def extract_urls(text):
...
46 file_name = html_file.name
47 print(f"Processing {file_name}...")
48
49 # Get commit history for this file
50 commit_messages = get_file_commit_history(html_file)
49 # Get commit details for this file
50 commits = get_file_commit_details(html_file)
51
52 if not commits:
53 continue
54
55 # Extract URLs from commit messages
56 all_urls = []
57 for message in commit_messages:
58 urls = extract_urls(message)
57 for commit in commits:
58 urls = extract_urls(commit["message"])
59 all_urls.extend(urls)
60
61 # Remove duplicates but preserve order
...
61 if url not in unique_urls:
62 unique_urls.append(url)
63
64 # Add to results if any URLs were found
65 if unique_urls:
66 results["pages"][file_name] = unique_urls
64 # Add to results if any commits were found
65 if commits:
66 results["pages"][file_name] = {
67 "commits": commits,
68 "urls": unique_urls
69 }
70
71 # Save results to JSON file
72 with open("gathered_links.json", "w") as f:
73 json.dump(results, f, indent=2)
74
75 print(f"Processed {len(html_files)} files")
76 print(f"Found URLs for {len(results['pages'])} files")
76 print(f"Found details for {len(results['pages'])} files")
77 print("Results saved to gathered_links.json")
78
79 if __name__ == "__main__":
⏺ Let's run the updated script:
⏺ Bash(cd /Users/simon/Dropbox/Development/tools && ./gather_links.py)…
  ⎿  Processing aria-live-regions.html...
Processing iframe-api-explorer.html...
Processing render-markdown.html...
Processing nav-for-headings.html...
Processing bluesky-firehose.html...
Processing image-resize-quality.html...
Processing audio-spectrum.html...
Processing prompts-js.html...
Processing apsw-query.html...
Processing svg-render.html...
Processing animated-rainbow-border.html...
Processing yaml-explorer.html...
Processing user-agent.html...
Processing writing-style.html...
Processing youtube-thumbnails.html...
Processing escape-entities.html...
Processing bbox-cropper.html...
Processing progress.html...
Processing bluesky-timeline.html...
Processing word-counter.html...
Processing clipboard-viewer.html...
Processing markdown-math.html...
Processing json-schema-builder.html...
Processing jina-reader.html...
Processing link-temp.html...
Processing openfreemap-demo.html...
Processing bluesky-resolve.html...
Processing pomodoro.html...
Processing claude-token-counter.html...
Processing image-to-jpeg.html...
Processing csv-marker-map.html...
Processing gemini-chat.html...
Processing image-to-svg.html...
Processing tiff-orientation.html...
Processing iframe-sandbox.html...
Processing timezones.html...
Processing click-grid-to-expand.html...
Processing php-deserializer.html...
Processing github-api-write.html...
Processing exif.html...
Processing qr.html...
Processing unix-timestamp.html...
Processing arena-animated.html...
Processing openai-audio.html...
Processing svg-progressive-render.html...
Processing github-issue.html...
Processing social-media-cropper.html...
Processing llm-prices.html...
Processing encrypt.html...
Processing extract-urls.html...
Processing paste-rich-text.html...
Processing github-issue-to-markdown.html...
Processing openai-audio-output.html...
Processing pdf-ocr.html...
Processing render-claude-citations.html...
Processing chrome-prompt-playground.html...
Processing haiku.html...
Processing sql-pretty-printer.html...
Processing event-planner.html...
Processing sqlite-wasm.html...
Processing openai-stream.html...
Processing california-clock-change.html...
Processing gpt-4o-audio-player.html...
Processing jina-embeddings-image-token-calculator.html...
Processing text-wrap-balance-nav.html...
Processing gemini-bbox.html...
Processing schema-dsl.html...
Processing pipfile.html...
Processing compare-pdfs.html...
Processing json-to-yaml.html...
Processing openai-webrtc.html...
Processing gemini-bbox-tool.html...
Processing mdn-timelines.html...
Processing box-shadow.html...
Processing ares.html...
Processing zip-wheel-explorer.html...
Processing ocr.html...
Processing svg-sandbox.html...
Processed 78 files
Found details for 77 files
Results saved to gathered_links.json
⏺ Let's check the JSON output:
Bash(cd /Users/simon/Dropbox/Development/tools && cat gathered_links.json | jq …
⏺ Bash(cd /Users/simon/Dropbox/Development/tools && cat gathered_links.json | jq …
'.pages["json-schema-builder.html"]')
  ⎿  No (tell Claude what to do differently)
> /cost
⎿ Total cost: $0.2595
Total duration (API): 2m 36.4s
Total duration (wall): 8m 57.6s
> This is working great. Write me a new script called build_colophon.py which looks through that
gathered JSON file and builds and saves an HTML page. The page should be mobile friendly and should
list every page - with a link to that page - and for each one display the commit messages neatly
(convert newlines to br and linkify URLs but no other formatting) - plus the commit message dates and
links to the commits themselves which are in https://github.com/simonw/tools
⏺ I'll create a script to build a mobile-friendly HTML colophon page from the gathered links JSON
file.
⏺ Write(file_path: /Users/simon/Dropbox/Development/tools/build_colophon.py)…
⎿ Wrote 174 lines to /Users/simon/Dropbox/Development/tools/build_colophon.py
#!/usr/bin/env python3
import json
import re
from datetime import datetime
import html
from pathlib import Path
def linkify_urls(text):
"""Convert URLs in text to clickable links."""
url_pattern = r'(https?://[^\s]+)'
return re.sub(url_pattern, r'<a href="\1" target="_blank">\1</a>', text)
def format_commit_message(message):
"""Format commit message with line breaks and linkified URLs."""
# Escape HTML entities
escaped = html.escape(message)
# Convert newlines to <br>
with_breaks = escaped.replace('\n', '<br>')
# Linkify URLs
return linkify_urls(with_breaks)
def build_colophon():
# Load the gathered_links.json file
try:
with open('gathered_links.json', 'r') as f:
data = json.load(f)
except FileNotFoundError:
print("Error: gathered_links.json not found. Run gather_links.py first.")
return
pages = data.get('pages', {})
if not pages:
print("No pages found in gathered_links.json")
return
# Sort pages by name
sorted_pages = sorted(pages.items())
# Start building the HTML
html_content = '''<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Tools Colophon</title>
<style>
body {
font-family: system-ui, -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto,
Helvetica, Arial, sans-serif;
line-height: 1.5;
max-width: 800px;
margin: 0 auto;
padding: 1rem;
color: #1a1a1a;
}
h1 {
border-bottom: 2px solid #f0f0f0;
padding-bottom: 0.5rem;
margin-top: 2rem;
}
h2 {
margin-top: 2rem;
font-size: 1.4rem;
}
a {
color: #0066cc;
text-decoration: none;
}
a:hover {
text-decoration: underline;
}
.tool {
margin-bottom: 2rem;
border-bottom: 1px solid #f0f0f0;
padding-bottom: 1rem;
}
.tool-name {
font-weight: bold;
}
.commit {
background-color: #f8f9fa;
padding: 1rem;
margin: 0.75rem 0;
border-radius: 4px;
border-left: 3px solid #ddd;
}
.commit-hash {
font-family: monospace;
color: #666;
font-size: 0.85rem;
}
.commit-date {
color: #666;
font-size: 0.85rem;
margin-left: 0.5rem;
}
.commit-message {
margin-top: 0.5rem;
}
.urls {
margin-top: 1rem;
}
@media (max-width: 600px) {
body {
padding: 0.5rem;
}
.commit {
padding: 0.75rem;
}
}
</style>
</head>
<body>
<h1>Tools Colophon</h1>
<p>This page documents the creation of the tools on <a
href="https://tools.simonwillison.net/">tools.simonwillison.net</a>,
including links to the Claude conversations used to build them.</p>
'''
# Add each page with its commits
for page_name, page_data in sorted_pages:
tool_url = f"https://tools.simonwillison.net/{page_name.replace('.html', '')}"
commits = page_data.get('commits', [])
html_content += f'''
<div class="tool">
<h2 class="tool-name"><a href="{tool_url}">{page_name}</a></h2>
'''
# Add each commit
for commit in commits:
commit_hash = commit.get('hash', '')
short_hash = commit_hash[:7] if commit_hash else 'unknown'
commit_date = commit.get('date', '')
# Format the date
formatted_date = ''
if commit_date:
try:
dt = datetime.fromisoformat(commit_date)
formatted_date = dt.strftime('%B %d, %Y')
except ValueError:
formatted_date = commit_date
commit_message = commit.get('message', '')
formatted_message = format_commit_message(commit_message)
commit_url = f"https://github.com/simonw/tools/commit/{commit_hash}"
html_content += f'''
<div class="commit">
<div>
<a href="{commit_url}" class="commit-hash" target="_blank">{short_hash}</a>
<span class="commit-date">{formatted_date}</span>
</div>
<div class="commit-message">{formatted_message}</div>
</div>
'''
html_content += '''
</div>
'''
# Close the HTML
html_content += '''
</body>
</html>
'''
# Write the HTML to colophon.html
with open('colophon.html', 'w') as f:
f.write(html_content)
print("Colophon page built successfully as colophon.html")
if __name__ == "__main__":
build_colophon()
⏺ Let's make the script executable:
⏺ Bash(chmod +x /Users/simon/Dropbox/Development/tools/build_colophon.py)…
  ⎿  (No content)
⏺ Now let's run the script to build the colophon page:
⏺ Bash(cd /Users/simon/Dropbox/Development/tools && ./build_colophon.py)…
  ⎿  Colophon page built successfully as colophon.html
⏺ Let's check the beginning of the generated HTML page:
⏺ Bash(cd /Users/simon/Dropbox/Development/tools && head -n 20 colophon.html)…
  ⎿  <!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Tools Colophon</title>
<style>
body {
font-family: system-ui, -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto,
Helvetica, Arial, sans-serif;
line-height: 1.5;
max-width: 800px;
margin: 0 auto;
padding: 1rem;
color: #1a1a1a;
}
h1 {
border-bottom: 2px solid #f0f0f0;
padding-bottom: 0.5rem;
margin-top: 2rem;
}
⏺ Let's check one of the pages to see how the commits are displayed:
Bash(cd /Users/simon/Dropbox/Development/tools && grep -A 20 "json-schema-builder.html" colophon.html)…
⏺ Bash(cd /Users/simon/Dropbox/Development/tools && grep -A 20 "json-schema-builder.html" colophon.html)…
  ⎿  <h2 class="tool-name"><a
href="https://tools.simonwillison.net/json-schema-builder">json-schema-builder.html</a></h2>
<div class="commit">
<div>
<a
href="https://github.com/simonw/tools/commit/0a186e5a3a939fc9e9679ecc31068494c1368b99"
class="commit-hash" target="_blank">0a186e5</a>
<span class="commit-date">March 07, 2025</span>
</div>
<div class="commit-message">Create json-schema-builder.html<br><br><a
href="https://claude.ai/share/c84c867a-dbaf-41c2-a6f1-ae7dad002485"
target="_blank">https://claude.ai/share/c84c867a-dbaf-41c2-a6f1-ae7dad002485</a></div>
</div>
</div>
<div class="tool">
<h2 class="tool-name"><a
href="https://tools.simonwillison.net/json-to-yaml">json-to-yaml.html</a></h2>
<div class="commit">
<div>
<a
href="https://github.com/simonw/tools/commit/5426b541ed61cf6a9d0b74c3acce637ca60a825a"
class="commit-hash" target="_blank">5426b54</a>
<span class="commit-date">December 15, 2024</span>
</div>
<div class="commit-message">json-to-yaml.html<br><br><a
href="https://gist.github.com/simonw/95c26586503afb22e60bcca9c721ee16"
target="_blank">https://gist.github.com/simonw/95c26586503afb22e60bcca9c721ee16</a></div>
</div>
</div>
<div class="tool">
<h2 class="tool-name"><a
href="https://tools.simonwillison.net/link-temp">link-temp.html</a></h2>
⏺ The scripts look good! Here's a summary of what we've built:
1. gather_links.py:
- Scans git history for all HTML files
- Extracts commit hashes, dates, messages, and URLs
- Creates a structured JSON file with all this information
2. build_colophon.py:
- Reads the JSON file produced by the first script
- Builds a mobile-friendly HTML page
- Lists all tools with links to their pages
- Shows nicely formatted commit messages with dates and linked URLs
- Links to the GitHub commits
The colophon page provides a comprehensive overview of how each tool was created, with links to the
Claude conversations that were used to build them.
> it's not working right. ocr.html had a bunch of commits but in colophon.html there is only one link
and heading for the first commit and the rest are shown within that same block - there should be
separate HTML chunks with links and formatted dates for each of the other commits. Also the neatly
formatted date should include the HH:MM as well as the date
⏺ I'll fix the issues in the build_colophon.py script to properly display multiple commits per tool
and include time in the formatted dates.
⏺ Update(file_path: /Users/simon/Dropbox/Development/tools/build_colophon.py)…
⎿ Updated /Users/simon/Dropbox/Development/tools/build_colophon.py with 2 additions and 2 removals
131 short_hash = commit_hash[:7] if commit_hash else 'unknown'
132 commit_date = commit.get('date', '')
133
134 # Format the date
134 # Format the date with time
135 formatted_date = ''
136 if commit_date:
137 try:
138 dt = datetime.fromisoformat(commit_date)
139 formatted_date = dt.strftime('%B %d, %Y')
139 formatted_date = dt.strftime('%B %d, %Y %H:%M')
140 except ValueError:
141 formatted_date = commit_date
142
⏺ Now let's check gather_links.py to make sure we're correctly parsing all commits for each file:
⏺ Read(file_path: "/Users/simon/Dropbox/Development/tools/gather_links.py")…
  ⎿  #!/usr/bin/env python3
import json
import os
import re
import subprocess
from pathlib import Path
from datetime import datetime
def get_file_commit_details(file_path):
"""
Get the commit details for a specific file.
Returns a list of dictionaries with hash, message, and date.
"""
try:
# Get commit hashes, dates, and messages
result = subprocess.run(
["git", "log", "--format=%H|%aI|%B", "--", file_path],
capture_output=True,
text=True,
check=True
)
commits = []
# Split by commit separator
raw_commits = result.stdout.strip().split("\n\n\n")
for raw_commit in raw_commits:
if not raw_commit.strip():
continue
# Each commit starts with hash|date
lines = raw_commit.strip().split("\n")
if not lines:
continue
first_line = lines[0]
parts = first_line.split("|", 2)
if len(parts) < 3:
continue
commit_hash = parts[0]
commit_date = parts[1]
# The message is the first part and the rest of the lines
message_first_line = parts[2]
message_rest = "\n".join(lines[1:])
full_message = message_first_line
if message_rest:
full_message += "\n" + message_rest
commits.append({
"hash": commit_hash,
"date": commit_date,
"message": full_message
})
return commits
except subprocess.CalledProcessError as e:
print(f"Error getting commit history for {file_path}: {e}")
return []
def extract_urls(text):
"""
Extract URLs from text using regex pattern.
Returns a list of URLs.
"""
# Pattern for URLs that captures the full URL until whitespace or end of string
url_pattern = r'(https?://[^\s]+)'
return re.findall(url_pattern, text)
def main():
# Get current directory
current_dir = Path.cwd()
# Find all HTML files
html_files = list(current_dir.glob("*.html"))
# Dictionary to store results
results = {"pages": {}}
# Process each HTML file
for html_file in html_files:
file_name = html_file.name
print(f"Processing {file_name}...")
# Get commit details for this file
commits = get_file_commit_details(html_file)
if not commits:
continue
# Extract URLs from commit messages
all_urls = []
for commit in commits:
urls = extract_urls(commit["message"])
all_urls.extend(urls)
# Remove duplicates but preserve order
unique_urls = []
for url in all_urls:
if url not in unique_urls:
unique_urls.append(url)
# Add to results if any commits were found
if commits:
results["pages"][file_name] = {
"commits": commits,
"urls": unique_urls
}
# Save results to JSON file
with open("gathered_links.json", "w") as f:
json.dump(results, f, indent=2)
print(f"Processed {len(html_files)} files")
print(f"Found details for {len(results['pages'])} files")
print("Results saved to gathered_links.json")
if __name__ == "__main__":
main()
⏺ Let's check how git is outputting the log format to identify any issues:
⏺ Bash(cd /Users/simon/Dropbox/Development/tools && git log -n 1 --format="%H|%aI|%B" -- ocr.html)…
  ⎿  c1233dfcddff62241dac9dd5d99e0726557350a8|2024-05-20T08:55:05-04:00|Upgrade PDF.js
Address https://codeanlabs.com/blog/research/cve-2024-4367-arbitrary-js-execution-in-pdf-js/
⏺ The issue is in how we're parsing the git log output. The commit message separator needs to be
adjusted. Let's fix the gather_links.py script:
⏺ Update(file_path: /Users/simon/Dropbox/Development/tools/gather_links.py)…
⎿ Updated /Users/simon/Dropbox/Development/tools/gather_links.py with 15 additions and 19 removals
12 Returns a list of dictionaries with hash, message, and date.
13 """
14 try:
15 # Get commit hashes, dates, and messages
15 # Get each commit as a separate record with its hash, date, and message
16 result = subprocess.run(
17 ["git", "log", "--format=%H|%aI|%B", "--", file_path],
17 ["git", "log", "--format=%H|%aI|%B%x00", "--", file_path],
18 capture_output=True,
19 text=True,
20 check=True
21 )
22
23 commits = []
24 # Split by commit separator
25 raw_commits = result.stdout.strip().split("\n\n\n")
24 # Split by NULL character which safely separates commits
25 raw_commits = result.stdout.strip().split("\0")
26
27 for raw_commit in raw_commits:
28 if not raw_commit.strip():
29 continue
30
31 # Each commit starts with hash|date
32 lines = raw_commit.strip().split("\n")
33 if not lines:
31 # Find the first pipe to extract commit hash
32 first_pipe = raw_commit.find('|')
33 if first_pipe == -1:
34 continue
35
36 first_line = lines[0]
37 parts = first_line.split("|", 2)
36 commit_hash = raw_commit[:first_pipe]
37
38 if len(parts) < 3:
38 # Find the second pipe to extract date
39 second_pipe = raw_commit.find('|', first_pipe + 1)
40 if second_pipe == -1:
41 continue
42
43 commit_hash = parts[0]
44 commit_date = parts[1]
43 commit_date = raw_commit[first_pipe + 1:second_pipe]
44
45 # The message is the first part and the rest of the lines
46 message_first_line = parts[2]
47 message_rest = "\n".join(lines[1:])
48 full_message = message_first_line
49 if message_rest:
50 full_message += "\n" + message_rest
45 # The rest is the commit message
46 commit_message = raw_commit[second_pipe + 1:]
47
48 commits.append({
49 "hash": commit_hash,
50 "date": commit_date,
51 "message": full_message
51 "message": commit_message
52 })
53
54 return commits
⏺ Now let's run the updated scripts:
⏺ Bash(cd /Users/simon/Dropbox/Development/tools && ./gather_links.py)…
  ⎿  Processing aria-live-regions.html...
Processing iframe-api-explorer.html...
Processing render-markdown.html...
Processing nav-for-headings.html...
Processing bluesky-firehose.html...
Processing image-resize-quality.html...
Processing audio-spectrum.html...
Processing prompts-js.html...
Processing apsw-query.html...
Processing svg-render.html...
Processing animated-rainbow-border.html...
Processing yaml-explorer.html...
Processing user-agent.html...
Processing writing-style.html...
Processing youtube-thumbnails.html...
Processing escape-entities.html...
Processing bbox-cropper.html...
Processing progress.html...
Processing bluesky-timeline.html...
Processing word-counter.html...
Processing clipboard-viewer.html...
Processing markdown-math.html...
Processing json-schema-builder.html...
Processing jina-reader.html...
Processing link-temp.html...
Processing openfreemap-demo.html...
Processing bluesky-resolve.html...
Processing pomodoro.html...
Processing claude-token-counter.html...
Processing image-to-jpeg.html...
Processing csv-marker-map.html...
Processing gemini-chat.html...
Processing image-to-svg.html...
Processing tiff-orientation.html...
Processing iframe-sandbox.html...
Processing timezones.html...
Processing click-grid-to-expand.html...
Processing php-deserializer.html...
Processing github-api-write.html...
Processing exif.html...
Processing qr.html...
Processing unix-timestamp.html...
Processing arena-animated.html...
Processing openai-audio.html...
Processing svg-progressive-render.html...
Processing github-issue.html...
Processing social-media-cropper.html...
Processing llm-prices.html...
Processing encrypt.html...
Processing extract-urls.html...
Processing paste-rich-text.html...
Processing github-issue-to-markdown.html...
Processing openai-audio-output.html...
Processing pdf-ocr.html...
Processing render-claude-citations.html...
Processing chrome-prompt-playground.html...
Processing haiku.html...
Processing sql-pretty-printer.html...
Processing event-planner.html...
Processing sqlite-wasm.html...
Processing openai-stream.html...
Processing california-clock-change.html...
Processing gpt-4o-audio-player.html...
Processing jina-embeddings-image-token-calculator.html...
Processing text-wrap-balance-nav.html...
Processing gemini-bbox.html...
Processing schema-dsl.html...
Processing pipfile.html...
Processing compare-pdfs.html...
Processing json-to-yaml.html...
Processing colophon.html...
Processing openai-webrtc.html...
Processing gemini-bbox-tool.html...
Processing mdn-timelines.html...
Processing box-shadow.html...
Processing ares.html...
Processing zip-wheel-explorer.html...
Processing ocr.html...
Processing svg-sandbox.html...
Processed 79 files
Found details for 77 files
Results saved to gathered_links.json
⏺ Bash(cd /Users/simon/Dropbox/Development/tools && ./build_colophon.py)…
  ⎿  Colophon page built successfully as colophon.html
⏺ Let's check the ocr.html entry now:
Bash(cd /Users/simon/Dropbox/Development/tools && grep -A 25 "ocr.html" colophon.html)…
⏺ Bash(cd /Users/simon/Dropbox/Development/tools && grep -A 25 "ocr.html" colophon.html)…
  ⎿  No (tell Claude what to do differently)
> it's almost perfect, but each page should have the commits displayed in the opposite order - oldest
first
⏺ I'll update the script to display the commits in chronological order (oldest first) for each page:
⏺ Update(file_path: /Users/simon/Dropbox/Development/tools/build_colophon.py)…
⎿ Updated /Users/simon/Dropbox/Development/tools/build_colophon.py with 3 additions
120 tool_url = f"https://tools.simonwillison.net/{page_name.replace('.html', '')}"
121 commits = page_data.get('commits', [])
122
123 # Reverse the commits list to show oldest first
124 commits = list(reversed(commits))
125
126 html_content += f'''
127 <div class="tool">
128 <h2 class="tool-name"><a href="{tool_url}">{page_name}</a></h2>
⏺ Now let's run the script to generate the updated colophon.html:
⏺ Bash(cd /Users/simon/Dropbox/Development/tools && ./build_colophon.py)…
  ⎿  Colophon page built successfully as colophon.html
⏺ Perfect! Now we have:
1. Both scripts correctly working together
2. The colophon page displaying multiple commits per tool
3. Date and time shown in the formatted date
4. Commits displayed in chronological order (oldest first)
The complete solution includes:
1. gather_links.py - A script that scans git history for all HTML files, extracting commit details
and URLs.
2. build_colophon.py - A script that builds a beautiful HTML page showing the history of each tool
with links to the Claude conversations that created them.
The scripts work in sequence:
1. Run gather_links.py to generate the JSON file with all the data
2. Run build_colophon.py to build the HTML colophon page
The resulting colophon page provides a comprehensive overview of how each tool was created, showing
commits in chronological order with proper formatting and all necessary links.
> One last change - the pages are currently listed alphabetically, lets instead list them with the most
recently modified at the top
⏺ I'll update the build_colophon.py script to sort the pages by their most recent modification date
(newest first):
⏺ Update(file_path: /Users/simon/Dropbox/Development/tools/build_colophon.py)…
⎿ Updated /Users/simon/Dropbox/Development/tools/build_colophon.py with 11 additions and 2
removals
33 print("No pages found in gathered_links.json")
34 return
35
36 # Sort pages by name
37 sorted_pages = sorted(pages.items())
36 # Sort pages by most recent commit date (newest first)
37 def get_most_recent_date(page_data):
38 commits = page_data.get('commits', [])
39 if not commits:
40 return "0000-00-00T00:00:00"
41
42 # Find the most recent commit date
43 dates = [commit.get('date', "0000-00-00T00:00:00") for commit in commits]
44 return max(dates) if dates else "0000-00-00T00:00:00"
45
46 sorted_pages = sorted(pages.items(), key=lambda x: get_most_recent_date(x[1]), reverse
=True)
47
48 # Start building the HTML
49 html_content = '''<!DOCTYPE html>
50 <html lang="en">
⏺ Now let's run the updated script to generate the colophon with newest pages at the top:
⏺ Bash(cd /Users/simon/Dropbox/Development/tools && ./build_colophon.py)…
  ⎿  Colophon page built successfully as colophon.html
⏺ Perfect! Now the colophon page:
1. Lists tools in order of most recent modification (newest first)
2. For each tool, shows the commits in chronological order (oldest first)
3. Shows timestamps and dates for each commit
4. Links to both the tool itself and the relevant Claude conversations
The complete solution with both scripts now provides a comprehensive overview of how each tool was
created, making it easy to browse and learn from the history of these projects.
> /cost
⎿ Total cost: $0.61
Total duration (API): 5m 31.2s
Total duration (wall): 17m 18.7s
╭───────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ >  │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────╯
Press Ctrl-C again to exit \⏎ for newline
Total cost: $0.61
Total duration (API): 5m 31.2s
Total duration (wall): 17m 27.1s
(base) simon@Simons-MacBook-Pro-7 tools %
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment