Skip to content

Instantly share code, notes, and snippets.

@thomasantony
Last active January 11, 2024 13:21
Show Gist options
  • Save thomasantony/c2d866d1cb3fec3c532b13ce695c9438 to your computer and use it in GitHub Desktop.
Save thomasantony/c2d866d1cb3fec3c532b13ce695c9438 to your computer and use it in GitHub Desktop.
Convert saved HTML transcripts from ChatGPT to Markdown
# Save the transcripts using the "Save Page WE" Chrome Extension
# This script was generated by ChatGPT
import sys
from bs4 import BeautifulSoup
# Check if a file was provided as a command line argument
if len(sys.argv) < 2:
print("Please provide an HTML file as a command line argument.")
sys.exit(1)
# Read the HTML file
html_file = sys.argv[1]
with open(html_file, 'r') as f:
html = f.read()
# Parse the HTML using BeautifulSoup
soup = BeautifulSoup(html, 'html.parser')
# Find all the elements with the 'ConversationItem__ConversationItemWrapper-' class
conversation_elements = soup.find_all(class_=lambda c: c and c.startswith('ConversationItem__ConversationItemWrapper-'))
# Output the conversation as a Markdown quote
for i, element in enumerate(conversation_elements):
text = element.get_text()
lines = text.split('\n')
if i % 2 == 0:
speaker = "User"
else:
speaker = "Assistant"
first_line = True
for line in lines:
if first_line:
print(speaker)
first_line = False
print(f"> {line}")
print()
@turnerll
Copy link

turnerll commented Jun 13, 2023

Parse the HTML using BeautifulSoup
https://stackabuse.com/guide-to-parsing-html-with-beautifulsoup-in-python/

To parse HTML using BeautifulSoup, you can use the BeautifulSoup() function. The syntax is:

from bs4 import BeautifulSoup

soup = BeautifulSoup(html_string, 'html.parser')

where:

html_string is the HTML string to be parsed
'html.parser' is the parser to use
Once you have created a BeautifulSoup object, you can use it to access the different elements of the HTML document. For example, to get the title of the document, you can use:

title = soup.title.string

To get all of the links in the document, you can use:

links = soup.find_all('a')

For more information, please see the BeautifulSoup documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment