Skip to content

Instantly share code, notes, and snippets.

@thomasantony
Last active January 11, 2024 13:21
Show Gist options
  • Star 10 You must be signed in to star a gist
  • Fork 7 You must be signed in to fork a gist
  • Save thomasantony/c2d866d1cb3fec3c532b13ce695c9438 to your computer and use it in GitHub Desktop.
Save thomasantony/c2d866d1cb3fec3c532b13ce695c9438 to your computer and use it in GitHub Desktop.
Convert saved HTML transcripts from ChatGPT to Markdown
# Save the transcripts using the "Save Page WE" Chrome Extension
# This script was generated by ChatGPT
import sys
from bs4 import BeautifulSoup
# Check if a file was provided as a command line argument
if len(sys.argv) < 2:
print("Please provide an HTML file as a command line argument.")
sys.exit(1)
# Read the HTML file
html_file = sys.argv[1]
with open(html_file, 'r') as f:
html = f.read()
# Parse the HTML using BeautifulSoup
soup = BeautifulSoup(html, 'html.parser')
# Find all the elements with the 'ConversationItem__ConversationItemWrapper-' class
conversation_elements = soup.find_all(class_=lambda c: c and c.startswith('ConversationItem__ConversationItemWrapper-'))
# Output the conversation as a Markdown quote
for i, element in enumerate(conversation_elements):
text = element.get_text()
lines = text.split('\n')
if i % 2 == 0:
speaker = "User"
else:
speaker = "Assistant"
first_line = True
for line in lines:
if first_line:
print(speaker)
first_line = False
print(f"> {line}")
print()
@Yanis02015
Copy link

I just said "give me the Markdown code not the result" and it gave me what I wanted

@thomasantony
Copy link
Author

This was useful back when OpenAI didn't let us save the chat history. So I used to save the page as HTML and then run it through this script to convert it. Now it is a lot easier and there are many chrome extensions that do a better job.

@blakeNaccarato
Copy link

blakeNaccarato commented Feb 1, 2023

Edit

Nevermind my original message below, the extension "Chat GPT Prompt Genius" does a good job of this by adding a "Share & Export" button to the sidebar of conversations in ChatGPT. It's available in Chrome and Firefox.

Original message


Here's a hacky attempt to update this for the latest HTML output by running "Save Page WE" on ChatGPT conversations. The `'ConversationItem__ConversationItemWrapper-'` class no longer shows up in the HTML output, so this snippet no longer works.

```Python
# Find all the elements with the 'ConversationItem__ConversationItemWrapper-' class
conversation_elements = soup.find_all(class_=lambda c: c and c.startswith('ConversationItem__ConversationItemWrapper-'))

I found the common element to be the following, which extracts conversations, but further processing would be necessary to Markdown-ify them, like code blocks and such.

# Find all the elements corresponding to a message in the conversation
conversation_elements = soup.find_all(
    class_=lambda c: c
    and c.startswith("min-h-[20px] flex flex-col items-start gap-4 whitespace-pre-wrap")
)
Here's the updated implementation that worked at the time of this comment. ```Python # Source: https://gist.github.com/thomasantony/c2d866d1cb3fec3c532b13ce695c9438

Save the transcripts using the "Save Page WE" Chrome Extension

This script was generated by ChatGPT

import sys
from bs4 import BeautifulSoup

Check if a file was provided as a command line argument

if len(sys.argv) < 2:
print("Please provide an HTML file as a command line argument.")
sys.exit(1)

Read the HTML file

html_file = sys.argv[1]
with open(html_file, "r") as f:
html = f.read()

Parse the HTML using BeautifulSoup

soup = BeautifulSoup(html, "html.parser")

Find all the elements corresponding to a message in the conversation

conversation_elements = soup.find_all(
class_=lambda c: c
and c.startswith("min-h-[20px] flex flex-col items-start gap-4 whitespace-pre-wrap")
)

Output the conversation as a Markdown quote

for i, element in enumerate(conversation_elements):
text = element.get_text()
lines = text.split("\n")
speaker = "User" if i % 2 == 0 else "Assistant"
first_line = True
for line in lines:
if first_line:
print(speaker)
first_line = False
print(f"> {line}")
print()

</details>

@turnerll
Copy link

turnerll commented Jun 13, 2023

Parse the HTML using BeautifulSoup
https://stackabuse.com/guide-to-parsing-html-with-beautifulsoup-in-python/

To parse HTML using BeautifulSoup, you can use the BeautifulSoup() function. The syntax is:

from bs4 import BeautifulSoup

soup = BeautifulSoup(html_string, 'html.parser')

where:

html_string is the HTML string to be parsed
'html.parser' is the parser to use
Once you have created a BeautifulSoup object, you can use it to access the different elements of the HTML document. For example, to get the title of the document, you can use:

title = soup.title.string

To get all of the links in the document, you can use:

links = soup.find_all('a')

For more information, please see the BeautifulSoup documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment