Skip to content

Instantly share code, notes, and snippets.

@cs224
Created May 8, 2023 13:16
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save cs224/92b6f7dd9f0a1d131b47dc6a82a818a4 to your computer and use it in GitHub Desktop.
Save cs224/92b6f7dd9f0a1d131b47dc6a82a818a4 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Author: cs224\n",
"\n",
"Last updated: 2023-05-08\n",
"\n",
"Python implementation: CPython\n",
"Python version : 3.8.12\n",
"IPython version : 8.2.0\n",
"\n",
"openai : 0.27.6\n",
"redlines : 0.2.2\n",
"markdown_it: 2.2.0\n",
"mdformat : 0.7.16\n",
"\n"
]
}
],
"source": [
"%load_ext watermark\n",
"%watermark -a 'cs224' -u -d -v -p openai,redlines,markdown_it,mdformat"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* https://daringfireball.net/projects/markdown/syntax\n",
"* https://github.com/executablebooks/markdown-it-py\n",
"* https://github.com/executablebooks/mdformat\n",
"* https://github.com/houfu/redlines"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<style>.container { width:70% !important; }</style>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"from IPython.display import display, HTML, Markdown\n",
"\n",
"from IPython.display import display_html\n",
"def display_side_by_side(*args):\n",
" html_str=''\n",
" for df in args:\n",
" if type(df) == np.ndarray:\n",
" df = pd.DataFrame(df)\n",
" html_str+=df.to_html()\n",
" html_str = html_str.replace('table','table style=\"display:inline\"')\n",
" # print(html_str)\n",
" display_html(html_str,raw=True)\n",
"\n",
"CSS = \"\"\"\n",
".output {\n",
" flex-direction: row;\n",
"}\n",
"\"\"\"\n",
"\n",
"def display_graphs_side_by_side(*args):\n",
" html_str='<table><tr>'\n",
" for g in args:\n",
" html_str += '<td>'\n",
" html_str += g._repr_svg_()\n",
" html_str += '</td>'\n",
" html_str += '</tr></table>'\n",
" display_html(html_str,raw=True)\n",
" \n",
"\n",
"display(HTML(\"<style>.container { width:70% !important; }</style>\"))"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"from redlines import Redlines"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"import openai\n",
"import os\n",
"\n",
"from dotenv import load_dotenv, find_dotenv\n",
"_ = load_dotenv(find_dotenv()) # read local .env file\n",
"\n",
"openai.api_key = os.getenv('OPENAI_API_KEY')"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"def get_completion(prompt, model=\"gpt-3.5-turbo\", temperature=0): \n",
" messages = [{\"role\": \"user\", \"content\": prompt}]\n",
" response = openai.ChatCompletion.create(\n",
" model=model,\n",
" messages=messages,\n",
" temperature=temperature, \n",
" )\n",
" return response.choices[0].message[\"content\"]"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"blog_post = \"\"\"\n",
"## Rational\n",
"\n",
"Recently, I participated in the free [ChatGPT Prompt Engineering for\n",
"Developers](https://www.deeplearning.ai/short-courses/chatgpt-prompt-engineering-for-developers) course by Andrew Ng and Isa Fulford. This triggered\n",
"an older urge to use an AI to help me fine tune text that I writte. While simple grammar and spelling mistakes can be spoted by MS Word and similar I\n",
"wanted the AI to also make my text more compelling and enhance its appeal. The AI should help me to ensure a smooth and engaging reading\n",
"experience. Below you can read my first steps towards that goal.\n",
"\n",
"## Get Started\n",
"\n",
"Before you start you will need a paid account on [platform.openai.com](https://platform.openai.com/), because only then you'll get access to the API\n",
"version of ChatGPT. Some users already got beta access to `GPT-4`, but in general you will only have access to `gpt-3.5-turbo` with the API. For our\n",
"purposes here that is good enough. And don't worry about the costs. This is only a couple of cents even for extended uses of the API. I never managed\n",
"to get over $1 so far. Actually all my efforts to get the code for this blog post working only cost USD 0.02.\n",
"\n",
"Once you have acces to [platform.openai.com](https://platform.openai.com/) you will need to create an API key and put it in a `.env` file like so:\n",
"\n",
"```\n",
"OPENAI_API_KEY=sk-...\n",
"```\n",
"\n",
"## The Code\n",
"\n",
"### Jupyter Notebook / Python Init\n",
"\n",
"I am using a [Juypter](https://jupyter.org/) notebook and python to access the API. The below explains step by step what I was doing.\n",
"\n",
"You start by setting up the API and defining a helper function:\n",
"\n",
"```python\n",
"import openai\n",
"import os\n",
"\n",
"from dotenv import load_dotenv, find_dotenv\n",
"_ = load_dotenv(find_dotenv()) # read local .env file\n",
"\n",
"openai.api_key = os.getenv('OPENAI_API_KEY')\n",
"\n",
"def get_completion(prompt, model=\"gpt-3.5-turbo\", temperature=0):\n",
" messages = [{\"role\": \"user\", \"content\": prompt}]\n",
" response = openai.ChatCompletion.create(\n",
" model=model,\n",
" messages=messages,\n",
" temperature=temperature,\n",
" )\n",
" return response.choices[0].message[\"content\"]\n",
"```\n",
"\n",
"\n",
"After that I define a little markdown text block as a playground. **Replace the single backtick, single quote, single backtick sequence with a triple\n",
"backtick sequence**:\n",
"\n",
"\n",
"```python\n",
"mark_down = '''\n",
"# Header 1\n",
"Some text under header 1.\n",
"\n",
"## Header 2\n",
"More text under header 2.\n",
"\n",
"`' `python\n",
"import pigpio\n",
"\n",
"handle = pi.i2c_open(1, 0x58)\n",
"\n",
"def horter_byte_sequence(channel, voltage):\n",
" voltage = int(voltage * 100.0)\n",
"\n",
" output_buffer = bytearray(3)\n",
"\n",
" high_byte = voltage >> 8\n",
" low_byte = voltage & 0xFF;\n",
" output_buffer[0] = (channel & 0xFF)\n",
" output_buffer[1] = low_byte\n",
" output_buffer[2] = high_byte\n",
"\n",
" return output_buffer\n",
"\n",
"v = horter_byte_sequence(0, 5.0)\n",
"pi.i2c_write_device(handle, v)\n",
"`' `\n",
"\n",
"### Header 3\n",
"Even more text under header 3.\n",
"\n",
"## Another Header 2\n",
"Text under another header 2.\n",
"'''\n",
"```\n",
"\n",
"### Split Markdown at Headings\n",
"\n",
"It helps to have a [syntax](https://daringfireball.net/projects/markdown/syntax) reference for markdown close. As you can read on\n",
"[learn.microsoft.com](https://learn.microsoft.com/en-us/azure/cognitive-services/openai/how-to/chatgpt?pivots=programming-language-chat-completions):\n",
"\n",
"> The token limit for gpt-35-turbo is 4096 tokens. These limits include the token count from both the message array sent and the model response. The\n",
"> number of tokens[^numtokens] in the messages array combined with the value of the max_tokens parameter must stay under these limits or you'll receive an error.\n",
"\n",
"This means that you can't send your whole blog post in one go to the chatgpt API, but you have to process it in pieces. In addition the approach to\n",
"process the blog post in pieces will also make it easier for you later to integrate the suggestions of the AI into your blog post.\n",
"\n",
"I did some searching and it was not as easy as I hoped for to find a python library that allowed me to easily split the input blog post at\n",
"headings. Finally I ended up using [markdown-it-py](https://github.com/executablebooks/markdown-it-py). But `markdown_it` is meant to be used to\n",
"translate markdown to HTML and out of the box does not work as markdown to markdown converter. After some digging I found at the bottom of its\n",
"[using](https://markdown-it-py.readthedocs.io/en/latest/using.html) documentation page that you can use\n",
"[mdformat](https://github.com/executablebooks/mdformat) in combination with `markdown_it`.\n",
"\n",
"In addition I remove any `fence` token that anyway does not belong to the standard text flow of the blog post. This results in the following helper\n",
"function:\n",
"\n",
"```python\n",
"import markdown_it\n",
"import mdformat.renderer\n",
"\n",
"def extract_md_sections(md_input_txt):\n",
" md = markdown_it.MarkdownIt()\n",
" options = {}\n",
" env = {}\n",
" md_renderer = mdformat.renderer.MDRenderer()\n",
"\n",
" tokens = md.parse(md_input_txt)\n",
" md_input_txt = md_renderer.render(tokens, options, env)\n",
" tokens = md.parse(md_input_txt)\n",
"\n",
" sections = []\n",
" current_section = []\n",
"\n",
" for token in tokens:\n",
" if token.type == 'heading_open':\n",
" if current_section:\n",
" sections.append(current_section)\n",
" current_section = [token]\n",
" elif token.type == 'fence':\n",
" continue\n",
" elif current_section is not None:\n",
" current_section.append(token)\n",
"\n",
" if current_section:\n",
" sections.append(current_section)\n",
"\n",
" sections = [md_renderer.render(section, options, env) for section in sections]\n",
" return sections\n",
"```\n",
"\n",
"If you wander about the double call to `md.parse()`: I am not sure if this is strictly necessary, but I noticed that some `fence` tokens might be\n",
"missed otherwise.\n",
"\n",
"Now you can try the splitting function on our dummy mardkown text block:\n",
"\n",
"\n",
"```python\n",
"sections = extract_md_sections(mark_down)\n",
"for section in sections:\n",
" print(section)\n",
" print('---')\n",
"```\n",
"\n",
"\n",
"And should see the following result:\n",
"```\n",
"# Header 1\n",
"\n",
"Some text under header 1.\n",
"\n",
"---\n",
"## Header 2\n",
"\n",
"More text under header 2.\n",
"\n",
"---\n",
"### Header 3\n",
"\n",
"Even more text under header 3.\n",
"\n",
"---\n",
"## Another Header 2\n",
"\n",
"Text under another header 2.\n",
"\n",
"---\n",
"```\n",
"\"\"\""
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"mark_down = '''\n",
"# Header 1\n",
"Some text under header 1.\n",
"\n",
"## Header 2\n",
"More text under header 2.\n",
"\n",
"```python\n",
"import pigpio\n",
"\n",
"handle = pi.i2c_open(1, 0x58)\n",
"\n",
"def horter_byte_sequence(channel, voltage):\n",
" voltage = int(voltage * 100.0)\n",
"\n",
" output_buffer = bytearray(3)\n",
"\n",
" high_byte = voltage >> 8\n",
" low_byte = voltage & 0xFF;\n",
" output_buffer[0] = (channel & 0xFF)\n",
" output_buffer[1] = low_byte\n",
" output_buffer[2] = high_byte\n",
"\n",
" return output_buffer\n",
"\n",
"v = horter_byte_sequence(0, 5.0)\n",
"pi.i2c_write_device(handle, v)\n",
"```\n",
"\n",
"### Header 3\n",
"Even more text under header 3.\n",
"\n",
"## Another Header 2\n",
"Text under another header 2.\n",
"'''\n"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"import markdown_it\n",
"import mdformat.renderer"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"def extract_md_sections(md_input_txt):\n",
" md = markdown_it.MarkdownIt()\n",
" options = {}\n",
" env = {}\n",
" md_renderer = mdformat.renderer.MDRenderer()\n",
" \n",
" tokens = md.parse(md_input_txt)\n",
" md_input_txt = md_renderer.render(tokens, options, env)\n",
" tokens = md.parse(md_input_txt)\n",
" \n",
" sections = []\n",
" current_section = []\n",
"\n",
" for token in tokens:\n",
" if token.type == 'heading_open':\n",
" if current_section:\n",
" sections.append(current_section)\n",
" current_section = [token]\n",
" elif token.type == 'fence':\n",
" continue\n",
" elif current_section is not None:\n",
" current_section.append(token)\n",
"\n",
" if current_section:\n",
" sections.append(current_section)\n",
"\n",
" sections = [md_renderer.render(section, options, env) for section in sections]\n",
" return sections"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"# Header 1\n",
"\n",
"Some text under header 1.\n",
"\n",
"---\n",
"## Header 2\n",
"\n",
"More text under header 2.\n",
"\n",
"---\n",
"### Header 3\n",
"\n",
"Even more text under header 3.\n",
"\n",
"---\n",
"## Another Header 2\n",
"\n",
"Text under another header 2.\n",
"\n",
"---\n"
]
}
],
"source": [
"sections = extract_md_sections(mark_down)\n",
"for section in sections:\n",
" print(section)\n",
" print('---') "
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"scrolled": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"## Rational\n",
"\n",
"Recently, I participated in the free [ChatGPT Prompt Engineering for\n",
"Developers](https://www.deeplearning.ai/short-courses/chatgpt-prompt-engineering-for-developers) course by Andrew Ng and Isa Fulford. This triggered\n",
"an older urge to use an AI to help me fine tune text that I writte. While simple grammar and spelling mistakes can be spoted by MS Word and similar I\n",
"wanted the AI to also make my text more compelling and enhance its appeal. The AI should help me to ensure a smooth and engaging reading\n",
"experience. Below you can read my first steps towards that goal.\n",
"\n",
"-----------------------------------------------------------------------------------------------------------\n",
"## Get Started\n",
"\n",
"Before you start you will need a paid account on [platform.openai.com](https://platform.openai.com/), because only then you'll get access to the API\n",
"version of ChatGPT. Some users already got beta access to `GPT-4`, but in general you will only have access to `gpt-3.5-turbo` with the API. For our\n",
"purposes here that is good enough. And don't worry about the costs. This is only a couple of cents even for extended uses of the API. I never managed\n",
"to get over $1 so far. Actually all my efforts to get the code for this blog post working only cost USD 0.02.\n",
"\n",
"Once you have acces to [platform.openai.com](https://platform.openai.com/) you will need to create an API key and put it in a `.env` file like so:\n",
"\n",
"-----------------------------------------------------------------------------------------------------------\n",
"## The Code\n",
"\n",
"-----------------------------------------------------------------------------------------------------------\n",
"### Jupyter Notebook / Python Init\n",
"\n",
"I am using a [Juypter](https://jupyter.org/) notebook and python to access the API. The below explains step by step what I was doing.\n",
"\n",
"You start by setting up the API and defining a helper function:\n",
"\n",
"After that I define a little markdown text block as a playground. **Replace the single backtick, single quote, single backtick sequence with a triple\n",
"backtick sequence**:\n",
"\n",
"-----------------------------------------------------------------------------------------------------------\n",
"### Split Markdown at Headings\n",
"\n",
"It helps to have a [syntax](https://daringfireball.net/projects/markdown/syntax) reference for markdown close. As you can read on\n",
"[learn.microsoft.com](https://learn.microsoft.com/en-us/azure/cognitive-services/openai/how-to/chatgpt?pivots=programming-language-chat-completions):\n",
"\n",
"> The token limit for gpt-35-turbo is 4096 tokens. These limits include the token count from both the message array sent and the model response. The\n",
"> number of tokens\\[^numtokens\\] in the messages array combined with the value of the max_tokens parameter must stay under these limits or you'll receive an error.\n",
"\n",
"This means that you can't send your whole blog post in one go to the chatgpt API, but you have to process it in pieces. In addition the approach to\n",
"process the blog post in pieces will also make it easier for you later to integrate the suggestions of the AI into your blog post.\n",
"\n",
"I did some searching and it was not as easy as I hoped for to find a python library that allowed me to easily split the input blog post at\n",
"headings. Finally I ended up using [markdown-it-py](https://github.com/executablebooks/markdown-it-py). But `markdown_it` is meant to be used to\n",
"translate markdown to HTML and out of the box does not work as markdown to markdown converter. After some digging I found at the bottom of its\n",
"[using](https://markdown-it-py.readthedocs.io/en/latest/using.html) documentation page that you can use\n",
"[mdformat](https://github.com/executablebooks/mdformat) in combination with `markdown_it`.\n",
"\n",
"In addition I remove any `fence` token that anyway does not belong to the standard text flow of the blog post. This results in the following helper\n",
"function:\n",
"\n",
"If you wander about the double call to `md.parse()`: I am not sure if this is strictly necessary, but I noticed that some `fence` tokens might be\n",
"missed otherwise.\n",
"\n",
"Now you can try the splitting function on our dummy mardkown text block:\n",
"\n",
"And should see the following result:\n",
"\n",
"-----------------------------------------------------------------------------------------------------------\n"
]
}
],
"source": [
"sections = extract_md_sections(blog_post)\n",
"for section in sections:\n",
" print(section)\n",
" print('-----------------------------------------------------------------------------------------------------------') "
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"## Rational\n",
"\n",
"Recently, I participated in the free [ChatGPT Prompt Engineering for\n",
"Developers](https://www.deeplearning.ai/short-courses/chatgpt-prompt-engineering-for-developers) course by Andrew Ng and Isa Fulford. This triggered\n",
"an older urge to use an AI to help me fine tune text that I writte. While simple grammar and spelling mistakes can be spoted by MS Word and similar I\n",
"wanted the AI to also make my text more compelling and enhance its appeal. The AI should help me to ensure a smooth and engaging reading\n",
"experience. Below you can read my first steps towards that goal.\n",
"\n"
]
}
],
"source": [
"i = 0\n",
"print(sections[i])"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"## Rational\n",
"\n",
"Recently, I participated in the free ChatGPT Prompt Engineering for Developers course by Andrew Ng and Isa Fulford. This triggered an older urge to use an AI to help me fine tune text that I writte. While simple grammar and spelling mistakes can be spoted by MS Word and similar I wanted the AI to also make my text more compelling and enhance its appeal. The AI should help me to ensure a smooth and engaging reading experience. Below you can read my first steps towards that goal.\n"
]
}
],
"source": [
"# The result of this first step should not contain any markup and ignore embedded images no matter if the images are embeddedd via Markdown or HTML tags.\n",
"prompt = f\"\"\"\n",
"Below a text delimited by `'` is provided to you. The text is a snipet from a blog post written in a mix of Markdown and HTML markup.\n",
"\n",
"As a first step extract the pure text. In this first step keep the markup for ordered or unordered list but pay close attention to remove all other markup and especially ignore embedded images no matter if the images are embeddedd via Markdown or HTML tags.\n",
"\n",
"As a second step use the output of the first step and ensure that newlines are only used to separate sections and at the end of enumration items of an ordered or unordered list. \n",
"\n",
"Provide as your response the output of the second step.\n",
"\n",
"`'`{sections[i]}`'`\n",
"\"\"\"\n",
"\n",
"response = get_completion(prompt)\n",
"print(response)"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"## Rational\n",
"\n",
"Recently, I participated in the free ChatGPT Prompt Engineering for Developers course by Andrew Ng and Isa Fulford. This triggered an older urge to use an AI to help me fine-tune text that I write. While simple grammar and spelling mistakes can be spotted by MS Word and similar, I wanted the AI to also make my text more compelling and enhance its appeal. The AI should help me ensure a smooth and engaging reading experience. Below, you can read my first steps towards that goal.\n"
]
}
],
"source": [
"prompt = f\"\"\"Proofread and correct the following section of a blog post. Stay as close as possible to the original and only make modifications to correct grammar or spelling mistakes. Text: ```{response}```\"\"\"\n",
"response2 = get_completion(prompt)\n",
"# display(Markdown(response))\n",
"print(response2)"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/markdown": [
"## Rational\n",
"\n",
"Recently, I participated in the free ChatGPT Prompt Engineering for Developers course by Andrew Ng and Isa Fulford. This triggered an older urge to use an AI to help me <span style=\"color:red;font-weight:700;text-decoration:line-through;\">fine tune </span><span style=\"color:red;font-weight:700;\">fine-tune </span>text that I <span style=\"color:red;font-weight:700;text-decoration:line-through;\">writte. </span><span style=\"color:red;font-weight:700;\">write. </span>While simple grammar and spelling mistakes can be <span style=\"color:red;font-weight:700;text-decoration:line-through;\">spoted </span><span style=\"color:red;font-weight:700;\">spotted </span>by MS Word and <span style=\"color:red;font-weight:700;text-decoration:line-through;\">similar </span><span style=\"color:red;font-weight:700;\">similar, </span>I wanted the AI to also make my text more compelling and enhance its appeal. The AI should help me <span style=\"color:red;font-weight:700;text-decoration:line-through;\">to </span>ensure a smooth and engaging reading experience. <span style=\"color:red;font-weight:700;text-decoration:line-through;\">Below </span><span style=\"color:red;font-weight:700;\">Below, </span>you can read my first steps towards that goal."
],
"text/plain": [
"<IPython.core.display.Markdown object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"diff = Redlines(response,response2)\n",
"display(Markdown(diff.output_markdown))"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"## Introduction\n",
"\n",
"As a technical hobbyist, I'm always on the lookout for ways to improve my writing skills. Recently, I had the opportunity to participate in the free ChatGPT Prompt Engineering for Developers course by Andrew Ng and Isa Fulford. This course reignited my interest in using AI to fine-tune my writing. While tools like MS Word can catch simple grammar and spelling mistakes, I wanted an AI that could take my writing to the next level. I wanted to create a more compelling and engaging reading experience for my audience. In this blog post, I'll share my first steps towards achieving that goal.\n"
]
}
],
"source": [
"# prompt = f\"\"\"\n",
"# Revise the following blog post excerpt, maintaining its original length and style, while enhancing its appeal for a technical hobbyist audience. \n",
"# Improve the reading experience to be smoother and more engaging by making a minimal set of modifications to the original. \n",
"# Text: ```{response2}```\n",
"# \"\"\"\n",
"# prompt = f\"\"\"\n",
"# Refine the following blog post excerpt with minimal alterations, preserving its original length and style, while enhancing its appeal for a discerning reader. Make limited yet impactful changes for a smooth and engaging reading experience. Text: ```{response2}```\n",
"# \"\"\"\n",
"\n",
"prompt = f\"\"\"\n",
"Below a text delimited by triple quotes is provided to you. The text is a snipet from a blog post.\n",
"\n",
"Walk through the blog post snipet paragraph by paragraph and make a few limited yet impactful changes for a smooth and engaging reading experience targeting a technical hobbyist audience.\n",
"\n",
"Text: ```{response2}```\n",
"\"\"\"\n",
"\n",
"response3 = get_completion(prompt)\n",
"print(response3)"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [],
"source": [
"# ------------------------------------------------------------------"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"## Get Started\n",
"\n",
"Before you start you will need a paid account on [platform.openai.com](https://platform.openai.com/), because only then you'll get access to the API\n",
"version of ChatGPT. Some users already got beta access to `GPT-4`, but in general you will only have access to `gpt-3.5-turbo` with the API. For our\n",
"purposes here that is good enough. And don't worry about the costs. This is only a couple of cents even for extended uses of the API. I never managed\n",
"to get over $1 so far. Actually all my efforts to get the code for this blog post working only cost USD 0.02.\n",
"\n",
"Once you have acces to [platform.openai.com](https://platform.openai.com/) you will need to create an API key and put it in a `.env` file like so:\n",
"\n"
]
}
],
"source": [
"i = 1\n",
"print(sections[i])"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"## Get Started\n",
"\n",
"Before you start you will need a paid account on platform.openai.com, because only then you'll get access to the API version of ChatGPT. Some users already got beta access to GPT-4, but in general you will only have access to gpt-3.5-turbo with the API. For our purposes here that is good enough. And don't worry about the costs. This is only a couple of cents even for extended uses of the API. I never managed to get over $1 so far. Actually all my efforts to get the code for this blog post working only cost USD 0.02.\n",
"\n",
"Once you have acces to platform.openai.com you will need to create an API key and put it in a `.env` file like so:\n"
]
}
],
"source": [
"# The result of this first step should not contain any markup and ignore embedded images no matter if the images are embeddedd via Markdown or HTML tags.\n",
"prompt = f\"\"\"\n",
"Below a text delimited by `'` is provided to you. The text is a snipet from a blog post written in a mix of Markdown and HTML markup.\n",
"\n",
"As a first step extract the pure text. In this first step keep the markup for ordered or unordered list but pay close attention to remove all other markup and especially ignore embedded images no matter if the images are embeddedd via Markdown or HTML tags.\n",
"\n",
"As a second step use the output of the first step and ensure that newlines are only used to separate sections and at the end of enumration items of an ordered or unordered list. \n",
"\n",
"Provide as your response the output of the second step.\n",
"\n",
"`'`{sections[i]}`'`\n",
"\"\"\"\n",
"\n",
"response = get_completion(prompt)\n",
"print(response)"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"## Get Started\n",
"\n",
"Before you start, you will need a paid account on platform.openai.com because only then will you get access to the API version of ChatGPT. Some users have already received beta access to GPT-4, but in general, you will only have access to gpt-3.5-turbo with the API. For our purposes here, that is good enough. And don't worry about the costs. This is only a couple of cents, even for extended uses of the API. I have never managed to spend over $1 so far. Actually, all my efforts to get the code for this blog post working only cost USD 0.02.\n",
"\n",
"Once you have access to platform.openai.com, you will need to create an API key and put it in a `.env` file like so:```\n"
]
}
],
"source": [
"prompt = f\"\"\"Proofread and correct the following section of a blog post. Stay as close as possible to the original and only make modifications to correct grammar or spelling mistakes. Text: ```{response}```\"\"\"\n",
"response2 = get_completion(prompt)\n",
"# display(Markdown(response))\n",
"print(response2)"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"data": {
"text/markdown": [
"## Get Started\n",
"\n",
"Before you <span style=\"color:red;font-weight:700;text-decoration:line-through;\">start </span><span style=\"color:red;font-weight:700;\">start, </span>you will need a paid account on <span style=\"color:red;font-weight:700;text-decoration:line-through;\">platform.openai.com, </span><span style=\"color:red;font-weight:700;\">platform.openai.com </span>because only then <span style=\"color:red;font-weight:700;text-decoration:line-through;\">you'll </span><span style=\"color:red;font-weight:700;\">will you </span>get access to the API version of ChatGPT. Some users <span style=\"color:red;font-weight:700;\">have </span>already <span style=\"color:red;font-weight:700;text-decoration:line-through;\">got </span><span style=\"color:red;font-weight:700;\">received </span>beta access to GPT-4, but in <span style=\"color:red;font-weight:700;text-decoration:line-through;\">general </span><span style=\"color:red;font-weight:700;\">general, </span>you will only have access to gpt-3.5-turbo with the API. For our purposes <span style=\"color:red;font-weight:700;text-decoration:line-through;\">here </span><span style=\"color:red;font-weight:700;\">here, </span>that is good enough. And don't worry about the costs. This is only a couple of <span style=\"color:red;font-weight:700;text-decoration:line-through;\">cents </span><span style=\"color:red;font-weight:700;\">cents, </span>even for extended uses of the API. I <span style=\"color:red;font-weight:700;\">have </span>never managed to <span style=\"color:red;font-weight:700;text-decoration:line-through;\">get </span><span style=\"color:red;font-weight:700;\">spend </span>over $1 so far. <span style=\"color:red;font-weight:700;text-decoration:line-through;\">Actually </span><span style=\"color:red;font-weight:700;\">Actually, </span>all my efforts to get the code for this blog post working only cost USD 0.02.\n",
"\n",
"Once you have <span style=\"color:red;font-weight:700;text-decoration:line-through;\">acces </span><span style=\"color:red;font-weight:700;\">access </span>to <span style=\"color:red;font-weight:700;text-decoration:line-through;\">platform.openai.com </span><span style=\"color:red;font-weight:700;\">platform.openai.com, </span>you will need to create an API key and put it in a `.env` file like <span style=\"color:red;font-weight:700;text-decoration:line-through;\">so:</span><span style=\"color:red;font-weight:700;\">so:```</span>"
],
"text/plain": [
"<IPython.core.display.Markdown object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"diff = Redlines(response,response2)\n",
"display(Markdown(diff.output_markdown))"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"## Getting Started with ChatGPT API\n",
"\n",
"To get started with ChatGPT API, you will need a paid account on platform.openai.com. This will give you access to the API version of ChatGPT. Although some users have already received beta access to GPT-4, for our purposes here, gpt-3.5-turbo with the API is good enough. Don't worry about the costs, as it only costs a couple of cents, even for extended uses of the API. In fact, I have never spent over $1 so far. To get the code for this blog post working, I only spent USD 0.02.\n",
"\n",
"To create an API key, you need to log in to platform.openai.com and follow the instructions. Once you have created the API key, put it in a `.env` file as shown below:`````\n"
]
}
],
"source": [
"# prompt = f\"\"\"\n",
"# Revise the following blog post excerpt, maintaining its original length and style, while enhancing its appeal for a technical hobbyist audience. \n",
"# Improve the reading experience to be smoother and more engaging by making a minimal set of modifications to the original. \n",
"# Text: ```{response2}```\n",
"# \"\"\"\n",
"# prompt = f\"\"\"\n",
"# Refine the following blog post excerpt with minimal alterations, preserving its original length and style, while enhancing its appeal for a discerning reader. Make limited yet impactful changes for a smooth and engaging reading experience. Text: ```{response2}```\n",
"# \"\"\"\n",
"\n",
"prompt = f\"\"\"\n",
"Below a text delimited by triple quotes is provided to you. The text is a snipet from a blog post.\n",
"\n",
"Walk through the blog post snipet paragraph by paragraph and make a few limited yet impactful changes for a smooth and engaging reading experience targeting a technical hobbyist audience.\n",
"\n",
"Text: ```{response2}```\n",
"\"\"\"\n",
"\n",
"response3 = get_completion(prompt)\n",
"print(response3)"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [],
"source": [
"# ------------------------------------------------------------------"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"### Split Markdown at Headings\n",
"\n",
"It helps to have a [syntax](https://daringfireball.net/projects/markdown/syntax) reference for markdown close. As you can read on\n",
"[learn.microsoft.com](https://learn.microsoft.com/en-us/azure/cognitive-services/openai/how-to/chatgpt?pivots=programming-language-chat-completions):\n",
"\n",
"> The token limit for gpt-35-turbo is 4096 tokens. These limits include the token count from both the message array sent and the model response. The\n",
"> number of tokens\\[^numtokens\\] in the messages array combined with the value of the max_tokens parameter must stay under these limits or you'll receive an error.\n",
"\n",
"This means that you can't send your whole blog post in one go to the chatgpt API, but you have to process it in pieces. In addition the approach to\n",
"process the blog post in pieces will also make it easier for you later to integrate the suggestions of the AI into your blog post.\n",
"\n",
"I did some searching and it was not as easy as I hoped for to find a python library that allowed me to easily split the input blog post at\n",
"headings. Finally I ended up using [markdown-it-py](https://github.com/executablebooks/markdown-it-py). But `markdown_it` is meant to be used to\n",
"translate markdown to HTML and out of the box does not work as markdown to markdown converter. After some digging I found at the bottom of its\n",
"[using](https://markdown-it-py.readthedocs.io/en/latest/using.html) documentation page that you can use\n",
"[mdformat](https://github.com/executablebooks/mdformat) in combination with `markdown_it`.\n",
"\n",
"In addition I remove any `fence` token that anyway does not belong to the standard text flow of the blog post. This results in the following helper\n",
"function:\n",
"\n",
"If you wander about the double call to `md.parse()`: I am not sure if this is strictly necessary, but I noticed that some `fence` tokens might be\n",
"missed otherwise.\n",
"\n",
"Now you can try the splitting function on our dummy mardkown text block:\n",
"\n",
"And should see the following result:\n",
"\n"
]
}
],
"source": [
"i = 4\n",
"print(sections[i])"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Split Markdown at Headings\n",
"\n",
"It helps to have a syntax reference for markdown close. As you can read on The token limit for gpt-35-turbo is 4096 tokens. These limits include the token count from both the message array sent and the model response. The number of tokens in the messages array combined with the value of the max_tokens parameter must stay under these limits or you'll receive an error.\n",
"\n",
"This means that you can't send your whole blog post in one go to the chatgpt API, but you have to process it in pieces. In addition the approach to process the blog post in pieces will also make it easier for you later to integrate the suggestions of the AI into your blog post.\n",
"\n",
"I did some searching and it was not as easy as I hoped for to find a python library that allowed me to easily split the input blog post at headings. Finally I ended up using markdown-it-py. But markdown_it is meant to be used to translate markdown to HTML and out of the box does not work as markdown to markdown converter. After some digging I found at the bottom of its using documentation page that you can use mdformat in combination with markdown_it.\n",
"\n",
"In addition I remove any fence token that anyway does not belong to the standard text flow of the blog post. This results in the following helper function:\n",
"\n",
"If you wander about the double call to md.parse(): I am not sure if this is strictly necessary, but I noticed that some fence tokens might be missed otherwise.\n",
"\n",
"Now you can try the splitting function on our dummy mardkown text block:\n",
"\n",
"And should see the following result:\n"
]
}
],
"source": [
"# The result of this first step should not contain any markup and ignore embedded images no matter if the images are embeddedd via Markdown or HTML tags.\n",
"prompt = f\"\"\"\n",
"Below a text delimited by `'` is provided to you. The text is a snipet from a blog post written in a mix of Markdown and HTML markup.\n",
"\n",
"As a first step extract the pure text. In this first step keep the markup for ordered or unordered list but pay close attention to remove all other markup and especially ignore embedded images no matter if the images are embeddedd via Markdown or HTML tags.\n",
"\n",
"As a second step use the output of the first step and ensure that newlines are only used to separate sections and at the end of enumration items of an ordered or unordered list. \n",
"\n",
"Provide as your response the output of the second step.\n",
"\n",
"`'`{sections[i]}`'`\n",
"\"\"\"\n",
"\n",
"response = get_completion(prompt)\n",
"print(response)"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Split Markdown at Headings\n",
"\n",
"It helps to have a syntax reference for Markdown close. As you can read on, the token limit for GPT-35-Turbo is 4096 tokens. These limits include the token count from both the message array sent and the model response. The number of tokens in the messages array combined with the value of the max_tokens parameter must stay under these limits, or you'll receive an error.\n",
"\n",
"This means that you can't send your whole blog post in one go to the ChatGPT API, but you have to process it in pieces. In addition, the approach to process the blog post in pieces will also make it easier for you later to integrate the suggestions of the AI into your blog post.\n",
"\n",
"I did some searching, and it was not as easy as I hoped to find a Python library that allowed me to easily split the input blog post at headings. Finally, I ended up using markdown-it-py. But markdown_it is meant to be used to translate Markdown to HTML and out of the box does not work as a Markdown to Markdown converter. After some digging, I found at the bottom of its using documentation page that you can use mdformat in combination with markdown_it.\n",
"\n",
"In addition, I remove any fence token that does not belong to the standard text flow of the blog post. This results in the following helper function:\n",
"\n",
"If you wonder about the double call to md.parse(): I am not sure if this is strictly necessary, but I noticed that some fence tokens might be missed otherwise.\n",
"\n",
"Now you can try the splitting function on our dummy Markdown text block:\n",
"\n",
"And should see the following result:\n"
]
}
],
"source": [
"prompt = f\"\"\"Proofread and correct the following section of a blog post. Stay as close as possible to the original and only make modifications to correct grammar or spelling mistakes. Text: ```{response}```\"\"\"\n",
"response2 = get_completion(prompt)\n",
"# display(Markdown(response))\n",
"print(response2)"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"data": {
"text/markdown": [
"Split Markdown at Headings\n",
"\n",
"It helps to have a syntax reference for <span style=\"color:red;font-weight:700;text-decoration:line-through;\">markdown </span><span style=\"color:red;font-weight:700;\">Markdown </span>close. As you can read <span style=\"color:red;font-weight:700;text-decoration:line-through;\">on The </span><span style=\"color:red;font-weight:700;\">on, the </span>token limit for <span style=\"color:red;font-weight:700;text-decoration:line-through;\">gpt-35-turbo </span><span style=\"color:red;font-weight:700;\">GPT-35-Turbo </span>is 4096 tokens. These limits include the token count from both the message array sent and the model response. The number of tokens in the messages array combined with the value of the max_tokens parameter must stay under these <span style=\"color:red;font-weight:700;text-decoration:line-through;\">limits </span><span style=\"color:red;font-weight:700;\">limits, </span>or you'll receive an error.\n",
"\n",
"This means that you can't send your whole blog post in one go to the <span style=\"color:red;font-weight:700;text-decoration:line-through;\">chatgpt </span><span style=\"color:red;font-weight:700;\">ChatGPT </span>API, but you have to process it in pieces. In <span style=\"color:red;font-weight:700;text-decoration:line-through;\">addition </span><span style=\"color:red;font-weight:700;\">addition, </span>the approach to process the blog post in pieces will also make it easier for you later to integrate the suggestions of the AI into your blog post.\n",
"\n",
"I did some <span style=\"color:red;font-weight:700;text-decoration:line-through;\">searching </span><span style=\"color:red;font-weight:700;\">searching, </span>and it was not as easy as I hoped <span style=\"color:red;font-weight:700;text-decoration:line-through;\">for </span>to find a <span style=\"color:red;font-weight:700;text-decoration:line-through;\">python </span><span style=\"color:red;font-weight:700;\">Python </span>library that allowed me to easily split the input blog post at headings. <span style=\"color:red;font-weight:700;text-decoration:line-through;\">Finally </span><span style=\"color:red;font-weight:700;\">Finally, </span>I ended up using markdown-it-py. But markdown_it is meant to be used to translate <span style=\"color:red;font-weight:700;text-decoration:line-through;\">markdown </span><span style=\"color:red;font-weight:700;\">Markdown </span>to HTML and out of the box does not work as <span style=\"color:red;font-weight:700;text-decoration:line-through;\">markdown to markdown </span><span style=\"color:red;font-weight:700;\">a Markdown to Markdown </span>converter. After some <span style=\"color:red;font-weight:700;text-decoration:line-through;\">digging </span><span style=\"color:red;font-weight:700;\">digging, </span>I found at the bottom of its using documentation page that you can use mdformat in combination with markdown_it.\n",
"\n",
"In <span style=\"color:red;font-weight:700;text-decoration:line-through;\">addition </span><span style=\"color:red;font-weight:700;\">addition, </span>I remove any fence token that <span style=\"color:red;font-weight:700;text-decoration:line-through;\">anyway </span>does not belong to the standard text flow of the blog post. This results in the following helper function:\n",
"\n",
"If you <span style=\"color:red;font-weight:700;text-decoration:line-through;\">wander </span><span style=\"color:red;font-weight:700;\">wonder </span>about the double call to md.parse(): I am not sure if this is strictly necessary, but I noticed that some fence tokens might be missed otherwise.\n",
"\n",
"Now you can try the splitting function on our dummy <span style=\"color:red;font-weight:700;text-decoration:line-through;\">mardkown </span><span style=\"color:red;font-weight:700;\">Markdown </span>text block:\n",
"\n",
"And should see the following result:"
],
"text/plain": [
"<IPython.core.display.Markdown object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"diff = Redlines(response,response2)\n",
"display(Markdown(diff.output_markdown))"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Splitting Markdown at Headings\n",
"\n",
"For technical hobbyists, having a syntax reference for Markdown close by is always helpful. As you read on, keep in mind that the token limit for GPT-35-Turbo is 4096 tokens, which includes the token count from both the message array sent and the model response. If the number of tokens in the messages array combined with the value of the max_tokens parameter exceeds these limits, you'll receive an error.\n",
"\n",
"This means that you can't send your entire blog post in one go to the ChatGPT API. Instead, you have to process it in pieces. This approach will also make it easier for you to integrate the AI's suggestions into your blog post later on.\n",
"\n",
"I searched for a Python library that would allow me to easily split the input blog post at headings, but it wasn't as easy as I had hoped. Eventually, I ended up using markdown-it-py. However, markdown_it is meant to be used to translate Markdown to HTML and out of the box does not work as a Markdown to Markdown converter. After some digging, I found out that you can use mdformat in combination with markdown_it.\n",
"\n",
"Additionally, I remove any fence token that does not belong to the standard text flow of the blog post. This results in the following helper function:\n",
"\n",
"If you're wondering about the double call to md.parse(), I'm not entirely sure if it's necessary, but I noticed that some fence tokens might be missed otherwise.\n",
"\n",
"Now, you can try the splitting function on our dummy Markdown text block and see the following result:\n"
]
}
],
"source": [
"# prompt = f\"\"\"\n",
"# Revise the following blog post excerpt, maintaining its original length and style, while enhancing its appeal for a technical hobbyist audience. \n",
"# Improve the reading experience to be smoother and more engaging by making a minimal set of modifications to the original. \n",
"# Text: ```{response2}```\n",
"# \"\"\"\n",
"# prompt = f\"\"\"\n",
"# Refine the following blog post excerpt with minimal alterations, preserving its original length and style, while enhancing its appeal for a discerning reader. Make limited yet impactful changes for a smooth and engaging reading experience. Text: ```{response2}```\n",
"# \"\"\"\n",
"\n",
"prompt = f\"\"\"\n",
"Below a text delimited by triple quotes is provided to you. The text is a snipet from a blog post.\n",
"\n",
"Walk through the blog post snipet paragraph by paragraph and make a few limited yet impactful changes for a smooth and engaging reading experience targeting a technical hobbyist audience.\n",
"\n",
"Text: ```{response2}```\n",
"\"\"\"\n",
"\n",
"response3 = get_completion(prompt)\n",
"print(response3)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.12"
},
"toc": {
"base_numbering": 1,
"nav_menu": {},
"number_sections": true,
"sideBar": true,
"skip_h1_title": false,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": false,
"toc_position": {},
"toc_section_display": true,
"toc_window_display": false
}
},
"nbformat": 4,
"nbformat_minor": 2
}
@cs224
Copy link
Author

cs224 commented Jun 16, 2023

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment