lewtun/hf-endpoints-inference.ipynb

## hf-endpoints-inference.ipynb
{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 125,
   "id": "d016767e-50a9-4f73-9ed1-45faefce61da",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Uncomment to install the Python client\n",
    "# %pip install text-generation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 130,
   "id": "0106a743-51c8-442c-9562-aca41c2c8e7e",
   "metadata": {},
   "outputs": [],
   "source": [
    "from typing import Dict, List\n",
    "\n",
    "from text_generation import Client\n",
    "\n",
    "API_URL = \"https://api-inference.huggingface.co/models/HuggingFaceH4/starchat-beta\"\n",
    "# Add your Hugging Face access token from here: https://huggingface.co/settings/tokens\n",
    "headers = {\"Authorization\": \"Bearer <HF_TOKEN>\"}\n",
    "client = Client(base_url=API_URL, headers=headers)\n",
    "\n",
    "generate_kwargs = dict(\n",
    "    temperature=0.7,\n",
    "    max_new_tokens=256,\n",
    "    top_p=0.9,\n",
    "    repetition_penalty=1.2,\n",
    "    do_sample=True,\n",
    "    seed=42,  # Change this to sample different responses\n",
    "    truncate=4096,\n",
    "    stop_sequences=[\"<|end|>\"],\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 131,
   "id": "bc0292c1-4cfd-45aa-ac34-091b7e470ded",
   "metadata": {},
   "outputs": [],
   "source": [
    "def prepare_prompt(messages: List[Dict[str, str]]) -> str:\n",
    "    \"\"\"Formats a list of messages according to their role\"\"\"\n",
    "    prompt = \"<|system|>\\n<|end|>\\n\"\n",
    "    if messages is None:\n",
    "        raise ValueError(\"Dialogue template must have at least one message.\")\n",
    "    for message in messages:\n",
    "        if message[\"role\"] == \"user\":\n",
    "            prompt += \"<|user|>\" + \"\\n\" + message[\"content\"] + \"<|end|>\" + \"\\n\"\n",
    "        else:\n",
    "            prompt += \"<|assistant|>\" + \"\\n\" + message[\"content\"] + \"<|end|>\" + \"\\n\"\n",
    "    prompt += \"<|assistant|>\" + \"\\n\"\n",
    "    return prompt"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "95cacc1f-ab8a-412c-96b1-512812557d37",
   "metadata": {},
   "source": [
    "## Synchronous generation (no streaming)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 132,
   "id": "687fb045-6459-45b1-b9ac-7b0b0ac2d9b7",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<|system|>\n",
      "<|end|>\n",
      "<|user|>\n",
      "What is RLHF?<|end|>\n",
      "<|assistant|>\n",
      "\n"
     ]
    }
   ],
   "source": [
    "# This is the initial user query\n",
    "messages = [{\"role\": \"user\", \"content\": \"What is RLHF?\"}]\n",
    "prompt = prepare_prompt(messages)\n",
    "print(prompt)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 133,
   "id": "7fafff9c-bff6-4a68-919f-83977a238aef",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Reinforcement learning with human feedback was first introduced in 2018 by researchers at OpenAI. They proposed a method called Prompting as Reward Engineering (PROMPT), where they used crowdsourced human feedback to improve the performance of an RL algorithm. Since then, RLHF has been explored in various fields such as robotics, natural language processing, and computer vision. However, its application remains limited due to challenges related to collecting high-quality feedback, annotating large amounts of data, and integrating RLHF into existing production pipelines.<|end|>\n",
      "CPU times: user 41.1 ms, sys: 4.82 ms, total: 45.9 ms\n",
      "Wall time: 704 ms\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "# Now generate a response from the endpoint\n",
    "output = client.generate(prompt, **generate_kwargs)\n",
    "print(outputs.generated_text)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 134,
   "id": "6aff3600-776f-43f6-aacc-8b39175051dd",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<|system|>\n",
      "<|end|>\n",
      "<|user|>\n",
      "What is RLHF?<|end|>\n",
      "<|assistant|>\n",
      "RLHF stands for Reinforcement Learning with Human Feedback. It's a type of reinforcement learning in which the agent learns to make decisions based on feedback from an external source, usually human experts or users. In this case, the goal is to train an AI model that can learn and adapt to new situations by using real-world data collected through user interactions. The ultimate aim is to create systems that can perform better than humans in complex tasks while also being able to explain their actions and reasoning process.<|end|>\n",
      "<|user|>\n",
      "Thanks! And who invented it?<|end|>\n",
      "<|assistant|>\n",
      "\n"
     ]
    }
   ],
   "source": [
    "# Append response to history\n",
    "messages.append(\n",
    "    {\"role\": \"assistant\", \"content\": output.generated_text.strip(\"<|end|>\")}\n",
    ")\n",
    "# This is the second turn of the dialogue\n",
    "messages.append({\"role\": \"user\", \"content\": \"Thanks! And who invented it?\"})\n",
    "prompt = prepare_prompt(messages)\n",
    "print(prompt)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 135,
   "id": "367bda81-a9d0-4080-9c64-b23e9479576d",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Reinforcement learning with human feedback was first introduced in 2018 by researchers at OpenAI. They proposed a method called Prompting as Reward Engineering (PROMPT), where they used crowdsourced human feedback to improve the performance of an RL algorithm. Since then, RLHF has been explored in various fields such as robotics, natural language processing, and computer vision. However, its application remains limited due to challenges related to collecting high-quality feedback, annotating large amounts of data, and integrating RLHF into existing production pipelines.<|end|>\n",
      "CPU times: user 49.3 ms, sys: 6.69 ms, total: 56 ms\n",
      "Wall time: 675 ms\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "# Generate the assistant response and repeat\n",
    "output = client.generate(prompt, **generate_kwargs)\n",
    "print(outputs.generated_text)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0cc3b011-ffe1-4251-aded-119ebeac8c27",
   "metadata": {},
   "source": [
    "## Streaming"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 136,
   "id": "1023e698-8078-4370-9a25-855467185db5",
   "metadata": {},
   "outputs": [],
   "source": [
    "def generate(prompt):\n",
    "    stream = client.generate_stream(\n",
    "        prompt,\n",
    "        **generate_kwargs,\n",
    "    )\n",
    "\n",
    "    output = \"\"\n",
    "    for idx, response in enumerate(stream):\n",
    "        # Skip special tokens from output\n",
    "        if response.token.special:\n",
    "            continue\n",
    "        output += response.token.text\n",
    "        print(response.token.text)\n",
    "    return output"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 137,
   "id": "debaa1e2-03aa-450b-a7de-2bf99433d144",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<|system|>\n",
      "<|end|>\n",
      "<|user|>\n",
      "What is RLHF?<|end|>\n",
      "<|assistant|>\n",
      "\n"
     ]
    }
   ],
   "source": [
    "# This is the initial user query\n",
    "messages = [{\"role\": \"user\", \"content\": \"What is RLHF?\"}]\n",
    "prompt = prepare_prompt(messages)\n",
    "print(prompt)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 138,
   "id": "fe04bada-9c47-429f-8e0b-94ea3f6ff5c7",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "RL\n",
      "HF\n",
      " stands\n",
      " for\n",
      " Re\n",
      "in\n",
      "forcement\n",
      " Learning\n",
      " with\n",
      " Human\n",
      " Feedback\n",
      ".\n",
      " It\n",
      "'s\n",
      " a\n",
      " type\n",
      " of\n",
      " rein\n",
      "forcement\n",
      " learning\n",
      " where\n",
      " the\n",
      " agent\n",
      " le\n",
      "ar\n",
      "ns\n",
      " to\n",
      " perform\n",
      " an\n",
      " action\n",
      " based\n",
      " on\n",
      " feedback\n",
      " from\n",
      " humans\n",
      " or\n",
      " other\n",
      " agents\n",
      ".\n",
      " The\n",
      " goal\n",
      " of\n",
      " RL\n",
      "HF\n",
      " is\n",
      " to\n",
      " create\n",
      " an\n",
      " int\n",
      "elligent\n",
      " system\n",
      " that\n",
      " can\n",
      " learn\n",
      " through\n",
      " interaction\n",
      " and\n",
      " collaboration\n",
      " with\n",
      " human\n",
      " ex\n",
      "perts\n",
      ",\n",
      " making\n",
      " it\n",
      " more\n",
      " effective\n",
      " and\n",
      " efficient\n",
      " in\n",
      " solving\n",
      " complex\n",
      " problems\n",
      ".\n",
      " In\n",
      " this\n",
      " way\n",
      ",\n",
      " RL\n",
      "HF\n",
      " combin\n",
      "es\n",
      " the\n",
      " power\n",
      " of\n",
      " machine\n",
      " learning\n",
      " algorithms\n",
      " with\n",
      " the\n",
      " knowledge\n",
      " and\n",
      " skills\n",
      " of\n",
      " human\n",
      " ex\n",
      "perts\n",
      " to\n",
      " achieve\n",
      " better\n",
      " results\n",
      " than\n",
      " either\n",
      " approach\n",
      " alone\n",
      ".\n",
      "CPU times: user 52.9 ms, sys: 7.7 ms, total: 60.6 ms\n",
      "Wall time: 676 ms\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "output = generate(prompt)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 139,
   "id": "4f5dde5d-812b-4b81-b161-864ffa34f88c",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<|system|>\n",
      "<|end|>\n",
      "<|user|>\n",
      "What is RLHF?<|end|>\n",
      "<|assistant|>\n",
      "RLHF stands for Reinforcement Learning with Human Feedback. It's a type of reinforcement learning where the agent learns to perform an action based on feedback from humans or other agents. The goal of RLHF is to create an intelligent system that can learn through interaction and collaboration with human experts, making it more effective and efficient in solving complex problems. In this way, RLHF combines the power of machine learning algorithms with the knowledge and skills of human experts to achieve better results than either approach alone.<|end|>\n",
      "<|user|>\n",
      "Thanks! And who invented it?<|end|>\n",
      "<|assistant|>\n",
      "\n"
     ]
    }
   ],
   "source": [
    "# Append response to history\n",
    "messages.append({\"role\": \"assistant\", \"content\": output})\n",
    "# This is the second turn of the dialogue\n",
    "messages.append({\"role\": \"user\", \"content\": \"Thanks! And who invented it?\"})\n",
    "prompt = prepare_prompt(messages)\n",
    "print(prompt)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 140,
   "id": "18c58665-5936-4364-8e13-fb7b0c81492d",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The\n",
      " concept\n",
      " of\n",
      " RL\n",
      "HF\n",
      " was\n",
      " first\n",
      " introduced\n",
      " by\n",
      " research\n",
      "ers\n",
      " at\n",
      " Open\n",
      "AI\n",
      " in\n",
      " \n",
      "2\n",
      "0\n",
      "1\n",
      "8\n",
      " as\n",
      " part\n",
      " of\n",
      " their\n",
      " G\n",
      "ym\n",
      " platform\n",
      " for\n",
      " developing\n",
      " and\n",
      " testing\n",
      " RL\n",
      " algorithms\n",
      ".\n",
      " However\n",
      ",\n",
      " RL\n",
      "HF\n",
      " has\n",
      " been\n",
      " around\n",
      " since\n",
      " the\n",
      " early\n",
      " days\n",
      " of\n",
      " AI\n",
      " when\n",
      " expert\n",
      " systems\n",
      " were\n",
      " developed\n",
      " using\n",
      " rule\n",
      "-\n",
      "based\n",
      " methods\n",
      " that\n",
      " re\n",
      "lied\n",
      " heav\n",
      "ily\n",
      " on\n",
      " manual\n",
      " programming\n",
      " by\n",
      " domain\n",
      " ex\n",
      "perts\n",
      ".\n",
      " Over\n",
      " time\n",
      ",\n",
      " these\n",
      " approaches\n",
      " have\n",
      " e\n",
      "volved\n",
      " into\n",
      " modern\n",
      " forms\n",
      " of\n",
      " Machine\n",
      " Learning\n",
      " such\n",
      " as\n",
      " Deep\n",
      " Neural\n",
      " Networks\n",
      " which\n",
      " are\n",
      " now\n",
      " being\n",
      " used\n",
      " in\n",
      " applications\n",
      " r\n",
      "anging\n",
      " from\n",
      " robot\n",
      "ics\n",
      " to\n",
      " medic\n",
      "ine\n",
      ".\n",
      " As\n",
      " for\n",
      " who\n",
      " \"\n",
      "invent\n",
      "ed\n",
      "\"\n",
      " RL\n",
      "HF\n",
      ",\n",
      " I\n",
      " would\n",
      " say\n",
      " that\n",
      " it\n",
      " represents\n",
      " a\n",
      " natural\n",
      " progress\n",
      "ion\n",
      " towards\n",
      " building\n",
      " int\n",
      "elligent\n",
      " systems\n",
      " that\n",
      " can\n",
      " collabor\n",
      "ate\n",
      " effectively\n",
      " with\n",
      " humans\n",
      " to\n",
      " solve\n",
      " complex\n",
      " problems\n",
      ".\n",
      "CPU times: user 49.3 ms, sys: 7.23 ms, total: 56.6 ms\n",
      "Wall time: 675 ms\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "output = generate(prompt)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 141,
   "id": "12c9280f-a5b0-4761-80f0-0c95a2636add",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The concept of RLHF was first introduced by researchers at OpenAI in 2018 as part of their Gym platform for developing and testing RL algorithms. However, RLHF has been around since the early days of AI when expert systems were developed using rule-based methods that relied heavily on manual programming by domain experts. Over time, these approaches have evolved into modern forms of Machine Learning such as Deep Neural Networks which are now being used in applications ranging from robotics to medicine. As for who \"invented\" RLHF, I would say that it represents a natural progression towards building intelligent systems that can collaborate effectively with humans to solve complex problems.\n"
     ]
    }
   ],
   "source": [
    "print(output)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1e6df454-f664-48e3-a99f-1ad8c7cf8e23",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "hf",
   "language": "python",
   "name": "hf"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
	{
	"cells": [
	{
	"cell_type": "code",
	"execution_count": 125,
	"id": "d016767e-50a9-4f73-9ed1-45faefce61da",
	"metadata": {},
	"outputs": [],
	"source": [
	"# Uncomment to install the Python client\n",
	"# %pip install text-generation"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 130,
	"id": "0106a743-51c8-442c-9562-aca41c2c8e7e",
	"metadata": {},
	"outputs": [],
	"source": [
	"from typing import Dict, List\n",
	"\n",
	"from text_generation import Client\n",
	"\n",
	"API_URL = \"https://api-inference.huggingface.co/models/HuggingFaceH4/starchat-beta\"\n",
	"# Add your Hugging Face access token from here: https://huggingface.co/settings/tokens\n",
	"headers = {\"Authorization\": \"Bearer <HF_TOKEN>\"}\n",
	"client = Client(base_url=API_URL, headers=headers)\n",
	"\n",
	"generate_kwargs = dict(\n",
	" temperature=0.7,\n",
	" max_new_tokens=256,\n",
	" top_p=0.9,\n",
	" repetition_penalty=1.2,\n",
	" do_sample=True,\n",
	" seed=42, # Change this to sample different responses\n",
	" truncate=4096,\n",
	" stop_sequences=[\"<\|end\|>\"],\n",
	")"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 131,
	"id": "bc0292c1-4cfd-45aa-ac34-091b7e470ded",
	"metadata": {},
	"outputs": [],
	"source": [
	"def prepare_prompt(messages: List[Dict[str, str]]) -> str:\n",
	" \"\"\"Formats a list of messages according to their role\"\"\"\n",
	" prompt = \"<\|system\|>\\n<\|end\|>\\n\"\n",
	" if messages is None:\n",
	" raise ValueError(\"Dialogue template must have at least one message.\")\n",
	" for message in messages:\n",
	" if message[\"role\"] == \"user\":\n",
	" prompt += \"<\|user\|>\" + \"\\n\" + message[\"content\"] + \"<\|end\|>\" + \"\\n\"\n",
	" else:\n",
	" prompt += \"<\|assistant\|>\" + \"\\n\" + message[\"content\"] + \"<\|end\|>\" + \"\\n\"\n",
	" prompt += \"<\|assistant\|>\" + \"\\n\"\n",
	" return prompt"
	]
	},
	{
	"cell_type": "markdown",
	"id": "95cacc1f-ab8a-412c-96b1-512812557d37",
	"metadata": {},
	"source": [
	"## Synchronous generation (no streaming)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 132,
	"id": "687fb045-6459-45b1-b9ac-7b0b0ac2d9b7",
	"metadata": {},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"<\|system\|>\n",
	"<\|end\|>\n",
	"<\|user\|>\n",
	"What is RLHF?<\|end\|>\n",
	"<\|assistant\|>\n",
	"\n"
	]
	}
	],
	"source": [
	"# This is the initial user query\n",
	"messages = [{\"role\": \"user\", \"content\": \"What is RLHF?\"}]\n",
	"prompt = prepare_prompt(messages)\n",
	"print(prompt)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 133,
	"id": "7fafff9c-bff6-4a68-919f-83977a238aef",
	"metadata": {},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"Reinforcement learning with human feedback was first introduced in 2018 by researchers at OpenAI. They proposed a method called Prompting as Reward Engineering (PROMPT), where they used crowdsourced human feedback to improve the performance of an RL algorithm. Since then, RLHF has been explored in various fields such as robotics, natural language processing, and computer vision. However, its application remains limited due to challenges related to collecting high-quality feedback, annotating large amounts of data, and integrating RLHF into existing production pipelines.<\|end\|>\n",
	"CPU times: user 41.1 ms, sys: 4.82 ms, total: 45.9 ms\n",
	"Wall time: 704 ms\n"
	]
	}
	],
	"source": [
	"%%time\n",
	"# Now generate a response from the endpoint\n",
	"output = client.generate(prompt, **generate_kwargs)\n",
	"print(outputs.generated_text)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 134,
	"id": "6aff3600-776f-43f6-aacc-8b39175051dd",
	"metadata": {},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"<\|system\|>\n",
	"<\|end\|>\n",
	"<\|user\|>\n",
	"What is RLHF?<\|end\|>\n",
	"<\|assistant\|>\n",
	"RLHF stands for Reinforcement Learning with Human Feedback. It's a type of reinforcement learning in which the agent learns to make decisions based on feedback from an external source, usually human experts or users. In this case, the goal is to train an AI model that can learn and adapt to new situations by using real-world data collected through user interactions. The ultimate aim is to create systems that can perform better than humans in complex tasks while also being able to explain their actions and reasoning process.<\|end\|>\n",
	"<\|user\|>\n",
	"Thanks! And who invented it?<\|end\|>\n",
	"<\|assistant\|>\n",
	"\n"
	]
	}
	],
	"source": [
	"# Append response to history\n",
	"messages.append(\n",
	" {\"role\": \"assistant\", \"content\": output.generated_text.strip(\"<\|end\|>\")}\n",
	")\n",
	"# This is the second turn of the dialogue\n",
	"messages.append({\"role\": \"user\", \"content\": \"Thanks! And who invented it?\"})\n",
	"prompt = prepare_prompt(messages)\n",
	"print(prompt)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 135,
	"id": "367bda81-a9d0-4080-9c64-b23e9479576d",
	"metadata": {},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"Reinforcement learning with human feedback was first introduced in 2018 by researchers at OpenAI. They proposed a method called Prompting as Reward Engineering (PROMPT), where they used crowdsourced human feedback to improve the performance of an RL algorithm. Since then, RLHF has been explored in various fields such as robotics, natural language processing, and computer vision. However, its application remains limited due to challenges related to collecting high-quality feedback, annotating large amounts of data, and integrating RLHF into existing production pipelines.<\|end\|>\n",
	"CPU times: user 49.3 ms, sys: 6.69 ms, total: 56 ms\n",
	"Wall time: 675 ms\n"
	]
	}
	],
	"source": [
	"%%time\n",
	"# Generate the assistant response and repeat\n",
	"output = client.generate(prompt, **generate_kwargs)\n",
	"print(outputs.generated_text)"
	]
	},
	{
	"cell_type": "markdown",
	"id": "0cc3b011-ffe1-4251-aded-119ebeac8c27",
	"metadata": {},
	"source": [
	"## Streaming"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 136,
	"id": "1023e698-8078-4370-9a25-855467185db5",
	"metadata": {},
	"outputs": [],
	"source": [
	"def generate(prompt):\n",
	" stream = client.generate_stream(\n",
	" prompt,\n",
	" **generate_kwargs,\n",
	" )\n",
	"\n",
	" output = \"\"\n",
	" for idx, response in enumerate(stream):\n",
	" # Skip special tokens from output\n",
	" if response.token.special:\n",
	" continue\n",
	" output += response.token.text\n",
	" print(response.token.text)\n",
	" return output"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 137,
	"id": "debaa1e2-03aa-450b-a7de-2bf99433d144",
	"metadata": {},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"<\|system\|>\n",
	"<\|end\|>\n",
	"<\|user\|>\n",
	"What is RLHF?<\|end\|>\n",
	"<\|assistant\|>\n",
	"\n"
	]
	}
	],
	"source": [
	"# This is the initial user query\n",
	"messages = [{\"role\": \"user\", \"content\": \"What is RLHF?\"}]\n",
	"prompt = prepare_prompt(messages)\n",
	"print(prompt)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 138,
	"id": "fe04bada-9c47-429f-8e0b-94ea3f6ff5c7",
	"metadata": {},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"RL\n",
	"HF\n",
	" stands\n",
	" for\n",
	" Re\n",
	"in\n",
	"forcement\n",
	" Learning\n",
	" with\n",
	" Human\n",
	" Feedback\n",
	".\n",
	" It\n",
	"'s\n",
	" a\n",
	" type\n",
	" of\n",
	" rein\n",
	"forcement\n",
	" learning\n",
	" where\n",
	" the\n",
	" agent\n",
	" le\n",
	"ar\n",
	"ns\n",
	" to\n",
	" perform\n",
	" an\n",
	" action\n",
	" based\n",
	" on\n",
	" feedback\n",
	" from\n",
	" humans\n",
	" or\n",
	" other\n",
	" agents\n",
	".\n",
	" The\n",
	" goal\n",
	" of\n",
	" RL\n",
	"HF\n",
	" is\n",
	" to\n",
	" create\n",
	" an\n",
	" int\n",
	"elligent\n",
	" system\n",
	" that\n",
	" can\n",
	" learn\n",
	" through\n",
	" interaction\n",
	" and\n",
	" collaboration\n",
	" with\n",
	" human\n",
	" ex\n",
	"perts\n",
	",\n",
	" making\n",
	" it\n",
	" more\n",
	" effective\n",
	" and\n",
	" efficient\n",
	" in\n",
	" solving\n",
	" complex\n",
	" problems\n",
	".\n",
	" In\n",
	" this\n",
	" way\n",
	",\n",
	" RL\n",
	"HF\n",
	" combin\n",
	"es\n",
	" the\n",
	" power\n",
	" of\n",
	" machine\n",
	" learning\n",
	" algorithms\n",
	" with\n",
	" the\n",
	" knowledge\n",
	" and\n",
	" skills\n",
	" of\n",
	" human\n",
	" ex\n",
	"perts\n",
	" to\n",
	" achieve\n",
	" better\n",
	" results\n",
	" than\n",
	" either\n",
	" approach\n",
	" alone\n",
	".\n",
	"CPU times: user 52.9 ms, sys: 7.7 ms, total: 60.6 ms\n",
	"Wall time: 676 ms\n"
	]
	}
	],
	"source": [
	"%%time\n",
	"output = generate(prompt)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 139,
	"id": "4f5dde5d-812b-4b81-b161-864ffa34f88c",
	"metadata": {},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"<\|system\|>\n",
	"<\|end\|>\n",
	"<\|user\|>\n",
	"What is RLHF?<\|end\|>\n",
	"<\|assistant\|>\n",
	"RLHF stands for Reinforcement Learning with Human Feedback. It's a type of reinforcement learning where the agent learns to perform an action based on feedback from humans or other agents. The goal of RLHF is to create an intelligent system that can learn through interaction and collaboration with human experts, making it more effective and efficient in solving complex problems. In this way, RLHF combines the power of machine learning algorithms with the knowledge and skills of human experts to achieve better results than either approach alone.<\|end\|>\n",
	"<\|user\|>\n",
	"Thanks! And who invented it?<\|end\|>\n",
	"<\|assistant\|>\n",
	"\n"
	]
	}
	],
	"source": [
	"# Append response to history\n",
	"messages.append({\"role\": \"assistant\", \"content\": output})\n",
	"# This is the second turn of the dialogue\n",
	"messages.append({\"role\": \"user\", \"content\": \"Thanks! And who invented it?\"})\n",
	"prompt = prepare_prompt(messages)\n",
	"print(prompt)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 140,
	"id": "18c58665-5936-4364-8e13-fb7b0c81492d",
	"metadata": {},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"The\n",
	" concept\n",
	" of\n",
	" RL\n",
	"HF\n",
	" was\n",
	" first\n",
	" introduced\n",
	" by\n",
	" research\n",
	"ers\n",
	" at\n",
	" Open\n",
	"AI\n",
	" in\n",
	" \n",
	"2\n",
	"0\n",
	"1\n",
	"8\n",
	" as\n",
	" part\n",
	" of\n",
	" their\n",
	" G\n",
	"ym\n",
	" platform\n",
	" for\n",
	" developing\n",
	" and\n",
	" testing\n",
	" RL\n",
	" algorithms\n",
	".\n",
	" However\n",
	",\n",
	" RL\n",
	"HF\n",
	" has\n",
	" been\n",
	" around\n",
	" since\n",
	" the\n",
	" early\n",
	" days\n",
	" of\n",
	" AI\n",
	" when\n",
	" expert\n",
	" systems\n",
	" were\n",
	" developed\n",
	" using\n",
	" rule\n",
	"-\n",
	"based\n",
	" methods\n",
	" that\n",
	" re\n",
	"lied\n",
	" heav\n",
	"ily\n",
	" on\n",
	" manual\n",
	" programming\n",
	" by\n",
	" domain\n",
	" ex\n",
	"perts\n",
	".\n",
	" Over\n",
	" time\n",
	",\n",
	" these\n",
	" approaches\n",
	" have\n",
	" e\n",
	"volved\n",
	" into\n",
	" modern\n",
	" forms\n",
	" of\n",
	" Machine\n",
	" Learning\n",
	" such\n",
	" as\n",
	" Deep\n",
	" Neural\n",
	" Networks\n",
	" which\n",
	" are\n",
	" now\n",
	" being\n",
	" used\n",
	" in\n",
	" applications\n",
	" r\n",
	"anging\n",
	" from\n",
	" robot\n",
	"ics\n",
	" to\n",
	" medic\n",
	"ine\n",
	".\n",
	" As\n",
	" for\n",
	" who\n",
	" \"\n",
	"invent\n",
	"ed\n",
	"\"\n",
	" RL\n",
	"HF\n",
	",\n",
	" I\n",
	" would\n",
	" say\n",
	" that\n",
	" it\n",
	" represents\n",
	" a\n",
	" natural\n",
	" progress\n",
	"ion\n",
	" towards\n",
	" building\n",
	" int\n",
	"elligent\n",
	" systems\n",
	" that\n",
	" can\n",
	" collabor\n",
	"ate\n",
	" effectively\n",
	" with\n",
	" humans\n",
	" to\n",
	" solve\n",
	" complex\n",
	" problems\n",
	".\n",
	"CPU times: user 49.3 ms, sys: 7.23 ms, total: 56.6 ms\n",
	"Wall time: 675 ms\n"
	]
	}
	],
	"source": [
	"%%time\n",
	"output = generate(prompt)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 141,
	"id": "12c9280f-a5b0-4761-80f0-0c95a2636add",
	"metadata": {},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"The concept of RLHF was first introduced by researchers at OpenAI in 2018 as part of their Gym platform for developing and testing RL algorithms. However, RLHF has been around since the early days of AI when expert systems were developed using rule-based methods that relied heavily on manual programming by domain experts. Over time, these approaches have evolved into modern forms of Machine Learning such as Deep Neural Networks which are now being used in applications ranging from robotics to medicine. As for who \"invented\" RLHF, I would say that it represents a natural progression towards building intelligent systems that can collaborate effectively with humans to solve complex problems.\n"
	]
	}
	],
	"source": [
	"print(output)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "1e6df454-f664-48e3-a99f-1ad8c7cf8e23",
	"metadata": {},
	"outputs": [],
	"source": []
	}
	],
	"metadata": {
	"kernelspec": {
	"display_name": "hf",
	"language": "python",
	"name": "hf"
	},
	"language_info": {
	"codemirror_mode": {
	"name": "ipython",
	"version": 3
	},
	"file_extension": ".py",
	"mimetype": "text/x-python",
	"name": "python",
	"nbconvert_exporter": "python",
	"pygments_lexer": "ipython3",
	"version": "3.8.13"
	}
	},
	"nbformat": 4,
	"nbformat_minor": 5
	}