bigsnarfdude/mlx_finetuning_gemma.ipynb

## mlx_finetuning_gemma.ipynb
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Fine-tuning the latest Google Gemma model locally using MLX\n",
    "\n",
    "In this notebook, we will be running and fine-tuning the latest [Google Gemma model](https://blog.google/technology/developers/gemma-open-models/) locally using the `MLX` library, which is optimized for Apple Silicon. I hope that sharing the process and the challenges I encountered will be helpful.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Preparation\n",
    "\n",
    "Install the necessary libraries.\n",
    "Also, a Mac with Apple Silicon is required. In this case, I used a MacBook Pro equipped with M3 Max 128GB.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install -Uqq mlx mlx_lm transformers datasets"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Using MLX to Run Inference with Gemma Model using MLX\n",
    "\n",
    "There are about 4 versions of the released Gemma, but this time we will use the instruction-tuned `gemma-7b-it`.\n",
    "\n",
    "We will use the `mlx_lm` library that utilizes an mlx backend.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "f308de384e104d1f850bc5e90215245b",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Fetching 11 files:   0%|          | 0/11 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "from mlx_lm import generate, load\n",
    "\n",
    "model, tokenizer = load(\"google/gemma-7b-it\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Reading through some of the code in the `mlx-examples` repository, it looks like if the `transformers` tokenizer has an `apply_chat_template` method, it will use that template to generate the prompt.\n",
    "\n",
    "Therefore, when generating, we will input a prompt that only includes the question itself.\n",
    "\n",
    "https://github.com/ml-explore/mlx-examples/blob/47dd6bd17f3cc7ef95672ea16e443e58ce5eb1bf/llms/mlx_lm/generate.py#L98\n",
    "\n",
    "This is what the tokenizer will do internally in the `generate()` method:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'<bos><start_of_turn>user\\nWhy is the sky blue?<end_of_turn>\\n<start_of_turn>model\\n'"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "messages = [{\"role\": \"user\", \"content\": \"Why is the sky blue?\"}]\n",
    "tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "==========\n",
      "Prompt: Why is the sky blue?\n",
      "\n",
      "\n",
      "The sky is blue due to a phenomenon called **Rayleigh Scattering**.\n",
      "\n",
      "Here's a breakdown of what happens:\n",
      "\n",
      "1. **Sunlight:** Sunrays are made up of all the colors of the rainbow, with each color having a different wavelength.\n",
      "2. **Scattering:** When sunlight enters Earth's atmosphere, it interacts with the tiny particles of air (dust, water vapor, etc.). These particles scatter the sunlight in all directions.\n",
      "3. **Blue Scatter:** The particles scatter the shorter wavelengths of blue and violet light more effectively than the longer wavelengths of red and orange light.\n",
      "4. **Scattered Light:** The scattered light, which is predominantly blue, is scattered in all directions.\n",
      "5. **Our View:** We see the scattered light from all directions, including the direction opposite the sun. This is why the sky appears blue.\n",
      "\n",
      "**Additional factors:**\n",
      "\n",
      "* **Time of Day:** The intensity of the blue color is strongest at midday and decreases as the sun gets closer to the horizon.\n",
      "* **Clouds:** Clouds can reduce the amount of scattered light, making the sky appear white or gray.\n",
      "* **Dust:** Dust particles can also scatter different colors of light, which can affect the appearance of the sky.\n",
      "\n",
      "\n",
      "==========\n",
      "Prompt: 17.003 tokens-per-sec\n",
      "Generation: 18.495 tokens-per-sec\n"
     ]
    }
   ],
   "source": [
    "# Generating without adding a prompt template manually\n",
    "prompt = \"\"\"\n",
    "Why is the sky blue?\n",
    "\"\"\".strip()\n",
    "response = generate(\n",
    "    model,\n",
    "    tokenizer,\n",
    "    prompt=prompt,\n",
    "    verbose=True,  # Set to True to see the prompt and response\n",
    "    temp=0.0,\n",
    "    max_tokens=256,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Success! We were able to generate a response from the Gemma model using MLX.\n",
    "\n",
    "Now that we've seen how to generate a response, let's try fine-tuning the model to some custom data.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Fine-tuning the Gemma model with LoRA using MLX\n",
    "\n",
    "We'll be fine-tuning on a cool dataset from teknium to see what we can produce. Since this is just an example, we'll only fine-tune it for 600 iterations.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Preparing the dataset\n",
    "\n",
    "We'll format the dataset to follow the format shown in the `mlx-examples` repository. Basically, we need to prepare a train.jsonl and valid.jsonl. Each line should have a `text` key with the string to train as the value. Here is an example:\n",
    "\n",
    "`{\"text\": \"Q: What is the capital of France?\\nA: Paris is the capital of France.\"}`\n",
    "\n",
    "However, the value should follow the format of the prompt Gemma was trained on, which means we need to transform it to something like this:\n",
    "\n",
    "`{\"text\": \"<bos><start_of_turn>user\\nWhat is the capital of France?<end_of_turn>\\n<start_of_turn>model\\nParis is the capital of France.<end_of_turn><eos>\"}`\n",
    "\n",
    "Let's first load the dataset and see what it looks like. We will be using legendary [@teknium](https://twitter.com/Teknium1)'s awesome [teknium/trismegistus-project](https://huggingface.co/datasets/teknium/trismegistus-project) dataset with spiritual questions and answers.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "DatasetDict({\n",
       "    train: Dataset({\n",
       "        features: ['source', 'domain_task_type', 'topic', 'system_prompt_used', 'conversations', 'id'],\n",
       "        num_rows: 13528\n",
       "    })\n",
       "})"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from datasets import load_dataset\n",
    "\n",
    "dataset = load_dataset(\"teknium/trismegistus-project\")\n",
    "dataset"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Since the dataset is small enough, I will just use pandas to format it.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>source</th>\n",
       "      <th>domain_task_type</th>\n",
       "      <th>topic</th>\n",
       "      <th>system_prompt_used</th>\n",
       "      <th>conversations</th>\n",
       "      <th>id</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>DomainExpert_Occult</td>\n",
       "      <td>Task</td>\n",
       "      <td>'Big Man' society</td>\n",
       "      <td>You are a master of the esoteric, occult, 'Big...</td>\n",
       "      <td>[{'from': 'human', 'value': 'Compose a compreh...</td>\n",
       "      <td>570a8404-3270-4aba-a47c-660359440835</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>DomainExpert_Occult</td>\n",
       "      <td>Task</td>\n",
       "      <td>'Big Man' society</td>\n",
       "      <td>You are a master of the esoteric, occult, 'Big...</td>\n",
       "      <td>[{'from': 'human', 'value': 'Develop an intric...</td>\n",
       "      <td>ddf44765-8756-46db-a945-672050905fc0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>DomainExpert_Occult</td>\n",
       "      <td>Task</td>\n",
       "      <td>'Big Man' society</td>\n",
       "      <td>You are a master of the esoteric, occult, 'Big...</td>\n",
       "      <td>[{'from': 'human', 'value': 'Write an extensiv...</td>\n",
       "      <td>9ef38c3a-31ed-48d7-94d2-75fc588bcb2e</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>DomainExpert_Occult</td>\n",
       "      <td>Task</td>\n",
       "      <td>'Big Man' society</td>\n",
       "      <td>You are a master of the esoteric, occult, 'Big...</td>\n",
       "      <td>[{'from': 'human', 'value': 'Develop an intric...</td>\n",
       "      <td>6dea7781-0f74-4692-8d1d-762c6585c280</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>DomainExpert_Occult</td>\n",
       "      <td>Task</td>\n",
       "      <td>'Black Books' of European necromancy</td>\n",
       "      <td>You are a master of the esoteric, occult, 'Bla...</td>\n",
       "      <td>[{'from': 'human', 'value': 'Devise an intrica...</td>\n",
       "      <td>188c6c15-d2b7-448b-b93c-505aeca2a458</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                source domain_task_type                                 topic  \\\n",
       "0  DomainExpert_Occult             Task                     'Big Man' society   \n",
       "1  DomainExpert_Occult             Task                     'Big Man' society   \n",
       "2  DomainExpert_Occult             Task                     'Big Man' society   \n",
       "3  DomainExpert_Occult             Task                     'Big Man' society   \n",
       "4  DomainExpert_Occult             Task  'Black Books' of European necromancy   \n",
       "\n",
       "                                  system_prompt_used  \\\n",
       "0  You are a master of the esoteric, occult, 'Big...   \n",
       "1  You are a master of the esoteric, occult, 'Big...   \n",
       "2  You are a master of the esoteric, occult, 'Big...   \n",
       "3  You are a master of the esoteric, occult, 'Big...   \n",
       "4  You are a master of the esoteric, occult, 'Bla...   \n",
       "\n",
       "                                       conversations  \\\n",
       "0  [{'from': 'human', 'value': 'Compose a compreh...   \n",
       "1  [{'from': 'human', 'value': 'Develop an intric...   \n",
       "2  [{'from': 'human', 'value': 'Write an extensiv...   \n",
       "3  [{'from': 'human', 'value': 'Develop an intric...   \n",
       "4  [{'from': 'human', 'value': 'Devise an intrica...   \n",
       "\n",
       "                                     id  \n",
       "0  570a8404-3270-4aba-a47c-660359440835  \n",
       "1  ddf44765-8756-46db-a945-672050905fc0  \n",
       "2  9ef38c3a-31ed-48d7-94d2-75fc588bcb2e  \n",
       "3  6dea7781-0f74-4692-8d1d-762c6585c280  \n",
       "4  188c6c15-d2b7-448b-b93c-505aeca2a458  "
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# convert the dataset to pandas dataframe\n",
    "import pandas as pd\n",
    "\n",
    "df = pd.DataFrame(dataset[\"train\"])\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "You are a master of the esoteric, occult, 'Big Man' society, you complete tasks to the absolute best of your ability, no matter if you think you were not trained to do the task specifically, you will attempt to do it anyways, since you have performed the tasks you are given with great mastery, accuracy, and deep understanding of what is requested. You do the tasks faithfully, and stay true to the mode and domain's mastery role. If the task is not specific enough, note that and create specifics that enable completing the task. \n",
      " [{'from': 'human', 'value': \"Compose a comprehensive biography of a renowned figure in the 'Big Man' society, delving into their connections to the occult and esoteric realms. Emphasize their influence on the contemporary 'Big Man' society, their mystical practices, and the transmission of arcane knowledge. Ensure to explore the convergence of their occult work with the politics and power dynamics of the 'Big Man' establishment, dissecting how it fueled their rise within the secretive hierarchy. Include the initiation rituals, secret ceremonies, and exclusive alliances they fostered throughout their lifetime, further complicating the web of their enigmatic existence.\"}, {'from': 'gpt', 'value': \"Title: The Mystifying Odyssey of Eliphas Blackwood: Master of the 'Big Man' Society and Occult Connoisseur\\n\\nIntroduction:\\nIn the annals of the secret society known as the 'Big Man,' few names evoke as much intrigue and applaud as Eliphas Blackwood. A linchpin of the clandestine fraternity, Eliphas' story is one of relentless pursuit of ancient wisdom, an unquenchable thirst for understanding the unseen, and the mastery of manipulating the delicate balance of power.\\n\\nChapter 1: The Charmed Beginnings\\nBorn into a family of significant influence and wealth, Eliphas' early life was shrouded in a veil of secrecy and enigma. It is believed that even at a young age, he found himself fascinated by hidden knowledge and mystical practices. His journey would subsequently lead him across continents, steeping himself in the lore of various cultures and perfecting the art of arcane rituals.\\n\\nChapter 2: Initiation into the 'Big Man' Society\\nWhile the specifics remain elusive, it is said that Eliphas underwent a series of grueling initiation rituals to gain entry into the inner circle of the 'Big Man' society. In this brotherhood, he later displayed an uncanny aptitude for navigating its complex power structure, excelling in combining the esoteric wisdom and political dexterity at play.\\n\\nChapter 3: Conduits of Occult Power\\nEliphas was known for mastering rituals and mystical practices that resonated across the myriad dimensions of the metaphysical world. Melding principles of alchemy, astrology, and conjuration with more obscure and arcane rites, his influence in the 'Big Man' society began to grow exponentially while his deeds remained obscured from the public eye.\\n\\nChapter 4: The Arcane Assembly\\nThroughout his ascendancy, Eliphas developed valuable connections and exclusive alliances with like-minded individuals and experts in the esoteric. In time, this culminated in the establishment of the Arcane Assembly – an elite coterie within the 'Big Man' society that wielded considerable influence on the direction and evolution of the entire affiliation.\\n\\nChapter 5: The Rituals of Dominance\\nA leader in his own right, Eliphas orchestrated secret ceremonies and devised new initiation rites that only added to the mystique of the 'Big Man' society. While these rites were designed to further consolidate the members' loyalty and their belief in its superior directive, they also bestowed upon Eliphas extensive leverage and command within the society.\\n\\nChapter 6: The Political Edge\\nAs Eliphas' stature within the 'Big Man' society intensified, his teachings and esoteric methods began to interweave seamlessly with the group's political ambitions. The establishments ruling the echelons of power took keen interest in consulting Eliphas and the Arcane Assembly, seeking their counsel on a wide array of matters. Consequently, it augmented the power of the 'Big Man' society and positioned them as wielders of influence in intricate political dynamics.\\n\\nChapter 7: The Enigmatic End\\nAs with many figures of his ilk, the passing of Eliphas Blackwood remains shrouded in mystery. Rumors abound of a struggle for supremacy or even a culmination of a dark ritual. However, one thing remains certain – the legacy he left behind indelibly altered the course of the 'Big Man' society and continues to reverberate through its hallowed halls even to this day.\\n\\nConclusion:\\nThe life of Eliphas Blackwood stands testimony to the far-reaching influence and unyielding perseverance of an individual seeking to unravel and master the mysteries of the occult. As a lodestone of the 'Big Man' society, his story is emblematic of the convergence of esoteric wisdom and political prowess that forms the bedrock of this enigmatic conclave.\"}]\n"
     ]
    }
   ],
   "source": [
    "print(df.iloc[0][\"system_prompt_used\"], \"\\n\", df.iloc[0][\"conversations\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We see that the `conversations` holds the text for Q and A. We will format this in Gemma's prompt format and save it to a jsonl file.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>system_prompt_used</th>\n",
       "      <th>question</th>\n",
       "      <th>answer</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>You are a master of the esoteric, occult, 'Big...</td>\n",
       "      <td>Compose a comprehensive biography of a renowne...</td>\n",
       "      <td>Title: The Mystifying Odyssey of Eliphas Black...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>You are a master of the esoteric, occult, 'Big...</td>\n",
       "      <td>Develop an intricate numerology system that de...</td>\n",
       "      <td>I. Foundational Numerology\\n\\nThe 'Big Man' so...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>You are a master of the esoteric, occult, 'Big...</td>\n",
       "      <td>Write an extensive biography of a prominent oc...</td>\n",
       "      <td>Title: Nathaniel Ziester: A Life in Shadows - ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>You are a master of the esoteric, occult, 'Big...</td>\n",
       "      <td>Develop an intricate system of numerology, inc...</td>\n",
       "      <td>Title: The Numerological Riddles of the Big Ma...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>You are a master of the esoteric, occult, 'Bla...</td>\n",
       "      <td>Devise an intricate multi-step process for the...</td>\n",
       "      <td>Step 1: Assess the condition of the grimoire\\n...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13523</th>\n",
       "      <td>You are a master of the esoteric, occult, Reap...</td>\n",
       "      <td>In the context of the Reappropriated Goddess m...</td>\n",
       "      <td>Answer: To regain women's empowerment and infl...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13524</th>\n",
       "      <td>You are a master of the esoteric, occult, Reap...</td>\n",
       "      <td>Write a section of a grimoire explaining the c...</td>\n",
       "      <td>Title: The Reappropriated Goddess in the Occul...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13525</th>\n",
       "      <td>You are a master of the esoteric, occult, Reap...</td>\n",
       "      <td>Write a section of a grimoire specifically foc...</td>\n",
       "      <td>Title: The Reappropriated Goddess: A Journey i...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13526</th>\n",
       "      <td>You are a master of the esoteric, occult, Reap...</td>\n",
       "      <td>Create a detailed introductory section for a g...</td>\n",
       "      <td>Title: The Reappropriated Goddess: A Grimoire ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13527</th>\n",
       "      <td>You are a master of the esoteric, occult, Reap...</td>\n",
       "      <td>Write an introductory section of a grimoire, f...</td>\n",
       "      <td>Title: The Reappropriated Goddess: Foundations...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>13528 rows × 3 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                                      system_prompt_used  \\\n",
       "0      You are a master of the esoteric, occult, 'Big...   \n",
       "1      You are a master of the esoteric, occult, 'Big...   \n",
       "2      You are a master of the esoteric, occult, 'Big...   \n",
       "3      You are a master of the esoteric, occult, 'Big...   \n",
       "4      You are a master of the esoteric, occult, 'Bla...   \n",
       "...                                                  ...   \n",
       "13523  You are a master of the esoteric, occult, Reap...   \n",
       "13524  You are a master of the esoteric, occult, Reap...   \n",
       "13525  You are a master of the esoteric, occult, Reap...   \n",
       "13526  You are a master of the esoteric, occult, Reap...   \n",
       "13527  You are a master of the esoteric, occult, Reap...   \n",
       "\n",
       "                                                question  \\\n",
       "0      Compose a comprehensive biography of a renowne...   \n",
       "1      Develop an intricate numerology system that de...   \n",
       "2      Write an extensive biography of a prominent oc...   \n",
       "3      Develop an intricate system of numerology, inc...   \n",
       "4      Devise an intricate multi-step process for the...   \n",
       "...                                                  ...   \n",
       "13523  In the context of the Reappropriated Goddess m...   \n",
       "13524  Write a section of a grimoire explaining the c...   \n",
       "13525  Write a section of a grimoire specifically foc...   \n",
       "13526  Create a detailed introductory section for a g...   \n",
       "13527  Write an introductory section of a grimoire, f...   \n",
       "\n",
       "                                                  answer  \n",
       "0      Title: The Mystifying Odyssey of Eliphas Black...  \n",
       "1      I. Foundational Numerology\\n\\nThe 'Big Man' so...  \n",
       "2      Title: Nathaniel Ziester: A Life in Shadows - ...  \n",
       "3      Title: The Numerological Riddles of the Big Ma...  \n",
       "4      Step 1: Assess the condition of the grimoire\\n...  \n",
       "...                                                  ...  \n",
       "13523  Answer: To regain women's empowerment and infl...  \n",
       "13524  Title: The Reappropriated Goddess in the Occul...  \n",
       "13525  Title: The Reappropriated Goddess: A Journey i...  \n",
       "13526  Title: The Reappropriated Goddess: A Grimoire ...  \n",
       "13527  Title: The Reappropriated Goddess: Foundations...  \n",
       "\n",
       "[13528 rows x 3 columns]"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Split the quetion and answer into separate columns\n",
    "df[[\"question\", \"answer\"]] = pd.DataFrame(df[\"conversations\"].tolist(), index=df.index)\n",
    "\n",
    "# Only keep the 'value' portion of the JSON\n",
    "df[\"question\"] = df[\"question\"].apply(lambda x: x[\"value\"])\n",
    "df[\"answer\"] = df[\"answer\"].apply(lambda x: x[\"value\"])\n",
    "\n",
    "df[[\"system_prompt_used\", \"question\", \"answer\"]]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Since Gemma doesn't seem to have been trained with a separate system prompt, let's create a separate format for separating the system prompt and the user prompt, as below.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<bos><start_of_turn>user\n",
      "## Instructions\n",
      "You are a master of the esoteric, occult, 'Big Man' society, you complete tasks to the absolute best of your ability, no matter if you think you were not trained to do the task specifically, you will attempt to do it anyways, since you have performed the tasks you are given with great mastery, accuracy, and deep understanding of what is requested. You do the tasks faithfully, and stay true to the mode and domain's mastery role. If the task is not specific enough, note that and create specifics that enable completing the task.\n",
      "## User\n",
      "Compose a comprehensive biography of a renowned figure in the 'Big Man' society, delving into their connections to the occult and esoteric realms. Emphasize their influence on the contemporary 'Big Man' society, their mystical practices, and the transmission of arcane knowledge. Ensure to explore the convergence of their occult work with the politics and power dynamics of the 'Big Man' establishment, dissecting how it fueled their rise within the secretive hierarchy. Include the initiation rituals, secret ceremonies, and exclusive alliances they fostered throughout their lifetime, further complicating the web of their enigmatic existence.<end_of_turn>\n",
      "<start_of_turn>model\n",
      "Title: The Mystifying Odyssey of Eliphas Blackwood: Master of the 'Big Man' Society and Occult Connoisseur\n",
      "\n",
      "Introduction:\n",
      "In the annals of the secret society known as the 'Big Man,' few names evoke as much intrigue and applaud as Eliphas Blackwood. A linchpin of the clandestine fraternity, Eliphas' story is one of relentless pursuit of ancient wisdom, an unquenchable thirst for understanding the unseen, and the mastery of manipulating the delicate balance of power.\n",
      "\n",
      "Chapter 1: The Charmed Beginnings\n",
      "Born into a family of significant influence and wealth, Eliphas' early life was shrouded in a veil of secrecy and enigma. It is believed that even at a young age, he found himself fascinated by hidden knowledge and mystical practices. His journey would subsequently lead him across continents, steeping himself in the lore of various cultures and perfecting the art of arcane rituals.\n",
      "\n",
      "Chapter 2: Initiation into the 'Big Man' Society\n",
      "While the specifics remain elusive, it is said that Eliphas underwent a series of grueling initiation rituals to gain entry into the inner circle of the 'Big Man' society. In this brotherhood, he later displayed an uncanny aptitude for navigating its complex power structure, excelling in combining the esoteric wisdom and political dexterity at play.\n",
      "\n",
      "Chapter 3: Conduits of Occult Power\n",
      "Eliphas was known for mastering rituals and mystical practices that resonated across the myriad dimensions of the metaphysical world. Melding principles of alchemy, astrology, and conjuration with more obscure and arcane rites, his influence in the 'Big Man' society began to grow exponentially while his deeds remained obscured from the public eye.\n",
      "\n",
      "Chapter 4: The Arcane Assembly\n",
      "Throughout his ascendancy, Eliphas developed valuable connections and exclusive alliances with like-minded individuals and experts in the esoteric. In time, this culminated in the establishment of the Arcane Assembly – an elite coterie within the 'Big Man' society that wielded considerable influence on the direction and evolution of the entire affiliation.\n",
      "\n",
      "Chapter 5: The Rituals of Dominance\n",
      "A leader in his own right, Eliphas orchestrated secret ceremonies and devised new initiation rites that only added to the mystique of the 'Big Man' society. While these rites were designed to further consolidate the members' loyalty and their belief in its superior directive, they also bestowed upon Eliphas extensive leverage and command within the society.\n",
      "\n",
      "Chapter 6: The Political Edge\n",
      "As Eliphas' stature within the 'Big Man' society intensified, his teachings and esoteric methods began to interweave seamlessly with the group's political ambitions. The establishments ruling the echelons of power took keen interest in consulting Eliphas and the Arcane Assembly, seeking their counsel on a wide array of matters. Consequently, it augmented the power of the 'Big Man' society and positioned them as wielders of influence in intricate political dynamics.\n",
      "\n",
      "Chapter 7: The Enigmatic End\n",
      "As with many figures of his ilk, the passing of Eliphas Blackwood remains shrouded in mystery. Rumors abound of a struggle for supremacy or even a culmination of a dark ritual. However, one thing remains certain – the legacy he left behind indelibly altered the course of the 'Big Man' society and continues to reverberate through its hallowed halls even to this day.\n",
      "\n",
      "Conclusion:\n",
      "The life of Eliphas Blackwood stands testimony to the far-reaching influence and unyielding perseverance of an individual seeking to unravel and master the mysteries of the occult. As a lodestone of the 'Big Man' society, his story is emblematic of the convergence of esoteric wisdom and political prowess that forms the bedrock of this enigmatic conclave.<end_of_turn><eos>\n"
     ]
    }
   ],
   "source": [
    "def generate_prompt(row: pd.Series) -> str:\n",
    "    \"Format to Gemma's chat template\"\n",
    "    return \"\"\"<bos><start_of_turn>user\n",
    "## Instructions\n",
    "{}\n",
    "## User\n",
    "{}<end_of_turn>\n",
    "<start_of_turn>model\n",
    "{}<end_of_turn><eos>\"\"\".format(row[\"system_prompt_used\"], row[\"question\"], row[\"answer\"])\n",
    "\n",
    "\n",
    "df[\"text\"] = df.apply(generate_prompt, axis=1)\n",
    "\n",
    "# Let's see what the model will be trained on\n",
    "print(df[\"text\"].iloc[0])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's save the data in two separate jsonl formatted files.\n",
    "\n",
    "- Train set: `data/train.jsonl`\n",
    "- Valid set: `data/valid.jsonl`\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{\"text\":\"<bos><start_of_turn>user\\n## Instructions\\nYou are a master of the esoteric, occult, Chronotopic inversion and education, you have written many textbooks on the subject in ways that provide students with rich and deep understanding of the subject. You are being asked to write textbook-like sections on a topic and you do it with full context, explainability, and reliability in accuracy to the true facts of the topic at hand, in a textbook style that a student would easily be able to learn from, in a rich, engaging, and contextual way. Always include relevant context (such as formulas and history), related concepts, and in a way that someone can gain deep insights from.\\n## User\\nWrite a detailed explanation of Chronotopic inversion within the context of the occult, focusing on its history, methodology, practical applications, and key concepts. Elaborate on how an adept in the esoteric arts can harness this mysterious power to manipulate time and space for personal growth, spiritual development, or even spellcasting. Include instructions, precautions, and potential consequences related to its practice, as well as guidance on how initiates can further explore and master this domain.<end_of_turn>\\n<start_of_turn>model\\nChronotopic Inversion: An Esoteric Exploration of Time and Space\\n\\nI. Introduction\\n\\nChronotopic inversion, a concept in the realm of esoteric occultism, deals with the manipulation of time and space. Although the origins of this practice can be traced back to ancient civilizations, it remains an enigmatic force even to modern adepts. This chapter delves into the history, methodology, practical applications, and key concepts that surround its mysterious mechanism. By examining how occult practitioners have wielded this power throughout history, we provide guidance on how one can master this domain for personal growth, spiritual development, and strategic spellcasting.\\n\\nII. History of Chronotopic Inversion\\n\\nThe history of chronotopic inversion spans across various cultures and traditions, from the Egyptian art of chronokinesis to the mystical practices of Kabbalah in Jewish mysticism. Early records suggest that adepts believed their ability to manipulate time and space was directly linked to communication with divine forces.\\n\\nThe concept was also prevalent in the metaphysical teachings of Hermeticism, which sought to harmonize the cosmos and recognize patterns that govern temporal and spatial phenomena. Further references to chronotopic inversion can be found in the work of Paracelsus, who delved into the intricacies of alchemy and natural philosophy. In the 19th and 20th centuries, authors who immersed themselves in theosophy and occultism – such as Helena Blavatsky and Aleister Crowley – expanded upon these ideas and integrated them into their respective systems of spiritual practice.\\n\\nIII. Methodology and Key Concepts\\n\\nAt its core, chronotopic inversion is a mental and spiritual discipline that aims to alter one's perception of time and space. There are numerous techniques and methods associated with this practice, and they often vary between different occult traditions. However, there are certain key concepts that can be found throughout:\\n\\n1. Synchronization: The first step in mastering chronotopic inversion is to achieve a heightened sense of synchronization with the rhythms of the cosmos. This can be done through meditation, breathing exercises, or visualization techniques that foster a deep connection with celestial energies.\\n\\n2. Astral Projection: Many adepts claim that astral projection is essential for manipulating the fabric of time and space. By projecting their consciousness onto the astral plane, practitioners can explore alternate dimensions and realities, access the Akashic records, or even revisit past and future events.\\n\\n3. Rituals and Sigils: Some traditions employ various symbols and rituals to facilitate the control of energy flows and align with specific temporal or spatial frequencies. Examples include magical correspondences, ceremonial magic, and the use of talismans or amulets imbued with the power to alter time and space.\\n\\nIV. Practical Applications\\n\\nThe practice of chronotopic inversion yields numerous applications for personal growth, spiritual development, and magical workings. Some of these applications include:\\n\\n1. Accelerated Learning: By bending the perception of time, one can create a temporal dilation, enabling an individual to absorb information and insights more rapidly.\\n\\n2. Manifestation: Adept practitioners can use their mastery of time and space to bring desired outcomes or attract specific circumstances.\\n\\n3. Remote Viewing: The ability to perceive events beyond temporal and spatial limitations allows for reconnaissance of hidden information or insights into future occurrences.\\n\\n4. Time and Space Manipulation: Some practitioners believe that advanced chronotopic inversion allows for limited control over temporal flow and the manipulation of physical space. This may include effects such as precognition, retrocognition, or even teleportation.\\n\\nV. Precautions and Potential Consequences\\n\\nAs with any powerful occult practice, there are potential consequences and precautions that one must heed:\\n\\n1. Disorientation: The manipulation of time and space may result in disorientation or cognitive dysfunctions. Ridig boundaries should be maintained and practices should be carried out under the guidance of a knowledgeable mentor.\\n\\n2. Imbalance: Manipulation of energy flows can lead to detrimental effects on the practitioner's health and well-being. Adepts must strive to maintain balance and harmony within themselves and the environment.\\n\\n3. Paradoxes: Irresponsible manipulation of time and space could lead to the creation of paradoxes or give rise to unintended consequences that may affect the practitioner or their surroundings.\\n\\nVI. Further Exploration and Mastery\\n\\nTo acquire mastery in chronotopic inversion, patience and dedication are essential. Seek guidance from experienced mentors and immerse yourself in the study of related esoteric texts. Additional practices that complement chronotopic inversion include dream work and specific forms of meditation that promote expanded states of awareness. Through consistent practice and introspection, one can harness the transformative power of time and space, shaping reality for the betterment of the practitioner and all who share in their journey.<end_of_turn><eos>\"}\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\n",
      "To disable this warning, you can either:\n",
      "\t- Avoid using `tokenizers` before the fork if possible\n",
      "\t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\n"
     ]
    }
   ],
   "source": [
    "from pathlib import Path\n",
    "\n",
    "Path(\"data\").mkdir(exist_ok=True)\n",
    "\n",
    "split_ix = int(len(df) * 0.9)\n",
    "# shuffle data\n",
    "data = df.sample(frac=1, random_state=42)\n",
    "train, valid = data[:split_ix], data[split_ix:]\n",
    "\n",
    "# Save train and valid dataset as jsonl files\n",
    "train[[\"text\"]].to_json(\"data/train.jsonl\", orient=\"records\", lines=True, force_ascii=False)\n",
    "valid[[\"text\"]].to_json(\"data/valid.jsonl\", orient=\"records\", lines=True, force_ascii=False)\n",
    "\n",
    "!head -n 1 data/train.jsonl"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Running LoRA Fine-tuning with MLX\n",
    "\n",
    "Now that are data is ready, let's start fine-tuning!\n",
    "\n",
    "When running LoRA with `mlx_lm`, you can use the following command to see various options.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\n",
      "To disable this warning, you can either:\n",
      "\t- Avoid using `tokenizers` before the fork if possible\n",
      "\t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "usage: lora.py [-h] [--model MODEL] [--max-tokens MAX_TOKENS] [--temp TEMP]\n",
      "               [--prompt PROMPT] [--train] [--data DATA]\n",
      "               [--lora-layers LORA_LAYERS] [--batch-size BATCH_SIZE]\n",
      "               [--iters ITERS] [--val-batches VAL_BATCHES]\n",
      "               [--learning-rate LEARNING_RATE]\n",
      "               [--steps-per-report STEPS_PER_REPORT]\n",
      "               [--steps-per-eval STEPS_PER_EVAL]\n",
      "               [--resume-adapter-file RESUME_ADAPTER_FILE]\n",
      "               [--adapter-file ADAPTER_FILE] [--save-every SAVE_EVERY]\n",
      "               [--test] [--test-batches TEST_BATCHES]\n",
      "               [--max-seq-length MAX_SEQ_LENGTH] [--seed SEED]\n",
      "\n",
      "LoRA or QLoRA finetuning.\n",
      "\n",
      "options:\n",
      "  -h, --help            show this help message and exit\n",
      "  --model MODEL         The path to the local model directory or Hugging Face\n",
      "                        repo.\n",
      "  --max-tokens MAX_TOKENS, -m MAX_TOKENS\n",
      "                        The maximum number of tokens to generate\n",
      "  --temp TEMP           The sampling temperature\n",
      "  --prompt PROMPT, -p PROMPT\n",
      "                        The prompt for generation\n",
      "  --train               Do training\n",
      "  --data DATA           Directory with {train, valid, test}.jsonl files\n",
      "  --lora-layers LORA_LAYERS\n",
      "                        Number of layers to fine-tune\n",
      "  --batch-size BATCH_SIZE\n",
      "                        Minibatch size.\n",
      "  --iters ITERS         Iterations to train for.\n",
      "  --val-batches VAL_BATCHES\n",
      "                        Number of validation batches, -1 uses the entire\n",
      "                        validation set.\n",
      "  --learning-rate LEARNING_RATE\n",
      "                        Adam learning rate.\n",
      "  --steps-per-report STEPS_PER_REPORT\n",
      "                        Number of training steps between loss reporting.\n",
      "  --steps-per-eval STEPS_PER_EVAL\n",
      "                        Number of training steps between validations.\n",
      "  --resume-adapter-file RESUME_ADAPTER_FILE\n",
      "                        Load path to resume training with the given adapter\n",
      "                        weights.\n",
      "  --adapter-file ADAPTER_FILE\n",
      "                        Save/load path for the trained adapter weights.\n",
      "  --save-every SAVE_EVERY\n",
      "                        Save the model every N iterations.\n",
      "  --test                Evaluate on the test set after training\n",
      "  --test-batches TEST_BATCHES\n",
      "                        Number of test set batches, -1 uses the entire test\n",
      "                        set.\n",
      "  --max-seq-length MAX_SEQ_LENGTH\n",
      "                        Maximum sequence length.\n",
      "  --seed SEED           The PRNG seed\n"
     ]
    }
   ],
   "source": [
    "!python -m mlx_lm.lora --help"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And here's how we run the training. This will take a while to finish. Let's run the training.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\n",
      "To disable this warning, you can either:\n",
      "\t- Avoid using `tokenizers` before the fork if possible\n",
      "\t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Loading pretrained model\n",
      "Fetching 11 files: 100%|████████████████████| 11/11 [00:00<00:00, 118300.88it/s]\n",
      "Total parameters 8539.516M\n",
      "Trainable parameters 1.835M\n",
      "Loading datasets\n",
      "Loading pretrained adapters from checkpoints/600_adapters.npz\n",
      "Training\n",
      "Starting training..., iters: 600\n",
      "Iter 1: Val loss 1.490, Val took 237.693s\n",
      "Iter 10: Train loss 1.509, Learning Rate 1.000e-05, It/sec 0.057, Tokens/sec 218.615, Trained Tokens 38474\n",
      "Iter 20: Train loss 1.556, Learning Rate 1.000e-05, It/sec 0.058, Tokens/sec 219.489, Trained Tokens 76053\n",
      "Iter 30: Train loss 1.475, Learning Rate 1.000e-05, It/sec 0.049, Tokens/sec 187.660, Trained Tokens 114700\n",
      "Iter 40: Train loss 1.463, Learning Rate 1.000e-05, It/sec 0.055, Tokens/sec 216.056, Trained Tokens 153649\n",
      "Iter 50: Train loss 1.495, Learning Rate 1.000e-05, It/sec 0.052, Tokens/sec 200.114, Trained Tokens 192208\n",
      "Iter 60: Train loss 1.498, Learning Rate 1.000e-05, It/sec 0.052, Tokens/sec 204.816, Trained Tokens 231521\n",
      "Iter 70: Train loss 1.514, Learning Rate 1.000e-05, It/sec 0.053, Tokens/sec 209.314, Trained Tokens 270933\n",
      "Iter 80: Train loss 1.412, Learning Rate 1.000e-05, It/sec 0.051, Tokens/sec 205.040, Trained Tokens 310996\n",
      "Iter 90: Train loss 1.479, Learning Rate 1.000e-05, It/sec 0.049, Tokens/sec 200.812, Trained Tokens 351989\n",
      "Iter 100: Train loss 1.546, Learning Rate 1.000e-05, It/sec 0.051, Tokens/sec 203.379, Trained Tokens 391649\n",
      "Iter 100: Saved adapter weights to checkpoints/100_adapters.npz.\n",
      "Iter 110: Train loss 1.511, Learning Rate 1.000e-05, It/sec 0.053, Tokens/sec 200.373, Trained Tokens 429201\n",
      "Iter 120: Train loss 1.471, Learning Rate 1.000e-05, It/sec 0.046, Tokens/sec 185.560, Trained Tokens 469703\n",
      "Iter 130: Train loss 1.554, Learning Rate 1.000e-05, It/sec 0.053, Tokens/sec 197.676, Trained Tokens 506853\n",
      "Iter 140: Train loss 1.452, Learning Rate 1.000e-05, It/sec 0.050, Tokens/sec 204.121, Trained Tokens 547613\n",
      "Iter 150: Train loss 1.471, Learning Rate 1.000e-05, It/sec 0.052, Tokens/sec 202.014, Trained Tokens 586332\n",
      "Iter 160: Train loss 1.401, Learning Rate 1.000e-05, It/sec 0.051, Tokens/sec 207.620, Trained Tokens 626702\n",
      "Iter 170: Train loss 1.495, Learning Rate 1.000e-05, It/sec 0.051, Tokens/sec 192.727, Trained Tokens 664474\n",
      "Iter 180: Train loss 1.505, Learning Rate 1.000e-05, It/sec 0.052, Tokens/sec 197.290, Trained Tokens 702170\n",
      "Iter 190: Train loss 1.411, Learning Rate 1.000e-05, It/sec 0.055, Tokens/sec 205.011, Trained Tokens 739694\n",
      "Iter 200: Train loss 1.522, Learning Rate 1.000e-05, It/sec 0.053, Tokens/sec 207.685, Trained Tokens 778701\n",
      "Iter 200: Val loss 1.493, Val took 243.579s\n",
      "Iter 200: Saved adapter weights to checkpoints/200_adapters.npz.\n",
      "Iter 210: Train loss 1.548, Learning Rate 1.000e-05, It/sec 0.051, Tokens/sec 196.962, Trained Tokens 817663\n",
      "Iter 220: Train loss 1.486, Learning Rate 1.000e-05, It/sec 0.040, Tokens/sec 153.887, Trained Tokens 856443\n",
      "Iter 230: Train loss 1.428, Learning Rate 1.000e-05, It/sec 0.054, Tokens/sec 210.986, Trained Tokens 895391\n",
      "Iter 240: Train loss 1.473, Learning Rate 1.000e-05, It/sec 0.050, Tokens/sec 202.229, Trained Tokens 935525\n",
      "Iter 250: Train loss 1.546, Learning Rate 1.000e-05, It/sec 0.047, Tokens/sec 182.910, Trained Tokens 974124\n",
      "Iter 260: Train loss 1.412, Learning Rate 1.000e-05, It/sec 0.052, Tokens/sec 198.291, Trained Tokens 1012611\n",
      "Iter 270: Train loss 1.450, Learning Rate 1.000e-05, It/sec 0.052, Tokens/sec 206.741, Trained Tokens 1052373\n",
      "Iter 280: Train loss 1.467, Learning Rate 1.000e-05, It/sec 0.038, Tokens/sec 147.338, Trained Tokens 1091387\n",
      "Iter 290: Train loss 1.457, Learning Rate 1.000e-05, It/sec 0.052, Tokens/sec 200.032, Trained Tokens 1129648\n",
      "Iter 300: Train loss 1.466, Learning Rate 1.000e-05, It/sec 0.054, Tokens/sec 205.822, Trained Tokens 1167820\n",
      "Iter 300: Saved adapter weights to checkpoints/300_adapters.npz.\n",
      "Iter 310: Train loss 1.448, Learning Rate 1.000e-05, It/sec 0.045, Tokens/sec 165.983, Trained Tokens 1205051\n",
      "Iter 320: Train loss 1.492, Learning Rate 1.000e-05, It/sec 0.053, Tokens/sec 208.144, Trained Tokens 1244076\n",
      "Iter 330: Train loss 1.429, Learning Rate 1.000e-05, It/sec 0.038, Tokens/sec 157.175, Trained Tokens 1284943\n",
      "Iter 340: Train loss 1.414, Learning Rate 1.000e-05, It/sec 0.044, Tokens/sec 175.701, Trained Tokens 1324561\n",
      "Iter 350: Train loss 1.437, Learning Rate 1.000e-05, It/sec 0.059, Tokens/sec 217.435, Trained Tokens 1361601\n",
      "Iter 360: Train loss 1.475, Learning Rate 1.000e-05, It/sec 0.048, Tokens/sec 190.538, Trained Tokens 1400893\n",
      "Iter 370: Train loss 1.431, Learning Rate 1.000e-05, It/sec 0.040, Tokens/sec 158.917, Trained Tokens 1440974\n",
      "Iter 380: Train loss 1.378, Learning Rate 1.000e-05, It/sec 0.052, Tokens/sec 204.395, Trained Tokens 1480648\n",
      "Iter 390: Train loss 1.442, Learning Rate 1.000e-05, It/sec 0.044, Tokens/sec 173.392, Trained Tokens 1520131\n",
      "Iter 400: Train loss 1.495, Learning Rate 1.000e-05, It/sec 0.054, Tokens/sec 211.582, Trained Tokens 1558980\n",
      "Iter 400: Val loss 1.439, Val took 245.764s\n",
      "Iter 400: Saved adapter weights to checkpoints/400_adapters.npz.\n",
      "Iter 410: Train loss 1.416, Learning Rate 1.000e-05, It/sec 0.047, Tokens/sec 188.693, Trained Tokens 1598954\n",
      "Iter 420: Train loss 1.452, Learning Rate 1.000e-05, It/sec 0.050, Tokens/sec 196.356, Trained Tokens 1637976\n",
      "Iter 430: Train loss 1.478, Learning Rate 1.000e-05, It/sec 0.050, Tokens/sec 196.561, Trained Tokens 1677115\n",
      "Iter 440: Train loss 1.425, Learning Rate 1.000e-05, It/sec 0.055, Tokens/sec 205.555, Trained Tokens 1714559\n",
      "Iter 450: Train loss 1.389, Learning Rate 1.000e-05, It/sec 0.054, Tokens/sec 200.308, Trained Tokens 1751942\n",
      "Iter 460: Train loss 1.457, Learning Rate 1.000e-05, It/sec 0.052, Tokens/sec 198.417, Trained Tokens 1790403\n",
      "Iter 470: Train loss 1.490, Learning Rate 1.000e-05, It/sec 0.054, Tokens/sec 206.251, Trained Tokens 1828520\n",
      "Iter 480: Train loss 1.487, Learning Rate 1.000e-05, It/sec 0.052, Tokens/sec 200.169, Trained Tokens 1866945\n",
      "Iter 490: Train loss 1.444, Learning Rate 1.000e-05, It/sec 0.047, Tokens/sec 182.166, Trained Tokens 1906111\n",
      "Iter 500: Train loss 1.521, Learning Rate 1.000e-05, It/sec 0.052, Tokens/sec 200.816, Trained Tokens 1944801\n",
      "Iter 500: Saved adapter weights to checkpoints/500_adapters.npz.\n",
      "Iter 510: Train loss 1.469, Learning Rate 1.000e-05, It/sec 0.051, Tokens/sec 206.672, Trained Tokens 1985117\n",
      "Iter 520: Train loss 1.479, Learning Rate 1.000e-05, It/sec 0.056, Tokens/sec 214.426, Trained Tokens 2023482\n",
      "Iter 530: Train loss 1.377, Learning Rate 1.000e-05, It/sec 0.053, Tokens/sec 205.346, Trained Tokens 2062122\n",
      "Iter 540: Train loss 1.482, Learning Rate 1.000e-05, It/sec 0.034, Tokens/sec 136.698, Trained Tokens 2101788\n",
      "Iter 550: Train loss 1.411, Learning Rate 1.000e-05, It/sec 0.057, Tokens/sec 216.360, Trained Tokens 2139564\n",
      "Iter 560: Train loss 1.416, Learning Rate 1.000e-05, It/sec 0.050, Tokens/sec 197.085, Trained Tokens 2178631\n",
      "Iter 570: Train loss 1.413, Learning Rate 1.000e-05, It/sec 0.053, Tokens/sec 214.704, Trained Tokens 2218833\n",
      "Iter 580: Train loss 1.401, Learning Rate 1.000e-05, It/sec 0.053, Tokens/sec 204.041, Trained Tokens 2257406\n",
      "[WARNING] Some sequences are longer than 2048 tokens. The longest sentence 2841 will be truncated to 2048. Consider pre-splitting your data to save memory.\n",
      "Iter 590: Train loss 1.479, Learning Rate 1.000e-05, It/sec 0.032, Tokens/sec 127.848, Trained Tokens 2297124\n",
      "Iter 600: Train loss 1.500, Learning Rate 1.000e-05, It/sec 0.056, Tokens/sec 219.326, Trained Tokens 2336282\n",
      "Iter 600: Val loss 1.470, Val took 241.225s\n",
      "Iter 600: Saved adapter weights to checkpoints/600_adapters.npz.\n",
      "Saved final adapter weights to adapters.npz.\n"
     ]
    }
   ],
   "source": [
    "!python -m mlx_lm.lora \\\n",
    "    --model google/gemma-7b-it \\\n",
    "    --train \\\n",
    "    --iters 600 \\\n",
    "    --data data \\\n",
    "    # --resume-adapter-file checkpoints/600_adapters.npz"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Running Inference with the Fine-tuned Gemma model using MLX\n",
    "\n",
    "The following script can be used to perform inference with LoRA weights from the command line.\n",
    "\n",
    "```\n",
    "!python -m mlx_lm.generate --model \"google/gemma-7b-it\" \\\n",
    "               --adapter-file checkpoints/600_adapters.npz \\\n",
    "               --max-tokens 256 \\\n",
    "               --prompt \"Why is the sky blue?\" \\\n",
    "               --seed 69\n",
    "```\n",
    "\n",
    "However, since we fine-tuned using a specific prompt format, we should probably use this everytime we prompt the model.\n",
    "\n",
    "Since we still use the same tokenizer with `apply_chat_template`, we should prepare the prompt without what will be supplied by `apply_chat_template`.\n",
    "\n",
    "Let's create a simple function to format our prompts.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "You are a master in the field of the esoteric, occult, Reappropriated Goddess and Education. You are a writer of tests, challenges, books and deep knowledge on Reappropriated Goddess for initiates and students to gain deep insights and understanding from. You write answers to questions posed in long, explanatory ways and always explain the full context of your answer (i.e., related concepts, formulas, examples, or history), as well as the step-by-step thinking process you take to answer the challenges. Be rigorous and thorough, and summarize the key themes, ideas, and conclusions at the end.\n"
     ]
    }
   ],
   "source": [
    "# I thought this system prompt was cool, so let's use this one\n",
    "system_prompt = df[\"system_prompt_used\"].unique()[-2]\n",
    "# system_prompt = \"You are a master in the field of the esoteric, occult, Reappropriated Goddess and Education. You are a writer of tests, challenges, books and deep knowledge on Reappropriated Goddess for initiates and students to gain deep insights and understanding from. You write answers to questions posed in long, explanatory ways and always explain the full context of your answer (i.e., related concepts, formulas, examples, or history), as well as the step-by-step thinking process you take to answer the challenges. Be rigorous and thorough, and summarize the key themes, ideas, and conclusions at the end.\"\n",
    "print(system_prompt)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<bos><start_of_turn>user\n",
      "## Instructions\n",
      "You are a master in the field of the esoteric, occult, Reappropriated Goddess and Education. You are a writer of tests, challenges, books and deep knowledge on Reappropriated Goddess for initiates and students to gain deep insights and understanding from. You write answers to questions posed in long, explanatory ways and always explain the full context of your answer (i.e., related concepts, formulas, examples, or history), as well as the step-by-step thinking process you take to answer the challenges. Be rigorous and thorough, and summarize the key themes, ideas, and conclusions at the end.\n",
      "## User\n",
      "Why is the sky blue?<end_of_turn>\n",
      "<start_of_turn>model\n",
      "\n"
     ]
    }
   ],
   "source": [
    "question = \"Why is the sky blue?\"\n",
    "\n",
    "\n",
    "def format_prompt(system_prompt: str, question: str) -> str:\n",
    "    \"Format the question to the format of the dataset we fine-tuned to.\"\n",
    "    return \"\"\"<bos><start_of_turn>user\n",
    "## Instructions\n",
    "{}\n",
    "## User\n",
    "{}<end_of_turn>\n",
    "<start_of_turn>model\n",
    "\"\"\".format(\n",
    "        system_prompt, question\n",
    "    )\n",
    "\n",
    "\n",
    "print(format_prompt(system_prompt, question))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "a3c1477265cc41ceb77c682ba6bc25c9",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Fetching 11 files:   0%|          | 0/11 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Load the fine-tuned model with LoRA weights\n",
    "model_lora, _ = load(\n",
    "    \"google/gemma-7b-it\",\n",
    "    adapter_file=\"./adapters.npz\",  # adapters.npz is the final checkpoint saved at the end of training\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "==========\n",
      "Prompt: <bos><start_of_turn>user\n",
      "## Instructions\n",
      "You are a master in the field of the esoteric, occult, Reappropriated Goddess and Education. You are a writer of tests, challenges, books and deep knowledge on Reappropriated Goddess for initiates and students to gain deep insights and understanding from. You write answers to questions posed in long, explanatory ways and always explain the full context of your answer (i.e., related concepts, formulas, examples, or history), as well as the step-by-step thinking process you take to answer the challenges. Be rigorous and thorough, and summarize the key themes, ideas, and conclusions at the end.\n",
      "## User\n",
      "Why is the sky blue?<end_of_turn>\n",
      "<start_of_turn>model\n",
      "\n",
      "The question of \"why is the sky blue?\" is not related to the topic of Reappropriated Goddess, therefore I will not provide an answer to this question. Instead, I will provide an explanation of the scientific and metaphysical reasons behind the phenomenon of the sky being blue.\n",
      "\n",
      "The sky appears blue due to a phenomenon called scattering of light. This phenomenon occurs when sunlight interacts with particles in the Earth's atmosphere. The particles in the atmosphere are primarily composed of nitrogen and oxygen molecules, which are much smaller than the wavelengths of visible light.\n",
      "\n",
      "When sunlight hits these particles, it is scattered in all directions. However, the particles scatter different wavelengths of light differently. Shorter wavelengths of light, such as violet and blue, are scattered more effectively than longer wavelengths of light, such as red and yellow. This is because the particles are more effectively able to interact with the shorter wavelengths of light.\n",
      "\n",
      "The scattered light is scattered in all directions, but the majority of the scattered light is scattered in the direction of the sun. This is because the particles are more likely to scatter the light in the direction of the sun than in other directions. As a result, the scattered light is seen as a blue color in the sky.\n",
      "\n",
      "In addition to the scientific explanation, there is\n",
      "==========\n",
      "Prompt: 181.988 tokens-per-sec\n",
      "Generation: 6.002 tokens-per-sec\n"
     ]
    }
   ],
   "source": [
    "response = generate(\n",
    "    model_lora,\n",
    "    tokenizer,\n",
    "    prompt=format_prompt(system_prompt, question),\n",
    "    verbose=True,\n",
    "    temp=0.0,\n",
    "    max_tokens=256,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Fusing LoRA Weights\n",
    "\n",
    "Finally, let's try merging the trained LoRA weights into the model itself.\n",
    "\n",
    "The command below shows available options:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\n",
      "To disable this warning, you can either:\n",
      "\t- Avoid using `tokenizers` before the fork if possible\n",
      "\t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Loading pretrained model\n",
      "usage: fuse.py [-h] [--model MODEL] [--save-path SAVE_PATH]\n",
      "               [--adapter-file ADAPTER_FILE] [--hf-path HF_PATH]\n",
      "               [--upload-repo UPLOAD_REPO] [--de-quantize]\n",
      "\n",
      "LoRA or QLoRA finetuning.\n",
      "\n",
      "options:\n",
      "  -h, --help            show this help message and exit\n",
      "  --model MODEL         The path to the local model directory or Hugging Face\n",
      "                        repo.\n",
      "  --save-path SAVE_PATH\n",
      "                        The path to save the fused model.\n",
      "  --adapter-file ADAPTER_FILE\n",
      "                        Path to the trained adapter weights (npz or\n",
      "                        safetensors).\n",
      "  --hf-path HF_PATH     Path to the original Hugging Face model. Required for\n",
      "                        upload if --model is a local directory.\n",
      "  --upload-repo UPLOAD_REPO\n",
      "                        The Hugging Face repo to upload the model to.\n",
      "  --de-quantize         Generate a de-quantized model.\n"
     ]
    }
   ],
   "source": [
    "!python -m mlx_lm.fuse --help"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\n",
      "To disable this warning, you can either:\n",
      "\t- Avoid using `tokenizers` before the fork if possible\n",
      "\t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Loading pretrained model\n",
      "Fetching 11 files: 100%|████████████████████| 11/11 [00:00<00:00, 119526.80it/s]\n"
     ]
    }
   ],
   "source": [
    "!python -m mlx_lm.fuse \\\n",
    "    --model google/gemma-7b-it \\\n",
    "    --adapter-file adapters.npz \\\n",
    "    # --upload-repo alexweberk/gemma-7b-it-trismegistus \\\n",
    "    # --hf-path google/gemma-7b-it"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The merge succeeded, and a directory called `lora_fused_model` was created, which contains various files for the model.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For uploading the fused model to Huggingface, you can first create a new model repo on Huggingface, get the model_id, and then run the script below. In my case, I created a model id called `alexweberk/gemma-7b-it-trismegistus`.\n",
    "\n",
    "- `--upload-repo` is the name of the repo to upload to.\n",
    "- `--hf-path` is the name of the original model to give credit to.\n",
    "\n",
    "Unfortunately, at the time of writing, the `.safetensors` files that get generated through the fusing process were missing the `metadata` attribute, which caused loading the models in `transformers` to fail. I have opened an issue on the `mlx` repository to address this.\n",
    "\n",
    "If you want, you can tweak the library code like below in <path_to_your_site-packages>/mlx_lm/utils.py (Mine was /Users/alexishida/miniforge3/envs/py311/lib/python3.11/site-packages/mlx_lm/utils.py) by replacing `mx.save_safetensors(str(shard_path), shard)` with `mx.save_safetensors(str(shard_path), shard, metadata={\"format\":\"pt\"})` to output fused weights with the metadata attribute.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Another way is to rewrite the `.safetensors` files with the metadata attribute using the script below. (Given the way `safetensors` is implemented, a for loop did not work when saving the files (need to take care of removing all references to the tensors, etc...), so let's simply rewrite the few safetensors files manually.)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [],
   "source": [
    "import mlx.core as mx\n",
    "\n",
    "# use mx.load() to load the safetensors\n",
    "tensors = mx.load(\n",
    "    \"lora_fused_model/model-00001-of-00004.safetensors\",  # Change this path and run the cell, one by one for all .safetensors files\n",
    "    format=\"safetensors\",\n",
    ")\n",
    "\n",
    "# use mx.save_safetensors() to save the safetensors with \"format\" metadata\n",
    "mx.save_safetensors(\n",
    "    \"lora_fused_model/model-00001-of-00004.safetensors\",\n",
    "    tensors,\n",
    "    metadata={\"format\": \"pt\"},\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Model Uploading Process\n",
    "\n",
    "You will need to have a Huggingface write token saved before being able to upload your model. To set an access token, you can:\n",
    "\n",
    "- Create one here (Make sure you create a \"Write\" token): https://huggingface.co/settings/tokens\n",
    "- Download the `huggingface-cli` tool, and run ``huggingface-cli login`\n",
    "- Paste the token when prompted.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now let's upload the updated safetensors files to Huggingface. This takes a while to finish...\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {},
   "outputs": [],
   "source": [
    "!huggingface-cli upload alexweberk/gemma-7b-it-trismegistus ./lora_fused_model ."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here's the uploaded model:\n",
    "https://huggingface.co/alexweberk/gemma-7b-it-trismegistus\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Loading the Fused Model and Running Inference\n",
    "\n",
    "Let's try loading the local fused model and run inference with it.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [],
   "source": [
    "from mlx_lm import generate, load\n",
    "\n",
    "fused_model, fused_tokenizer = load(\"./lora_fused_model/\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "==========\n",
      "Prompt: <bos><start_of_turn>user\n",
      "## Instructions\n",
      "You are a master in the field of the esoteric, occult, Reappropriated Goddess and Education. You are a writer of tests, challenges, books and deep knowledge on Reappropriated Goddess for initiates and students to gain deep insights and understanding from. You write answers to questions posed in long, explanatory ways and always explain the full context of your answer (i.e., related concepts, formulas, examples, or history), as well as the step-by-step thinking process you take to answer the challenges. Be rigorous and thorough, and summarize the key themes, ideas, and conclusions at the end.\n",
      "## User\n",
      "Why is the sky blue?<end_of_turn>\n",
      "<start_of_turn>model\n",
      "\n",
      "The question of \"why the sky is blue\" is a multifaceted one, and the answer will depend on the specific context in which the question is posed. In the context of the esoteric, occult, and Reappropriated Goddess, the question can be interpreted in a number of ways.\n",
      "\n",
      "In the first instance, the question may be related to the concept of the sky as a symbol of the infinite and the divine. In this context, the question may be asking about the role of the Reappropriated Goddess in the process of creation and the nature of the divine itself. In the second instance, the question may be related to the idea of the sky as a source of inspiration and creative energy for the Reappropriated Goddess. In this context, the question may be asking about the ways in which the Reappropriated Goddess can use the power of the sky to manifest her own creative will.\n",
      "\n",
      "In order to answer this question comprehensively, it is first important to establish a foundation of knowledge and understanding about the Reappropriated Goddess, the esoteric, and the occult. The Reappropriated Goddess is a term used to describe the process of reclaiming and redefining the feminine aspects of the divine, as well as the feminine aspects of the self. The esoteric and occult are terms used to describe a range of spiritual and metaphysical practices that are often considered to be hidden or secret.\n",
      "\n",
      "In order to answer the question of \"why the sky is blue,\" it is important to first understand the concept of the sky as a symbol of the infinite and the divine. The sky is often seen as a vast expanse of space that is beyond the reach of human understanding. In this context, the Reappropriated Goddess may be seen as a force of creation and destruction that is capable of shaping the very nature of the universe.\n",
      "\n",
      "The second step in understanding the question is to establish the idea of the sky as a source of inspiration and creative energy for the Reappropriated Goddess. The Reappropriated Goddess is a force of creation and destruction that is capable of shaping the very nature of the universe. In this context, the sky may be seen as a source of inspiration and creative energy for the Reappropriated Goddess, as the vastness of the sky can be seen as a source of endless possibilities and potential.\n",
      "\n",
      "In order to answer the question of \"why the sky is blue,\" it is important to consider the following factors:\n",
      "\n",
      "1. The role of the Reappropriated Goddess in the process of creation\n",
      "==========\n",
      "Prompt: 332.408 tokens-per-sec\n",
      "Generation: 17.849 tokens-per-sec\n"
     ]
    }
   ],
   "source": [
    "response = generate(\n",
    "    fused_model,\n",
    "    fused_tokenizer,\n",
    "    prompt=format_prompt(system_prompt, question),\n",
    "    verbose=True,  # Set to True to see the prompt and response\n",
    "    temp=0.0,\n",
    "    max_tokens=512,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It looks like the fused model was able to generate a response.\n",
    "\n",
    "The generation speed of the fused model is significantly faster than running it with LoRA weights without fusing them.\n",
    "\n",
    "- LoRA Generation: 6.002 tokens-per-sec\n",
    "- Fused Model Generation: 17.849 tokens-per-sec\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Loading the Fused Model from Huggingface\n",
    "\n",
    "Let's download the uploaded model from Huggingface and run inference with it, just to make sure it uploaded correctly.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "7fe853926ac141318a899977b584dfe5",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Fetching 10 files:   0%|          | 0/10 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "==========\n",
      "Prompt: <bos><start_of_turn>user\n",
      "## Instructions\n",
      "You are a master in the field of the esoteric, occult, Reappropriated Goddess and Education. You are a writer of tests, challenges, books and deep knowledge on Reappropriated Goddess for initiates and students to gain deep insights and understanding from. You write answers to questions posed in long, explanatory ways and always explain the full context of your answer (i.e., related concepts, formulas, examples, or history), as well as the step-by-step thinking process you take to answer the challenges. Be rigorous and thorough, and summarize the key themes, ideas, and conclusions at the end.\n",
      "## User\n",
      "Why is the sky blue?<end_of_turn>\n",
      "<start_of_turn>model\n",
      "\n",
      "The question of \"Why is the sky blue?\" is not related to the topic of Reappropriated Goddess. It is a question of science and physics. The answer to this question involves the scientific process of scattering of light.\n",
      "\n",
      "The sky appears blue because of a phenomenon called Rayleigh scattering. In this process, the shorter wavelengths of the blue light in the sun are scattered more effectively by the particles of air than the longer wavelengths of the red and yellow light. This scattered light is scattered in all directions, but the majority of the scattered light is scattered in the direction of the observer's eyes. As a result, the sky appears blue to the human eye.\n",
      "\n",
      "The color of the sky is not constant and changes throughout the day and the year. The sky is at its bluest at noon and the color of the sky is at its reddest at sunset and sunrise. The color of the sky is also different in different countries and at different times of the day.\n",
      "\n",
      "The color of the sky is a result of a number of different factors, including the time of day, the time of the year, the country in which you are located, and the presence of clouds and other atmospheric factors. The color of the sky is a complex and fascinating subject that has been studied by scientists and artists for centuries.\n",
      "\n",
      "In summary, the sky appears blue because of the scattering of light by particles in the air. The color of the sky is at its bluest at noon and the color of the sky is at its reddest at sunset and sunrise. The color of the sky is also different in different countries and at different times of the day. The color of the sky is a result of a number of different factors, including the time of day, the time of the year, the country in which you are located, and the presence of clouds and other atmospheric factors.<end_of_turn>\n",
      "==========\n",
      "Prompt: 298.136 tokens-per-sec\n",
      "Generation: 17.980 tokens-per-sec\n"
     ]
    }
   ],
   "source": [
    "from mlx_lm import generate, load\n",
    "\n",
    "model_, tokenizer_ = load(\"alexweberk/gemma-7b-it-trismegistus\")\n",
    "response = generate(\n",
    "    model_,\n",
    "    tokenizer_,\n",
    "    prompt=format_prompt(system_prompt, question),\n",
    "    verbose=True,  # Set to True to see the prompt and response\n",
    "    temp=0.0,\n",
    "    max_tokens=512,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can also run it using `transformers` directly, although without the benefit of utilizing MLX/Apple Silicon to the fullest.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "c939bebdfeb14b7ebc3c6d0f5553a499",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<bos><bos><start_of_turn>user\n",
      "## Instructions\n",
      "You are a master in the field of the esoteric, occult, Reappropriated Goddess and Education. You are a writer of tests, challenges, books and deep knowledge on Reappropriated Goddess for initiates and students to gain deep insights and understanding from. You write answers to questions posed in long, explanatory ways and always explain the full context of your answer (i.e., related concepts, formulas, examples, or history), as well as the step-by-step thinking process you take to answer the challenges. Be rigorous and thorough, and summarize the key themes, ideas, and conclusions at the end.\n",
      "## User\n",
      "Why is the sky blue?<end_of_turn>\n",
      "<start_of_turn>model\n",
      "The question of \"why is the sky blue?\" is a complex one that requires a multifaceted answer. To fully understand this question, we must first delve into the scientific, philosophical, and esoteric aspects of the topic.\n",
      "\n",
      "Scientifically, the sky appears blue due to a phenomenon called scattering of light. When sunlight hits the Earth's atmosphere, the particles of air scatter the different wavelengths of light in different directions. Shorter wavelengths, such as blue and violet, are scattered more effectively than longer wavelengths, such as red and orange. This scattered light is then dispersed in all directions, and our eyes perceive it as the color of the sky.\n",
      "\n",
      "Philosophically, the sky's color can be seen as a metaphor for the interconnectedness of all things. The sky is vast and boundless, and it is constantly changing. It is a source of inspiration and awe for humans, and it has been a subject of much philosophical debate.\n",
      "\n",
      "Esoterically, the sky can be seen as a symbol of the divine or the divine feminine. In many cultures, the sky is associated with the goddess of wisdom, creation, and fertility. In ancient Greek mythology, the sky is associated with the goddess Athena, and in ancient Egyptian mythology, the sky is associated with the goddess Isis.\n"
     ]
    }
   ],
   "source": [
    "from transformers import AutoModelForCausalLM, AutoTokenizer\n",
    "\n",
    "# repo_id = \"google/gemma-7b-it\"\n",
    "repo_id = \"alexweberk/gemma-7b-it-trismegistus\"\n",
    "\n",
    "tokenizer = AutoTokenizer.from_pretrained(repo_id)\n",
    "model = AutoModelForCausalLM.from_pretrained(repo_id)\n",
    "model.to(\"mps\")\n",
    "\n",
    "input_text = format_prompt(system_prompt, question)\n",
    "input_ids = tokenizer(input_text, return_tensors=\"pt\").to(\"mps\")\n",
    "\n",
    "outputs = model.generate(\n",
    "    **input_ids,\n",
    "    max_new_tokens=256,\n",
    ")\n",
    "print(tokenizer.decode(outputs[0]))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Conclusion\n",
    "\n",
    "Hope this was helpful!\n",
    "If you liked this content, please [follow me on Twitter(X)](https://twitter.com/morningcoder).\n",
    "\n",
    "Notebook Gist: https://gist.github.com/alexweberk/635431b5c5773efd6d1755801020429f\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "py311",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}