Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save MichelNivard/eb742eb566075cbab6bd47ca69823bc2 to your computer and use it in GitHub Desktop.
Save MichelNivard/eb742eb566075cbab6bd47ca69823bc2 to your computer and use it in GitHub Desktop.
OpenLlama 3b bnb-4bit-training.ipynb
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/gist/MichelNivard/eb742eb566075cbab6bd47ca69823bc2/openllama-3b-bnb-4bit-training.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "XIyP_0r6zuVc"
},
"source": [
"# QLoRA 4bit quantization of llama as an R tutor (tidyllama)\n",
"\n",
"this notebook is an adaptation of the standard 4-bit QLoRA example notebook.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "FuXIFTFapAMI",
"outputId": "aeea76ec-e5c6-4481-b7fe-b9f876eabedc"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
" Installing build dependencies ... \u001b[?25l\u001b[?25hdone\n",
" Getting requirements to build wheel ... \u001b[?25l\u001b[?25hdone\n",
" Preparing metadata (pyproject.toml) ... \u001b[?25l\u001b[?25hdone\n",
" Installing build dependencies ... \u001b[?25l\u001b[?25hdone\n",
" Getting requirements to build wheel ... \u001b[?25l\u001b[?25hdone\n",
" Preparing metadata (pyproject.toml) ... \u001b[?25l\u001b[?25hdone\n",
" Installing build dependencies ... \u001b[?25l\u001b[?25hdone\n",
" Getting requirements to build wheel ... \u001b[?25l\u001b[?25hdone\n",
" Preparing metadata (pyproject.toml) ... \u001b[?25l\u001b[?25hdone\n"
]
}
],
"source": [
"!pip install -q -U bitsandbytes\n",
"!pip install -q -U git+https://github.com/huggingface/transformers.git \n",
"!pip install -q -U git+https://github.com/huggingface/peft.git\n",
"!pip install -q -U git+https://github.com/huggingface/accelerate.git\n",
"!pip install -q datasets\n",
"!pip install -q sentencepiece\n",
"!pip install -q huggingface_hub"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "izhhaZheH4QK",
"outputId": "ac5c3bc7-00d8-40c8-efdb-760546c26915"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Token will not been saved to git credential helper. Pass `add_to_git_credential=True` if you want to set the git credential as well.\n",
"Token is valid (permission: write).\n",
"Your token has been saved to /root/.cache/huggingface/token\n",
"Login successful\n"
]
}
],
"source": [
"from huggingface_hub import login\n",
"login(token = \"####\")\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "MJ-5idQwzvg-"
},
"source": [
"First let's load the model I swapped in Llama 3b"
]
},
{
"cell_type": "code",
"source": [],
"metadata": {
"id": "qiGT5bF6I3eA"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [],
"metadata": {
"id": "U7MVnFsQI3Nm"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "E0Nl5mWL0k2T"
},
"outputs": [],
"source": [
"import torch\n",
"from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig,LlamaForCausalLM, LlamaTokenizer\n",
"\n",
"model_id = \"openlm-research/open_llama_3b\"\n",
"bnb_config = BitsAndBytesConfig(\n",
" load_in_4bit=True,\n",
" bnb_4bit_use_double_quant=True,\n",
" bnb_4bit_quant_type=\"nf4\",\n",
" bnb_4bit_compute_dtype=torch.bfloat16\n",
")\n",
"\n",
"tokenizer = LlamaTokenizer.from_pretrained('openlm-research/open_llama_3b')\n",
"model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config, device_map={\"\":0})"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Mp2gMi1ZzGET"
},
"source": [
"Then we have to apply some preprocessing to the model to prepare it for training. For that use the `prepare_model_for_kbit_training` method from PEFT."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "a9EUEDAl0ss3"
},
"outputs": [],
"source": [
"from peft import prepare_model_for_kbit_training\n",
"\n",
"model.gradient_checkpointing_enable()\n",
"model = prepare_model_for_kbit_training(model)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "gkIcwsSU01EB"
},
"outputs": [],
"source": [
"def print_trainable_parameters(model):\n",
" \"\"\"\n",
" Prints the number of trainable parameters in the model.\n",
" \"\"\"\n",
" trainable_params = 0\n",
" all_param = 0\n",
" for _, param in model.named_parameters():\n",
" all_param += param.numel()\n",
" if param.requires_grad:\n",
" trainable_params += param.numel()\n",
" print(\n",
" f\"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param}\"\n",
" )"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "Ybeyl20n3dYH",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "f2ce6dcf-2e22-4577-bdaa-25292d07b318"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"trainable params: 2662400 || all params: 1818384000 || trainable%: 0.1464157185720948\n"
]
}
],
"source": [
"from peft import LoraConfig, get_peft_model\n",
"\n",
"config = LoraConfig(\n",
" r=8, \n",
" lora_alpha=32, \n",
" target_modules=[\"q_proj\", \"v_proj\"], \n",
" lora_dropout=0.05, \n",
" bias=\"none\", \n",
" task_type=\"CAUSAL_LM\"\n",
")\n",
"\n",
"model = get_peft_model(model, config)\n",
"print_trainable_parameters(model)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "FCc64bfnmd3j"
},
"source": [
"Let's load the dataset"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "s6f4z8EYmcJ6",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 120,
"referenced_widgets": [
"a436d872e8f3451c9c762f4e6a4b08b2",
"a5a742c272484bf8909b6d685398f9bd",
"fcb4f7e3371d4112876415b94e0d665d",
"612005326bfe4f39801daa8cc6a8c2ab",
"c770ba2a71f5475c97d65b22c2e912da",
"c19614c2b89d42408b65be0fadecf9fe",
"5e15a7f671a341dabf96978460e3626a",
"c2bf24971c8c44ecb2f24a7c37e3ef3d",
"2474949366634c998e6069192780611a",
"dc1dc5709c174b5583eaf6c9fa2b69e5",
"ddc99c20c36440fa97caecfba7e2179a"
]
},
"outputId": "e9d7d53e-a2ee-461e-dba7-31b28b5c0f5c"
},
"outputs": [
{
"output_type": "stream",
"name": "stderr",
"text": [
"WARNING:datasets.builder:Found cached dataset json (/root/.cache/huggingface/datasets/json/default-626c46e6070edf0e/0.0.0/e347ab1c932092252e717ff3f949105a4dd28b27e842dd53157d2f72e276c2e4)\n"
]
},
{
"output_type": "display_data",
"data": {
"text/plain": [
" 0%| | 0/1 [00:00<?, ?it/s]"
],
"application/vnd.jupyter.widget-view+json": {
"version_major": 2,
"version_minor": 0,
"model_id": "a436d872e8f3451c9c762f4e6a4b08b2"
}
},
"metadata": {}
},
{
"output_type": "stream",
"name": "stderr",
"text": [
"WARNING:datasets.arrow_dataset:Loading cached shuffled indices for dataset at /root/.cache/huggingface/datasets/json/default-626c46e6070edf0e/0.0.0/e347ab1c932092252e717ff3f949105a4dd28b27e842dd53157d2f72e276c2e4/cache-48ff702d803d163a.arrow\n",
"WARNING:datasets.arrow_dataset:Loading cached processed dataset at /root/.cache/huggingface/datasets/json/default-626c46e6070edf0e/0.0.0/e347ab1c932092252e717ff3f949105a4dd28b27e842dd53157d2f72e276c2e4/cache-6fe01e90ab42d98a.arrow\n"
]
}
],
"source": [
"from datasets import load_dataset, concatenate_datasets\n",
"\n",
"dataset = load_dataset(\"json\", data_dir=\"training_data\")\n",
"#ds2 = load_dataset(\"json\", data_file=\"training_data/training_data.json\")\n",
"\n",
"\n",
"\n",
"dataset = dataset.shuffle(seed=42)\n",
"\n",
"data = dataset.map(lambda samples: tokenizer(samples['text'],max_length=2048, truncation=True), batched=True)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "_0MOtwf3zdZp"
},
"source": [
"Run the cell below to run the training! For the sake of the demo, we just ran it for few steps just to showcase how to use this integration with existing tools on the HF ecosystem."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "jq0nX33BmfaC",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 368
},
"outputId": "21b24942-100e-489b-e2cb-3cedad576e1f"
},
"outputs": [
{
"output_type": "error",
"ename": "ModuleNotFoundError",
"evalue": "ignored",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mModuleNotFoundError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m<ipython-input-1-dc23b3bc1c52>\u001b[0m in \u001b[0;36m<cell line: 1>\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0;32mimport\u001b[0m \u001b[0mtransformers\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 2\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 3\u001b[0m \u001b[0;31m# needed for gpt-neo-x tokenizer\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 4\u001b[0m \u001b[0;31m# tokenizer.pad_token = tokenizer.eos_token\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 5\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;31mModuleNotFoundError\u001b[0m: No module named 'transformers'",
"",
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0;32m\nNOTE: If your import is failing due to a missing package, you can\nmanually install dependencies using either !pip or !apt.\n\nTo view examples of installing some common dependencies, click the\n\"Open Examples\" button below.\n\u001b[0;31m---------------------------------------------------------------------------\u001b[0m\n"
],
"errorDetails": {
"actions": [
{
"action": "open_url",
"actionText": "Open Examples",
"url": "/notebooks/snippets/importing_libraries.ipynb"
}
]
}
}
],
"source": [
"import transformers\n",
"\n",
"# needed for gpt-neo-x tokenizer\n",
"# tokenizer.pad_token = tokenizer.eos_token\n",
"\n",
"# needed for llama tokenizer\n",
"tokenizer.add_special_tokens({'pad_token': '[PAD]'})\n",
"\n",
"trainer = transformers.Trainer(\n",
" model=model, \n",
" train_dataset=data['train'],\n",
" args=transformers.TrainingArguments(\n",
" resume_from_checkpoint = True,\n",
" per_gpu_train_batch_size=1,\n",
" gradient_accumulation_steps=12,\n",
" warmup_steps=2,\n",
" num_train_epochs=2,\n",
" learning_rate=1e-5,\n",
" fp16=True,\n",
" logging_steps=40,\n",
" save_total_limit=3,\n",
" output_dir=\"./tidyllama_3b_v2\",\n",
" optim=\"paged_adamw_8bit\",\n",
" push_to_hub=True\n",
" ),\n",
" data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False),\n",
")\n",
"model.config.use_cache = False # silence the warnings. Please re-enable for inference!\n",
"\n",
"trainer.train()\n",
"\n",
"model.config.use_cache = True\n",
"model.push_to_hub(\"tidyllama_3b_v1\")\n"
]
}
],
"metadata": {
"accelerator": "GPU",
"colab": {
"machine_shape": "hm",
"provenance": [],
"gpuType": "V100",
"include_colab_link": true
},
"kernelspec": {
"display_name": "Python 3",
"name": "python3"
},
"language_info": {
"name": "python"
},
"widgets": {
"application/vnd.jupyter.widget-state+json": {
"a436d872e8f3451c9c762f4e6a4b08b2": {
"model_module": "@jupyter-widgets/controls",
"model_name": "HBoxModel",
"model_module_version": "1.5.0",
"state": {
"_dom_classes": [],
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "HBoxModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/controls",
"_view_module_version": "1.5.0",
"_view_name": "HBoxView",
"box_style": "",
"children": [
"IPY_MODEL_a5a742c272484bf8909b6d685398f9bd",
"IPY_MODEL_fcb4f7e3371d4112876415b94e0d665d",
"IPY_MODEL_612005326bfe4f39801daa8cc6a8c2ab"
],
"layout": "IPY_MODEL_c770ba2a71f5475c97d65b22c2e912da"
}
},
"a5a742c272484bf8909b6d685398f9bd": {
"model_module": "@jupyter-widgets/controls",
"model_name": "HTMLModel",
"model_module_version": "1.5.0",
"state": {
"_dom_classes": [],
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "HTMLModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/controls",
"_view_module_version": "1.5.0",
"_view_name": "HTMLView",
"description": "",
"description_tooltip": null,
"layout": "IPY_MODEL_c19614c2b89d42408b65be0fadecf9fe",
"placeholder": "​",
"style": "IPY_MODEL_5e15a7f671a341dabf96978460e3626a",
"value": "100%"
}
},
"fcb4f7e3371d4112876415b94e0d665d": {
"model_module": "@jupyter-widgets/controls",
"model_name": "FloatProgressModel",
"model_module_version": "1.5.0",
"state": {
"_dom_classes": [],
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "FloatProgressModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/controls",
"_view_module_version": "1.5.0",
"_view_name": "ProgressView",
"bar_style": "success",
"description": "",
"description_tooltip": null,
"layout": "IPY_MODEL_c2bf24971c8c44ecb2f24a7c37e3ef3d",
"max": 1,
"min": 0,
"orientation": "horizontal",
"style": "IPY_MODEL_2474949366634c998e6069192780611a",
"value": 1
}
},
"612005326bfe4f39801daa8cc6a8c2ab": {
"model_module": "@jupyter-widgets/controls",
"model_name": "HTMLModel",
"model_module_version": "1.5.0",
"state": {
"_dom_classes": [],
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "HTMLModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/controls",
"_view_module_version": "1.5.0",
"_view_name": "HTMLView",
"description": "",
"description_tooltip": null,
"layout": "IPY_MODEL_dc1dc5709c174b5583eaf6c9fa2b69e5",
"placeholder": "​",
"style": "IPY_MODEL_ddc99c20c36440fa97caecfba7e2179a",
"value": " 1/1 [00:00&lt;00:00, 66.10it/s]"
}
},
"c770ba2a71f5475c97d65b22c2e912da": {
"model_module": "@jupyter-widgets/base",
"model_name": "LayoutModel",
"model_module_version": "1.2.0",
"state": {
"_model_module": "@jupyter-widgets/base",
"_model_module_version": "1.2.0",
"_model_name": "LayoutModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "LayoutView",
"align_content": null,
"align_items": null,
"align_self": null,
"border": null,
"bottom": null,
"display": null,
"flex": null,
"flex_flow": null,
"grid_area": null,
"grid_auto_columns": null,
"grid_auto_flow": null,
"grid_auto_rows": null,
"grid_column": null,
"grid_gap": null,
"grid_row": null,
"grid_template_areas": null,
"grid_template_columns": null,
"grid_template_rows": null,
"height": null,
"justify_content": null,
"justify_items": null,
"left": null,
"margin": null,
"max_height": null,
"max_width": null,
"min_height": null,
"min_width": null,
"object_fit": null,
"object_position": null,
"order": null,
"overflow": null,
"overflow_x": null,
"overflow_y": null,
"padding": null,
"right": null,
"top": null,
"visibility": null,
"width": null
}
},
"c19614c2b89d42408b65be0fadecf9fe": {
"model_module": "@jupyter-widgets/base",
"model_name": "LayoutModel",
"model_module_version": "1.2.0",
"state": {
"_model_module": "@jupyter-widgets/base",
"_model_module_version": "1.2.0",
"_model_name": "LayoutModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "LayoutView",
"align_content": null,
"align_items": null,
"align_self": null,
"border": null,
"bottom": null,
"display": null,
"flex": null,
"flex_flow": null,
"grid_area": null,
"grid_auto_columns": null,
"grid_auto_flow": null,
"grid_auto_rows": null,
"grid_column": null,
"grid_gap": null,
"grid_row": null,
"grid_template_areas": null,
"grid_template_columns": null,
"grid_template_rows": null,
"height": null,
"justify_content": null,
"justify_items": null,
"left": null,
"margin": null,
"max_height": null,
"max_width": null,
"min_height": null,
"min_width": null,
"object_fit": null,
"object_position": null,
"order": null,
"overflow": null,
"overflow_x": null,
"overflow_y": null,
"padding": null,
"right": null,
"top": null,
"visibility": null,
"width": null
}
},
"5e15a7f671a341dabf96978460e3626a": {
"model_module": "@jupyter-widgets/controls",
"model_name": "DescriptionStyleModel",
"model_module_version": "1.5.0",
"state": {
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "DescriptionStyleModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "StyleView",
"description_width": ""
}
},
"c2bf24971c8c44ecb2f24a7c37e3ef3d": {
"model_module": "@jupyter-widgets/base",
"model_name": "LayoutModel",
"model_module_version": "1.2.0",
"state": {
"_model_module": "@jupyter-widgets/base",
"_model_module_version": "1.2.0",
"_model_name": "LayoutModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "LayoutView",
"align_content": null,
"align_items": null,
"align_self": null,
"border": null,
"bottom": null,
"display": null,
"flex": null,
"flex_flow": null,
"grid_area": null,
"grid_auto_columns": null,
"grid_auto_flow": null,
"grid_auto_rows": null,
"grid_column": null,
"grid_gap": null,
"grid_row": null,
"grid_template_areas": null,
"grid_template_columns": null,
"grid_template_rows": null,
"height": null,
"justify_content": null,
"justify_items": null,
"left": null,
"margin": null,
"max_height": null,
"max_width": null,
"min_height": null,
"min_width": null,
"object_fit": null,
"object_position": null,
"order": null,
"overflow": null,
"overflow_x": null,
"overflow_y": null,
"padding": null,
"right": null,
"top": null,
"visibility": null,
"width": null
}
},
"2474949366634c998e6069192780611a": {
"model_module": "@jupyter-widgets/controls",
"model_name": "ProgressStyleModel",
"model_module_version": "1.5.0",
"state": {
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "ProgressStyleModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "StyleView",
"bar_color": null,
"description_width": ""
}
},
"dc1dc5709c174b5583eaf6c9fa2b69e5": {
"model_module": "@jupyter-widgets/base",
"model_name": "LayoutModel",
"model_module_version": "1.2.0",
"state": {
"_model_module": "@jupyter-widgets/base",
"_model_module_version": "1.2.0",
"_model_name": "LayoutModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "LayoutView",
"align_content": null,
"align_items": null,
"align_self": null,
"border": null,
"bottom": null,
"display": null,
"flex": null,
"flex_flow": null,
"grid_area": null,
"grid_auto_columns": null,
"grid_auto_flow": null,
"grid_auto_rows": null,
"grid_column": null,
"grid_gap": null,
"grid_row": null,
"grid_template_areas": null,
"grid_template_columns": null,
"grid_template_rows": null,
"height": null,
"justify_content": null,
"justify_items": null,
"left": null,
"margin": null,
"max_height": null,
"max_width": null,
"min_height": null,
"min_width": null,
"object_fit": null,
"object_position": null,
"order": null,
"overflow": null,
"overflow_x": null,
"overflow_y": null,
"padding": null,
"right": null,
"top": null,
"visibility": null,
"width": null
}
},
"ddc99c20c36440fa97caecfba7e2179a": {
"model_module": "@jupyter-widgets/controls",
"model_name": "DescriptionStyleModel",
"model_module_version": "1.5.0",
"state": {
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "DescriptionStyleModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "StyleView",
"description_width": ""
}
}
}
}
},
"nbformat": 4,
"nbformat_minor": 0
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment