pszemraj/gpt-j-6b-testing.ipynb

## gpt-j-6b-testing.ipynb
{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "name": "GPT-J 6B - Testing",
      "provenance": [],
      "collapsed_sections": [
        "30padl1SlLEb"
      ],
      "machine_shape": "hm",
      "include_colab_link": true
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    },
    "accelerator": "GPU"
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "view-in-github",
        "colab_type": "text"
      },
      "source": [
        "<a href=\"https://colab.research.google.com/gist/pszemraj/c914ec118494ff6b21e8f2779d655e21/gpt-j-6b-testing.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "p2oDvCMLJ47F"
      },
      "source": [
        "# GPT-J 6B - Testing\n",
        "\n",
        "Created: Sept 27, 21\n",
        "\n",
        "---\n",
        "\n",
        "## Goal\n",
        "\n",
        "- see if GPT-J 6B can work on Colab on GPU with _transformers_ pipeline object to respond to text prompts\n",
        "\n",
        "Results: it can! but with massive resources... at least thus far\n",
        "\n",
        "## Model Desc + Details \n",
        "\n",
        "- you can find details on the huggingface page [here](https://huggingface.co/EleutherAI/gpt-j-6B), some are directly pasted below\n",
        "\n",
        "---\n",
        "\n",
        "GPT-J 6B is a transformer model trained using Ben Wang's Mesh Transformer JAX. \"GPT-J\" refers to the class of model, while \"6B\" represents the number of trainable parameters.\n",
        "\n",
        "> The model consists of 28 layers with a model dimension of 4096, and a feedforward dimension of 16384. The model dimension is split into 16 heads, each with a dimension of 256. Rotary Position Embedding (RoPE) is applied to 64 dimensions of each head. The model is trained with a tokenization vocabulary of 50257, using the same set of BPEs as GPT-2/GPT-3.\n",
        "\n",
        "Training data\n",
        "\n",
        "> GPT-J 6B was trained on the Pile, a large-scale curated dataset created by EleutherAI.\n",
        "\n",
        "Training procedure\n",
        "\n",
        "> This model was trained for 402 billion tokens over 383,500 steps on TPU v3-256 pod. It was trained as an autoregressive language model, using cross-entropy loss to maximize the likelihood of predicting the next token correctly.\n",
        "\n",
        "Intended Use and Limitations\n",
        "\n",
        "> GPT-J learns an inner representation of the English language that can be used to extract features useful for downstream tasks. The model is best at what it was pretrained for however, which is generating text from a prompt.\n",
        "\n",
        "---\n",
        "\n",
        "Quick notebook made by [Peter Szemraj](https://github.com/pszemraj)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "wvN1aZEnlHs4"
      },
      "source": [
        "# Setup"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "30padl1SlLEb"
      },
      "source": [
        "## make colab outputs nice"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "xZR9qDLClJ5N"
      },
      "source": [
        "from IPython.display import HTML, display\n",
        "def set_css():\n",
        "  display(HTML('''\n",
        "  <style>\n",
        "    pre {\n",
        "        white-space: pre-wrap;\n",
        "    }\n",
        "  </style>\n",
        "  '''))\n",
        "get_ipython().events.register('pre_run_cell', set_css)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "RbCZIEqhlORX"
      },
      "source": [
        "## install and load model"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "lIYdn1woOS1n",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 17
        },
        "outputId": "7ce0c878-16a0-4e02-e69c-5fd1bfde3aec"
      },
      "source": [
        "%%capture\n",
        "!pip install -U transformers"
      ],
      "execution_count": null,
      "outputs": [
        {
          "output_type": "display_data",
          "data": {
            "text/html": [
              "\n",
              "  <style>\n",
              "    pre {\n",
              "        white-space: pre-wrap;\n",
              "    }\n",
              "  </style>\n",
              "  "
            ],
            "text/plain": [
              "<IPython.core.display.HTML object>"
            ]
          },
          "metadata": {}
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "JAgHJPt1PxaU"
      },
      "source": [
        "update pytorch \n",
        "- commands to [update torch to work with a100 GPU](https://pytorch.org/get-started/locally/)\n",
        "- found after researching errors and used [this thread](https://forums.fast.ai/t/notes-on-using-nvidia-a100-40gb/89894)\n",
        "\n"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 138
        },
        "id": "mGQxfymsN1Uf",
        "outputId": "95cab9b1-0ebd-409f-8c85-cddf0a3dae12"
      },
      "source": [
        "!pip3 install torch==1.9.1+cu111 torchvision==0.10.1+cu111 torchaudio==0.9.1 -f https://download.pytorch.org/whl/torch_stable.html"
      ],
      "execution_count": null,
      "outputs": [
        {
          "output_type": "display_data",
          "data": {
            "text/html": [
              "\n",
              "  <style>\n",
              "    pre {\n",
              "        white-space: pre-wrap;\n",
              "    }\n",
              "  </style>\n",
              "  "
            ],
            "text/plain": [
              "<IPython.core.display.HTML object>"
            ]
          },
          "metadata": {}
        },
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Looking in links: https://download.pytorch.org/whl/torch_stable.html\n",
            "Requirement already satisfied: torch==1.9.1+cu111 in /usr/local/lib/python3.7/dist-packages (1.9.1+cu111)\n",
            "Requirement already satisfied: torchvision==0.10.1+cu111 in /usr/local/lib/python3.7/dist-packages (0.10.1+cu111)\n",
            "Requirement already satisfied: torchaudio==0.9.1 in /usr/local/lib/python3.7/dist-packages (0.9.1)\n",
            "Requirement already satisfied: typing-extensions in /usr/local/lib/python3.7/dist-packages (from torch==1.9.1+cu111) (3.7.4.3)\n",
            "Requirement already satisfied: pillow>=5.3.0 in /usr/local/lib/python3.7/dist-packages (from torchvision==0.10.1+cu111) (7.1.2)\n",
            "Requirement already satisfied: numpy in /usr/local/lib/python3.7/dist-packages (from torchvision==0.10.1+cu111) (1.19.5)\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "IVjTEZPHScbb"
      },
      "source": [
        "<font color=\"orange\"> Colab should prompt you to restart the runtime here, to use the newly installed version of torch </font>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "pGddMTc2pH8B"
      },
      "source": [
        "### check gpu power level"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "W5PtcaLWpLCR",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 364
        },
        "outputId": "039bd763-d04b-4163-eb8d-c03185d3febe"
      },
      "source": [
        "!nvidia-smi\n",
        "# printed device ID is relevant for running on GPU"
      ],
      "execution_count": null,
      "outputs": [
        {
          "output_type": "display_data",
          "data": {
            "text/html": [
              "\n",
              "  <style>\n",
              "    pre {\n",
              "        white-space: pre-wrap;\n",
              "    }\n",
              "  </style>\n",
              "  "
            ],
            "text/plain": [
              "<IPython.core.display.HTML object>"
            ]
          },
          "metadata": {}
        },
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Mon Sep 27 21:53:57 2021       \n",
            "+-----------------------------------------------------------------------------+\n",
            "| NVIDIA-SMI 470.63.01    Driver Version: 460.32.03    CUDA Version: 11.2     |\n",
            "|-------------------------------+----------------------+----------------------+\n",
            "| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |\n",
            "| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |\n",
            "|                               |                      |               MIG M. |\n",
            "|===============================+======================+======================|\n",
            "|   0  A100-SXM4-40GB      Off  | 00000000:00:04.0 Off |                    0 |\n",
            "| N/A   32C    P0    44W / 400W |      0MiB / 40536MiB |      0%      Default |\n",
            "|                               |                      |             Disabled |\n",
            "+-------------------------------+----------------------+----------------------+\n",
            "                                                                               \n",
            "+-----------------------------------------------------------------------------+\n",
            "| Processes:                                                                  |\n",
            "|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |\n",
            "|        ID   ID                                                   Usage      |\n",
            "|=============================================================================|\n",
            "|  No running processes found                                                 |\n",
            "+-----------------------------------------------------------------------------+\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 34
        },
        "id": "3jNDjXhdQY5v",
        "outputId": "54b08bcc-c431-4610-ad44-22b70b67a140"
      },
      "source": [
        "import torch\n",
        "\n",
        "torch.cuda.is_available()"
      ],
      "execution_count": null,
      "outputs": [
        {
          "output_type": "display_data",
          "data": {
            "text/html": [
              "\n",
              "  <style>\n",
              "    pre {\n",
              "        white-space: pre-wrap;\n",
              "    }\n",
              "  </style>\n",
              "  "
            ],
            "text/plain": [
              "<IPython.core.display.HTML object>"
            ]
          },
          "metadata": {}
        },
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "True"
            ]
          },
          "metadata": {},
          "execution_count": 11
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "aobRECAVQmiK"
      },
      "source": [
        "### check CPU power level"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 34
        },
        "id": "EofOesfDRH2m",
        "outputId": "8fec0dcb-7a55-4268-cf4c-b7e0e7189d34"
      },
      "source": [
        "from psutil import virtual_memory\n",
        "import os\n",
        "ram_gb = round(virtual_memory().total / (1024**3), 1)\n",
        "print('Runtime has {} gigs of memory and {} processors'.format(ram_gb,\n",
        "      os.cpu_count()))\n"
      ],
      "execution_count": null,
      "outputs": [
        {
          "output_type": "display_data",
          "data": {
            "text/html": [
              "\n",
              "  <style>\n",
              "    pre {\n",
              "        white-space: pre-wrap;\n",
              "    }\n",
              "  </style>\n",
              "  "
            ],
            "text/plain": [
              "<IPython.core.display.HTML object>"
            ]
          },
          "metadata": {}
        },
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Runtime has 83.5 gigs of memory and 12 processors\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Jxgq3K9NRK7H"
      },
      "source": [
        "### DL model, load into Transformers Pipeline"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "bD3-aCFdaYaW"
      },
      "source": [
        "details on how to configure a pipeline are [here](https://huggingface.co/transformers/main_classes/pipelines.html)"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "Oh_TRtuwJtId",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 17
        },
        "outputId": "126c19e7-ba88-48ed-ba7b-e029e9fc172b"
      },
      "source": [
        "from transformers import pipeline\n",
        "import pprint as pp"
      ],
      "execution_count": null,
      "outputs": [
        {
          "output_type": "display_data",
          "data": {
            "text/html": [
              "\n",
              "  <style>\n",
              "    pre {\n",
              "        white-space: pre-wrap;\n",
              "    }\n",
              "  </style>\n",
              "  "
            ],
            "text/plain": [
              "<IPython.core.display.HTML object>"
            ]
          },
          "metadata": {}
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 86
        },
        "id": "XsdRquFrLf4G",
        "outputId": "b1638a7a-0fe5-4c7e-b1b1-321cb77e3472"
      },
      "source": [
        "from transformers import AutoTokenizer, AutoModelForCausalLM\n",
        "\n",
        "tokenizer = AutoTokenizer.from_pretrained(\"EleutherAI/gpt-j-6B\")\n",
        "model = AutoModelForCausalLM.from_pretrained(\"EleutherAI/gpt-j-6B\")"
      ],
      "execution_count": null,
      "outputs": [
        {
          "output_type": "display_data",
          "data": {
            "text/html": [
              "\n",
              "  <style>\n",
              "    pre {\n",
              "        white-space: pre-wrap;\n",
              "    }\n",
              "  </style>\n",
              "  "
            ],
            "text/plain": [
              "<IPython.core.display.HTML object>"
            ]
          },
          "metadata": {}
        },
        {
          "output_type": "stream",
          "name": "stderr",
          "text": [
            "/usr/local/lib/python3.7/dist-packages/transformers/configuration_utils.py:337: UserWarning: Passing `gradient_checkpointing` to a config initialization is deprecated and will be removed in v5 Transformers. Using `model.gradient_checkpointing_enable()` instead, or if you are using the `Trainer` API, pass `gradient_checkpointing=True` in your `TrainingArguments`.\n",
            "  \"Passing `gradient_checkpointing` to a config initialization is deprecated and will be removed in v5 \"\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 17
        },
        "id": "VWK8EuibLmao",
        "outputId": "a8fb70eb-7085-4b28-96c2-d8efba59f057"
      },
      "source": [
        "generator = pipeline('text-generation', \n",
        "                     model=model, tokenizer=tokenizer,\n",
        "                     device=0)"
      ],
      "execution_count": null,
      "outputs": [
        {
          "output_type": "display_data",
          "data": {
            "text/html": [
              "\n",
              "  <style>\n",
              "    pre {\n",
              "        white-space: pre-wrap;\n",
              "    }\n",
              "  </style>\n",
              "  "
            ],
            "text/plain": [
              "<IPython.core.display.HTML object>"
            ]
          },
          "metadata": {}
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "hBNWkpwqlQim"
      },
      "source": [
        "# Test Model"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "GG-J6gc6cGkC"
      },
      "source": [
        "normal test"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "RVAL4lgzKHts",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 69
        },
        "outputId": "bc9c09a5-ef84-444c-b2ad-3fc17463685b"
      },
      "source": [
        "generator(\"Nikola Tesla was born on\", do_sample=True, min_length=50)"
      ],
      "execution_count": null,
      "outputs": [
        {
          "output_type": "display_data",
          "data": {
            "text/html": [
              "\n",
              "  <style>\n",
              "    pre {\n",
              "        white-space: pre-wrap;\n",
              "    }\n",
              "  </style>\n",
              "  "
            ],
            "text/plain": [
              "<IPython.core.display.HTML object>"
            ]
          },
          "metadata": {}
        },
        {
          "output_type": "stream",
          "name": "stderr",
          "text": [
            "Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.\n"
          ]
        },
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "[{'generated_text': 'Nikola Tesla was born on July 10, 1856, in Smiljan, a small village in the then Austro-Hungarian Empire (now Croatia). [1] He was the youngest of four children, of whom two died in infancy'}]"
            ]
          },
          "metadata": {},
          "execution_count": 8
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "spk8MdPF3Z0m"
      },
      "source": [
        "specific question request "
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 1000
        },
        "id": "3NWHFu5J2u2j",
        "outputId": "5fd31f81-abd4-4d6f-f1ac-866212e421de"
      },
      "source": [
        "import pprint as pp\n",
        "\n",
        "prompt1 = \"most important qualities for entering the data science research field:\"\n",
        "\n",
        "response1 = generator(prompt1, do_sample=True, min_length=100, \n",
        "                      max_length=1000, clean_up_tokenization_spaces=True,\n",
        "                      return_full_text=True)\n",
        "print(\"Prompt: \\n\")\n",
        "pp.pprint(prompt1)\n",
        "print(\"\\nResponse: \\n\")\n",
        "pp.pprint(response1)"
      ],
      "execution_count": null,
      "outputs": [
        {
          "output_type": "display_data",
          "data": {
            "text/html": [
              "\n",
              "  <style>\n",
              "    pre {\n",
              "        white-space: pre-wrap;\n",
              "    }\n",
              "  </style>\n",
              "  "
            ],
            "text/plain": [
              "<IPython.core.display.HTML object>"
            ]
          },
          "metadata": {}
        },
        {
          "output_type": "stream",
          "name": "stderr",
          "text": [
            "Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.\n"
          ]
        },
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Prompt: \n",
            "\n",
            "'most important qualities for entering the data science research field:'\n",
            "\n",
            "Response: \n",
            "\n",
            "[{'generated_text': 'most important qualities for entering the data science '\n",
            "                    'research field:** Data scientists must have good command '\n",
            "                    'of the statistical and machine learning toolbox. **Data '\n",
            "                    'scientists must have high tolerance for uncertainty '\n",
            "                    '**(high tolerance for error)**, because even though they '\n",
            "                    'may have advanced degrees and a strong record of '\n",
            "                    'accomplishment, they will be operating with less '\n",
            "                    'certainty than the classical methods they are replacing. '\n",
            "                    'In data science, the **\"answer\"** is usually the last '\n",
            "                    'thing we calculate because the answer is subject to so '\n",
            "                    'many random factors. The **\"output\"** of data science can '\n",
            "                    'be a predictive probability, a regression model, a '\n",
            "                    'pattern, or it can be a \"story\" about the relationship '\n",
            "                    'between one or more independent variables and a dependent '\n",
            "                    'or \"response\" variable.\\n'\n",
            "                    '\\n'\n",
            "                    '### The Value of Knowledge in Data Science\\n'\n",
            "                    '\\n'\n",
            "                    'The value of data science is derived from knowledge of '\n",
            "                    'its methodology and implementation. However, knowledge '\n",
            "                    'also has its drawbacks. For example, if I do not have '\n",
            "                    'data, I am not able to do data analysis. If in an '\n",
            "                    'organization I do not have access to the data, I cannot '\n",
            "                    'do data analysis. It is important to establish the data '\n",
            "                    'science methodology and how to access the data. In '\n",
            "                    'contrast, the drawbacks of not having the methodology in '\n",
            "                    'place can be fatal. The methodology to build and execute '\n",
            "                    'a model must be in place. In that sense, the methodology '\n",
            "                    'is essential.\\n'\n",
            "                    '\\n'\n",
            "                    '### Summary\\n'\n",
            "                    '\\n'\n",
            "                    'In this chapter we have tried to provide a broad overview '\n",
            "                    'of data science based on the characteristics of the data '\n",
            "                    'we collect, store, and analyze. Data science encompasses '\n",
            "                    'different types of methodologies to process and analyze '\n",
            "                    'data. Data science is the combination of different tools '\n",
            "                    'and methodologies (i.e., statistical analysis, machine '\n",
            "                    'learning, predictive analytics, data mining, exploratory '\n",
            "                    'analysis, visualization, and programming). Data science '\n",
            "                    'is a method for discovering patterns in data, such that '\n",
            "                    'it becomes possible to predict future behavior of humans '\n",
            "                    'or other systems **(data-driven design)** and to explain '\n",
            "                    'what has happened in the past. If it is done well, data '\n",
            "                    'science produces a meaningful statistical model, which '\n",
            "                    'captures the nature of an outcome. Data science has '\n",
            "                    'become increasingly popular in the past few years, as the '\n",
            "                    'amount of information we generate about customers via a '\n",
            "                    'variety of technologies increases at an exponential '\n",
            "                    'rate.\\n'\n",
            "                    '\\n'\n",
            "                    '### Coda\\n'\n",
            "                    '\\n'\n",
            "                    'We hope that you have enjoyed spending a day with data '\n",
            "                    'science. When we began, we expected people would enjoy '\n",
            "                    'the topic, but we did not necessarily expect for them to '\n",
            "                    'become obsessed with it. Indeed, in the past several '\n",
            "                    'years, it seems like the field is in a bit of a frenzy. '\n",
            "                    'Some call this a \"bubble,\" but what matters is that data '\n",
            "                    'sciences is gaining momentum. There are more job openings '\n",
            "                    'in the market than there are qualified applicants. Data '\n",
            "                    'science jobs are going viral (i.e., Facebook, Yelp, and '\n",
            "                    'Netflix are offering data science projects). Even though '\n",
            "                    'the field is in a bit of a bubble, we have faith about '\n",
            "                    'its future. We hope that in your everyday activities you '\n",
            "                    'may use data science ideas to make your work easier as '\n",
            "                    'well as better and faster. It is important to remember '\n",
            "                    'that data is everywhere, including the data science job '\n",
            "                    'postings.\\n'\n",
            "                    '\\n'\n",
            "                    '## Chapter 9. The Business of Data Products\\n'\n",
            "                    '\\n'\n",
            "                    '> \" **There comes a time when every science must move '\n",
            "                    'from the lab into the marketplace, and that time has come '\n",
            "                    'for data science.** \"\\n'\n",
            "                    '\\n'\n",
            "                    'This point of the book is about productizing the data '\n",
            "                    'science methodology and tooling (e.g., R, R Shiny and '\n",
            "                    'dplyr, etc.). Although we have tried to provide a more '\n",
            "                    'extensive version of the methodology in chapters 1, 2, '\n",
            "                    'and 4, when we began writing code in chapters 5,  and, we '\n",
            "                    'did not have the methodology worked out. In addition, we '\n",
            "                    'still lacked data science examples to show the '\n",
            "                    'implementation of these tools (especially dplyr), which '\n",
            "                    'makes it unclear how these tools fit into the '\n",
            "                    'methodology. As a result, we had to write chapters 7,  '\n",
            "                    'and 8 using the data science examples provided in chapter '\n",
            "                    '6, but the methodology was not fully established and '\n",
            "                    'developed. In other words, we were trying to build the '\n",
            "                    'data science house before the foundation was set, while '\n",
            "                    'also implementing the tools. In this chapter, we will go '\n",
            "                    'through one of the steps of the process of developing a '\n",
            "                    'data science package that can be used by anyone in a '\n",
            "                    'typical enterprise, including business, academic, and '\n",
            "                    'government. The focus of this chapter will be how to '\n",
            "                    'develop an effective product or package, and the lessons '\n",
            "                    'we learned along the way are more likely to be useful.\\n'\n",
            "                    '\\n'\n",
            "                    'This step can be broken down into two components: the '\n",
            "                    'process for developing software products, as well as the '\n",
            "                    'development of a data science platform. The goal here is '\n",
            "                    'to provide a step-by-step process to make it easy for the '\n",
            "                    'reader to achieve high success rates in data science.\\n'\n",
            "                    '\\n'\n",
            "                    '### Note\\n'\n",
            "                    '\\n'\n",
            "                    'This chapter may be the most'}]\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "RNx-dWkhah-2"
      },
      "source": [
        "idea gen"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 1000
        },
        "id": "2aj6vE4KZB2F",
        "outputId": "0dfc1acb-991b-436e-bb8d-ea3da6a52c6e"
      },
      "source": [
        "prompt2 = \"ideas to increase user engagement and participation in a university analytics club:\"\n",
        "response_2 = generator(prompt2, do_sample=True, min_length=100, max_length=1000,\n",
        "                       clean_up_tokenization_spaces=True,\n",
        "                       return_full_text=True)\n",
        "print(\"Prompt: \\n\")\n",
        "pp.pprint(prompt2)\n",
        "print(\"\\nResponse: \\n\")\n",
        "pp.pprint(response_2)"
      ],
      "execution_count": null,
      "outputs": [
        {
          "output_type": "display_data",
          "data": {
            "text/html": [
              "\n",
              "  <style>\n",
              "    pre {\n",
              "        white-space: pre-wrap;\n",
              "    }\n",
              "  </style>\n",
              "  "
            ],
            "text/plain": [
              "<IPython.core.display.HTML object>"
            ]
          },
          "metadata": {}
        },
        {
          "output_type": "stream",
          "name": "stderr",
          "text": [
            "Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.\n"
          ]
        },
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Prompt: \n",
            "\n",
            "('ideas to increase user engagement and participation in a university '\n",
            " 'analytics club:')\n",
            "\n",
            "Response: \n",
            "\n",
            "[{'generated_text': 'ideas to increase user engagement and participation in a '\n",
            "                    'university analytics club:\\n'\n",
            "                    '\\n'\n",
            "                    'Evaluating and improving how club members engage with '\n",
            "                    'club website is an area for improvement, which I’ll '\n",
            "                    'discuss in another post.\\n'\n",
            "                    '\\n'\n",
            "                    'Another idea would be to engage “expertise volunteers” to '\n",
            "                    'coach club members. This would require club members to be '\n",
            "                    'self-directed learners. I feel that this also requires a '\n",
            "                    'mindset change.\\n'\n",
            "                    '\\n'\n",
            "                    'Lastly, some club members may have not been active users '\n",
            "                    'of the website, and thus not interacted with its '\n",
            "                    'analytics. This idea would involve getting people who are '\n",
            "                    'not users to participate. This is a more complicated '\n",
            "                    'idea. However, it could be made easier if a group of club '\n",
            "                    'members were tasked with this.\\n'\n",
            "                    '\\n'\n",
            "                    'Idea 3: Build a more engaged community\\n'\n",
            "                    '\\n'\n",
            "                    'I felt that there was a need for our members to express '\n",
            "                    'their ideas and experiences as “they are”.\\n'\n",
            "                    '\\n'\n",
            "                    'This idea is one that will become more important. I '\n",
            "                    'predict in a few years, users who are not academically '\n",
            "                    'inclined or who are not looking for an academic job will '\n",
            "                    'use online platforms to share what they’re doing and '\n",
            "                    'their research findings, just as users do today.\\n'\n",
            "                    '\\n'\n",
            "                    'I believe it’s important to have “more than one way” to '\n",
            "                    'get people to talk about their “research”. This is '\n",
            "                    'especially true today because today’s PhD students do not '\n",
            "                    'work in an ivory tower.\\n'\n",
            "                    '\\n'\n",
            "                    'I believe an online format like our Analytics Club should '\n",
            "                    'be a part of how students collaborate, share, and build '\n",
            "                    'communities. I felt that we are missing out on valuable '\n",
            "                    'ideas by using other social media platforms if we want to '\n",
            "                    'engage in this more.\\n'\n",
            "                    '\\n'\n",
            "                    'Idea 4: Collaboration with a company\\n'\n",
            "                    '\\n'\n",
            "                    'When I first suggested to the Analytics Club this idea, I '\n",
            "                    'wasn’t sure how well it would go, with the general '\n",
            "                    'thinking that “this is the university”. To my surprise, '\n",
            "                    'it was not rejected. The Analytics Club has some ideas '\n",
            "                    'for collaboration with companies, and members were '\n",
            "                    'excited about our idea.\\n'\n",
            "                    '\\n'\n",
            "                    'Idea 5: Invite professionals\\n'\n",
            "                    '\\n'\n",
            "                    'I feel this idea is in its infancy stage. However, it’s '\n",
            "                    'one that I believe should be encouraged.\\n'\n",
            "                    '\\n'\n",
            "                    'The Analytics Club has done a great job at connecting '\n",
            "                    'with people at other universities, such as the University '\n",
            "                    'of Pennsylvania and MIT. To start, we’ve connected with '\n",
            "                    'companies to connect us with professionals from a '\n",
            "                    'different mindset than that of our current users.\\n'\n",
            "                    '\\n'\n",
            "                    'Our club is still growing, and we have some things to '\n",
            "                    'work on before our members truly get what they want and '\n",
            "                    'need from the Analytic Club.\\n'\n",
            "                    '\\n'\n",
            "                    'Idea 6: Evaluate and report\\n'\n",
            "                    '\\n'\n",
            "                    'Ideally, ideas can be tested against each other to '\n",
            "                    'determine which ones work best. The Analytics Club is in '\n",
            "                    'the process of doing this. However, the club has been '\n",
            "                    'slow in being able to use the findings we’ve received in '\n",
            "                    'a real and practical way.\\n'\n",
            "                    '\\n'\n",
            "                    'Our club is only beginning to make its mark. To be fair, '\n",
            "                    'it’s a learning process for members. However, the club '\n",
            "                    'has only been around for six months.\\n'\n",
            "                    '\\n'\n",
            "                    'In Summary\\n'\n",
            "                    '\\n'\n",
            "                    'So there you go!\\n'\n",
            "                    '\\n'\n",
            "                    'Some ideas were more complicated to implement simply '\n",
            "                    'because they were new to our club. Some are pretty '\n",
            "                    'straightforward and we’ve been implementing them for a '\n",
            "                    'while. This is normal for an innovative group of people.\\n'\n",
            "                    '\\n'\n",
            "                    'I appreciate your feedback. Let me know what ideas you '\n",
            "                    'have for the club and I’ll write a future post responding '\n",
            "                    'to your ideas. I hope you enjoy and benefit from the '\n",
            "                    'Analytics Club!'}]\n"
          ]
        }
      ]
    }
  ]
}