Skip to content

Instantly share code, notes, and snippets.

@linkerlin
Created September 10, 2022 04:23
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save linkerlin/4b1f0f221265b02e51611666792d31e4 to your computer and use it in GitHub Desktop.
Save linkerlin/4b1f0f221265b02e51611666792d31e4 to your computer and use it in GitHub Desktop.
封神榜大模型.ipynb
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"provenance": [],
"collapsed_sections": [],
"machine_shape": "hm",
"authorship_tag": "ABX9TyMK+iTIWvrDxWHWBIXKg3Xw",
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
},
"accelerator": "GPU",
"gpuClass": "standard"
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/gist/linkerlin/4b1f0f221265b02e51611666792d31e4/.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"id": "BAwwxIHmb6f1"
},
"outputs": [],
"source": [
"# 封神榜大模型\n",
"# https://github.com/IDEA-CCNL/Fengshenbang-LM\n"
]
},
{
"cell_type": "code",
"source": [
"!pip install transformers\n",
"!pip install datasets\n",
"!pip3 install rouge\n"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "w4wBihz5cKVn",
"outputId": "784bae00-29f9-4f96-eb77-386a5bd878f9"
},
"execution_count": 3,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n",
"Collecting transformers\n",
" Downloading transformers-4.21.3-py3-none-any.whl (4.7 MB)\n",
"\u001b[K |████████████████████████████████| 4.7 MB 15.0 MB/s \n",
"\u001b[?25hRequirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.7/dist-packages (from transformers) (6.0)\n",
"Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.7/dist-packages (from transformers) (2022.6.2)\n",
"Requirement already satisfied: filelock in /usr/local/lib/python3.7/dist-packages (from transformers) (3.8.0)\n",
"Collecting tokenizers!=0.11.3,<0.13,>=0.11.1\n",
" Downloading tokenizers-0.12.1-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (6.6 MB)\n",
"\u001b[K |████████████████████████████████| 6.6 MB 71.4 MB/s \n",
"\u001b[?25hRequirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.7/dist-packages (from transformers) (4.64.0)\n",
"Collecting huggingface-hub<1.0,>=0.1.0\n",
" Downloading huggingface_hub-0.9.1-py3-none-any.whl (120 kB)\n",
"\u001b[K |████████████████████████████████| 120 kB 88.8 MB/s \n",
"\u001b[?25hRequirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.7/dist-packages (from transformers) (1.21.6)\n",
"Requirement already satisfied: requests in /usr/local/lib/python3.7/dist-packages (from transformers) (2.23.0)\n",
"Requirement already satisfied: importlib-metadata in /usr/local/lib/python3.7/dist-packages (from transformers) (4.12.0)\n",
"Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.7/dist-packages (from transformers) (21.3)\n",
"Requirement already satisfied: typing-extensions>=3.7.4.3 in /usr/local/lib/python3.7/dist-packages (from huggingface-hub<1.0,>=0.1.0->transformers) (4.1.1)\n",
"Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /usr/local/lib/python3.7/dist-packages (from packaging>=20.0->transformers) (3.0.9)\n",
"Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.7/dist-packages (from importlib-metadata->transformers) (3.8.1)\n",
"Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests->transformers) (1.24.3)\n",
"Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests->transformers) (2.10)\n",
"Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests->transformers) (2022.6.15)\n",
"Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests->transformers) (3.0.4)\n",
"Installing collected packages: tokenizers, huggingface-hub, transformers\n",
"Successfully installed huggingface-hub-0.9.1 tokenizers-0.12.1 transformers-4.21.3\n",
"Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n",
"Collecting datasets\n",
" Downloading datasets-2.4.0-py3-none-any.whl (365 kB)\n",
"\u001b[K |████████████████████████████████| 365 kB 15.0 MB/s \n",
"\u001b[?25hCollecting xxhash\n",
" Downloading xxhash-3.0.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (212 kB)\n",
"\u001b[K |████████████████████████████████| 212 kB 101.9 MB/s \n",
"\u001b[?25hCollecting responses<0.19\n",
" Downloading responses-0.18.0-py3-none-any.whl (38 kB)\n",
"Requirement already satisfied: dill<0.3.6 in /usr/local/lib/python3.7/dist-packages (from datasets) (0.3.5.1)\n",
"Requirement already satisfied: pyarrow>=6.0.0 in /usr/local/lib/python3.7/dist-packages (from datasets) (6.0.1)\n",
"Requirement already satisfied: fsspec[http]>=2021.11.1 in /usr/local/lib/python3.7/dist-packages (from datasets) (2022.8.1)\n",
"Requirement already satisfied: huggingface-hub<1.0.0,>=0.1.0 in /usr/local/lib/python3.7/dist-packages (from datasets) (0.9.1)\n",
"Collecting multiprocess\n",
" Downloading multiprocess-0.70.13-py37-none-any.whl (115 kB)\n",
"\u001b[K |████████████████████████████████| 115 kB 99.4 MB/s \n",
"\u001b[?25hRequirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.7/dist-packages (from datasets) (1.21.6)\n",
"Requirement already satisfied: requests>=2.19.0 in /usr/local/lib/python3.7/dist-packages (from datasets) (2.23.0)\n",
"Requirement already satisfied: tqdm>=4.62.1 in /usr/local/lib/python3.7/dist-packages (from datasets) (4.64.0)\n",
"Requirement already satisfied: importlib-metadata in /usr/local/lib/python3.7/dist-packages (from datasets) (4.12.0)\n",
"Requirement already satisfied: pandas in /usr/local/lib/python3.7/dist-packages (from datasets) (1.3.5)\n",
"Requirement already satisfied: aiohttp in /usr/local/lib/python3.7/dist-packages (from datasets) (3.8.1)\n",
"Requirement already satisfied: packaging in /usr/local/lib/python3.7/dist-packages (from datasets) (21.3)\n",
"Requirement already satisfied: asynctest==0.13.0 in /usr/local/lib/python3.7/dist-packages (from aiohttp->datasets) (0.13.0)\n",
"Requirement already satisfied: async-timeout<5.0,>=4.0.0a3 in /usr/local/lib/python3.7/dist-packages (from aiohttp->datasets) (4.0.2)\n",
"Requirement already satisfied: frozenlist>=1.1.1 in /usr/local/lib/python3.7/dist-packages (from aiohttp->datasets) (1.3.1)\n",
"Requirement already satisfied: charset-normalizer<3.0,>=2.0 in /usr/local/lib/python3.7/dist-packages (from aiohttp->datasets) (2.1.1)\n",
"Requirement already satisfied: aiosignal>=1.1.2 in /usr/local/lib/python3.7/dist-packages (from aiohttp->datasets) (1.2.0)\n",
"Requirement already satisfied: yarl<2.0,>=1.0 in /usr/local/lib/python3.7/dist-packages (from aiohttp->datasets) (1.8.1)\n",
"Requirement already satisfied: typing-extensions>=3.7.4 in /usr/local/lib/python3.7/dist-packages (from aiohttp->datasets) (4.1.1)\n",
"Requirement already satisfied: multidict<7.0,>=4.5 in /usr/local/lib/python3.7/dist-packages (from aiohttp->datasets) (6.0.2)\n",
"Requirement already satisfied: attrs>=17.3.0 in /usr/local/lib/python3.7/dist-packages (from aiohttp->datasets) (22.1.0)\n",
"Requirement already satisfied: filelock in /usr/local/lib/python3.7/dist-packages (from huggingface-hub<1.0.0,>=0.1.0->datasets) (3.8.0)\n",
"Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.7/dist-packages (from huggingface-hub<1.0.0,>=0.1.0->datasets) (6.0)\n",
"Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /usr/local/lib/python3.7/dist-packages (from packaging->datasets) (3.0.9)\n",
"Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests>=2.19.0->datasets) (3.0.4)\n",
"Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests>=2.19.0->datasets) (1.24.3)\n",
"Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests>=2.19.0->datasets) (2022.6.15)\n",
"Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests>=2.19.0->datasets) (2.10)\n",
"Collecting urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1\n",
" Downloading urllib3-1.25.11-py2.py3-none-any.whl (127 kB)\n",
"\u001b[K |████████████████████████████████| 127 kB 95.5 MB/s \n",
"\u001b[?25hRequirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.7/dist-packages (from importlib-metadata->datasets) (3.8.1)\n",
"Requirement already satisfied: pytz>=2017.3 in /usr/local/lib/python3.7/dist-packages (from pandas->datasets) (2022.2.1)\n",
"Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.7/dist-packages (from pandas->datasets) (2.8.2)\n",
"Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.7/dist-packages (from python-dateutil>=2.7.3->pandas->datasets) (1.15.0)\n",
"Installing collected packages: urllib3, xxhash, responses, multiprocess, datasets\n",
" Attempting uninstall: urllib3\n",
" Found existing installation: urllib3 1.24.3\n",
" Uninstalling urllib3-1.24.3:\n",
" Successfully uninstalled urllib3-1.24.3\n",
"Successfully installed datasets-2.4.0 multiprocess-0.70.13 responses-0.18.0 urllib3-1.25.11 xxhash-3.0.0\n",
"Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n",
"Collecting rouge\n",
" Downloading rouge-1.0.1-py3-none-any.whl (13 kB)\n",
"Requirement already satisfied: six in /usr/local/lib/python3.7/dist-packages (from rouge) (1.15.0)\n",
"Installing collected packages: rouge\n",
"Successfully installed rouge-1.0.1\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"from google.colab import drive\n",
"drive.mount('/content/drive')"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "hKIiTeDpcYTX",
"outputId": "3aaa1d55-b636-410f-9528-4456337037f0"
},
"execution_count": 4,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Mounted at /content/drive\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"%mv /root/.cache /root/.cache.backup\n",
"%cd /content/drive/MyDrive\n",
"%mkdir /content/drive/MyDrive/.cache\n",
"!ln -s /content/drive/MyDrive/.cache /root/.cache\n",
"%cd /content/drive/MyDrive/.cache\n",
"%ls -la\n",
"%cd /content/"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "6TQbG6MAccMH",
"outputId": "629ce0f4-c369-4fc9-be1d-a5e6e2014ef8"
},
"execution_count": 5,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"/content/drive/MyDrive\n",
"mkdir: cannot create directory ‘/content/drive/MyDrive/.cache’: File exists\n",
"/content/drive/MyDrive/.cache\n",
"total 9173\n",
"-rw------- 1 root root 29 Sep 9 01:47 2dc6085404c55008ba7fc09ab7483ef3f0a4ca2496ccee0cdbf51c2b5d529dff.ec5c189f89475aac7d8cbd243960a0655cfadc3d0474da8ff2ed0bf1699c2a5f\n",
"-rw------- 1 root root 142 Sep 9 01:47 2dc6085404c55008ba7fc09ab7483ef3f0a4ca2496ccee0cdbf51c2b5d529dff.ec5c189f89475aac7d8cbd243960a0655cfadc3d0474da8ff2ed0bf1699c2a5f.json\n",
"-rw------- 1 root root 0 Sep 9 01:47 2dc6085404c55008ba7fc09ab7483ef3f0a4ca2496ccee0cdbf51c2b5d529dff.ec5c189f89475aac7d8cbd243960a0655cfadc3d0474da8ff2ed0bf1699c2a5f.lock\n",
"-rw------- 1 root root 109540 Sep 9 01:47 36acdf4f3edf0a14ffb2b2c68ba47e93abd9448825202377ddb16dae8114fe07.accd894ff58c6ff7bd4f3072890776c14f4ea34fcc08e79cd88c2d157756dceb\n",
"-rw------- 1 root root 130 Sep 9 01:47 36acdf4f3edf0a14ffb2b2c68ba47e93abd9448825202377ddb16dae8114fe07.accd894ff58c6ff7bd4f3072890776c14f4ea34fcc08e79cd88c2d157756dceb.json\n",
"-rw------- 1 root root 0 Sep 9 01:47 36acdf4f3edf0a14ffb2b2c68ba47e93abd9448825202377ddb16dae8114fe07.accd894ff58c6ff7bd4f3072890776c14f4ea34fcc08e79cd88c2d157756dceb.lock\n",
"-rw------- 1 root root 624 Sep 9 01:48 6cc404ca8136bc87bae0fb24f2259904943d776a6c5ddc26598bbdc319476f42.0f9bcd8314d841c06633e7b92b04509f1802c16796ee67b0f1177065739e24ae\n",
"-rw------- 1 root root 132 Sep 9 01:48 6cc404ca8136bc87bae0fb24f2259904943d776a6c5ddc26598bbdc319476f42.0f9bcd8314d841c06633e7b92b04509f1802c16796ee67b0f1177065739e24ae.json\n",
"-rw------- 1 root root 0 Sep 9 01:47 6cc404ca8136bc87bae0fb24f2259904943d776a6c5ddc26598bbdc319476f42.0f9bcd8314d841c06633e7b92b04509f1802c16796ee67b0f1177065739e24ae.lock\n",
"drwx------ 2 root root 4096 Sep 9 02:07 \u001b[0m\u001b[01;34mdata\u001b[0m/\n",
"drwx------ 2 root root 4096 Sep 3 17:14 \u001b[01;34mhuggingface\u001b[0m/\n",
"-rw------- 1 root root 9254935 Sep 10 03:27 jieba.cache\n",
"drwx------ 2 root root 4096 Aug 31 14:01 \u001b[01;34mmatplotlib\u001b[0m/\n",
"drwx------ 2 root root 4096 Aug 31 14:00 \u001b[01;34mnode-gyp\u001b[0m/\n",
"drwx------ 2 root root 4096 Sep 3 17:14 \u001b[01;34mpip\u001b[0m/\n",
"drwx------ 2 root root 4096 Sep 5 13:52 \u001b[01;34mscikit-image\u001b[0m/\n",
"/content\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"%cd /content/drive/MyDrive/AI\n",
"!git clone https://github.com/IDEA-CCNL/Fengshenbang-LM\n",
"%cd Fengshenbang-LM\n",
"\n",
"!pip install --editable ."
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "qTsvM16xcYWt",
"outputId": "726e88ca-b96b-4112-e2ce-277b041de280"
},
"execution_count": 6,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"/content/drive/MyDrive/AI\n",
"fatal: destination path 'Fengshenbang-LM' already exists and is not an empty directory.\n",
"/content/drive/MyDrive/AI/Fengshenbang-LM\n",
"Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n",
"Obtaining file:///content/drive/MyDrive/AI/Fengshenbang-LM\n",
"Requirement already satisfied: transformers>=4.17.0 in /usr/local/lib/python3.7/dist-packages (from fengshen==0.0.1) (4.21.3)\n",
"Requirement already satisfied: datasets>=2.0.0 in /usr/local/lib/python3.7/dist-packages (from fengshen==0.0.1) (2.4.0)\n",
"Collecting pytorch_lightning>=1.5.10\n",
" Using cached pytorch_lightning-1.7.5-py3-none-any.whl (706 kB)\n",
"Collecting deepspeed==0.5.10\n",
" Using cached deepspeed-0.5.10-py3-none-any.whl\n",
"Collecting jieba-fast>=0.53\n",
" Using cached jieba_fast-0.53-cp37-cp37m-linux_x86_64.whl\n",
"Requirement already satisfied: jieba>=0.40.0 in /usr/local/lib/python3.7/dist-packages (from fengshen==0.0.1) (0.42.1)\n",
"Collecting triton==1.0.0\n",
" Using cached triton-1.0.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (15.2 MB)\n",
"Requirement already satisfied: packaging in /usr/local/lib/python3.7/dist-packages (from deepspeed==0.5.10->fengshen==0.0.1) (21.3)\n",
"Requirement already satisfied: torch in /usr/local/lib/python3.7/dist-packages (from deepspeed==0.5.10->fengshen==0.0.1) (1.12.1+cu113)\n",
"Requirement already satisfied: psutil in /usr/local/lib/python3.7/dist-packages (from deepspeed==0.5.10->fengshen==0.0.1) (5.4.8)\n",
"Requirement already satisfied: tqdm in /usr/local/lib/python3.7/dist-packages (from deepspeed==0.5.10->fengshen==0.0.1) (4.64.0)\n",
"Collecting hjson\n",
" Using cached hjson-3.1.0-py3-none-any.whl (54 kB)\n",
"Collecting ninja\n",
" Using cached ninja-1.10.2.3-py2.py3-none-manylinux_2_5_x86_64.manylinux1_x86_64.whl (108 kB)\n",
"Collecting py-cpuinfo\n",
" Using cached py_cpuinfo-8.0.0-py3-none-any.whl\n",
"Requirement already satisfied: numpy in /usr/local/lib/python3.7/dist-packages (from deepspeed==0.5.10->fengshen==0.0.1) (1.21.6)\n",
"Requirement already satisfied: pandas in /usr/local/lib/python3.7/dist-packages (from datasets>=2.0.0->fengshen==0.0.1) (1.3.5)\n",
"Requirement already satisfied: aiohttp in /usr/local/lib/python3.7/dist-packages (from datasets>=2.0.0->fengshen==0.0.1) (3.8.1)\n",
"Requirement already satisfied: importlib-metadata in /usr/local/lib/python3.7/dist-packages (from datasets>=2.0.0->fengshen==0.0.1) (4.12.0)\n",
"Requirement already satisfied: xxhash in /usr/local/lib/python3.7/dist-packages (from datasets>=2.0.0->fengshen==0.0.1) (3.0.0)\n",
"Requirement already satisfied: multiprocess in /usr/local/lib/python3.7/dist-packages (from datasets>=2.0.0->fengshen==0.0.1) (0.70.13)\n",
"Requirement already satisfied: huggingface-hub<1.0.0,>=0.1.0 in /usr/local/lib/python3.7/dist-packages (from datasets>=2.0.0->fengshen==0.0.1) (0.9.1)\n",
"Requirement already satisfied: dill<0.3.6 in /usr/local/lib/python3.7/dist-packages (from datasets>=2.0.0->fengshen==0.0.1) (0.3.5.1)\n",
"Requirement already satisfied: fsspec[http]>=2021.11.1 in /usr/local/lib/python3.7/dist-packages (from datasets>=2.0.0->fengshen==0.0.1) (2022.8.1)\n",
"Requirement already satisfied: responses<0.19 in /usr/local/lib/python3.7/dist-packages (from datasets>=2.0.0->fengshen==0.0.1) (0.18.0)\n",
"Requirement already satisfied: requests>=2.19.0 in /usr/local/lib/python3.7/dist-packages (from datasets>=2.0.0->fengshen==0.0.1) (2.23.0)\n",
"Requirement already satisfied: pyarrow>=6.0.0 in /usr/local/lib/python3.7/dist-packages (from datasets>=2.0.0->fengshen==0.0.1) (6.0.1)\n",
"Requirement already satisfied: asynctest==0.13.0 in /usr/local/lib/python3.7/dist-packages (from aiohttp->datasets>=2.0.0->fengshen==0.0.1) (0.13.0)\n",
"Requirement already satisfied: frozenlist>=1.1.1 in /usr/local/lib/python3.7/dist-packages (from aiohttp->datasets>=2.0.0->fengshen==0.0.1) (1.3.1)\n",
"Requirement already satisfied: async-timeout<5.0,>=4.0.0a3 in /usr/local/lib/python3.7/dist-packages (from aiohttp->datasets>=2.0.0->fengshen==0.0.1) (4.0.2)\n",
"Requirement already satisfied: charset-normalizer<3.0,>=2.0 in /usr/local/lib/python3.7/dist-packages (from aiohttp->datasets>=2.0.0->fengshen==0.0.1) (2.1.1)\n",
"Requirement already satisfied: aiosignal>=1.1.2 in /usr/local/lib/python3.7/dist-packages (from aiohttp->datasets>=2.0.0->fengshen==0.0.1) (1.2.0)\n",
"Requirement already satisfied: yarl<2.0,>=1.0 in /usr/local/lib/python3.7/dist-packages (from aiohttp->datasets>=2.0.0->fengshen==0.0.1) (1.8.1)\n",
"Requirement already satisfied: multidict<7.0,>=4.5 in /usr/local/lib/python3.7/dist-packages (from aiohttp->datasets>=2.0.0->fengshen==0.0.1) (6.0.2)\n",
"Requirement already satisfied: attrs>=17.3.0 in /usr/local/lib/python3.7/dist-packages (from aiohttp->datasets>=2.0.0->fengshen==0.0.1) (22.1.0)\n",
"Requirement already satisfied: typing-extensions>=3.7.4 in /usr/local/lib/python3.7/dist-packages (from aiohttp->datasets>=2.0.0->fengshen==0.0.1) (4.1.1)\n",
"Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.7/dist-packages (from huggingface-hub<1.0.0,>=0.1.0->datasets>=2.0.0->fengshen==0.0.1) (6.0)\n",
"Requirement already satisfied: filelock in /usr/local/lib/python3.7/dist-packages (from huggingface-hub<1.0.0,>=0.1.0->datasets>=2.0.0->fengshen==0.0.1) (3.8.0)\n",
"Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /usr/local/lib/python3.7/dist-packages (from packaging->deepspeed==0.5.10->fengshen==0.0.1) (3.0.9)\n",
"Collecting torchmetrics>=0.7.0\n",
" Using cached torchmetrics-0.9.3-py3-none-any.whl (419 kB)\n",
"Collecting tensorboard>=2.9.1\n",
" Using cached tensorboard-2.10.0-py3-none-any.whl (5.9 MB)\n",
"Collecting pyDeprecate>=0.3.1\n",
" Using cached pyDeprecate-0.3.2-py3-none-any.whl (10 kB)\n",
"Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests>=2.19.0->datasets>=2.0.0->fengshen==0.0.1) (2.10)\n",
"Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests>=2.19.0->datasets>=2.0.0->fengshen==0.0.1) (2022.6.15)\n",
"Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests>=2.19.0->datasets>=2.0.0->fengshen==0.0.1) (3.0.4)\n",
"Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests>=2.19.0->datasets>=2.0.0->fengshen==0.0.1) (1.25.11)\n",
"Requirement already satisfied: grpcio>=1.24.3 in /usr/local/lib/python3.7/dist-packages (from tensorboard>=2.9.1->pytorch_lightning>=1.5.10->fengshen==0.0.1) (1.47.0)\n",
"Requirement already satisfied: setuptools>=41.0.0 in /usr/local/lib/python3.7/dist-packages (from tensorboard>=2.9.1->pytorch_lightning>=1.5.10->fengshen==0.0.1) (57.4.0)\n",
"Requirement already satisfied: absl-py>=0.4 in /usr/local/lib/python3.7/dist-packages (from tensorboard>=2.9.1->pytorch_lightning>=1.5.10->fengshen==0.0.1) (1.2.0)\n",
"Requirement already satisfied: markdown>=2.6.8 in /usr/local/lib/python3.7/dist-packages (from tensorboard>=2.9.1->pytorch_lightning>=1.5.10->fengshen==0.0.1) (3.4.1)\n",
"Requirement already satisfied: wheel>=0.26 in /usr/local/lib/python3.7/dist-packages (from tensorboard>=2.9.1->pytorch_lightning>=1.5.10->fengshen==0.0.1) (0.37.1)\n",
"Requirement already satisfied: tensorboard-plugin-wit>=1.6.0 in /usr/local/lib/python3.7/dist-packages (from tensorboard>=2.9.1->pytorch_lightning>=1.5.10->fengshen==0.0.1) (1.8.1)\n",
"Requirement already satisfied: werkzeug>=1.0.1 in /usr/local/lib/python3.7/dist-packages (from tensorboard>=2.9.1->pytorch_lightning>=1.5.10->fengshen==0.0.1) (1.0.1)\n",
"Requirement already satisfied: google-auth-oauthlib<0.5,>=0.4.1 in /usr/local/lib/python3.7/dist-packages (from tensorboard>=2.9.1->pytorch_lightning>=1.5.10->fengshen==0.0.1) (0.4.6)\n",
"Requirement already satisfied: tensorboard-data-server<0.7.0,>=0.6.0 in /usr/local/lib/python3.7/dist-packages (from tensorboard>=2.9.1->pytorch_lightning>=1.5.10->fengshen==0.0.1) (0.6.1)\n",
"Requirement already satisfied: protobuf<3.20,>=3.9.2 in /usr/local/lib/python3.7/dist-packages (from tensorboard>=2.9.1->pytorch_lightning>=1.5.10->fengshen==0.0.1) (3.17.3)\n",
"Requirement already satisfied: google-auth<3,>=1.6.3 in /usr/local/lib/python3.7/dist-packages (from tensorboard>=2.9.1->pytorch_lightning>=1.5.10->fengshen==0.0.1) (1.35.0)\n",
"Requirement already satisfied: six>=1.9.0 in /usr/local/lib/python3.7/dist-packages (from google-auth<3,>=1.6.3->tensorboard>=2.9.1->pytorch_lightning>=1.5.10->fengshen==0.0.1) (1.15.0)\n",
"Requirement already satisfied: pyasn1-modules>=0.2.1 in /usr/local/lib/python3.7/dist-packages (from google-auth<3,>=1.6.3->tensorboard>=2.9.1->pytorch_lightning>=1.5.10->fengshen==0.0.1) (0.2.8)\n",
"Requirement already satisfied: rsa<5,>=3.1.4 in /usr/local/lib/python3.7/dist-packages (from google-auth<3,>=1.6.3->tensorboard>=2.9.1->pytorch_lightning>=1.5.10->fengshen==0.0.1) (4.9)\n",
"Requirement already satisfied: cachetools<5.0,>=2.0.0 in /usr/local/lib/python3.7/dist-packages (from google-auth<3,>=1.6.3->tensorboard>=2.9.1->pytorch_lightning>=1.5.10->fengshen==0.0.1) (4.2.4)\n",
"Requirement already satisfied: requests-oauthlib>=0.7.0 in /usr/local/lib/python3.7/dist-packages (from google-auth-oauthlib<0.5,>=0.4.1->tensorboard>=2.9.1->pytorch_lightning>=1.5.10->fengshen==0.0.1) (1.3.1)\n",
"Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.7/dist-packages (from importlib-metadata->datasets>=2.0.0->fengshen==0.0.1) (3.8.1)\n",
"Requirement already satisfied: pyasn1<0.5.0,>=0.4.6 in /usr/local/lib/python3.7/dist-packages (from pyasn1-modules>=0.2.1->google-auth<3,>=1.6.3->tensorboard>=2.9.1->pytorch_lightning>=1.5.10->fengshen==0.0.1) (0.4.8)\n",
"Requirement already satisfied: oauthlib>=3.0.0 in /usr/local/lib/python3.7/dist-packages (from requests-oauthlib>=0.7.0->google-auth-oauthlib<0.5,>=0.4.1->tensorboard>=2.9.1->pytorch_lightning>=1.5.10->fengshen==0.0.1) (3.2.0)\n",
"Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.7/dist-packages (from transformers>=4.17.0->fengshen==0.0.1) (2022.6.2)\n",
"Requirement already satisfied: tokenizers!=0.11.3,<0.13,>=0.11.1 in /usr/local/lib/python3.7/dist-packages (from transformers>=4.17.0->fengshen==0.0.1) (0.12.1)\n",
"Requirement already satisfied: pytz>=2017.3 in /usr/local/lib/python3.7/dist-packages (from pandas->datasets>=2.0.0->fengshen==0.0.1) (2022.2.1)\n",
"Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.7/dist-packages (from pandas->datasets>=2.0.0->fengshen==0.0.1) (2.8.2)\n",
"Installing collected packages: triton, torchmetrics, tensorboard, pyDeprecate, py-cpuinfo, ninja, hjson, pytorch-lightning, jieba-fast, deepspeed, fengshen\n",
" Attempting uninstall: tensorboard\n",
" Found existing installation: tensorboard 2.8.0\n",
" Uninstalling tensorboard-2.8.0:\n",
" Successfully uninstalled tensorboard-2.8.0\n",
" Running setup.py develop for fengshen\n",
"\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n",
"tensorflow 2.8.2+zzzcolab20220719082949 requires tensorboard<2.9,>=2.8, but you have tensorboard 2.10.0 which is incompatible.\u001b[0m\n",
"Successfully installed deepspeed-0.5.10 fengshen-0.0.1 hjson-3.1.0 jieba-fast-0.53 ninja-1.10.2.3 py-cpuinfo-8.0.0 pyDeprecate-0.3.2 pytorch-lightning-1.7.5 tensorboard-2.10.0 torchmetrics-0.9.3 triton-1.0.0\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"import fengshen\n",
"import sys\n",
"sys.path.append(\"./fengshen/examples/pegasus/\")"
],
"metadata": {
"id": "B5A19Sjicvhd"
},
"execution_count": 7,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# 中文摘要\n",
"from transformers import PegasusForConditionalGeneration\n",
"# Need to download tokenizers_pegasus.py and other Python script from Fengshenbang-LM github repo in advance,\n",
"# or you can download tokenizers_pegasus.py and data_utils.py in https://huggingface.co/IDEA-CCNL/Randeng_Pegasus_523M/tree/main\n",
"# Strongly recommend you git clone the Fengshenbang-LM repo:\n",
"\n",
"#%cd fengshen/examples/pegasus/\n",
"import fengshen\n",
"import sys\n",
"sys.path.append(\"./fengshen/examples/pegasus/\")\n",
"# and then you will see the tokenizers_pegasus.py and data_utils.py which are needed by pegasus model\n",
"from tokenizers_pegasus import PegasusTokenizer\n",
"\n",
"model = PegasusForConditionalGeneration.from_pretrained(\"IDEA-CCNL/Randeng-Pegasus-523M-Summary-Chinese\")\n",
"tokenizer = PegasusTokenizer.from_pretrained(\"IDEA-CCNL/Randeng-Pegasus-523M-Summary-Chinese\")\n",
"\n",
"text = \"据微信公众号“界面”报道,4日上午10点左右,中国发改委反垄断调查小组突击查访奔驰上海办事处,调取数据材料,并对多名奔驰高管进行了约谈。截止昨日晚9点,包括北京梅赛德斯-奔驰销售服务有限公司东区总经理在内的多名管理人员仍留在上海办公室内\"\n",
"inputs = tokenizer(text, max_length=1024, return_tensors=\"pt\")\n",
"\n",
"# Generate Summary\n",
"summary_ids = model.generate(inputs[\"input_ids\"])\n",
"tokenizer.batch_decode(summary_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]\n",
"\n",
"# model Output: 反垄断调查小组突击查访奔驰上海办事处,对多名奔驰高管进行约谈\n"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 231
},
"id": "PIC-Jr-Uce02",
"outputId": "8d9df98f-215e-4f5b-a6d5-d9ad79667821"
},
"execution_count": 8,
"outputs": [
{
"output_type": "stream",
"name": "stderr",
"text": [
"Building prefix dict from the default dictionary ...\n",
"DEBUG:jieba:Building prefix dict from the default dictionary ...\n",
"Loading model from cache /root/.cache/jieba.cache\n",
"DEBUG:jieba:Loading model from cache /root/.cache/jieba.cache\n",
"Loading model cost 1.879 seconds.\n",
"DEBUG:jieba:Loading model cost 1.879 seconds.\n",
"Prefix dict has been built successfully.\n",
"DEBUG:jieba:Prefix dict has been built successfully.\n",
"/usr/local/lib/python3.7/dist-packages/transformers/generation_utils.py:1207: UserWarning: Neither `max_length` nor `max_new_tokens` have been set, `max_length` will default to 256 (`self.config.max_length`). Controlling `max_length` via the config is deprecated and `max_length` will be removed from the config in v5 of Transformers -- we recommend using `max_new_tokens` to control the maximum length of the generation.\n",
" UserWarning,\n"
]
},
{
"output_type": "execute_result",
"data": {
"text/plain": [
"'反垄断调查小组突击查访奔驰上海办事处,对多名奔驰高管进行约谈'"
],
"application/vnd.google.colaboratory.intrinsic+json": {
"type": "string"
}
},
"metadata": {},
"execution_count": 8
}
]
},
{
"cell_type": "code",
"source": [
"text = \"\"\"\n",
"\n",
"最近两年,预训练逐渐成为整个认知智能的基础,自然语言和计算机视觉的算法全方面的依赖于预训练模型来构建。\n",
"\n",
"预训练模型的规模从最初的1亿参数BERT到一千多亿参数的GTP-3,正在以每年10倍的速度增加。针对不同的下游任务,需要不同的结构,不同的尺寸和不同的专业领域的预训练模型。 这个世界需要更多更大的模型。但是,有限的算力资源是限制整个领域进一步发展的瓶颈。尤其是高校、小公司和一些传统公司,根本不具备足够的算力来训练和使用大规模预训练模型。这些都阻碍了整个人工智能技术更进一步的落地。\n",
"\n",
"这个世界需要一个答案。\n",
"\n",
"IDEA研究院正式宣布,开启 “封神榜”大模型开源计划。“封神榜”将全方面的开源一系列NLP相关的预训练大模型,它们将覆盖文本分类、文本续写、文本摘要、语义纠错等NLP相关任务,不同的专业领域。而且我们承诺,将对这些模型做持续的升级,不断融合最新的数据和最新的训练算法。通过IDEA研究院的努力,打造中文认知智能的通用基础设施,避免重复建设,为全社会节省算力。\n",
"\n",
"\"\"\"\n"
],
"metadata": {
"id": "lOUxP0noi_88"
},
"execution_count": 12,
"outputs": []
},
{
"cell_type": "code",
"source": [
"inputs = tokenizer(text, max_length=5120, return_tensors=\"pt\")\n",
"\n",
"# Generate Summary\n",
"summary_ids = model.generate(inputs[\"input_ids\"])\n",
"tokenizer.batch_decode(summary_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]\n"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 35
},
"id": "qOSF68Kxj_ql",
"outputId": "4111f70f-27ee-48e3-8bdf-17bb5a173baf"
},
"execution_count": 13,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"'idea 研究院开启“封神榜”大模型开源计划'"
],
"application/vnd.google.colaboratory.intrinsic+json": {
"type": "string"
}
},
"metadata": {},
"execution_count": 13
}
]
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment