Skip to content

Instantly share code, notes, and snippets.

@itsuncheng
Created January 19, 2021 11:28
Show Gist options
  • Save itsuncheng/28bcb6cc64d9f18c971913eb5a0a0673 to your computer and use it in GitHub Desktop.
Save itsuncheng/28bcb6cc64d9f18c971913eb5a0a0673 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Text Generation in English"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"from transformers import pipeline"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"text_generation = pipeline(\"text-generation\")"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"prefix_text = \"The world is\""
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"The world is a better place if you're a good person.\n",
"\n",
"I'm not saying that you should be a bad person. I'm saying that you should be a good person.\n",
"\n",
"I'm not saying that you should be a bad\n"
]
}
],
"source": [
"generated_text= text_generation(prefix_text, max_length=50, do_sample=False)[0]\n",
"print(generated_text['generated_text'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Text Generation in Chinese Example"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [],
"source": [
"from transformers import pipeline, BertTokenizerFast, AutoModelWithLMHead"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"tokenizer = BertTokenizerFast.from_pretrained('bert-base-chinese')\n",
"model = AutoModelWithLMHead.from_pretrained('ckiplab/gpt2-base-chinese')"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [],
"source": [
"text_generation = pipeline(\"text-generation\", model=model, tokenizer=tokenizer)"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Setting `pad_token_id` to `eos_token_id`:102 for open-end generation.\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"我 想 要 去 看 看 。 」 他 說 : 「 我 們 不 能 說, 我 們 不 能 說, 我 們 不 能 說, 我 們 不 能 說, 我 們 不 能 說, 我 們 不 能 說, 我 們\n"
]
}
],
"source": [
"prefix_text = \"我 想 要 去\"\n",
"\n",
"generated_text= text_generation(prefix_text, max_length=50, do_sample=False)[0]\n",
"print(generated_text['generated_text'])"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.6"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment