Skip to content

Instantly share code, notes, and snippets.

@ryderwishart
Created August 30, 2023 16:55
Show Gist options
  • Save ryderwishart/2cebcace16c0d6e0503851f5da720491 to your computer and use it in GitHub Desktop.
Save ryderwishart/2cebcace16c0d6e0503851f5da720491 to your computer and use it in GitHub Desktop.
posalign_two-step_spanish.ipynb
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/gist/ryderwishart/2cebcace16c0d6e0503851f5da720491/posalign_two-step_spanish.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "Ct3L_5rEXmAP"
},
"outputs": [],
"source": [
"import getpass, os\n",
"secret_key = getpass.getpass('Enter OpenAI secret key: ')\n",
"os.environ['OPENAI_API_KEY'] = secret_key"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "GzyNIXwJXmAR"
},
"outputs": [],
"source": [
"import openai"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "-S79xiPwXmAR"
},
"source": [
"## Handling an edge case (!)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "bZZTof0rXmAT"
},
"outputs": [],
"source": [
"prompt = '''\n",
"Here are some general facts to note about Bantu languages:\n",
"Bantu languages: are agglutinating, ensure correct affix attachment; employ complex noun class system, ensure noun agreement across sentences; follow SOV order; mark verbs for tense, aspect, mood; adhere to rules of conjunctive and disjunctive orthography; apply tone system to distinguish meaning.\n",
"For translating from Greek: handle Koine Greek's inflection as agglutination, pay attention to affixes; replace Greek's three-gender system with Bantu noun class system, ensuring agreement; shift to SOV order; adapt Greek Voice/Aspect/Mood markings to Bantu system; implement rules of conjunctive and disjunctive orthography; recognize there is a tonal system for distinguishing some grammatical systems.\n",
"Most words in a Bantu sentence are marked by a prefix indicating the category to which the noun used as the subject of the sentence belongs. If there is an object, the words in that noun phrase and the verb are also marked by a prefix determined by the noun class of the object\n",
"\n",
"Here is a sentence:\n",
"English: And He said to them What things; - And they said to Him The things concerning Jesus of Nazareth, who was a man a prophet mighty in deed and word before - God and all the people,\n",
"Greek: καὶ εἶπεν αὐτοῖς Ποῖα;οἱ δὲ εἶπαν αὐτῷ Τὰ περὶ Ἰησοῦ τοῦ Ναζαρηνοῦ,ὃς ἐγένετο ἀνὴρ προφήτης δυνατὸς ἐν ἔργῳ καὶ λόγῳ ἐναντίον τοῦ Θεοῦ καὶ παντὸς τοῦ λαοῦ,\n",
"Abanyom: Wɛ abib arɛ, “Ba nsɔl yi?” Abɔ afanga arɛ, “Nsɔl yi ɛlemɔ Jisɔs yɔ Nasarɛt. Ajɔl nyɛna amir abɛl ɛkɔ na alom na nema na libri Ɔsɔwɔ na anɛ kpakpa.\n",
"\n",
"Here is a phonological, semantic, orthographic alignment of that sentence:\n",
"```\n",
"\t[\n",
"\t {\n",
"\t \"Source phrase\": \"Wɛ abib arɛ,\",\n",
"\t \"Target phrase\": \"And He said to them\",\n",
"\t \"Greek phrase\": \"καὶ εἶπεν αὐτοῖς\",\n",
"\t \"Rationale\": \"Orthographic alignment (comma)\",\n",
"\t \"Relevant grammatical patterns\": \"sentential connection, projective matrix\"\n",
"\t },\n",
"\t {\n",
"\t \"Source phrase\": \"“Ba nsɔl yi?”\",\n",
"\t \"Target phrase\": \"What things;\",\n",
"\t \"Greek phrase\": \"Ποῖα;\",\n",
"\t \"Rationale\": \"Semantic alignment; orthographic (quotation marks, capitalized Greek word)\",\n",
"\t \"Relevant grammatical patterns\": \"interrogative\"\n",
"\t },\n",
"\t {\n",
"\t \"Source phrase\": \"Abɔ afanga arɛ\",\n",
"\t \"Target phrase\": \"And they said to Him\",\n",
"\t \"Greek phrase\": \"οἱ δὲ εἶπαν αὐτῷ\",\n",
"\t \"Rationale\": \"Semantic alignment; orthographic (comma, capitalized Greek word)\",\n",
"\t \"Relevant grammatical patterns\": \"sentential connection, projective matrix\"\n",
"\t },\n",
"\t {\n",
"\t \"Source phrase\": \"“Nsɔl yi ɛlemɔ Jisɔs yɔ Nasarɛt.\",\n",
"\t \"Target phrase\": \"The things concerning Jesus of Nazareth,\",\n",
"\t \"Greek phrase\": \"Τὰ περὶ Ἰησοῦ τοῦ Ναζαρηνοῦ,\",\n",
"\t \"Rationale\": \"Semantic and phonetic similarity (Nasarɛt/Nazareth/Ναζαρηνοῦ, Jisɔs/Jesus/Ἰησοῦ, nsɔl/things [identified in a previous phrase]); orthographic (quotation marks, terminal punctuation)\",\n",
"\t \"Relevant grammatical patterns\": \"complex nominal construction with adjunct\"\n",
"\t },\n",
"\t {\n",
"\t \"Source phrase\": \"Ajɔl nyɛna amir abɛl ɛkɔ na alom na nema na libri\",\n",
"\t \"Target phrase\": \"who was a man a prophet mighty in deed and word\",\n",
"\t \"Greek phrase\": \"ὃς ἐγένετο ἀνὴρ προφήτης δυνατὸς ἐν ἔργῳ καὶ λόγῳ\",\n",
"\t \"Rationale\": \"Semantic alignment and matching content words ('libri' aligns with 'word', 'λόγῳ'); orthographic (period)\",\n",
"\t \"Relevant grammatical patterns\": \"subordination (relative construction), coordination\"\n",
"\t },\n",
"\t {\n",
"\t \"Source phrase\": \"Ɔsɔwɔ na anɛ kpakpa.\",\n",
"\t \"Target phrase\": \"before - God and all the people\",\n",
"\t \"Greek phrase\": \"ἐναντίον τοῦ Θεοῦ καὶ παντὸς τοῦ λαοῦ,\",\n",
"\t \"Rationale\": \"Semantic alignment and matching content words ('Ɔsɔwɔ' aligns with 'God', 'Θεοῦ'); orthographic (period)\",\n",
"\t \"Relevant grammatical patterns\": \"prepositional construction, coordination\"\n",
"\t }\n",
"\t]\n",
"```\n",
"\n",
"Please also align the following sentence. Avoid including multiple phrases in a single alignment unit (break phrases at the very least on commas or other major punctuation, including enclosing quotation marks):\n",
"\n",
"English: After now these things appointed the Lord others seventy, and sent them in two [by] before [the] face of Himself into every city and place where was about He Himself to go.\n",
"Greek: Μετὰ δὲ ταῦτα ἀνέδειξεν ὁ Κύριος ἑτέρους ἑβδομήκοντα,καὶ ἀπέστειλεν αὐτοὺς ἀνὰ δύο πρὸ προσώπου αὐτοῦ εἰς πᾶσαν πόλιν καὶ τόπον οὗ ἤμελλεν αὐτὸς ἔρχεσθαι.\n",
"Spanish: Entonces Él les dijo: ¿Qué cosas? Y ellos le dijeron: Las referentes a Jesús el Nazareno, que fue un profeta poderoso en obra y en palabra delante de Dios y de todo el pueblo;\n",
"'''\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "OG1w4sbbXmAV"
},
"outputs": [],
"source": [
"messages = [\n",
" # {\"role\": \"system\", \"content\": f\"You are CodeAnalyzerGPT. Analyze the user-supplied code below and follow any instructions the user gives.\"},\n",
" {\"role\": \"system\", \"content\": f\"You are LangAlignerGPT. Analyze the user-supplied alignment examples below and follow any instructions the user gives.\"},\n",
" {\"role\": \"user\", \"content\": prompt},\n",
"]\n",
"\n",
"response = openai.ChatCompletion.create(\n",
" model=\"gpt-4\",\n",
" messages=messages,\n",
" temperature=0.3,\n",
" n=1,\n",
" presence_penalty=0.5,\n",
" frequency_penalty=0.5,\n",
")\n",
"\n",
"generated_texts = [\n",
" choice.message[\"content\"].strip() for choice in response[\"choices\"]\n",
"]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "gsHiiNCBXmAV",
"outputId": "35a82681-85e9-425d-e6ea-4e638d5e521c"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The Spanish sentence provided does not match the English and Greek sentences. It seems to be a continuation of the previous example. Could you please provide the correct Spanish translation for this sentence? \n",
"\n",
"However, I can still align the English and Greek sentences as follows:\n",
"\n",
"```\n",
"[\n",
" {\n",
" \"Source phrase\": \"Μετὰ δὲ ταῦτα\",\n",
" \"Target phrase\": \"After now these things\",\n",
" \"Rationale\": \"Semantic alignment; 'Μετὰ' aligns with 'After', 'δὲ' with 'now', and 'ταῦτα' with 'these things'\",\n",
" \"Relevant grammatical patterns\": \"Temporal clause\"\n",
" },\n",
" {\n",
" \"Source phrase\": \"ἀνέδειξεν ὁ Κύριος ἑτέρους ἑβδομήκοντα,\",\n",
" \"Target phrase\": \"appointed the Lord others seventy,\",\n",
" \"Rationale\": \"'ἀνέδειξεν' aligns semantically with 'appointed', 'ὁ Κύριος' with 'the Lord', and 'ἑτέρους ἑβδομήκοντα' with 'others seventy'\",\n",
" \"Relevant grammatical patterns\": \"Subject-verb-object structure\"\n",
" },\n",
" {\n",
" \"Source phrase\": \"καὶ ἀπέστειλεν αὐτοὺς\",\n",
" \"Target phrase\": \"and sent them\",\n",
" \"'Rationale': Semantic alignment; καί corresponds to 'and', while ἀπέστειλεν αὐτούς corresponds to 'sent them'\",\n",
" \"'Relevant grammatical patterns': Conjunction and verb-object structure\"\n",
" },\n",
" {\n",
" \"'Source phrase': ἀνά δύο πρό προσώπου αὖ αὖ είς πάσαν πόλιν καί τόπον οὗ ήμελλεν αὖ έρχθαι.\",\n",
" \"'Target phrase': in two [by] before [the] face of Himself into every city and place where was about He Himself to go.\",\n",
" \"'Rationale': Semantic alignment; each word or group of words in the Greek sentence has a corresponding part in the English sentence that conveys a similar meaning.\",\n",
" \"'Relevant grammatical patterns': Prepositional phrases, relative clause.\"\n",
" }\n",
"]\n",
"```\n"
]
}
],
"source": [
"print(generated_texts[0])"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "OsECKdtzXmAV"
},
"source": [
"## New Spanish alignment"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "MBs-SZB4XmAW",
"outputId": "6802a1ef-c212-48da-cafb-548fba54313d"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Here is a phonological, semantic, orthographic alignment of that sentence:\t\n",
"```\n",
"\t[\n",
"\t {\n",
"\t \"Source phrase\": \"Después de esto,\",\n",
"\t \"Target phrase\": \"After now these things\",\n",
"\t \"Greek phrase\": \"Μετὰ δὲ ταῦτα\",\n",
"\t \"Rationale\": \"Semantic alignment; Orthographic (comma)\",\n",
"\t \"Relevant grammatical patterns\": \"Temporal clause\"\n",
"\t },\n",
"\t {\n",
"\t \"Source phrase\": \"el Señor designó a otros setenta,\",\n",
"\t \"Target phrase\": \"appointed the Lord others seventy,\",\n",
"\t \"Greek phrase\": \"ἀνέδειξεν ὁ Κύριος ἑτέρους ἑβδομήκοντα,\",\n",
"\t \"Rationale\": \"'Señor' aligns with 'Lord', 'setenta' aligns with 'seventy'; Semantic alignment; Orthographic (comma)\",\n",
"\t \"Relevant grammatical patterns\": \"[Subject] + [verb] + [direct object]\"\n",
"\t },\n",
"\t {\n",
"\t \"Source phrase\": \t\"y los envió de dos en dos delante de Él,\",\n",
"\t \"Target phrase\": \t\"and sent them in two [by] before [the] face of Himself \",\n",
"\t \"Greek phrase\": \t\"καὶ ἀπέστειλεν αὐτοὺς ἀνὰ δύο πρὸ προσώπου αὐτοῦ \",\n",
" \t \t\"Rationale\":\"Semantic alignment; matching content words ('envió' aligns with 'sent', 'dos' aligns with 'two'); Orthographic (comma)\",\n",
" \t \t\"Relevant grammatical patterns\":\"[Subject] + [verb] + [direct object], prepositional construction\"\n",
" \t },\n",
" \t {\n",
" \t \t\"Source phrase\":\"a toda ciudad y lugar adonde Él había de ir.\",\n",
" \t \t\"Target phrase\":\"into every city and place where was about He Himself to go.\",\n",
" \t \t\"Greek phrase\":\"εἰς πᾶσαν πόλιν καὶ τόπον οὗ ἤμελλεν αὐτὸς ἔρχεσθαι.\",\n",
" \t \t\"Rationale\":\"'ciudad' aligns with 'city', 'lugar' aligns with 'place'; Semantic alignment; Orthographic (period)\",\n",
" \t \t\"Relevant grammatical patterns\":\"prepositional construction, relative clause\"\n",
" \t }\n",
" ]\n",
"```\n"
]
}
],
"source": [
"prompt = '''\n",
"Here are some general facts to note about Bantu languages:\n",
"Bantu languages: are agglutinating, ensure correct affix attachment; employ complex noun class system, ensure noun agreement across sentences; follow SOV order; mark verbs for tense, aspect, mood; adhere to rules of conjunctive and disjunctive orthography; apply tone system to distinguish meaning.\n",
"For translating from Greek: handle Koine Greek's inflection as agglutination, pay attention to affixes; replace Greek's three-gender system with Bantu noun class system, ensuring agreement; shift to SOV order; adapt Greek Voice/Aspect/Mood markings to Bantu system; implement rules of conjunctive and disjunctive orthography; recognize there is a tonal system for distinguishing some grammatical systems.\n",
"Most words in a Bantu sentence are marked by a prefix indicating the category to which the noun used as the subject of the sentence belongs. If there is an object, the words in that noun phrase and the verb are also marked by a prefix determined by the noun class of the object\n",
"\n",
"Here is a sentence:\n",
"English: And He said to them What things; - And they said to Him The things concerning Jesus of Nazareth, who was a man a prophet mighty in deed and word before - God and all the people,\n",
"Greek: καὶ εἶπεν αὐτοῖς Ποῖα;οἱ δὲ εἶπαν αὐτῷ Τὰ περὶ Ἰησοῦ τοῦ Ναζαρηνοῦ,ὃς ἐγένετο ἀνὴρ προφήτης δυνατὸς ἐν ἔργῳ καὶ λόγῳ ἐναντίον τοῦ Θεοῦ καὶ παντὸς τοῦ λαοῦ,\n",
"Abanyom: Wɛ abib arɛ, “Ba nsɔl yi?” Abɔ afanga arɛ, “Nsɔl yi ɛlemɔ Jisɔs yɔ Nasarɛt. Ajɔl nyɛna amir abɛl ɛkɔ na alom na nema na libri Ɔsɔwɔ na anɛ kpakpa.\n",
"\n",
"Here is a phonological, semantic, orthographic alignment of that sentence:\n",
"```\n",
"\t[\n",
"\t {\n",
"\t \"Source phrase\": \"Wɛ abib arɛ,\",\n",
"\t \"Target phrase\": \"And He said to them\",\n",
"\t \"Greek phrase\": \"καὶ εἶπεν αὐτοῖς\",\n",
"\t \"Rationale\": \"Orthographic alignment (comma)\",\n",
"\t \"Relevant grammatical patterns\": \"sentential connection, projective matrix\"\n",
"\t },\n",
"\t {\n",
"\t \"Source phrase\": \"“Ba nsɔl yi?”\",\n",
"\t \"Target phrase\": \"What things;\",\n",
"\t \"Greek phrase\": \"Ποῖα;\",\n",
"\t \"Rationale\": \"Semantic alignment; orthographic (quotation marks, capitalized Greek word)\",\n",
"\t \"Relevant grammatical patterns\": \"interrogative\"\n",
"\t },\n",
"\t {\n",
"\t \"Source phrase\": \"Abɔ afanga arɛ\",\n",
"\t \"Target phrase\": \"And they said to Him\",\n",
"\t \"Greek phrase\": \"οἱ δὲ εἶπαν αὐτῷ\",\n",
"\t \"Rationale\": \"Semantic alignment; orthographic (comma, capitalized Greek word)\",\n",
"\t \"Relevant grammatical patterns\": \"sentential connection, projective matrix\"\n",
"\t },\n",
"\t {\n",
"\t \"Source phrase\": \"“Nsɔl yi ɛlemɔ Jisɔs yɔ Nasarɛt.\",\n",
"\t \"Target phrase\": \"The things concerning Jesus of Nazareth,\",\n",
"\t \"Greek phrase\": \"Τὰ περὶ Ἰησοῦ τοῦ Ναζαρηνοῦ,\",\n",
"\t \"Rationale\": \"Semantic and phonetic similarity (Nasarɛt/Nazareth/Ναζαρηνοῦ, Jisɔs/Jesus/Ἰησοῦ, nsɔl/things [identified in a previous phrase]); orthographic (quotation marks, terminal punctuation)\",\n",
"\t \"Relevant grammatical patterns\": \"complex nominal construction with adjunct\"\n",
"\t },\n",
"\t {\n",
"\t \"Source phrase\": \"Ajɔl nyɛna amir abɛl ɛkɔ na alom na nema na libri\",\n",
"\t \"Target phrase\": \"who was a man a prophet mighty in deed and word\",\n",
"\t \"Greek phrase\": \"ὃς ἐγένετο ἀνὴρ προφήτης δυνατὸς ἐν ἔργῳ καὶ λόγῳ\",\n",
"\t \"Rationale\": \"Semantic alignment and matching content words ('libri' aligns with 'word', 'λόγῳ'); orthographic (period)\",\n",
"\t \"Relevant grammatical patterns\": \"subordination (relative construction), coordination\"\n",
"\t },\n",
"\t {\n",
"\t \"Source phrase\": \"Ɔsɔwɔ na anɛ kpakpa.\",\n",
"\t \"Target phrase\": \"before - God and all the people\",\n",
"\t \"Greek phrase\": \"ἐναντίον τοῦ Θεοῦ καὶ παντὸς τοῦ λαοῦ,\",\n",
"\t \"Rationale\": \"Semantic alignment and matching content words ('Ɔsɔwɔ' aligns with 'God', 'Θεοῦ'); orthographic (period)\",\n",
"\t \"Relevant grammatical patterns\": \"prepositional construction, coordination\"\n",
"\t }\n",
"\t]\n",
"```\n",
"\n",
"Please also align the following sentence. Avoid including multiple phrases in a single alignment unit (break phrases at the very least on commas or other major punctuation, including enclosing quotation marks):\n",
"\n",
"English: After now these things appointed the Lord others seventy, and sent them in two [by] before [the] face of Himself into every city and place where was about He Himself to go.\n",
"Greek: Μετὰ δὲ ταῦτα ἀνέδειξεν ὁ Κύριος ἑτέρους ἑβδομήκοντα,καὶ ἀπέστειλεν αὐτοὺς ἀνὰ δύο πρὸ προσώπου αὐτοῦ εἰς πᾶσαν πόλιν καὶ τόπον οὗ ἤμελλεν αὐτὸς ἔρχεσθαι.\n",
"Spanish: Después de esto, el Señor designó a otros setenta, y los envió de dos en dos delante de Él, a toda ciudad y lugar adonde Él había de ir.\n",
"'''\n",
"\n",
"messages = [\n",
" # {\"role\": \"system\", \"content\": f\"You are CodeAnalyzerGPT. Analyze the user-supplied code below and follow any instructions the user gives.\"},\n",
" {\"role\": \"system\", \"content\": f\"You are LangAlignerGPT. Analyze the user-supplied alignment examples below and follow any instructions the user gives.\"},\n",
" {\"role\": \"user\", \"content\": prompt},\n",
"]\n",
"\n",
"response = openai.ChatCompletion.create(\n",
" model=\"gpt-4\",\n",
" messages=messages,\n",
" temperature=0.3,\n",
" n=1,\n",
" presence_penalty=0.5,\n",
" frequency_penalty=0.5,\n",
")\n",
"\n",
"generated_texts = [\n",
" choice.message[\"content\"].strip() for choice in response[\"choices\"]\n",
"]\n",
"print(generated_texts[0])"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "U_G9SPqYXmAW"
},
"source": [
"## Align individual chunks from output"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "VtkVFsR8XmAW",
"outputId": "8258e8e2-d345-4d72-fd0d-4ab8036b9e0e"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['\"Source phrase\": \"Después de esto,\",\\n\\t \"Target phrase\": \"After now these things\",\\n\\t \"Greek phrase\": \"Μετὰ δὲ ταῦτα\",\\n\\t \"Rationale\": \"Semantic alignment; Orthographic (comma)\",\\n\\t \"Relevant grammatical patterns\": \"Temporal clause\"\\n\\t },\\n\\t {\\n\\t \"Source phrase\": \"el Señor designó a otros setenta,\",\\n\\t \"Target phrase\": \"appointed the Lord others seventy,\",\\n\\t \"Greek phrase\": \"ἀνέδειξεν ὁ Κύριος ἑτέρους ἑβδομήκοντα,\",\\n\\t \"Rationale\": \"\\'Señor\\' aligns with \\'Lord\\', \\'setenta\\' aligns with \\'seventy\\'; Semantic alignment; Orthographic (comma)\",\\n\\t \"Relevant grammatical patterns\": \"[Subject] + [verb] + [direct object]\"\\n\\t },\\n\\t {\\n\\t \"Source phrase\": \\t\"y los envió de dos en dos delante de Él,\",\\n\\t \"Target phrase\": \\t\"and sent them in two [by] before [the] face of Himself \",\\n\\t \"Greek phrase\": \\t\"καὶ ἀπέστειλεν αὐτοὺς ἀνὰ δύο πρὸ προσώπου αὐτοῦ \",\\n \\t \\t\"Rationale\":\"Semantic alignment; matching content words (\\'envió\\' aligns with \\'sent\\', \\'dos\\' aligns with \\'two\\'); Orthographic (comma)\",\\n \\t \\t\"Relevant grammatical patterns\":\"[Subject] + [verb] + [direct object], prepositional construction\"\\n \\t },\\n \\t {\\n \\t \\t\"Source phrase\":\"a toda ciudad y lugar adonde Él había de ir.\",\\n \\t \\t\"Target phrase\":\"into every city and place where was about He Himself to go.\",\\n \\t \\t\"Greek phrase\":\"εἰς πᾶσαν πόλιν καὶ τόπον οὗ ἤμελλεν αὐτὸς ἔρχεσθαι.\",\\n \\t \\t\"Rationale\":\"\\'ciudad\\' aligns with \\'city\\', \\'lugar\\' aligns with \\'place\\'; Semantic alignment; Orthographic (period)\",\\n \\t \\t\"Relevant grammatical patterns\":\"prepositional construction, relative clause\"\\n \\t }\\n ]\\n```']\n"
]
}
],
"source": [
"final_alignments = []\n",
"\n",
"output = generated_texts[0]\n",
"# strip off '''Here is a phonological, semantic, orthographic alignment of the sentence:\n",
"\n",
"# ```\n",
"# [\n",
"# {'''\n",
"# from output\n",
"output = output.split('''Here is a phonological, semantic, orthographic alignment of that sentence:\n",
"```\n",
"\t[\n",
"\t {''')[1]\n",
"output = output.split('''\n",
" }\n",
"]\n",
"```''')[0]\n",
"\n",
"output = output.split('''},\n",
" {''')\n",
"\n",
"output = [i.strip().replace('\\n ', '\\n') for i in output]\n",
"output = [i.strip().replace('\\n \\t', '\\n') for i in output]\n",
"\n",
"print(output)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "rkOBxZqlXmAX",
"outputId": "8a10b2c4-488a-46f6-abbb-0868c0bf20b6"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Sure, here is the breakdown:\n",
"\n",
"1. \"Después de esto,\" --> \"After now these things\" --> \"Μετὰ δὲ ταῦτα\"\n",
" - Source token(s)\t\t-->\tTarget token(s)\t\t-->\tGreek token(s)\n",
" - Después de esto, --> After now these things, --> Μετὰ δὲ ταῦτα,\n",
"\n",
"2. \"el Señor designó a otros setenta,\" --> \"appointed the Lord others seventy,\" --> \"ἀνέδειξεν ὁ Κύριος ἑτέρους ἑβδομήκοντα,\"\n",
" - Source token(s)\t\t-->\tTarget token(s)\t\t-->\tGreek token(s)\n",
" - el Señor --> the Lord --> ὁ Κύριος\n",
" - designó a otros setenta, --> appointed others seventy, --> ἀνέδειξεν ἑτέρους ἑβδομήκοντα,\n",
"\n",
"3. \"y los envió de dos en dos delante de Él,\" --> \"and sent them in two [by] before [the] face of Himself\" --> \"καὶ ἀπέστειλεν αὐτοὺς ἀνὰ δύο πρὸ προσώπου αὐτοῦ\"\n",
" - Source token(s)\t\t-->\tTarget token(s)\t\t-->\tGreek token(s)\n",
" - y los envió --> and sent them --> καὶ ἀπέστειλεν αὐτοὺς\n",
" - de dos en dos delante de Él, --> in two [by] before [the] face of Himself, --> ἀνὰ δύο πρὸ προσώπου αὐτοῦ,\n",
"\n",
"4. \"a toda ciudad y lugar adonde Él había de ir.\" -> \"into every city and place where was about He Himself to go.\" -> \"εἰς πάσης της πόλης και τόπου ου εμέλλησε ν' αυξήσω.\"\n",
" - Source token(s) -> Target token(s) -> Greek token(s)\n",
" - a toda ciudad y lugar -> into every city and place -> είς πάσης της πόλης και τόπου\n",
" - adonde Él había de ir. -> where was about He Himself to go. -> ου εμέλλησε ν' αυξήσω.\n"
]
}
],
"source": [
"for chunk in output:\n",
"\n",
" prompt = '''Here is a sentence:\n",
" {POS_aligned_chunks}\n",
"\n",
" Please further align and break down this chunk into a mapping of the fewest possible tokens (sometimes multiple tokens will align to one token; that's expected):\n",
"\n",
" E.g.: \"Source phrase\": \"Después de esto,\",\\n\\t \"Target phrase\": \"After now these things\",\\n\\t \"Greek phrase\": \"Μετὰ δὲ ταῦτα\",\\n\\t \"Rationale\": \"Semantic alignment; Orthographic (comma)\",\\n\\t \"Relevant grammatical patterns\": \"Temporal clause\"\n",
"\n",
" - Source token(s)\\t\\t-->\\tTarget token(s)\\t\\t-->\\tGreek token(s)\n",
" ...etc.\n",
"\n",
" {current_chunk}\n",
" '''.format(POS_aligned_chunks='\\n'.join(output), current_chunk=chunk)\n",
"\n",
" messages = [\n",
" # {\"role\": \"system\", \"content\": f\"You are CodeAnalyzerGPT. Analyze the user-supplied code below and follow any instructions the user gives.\"},\n",
" {\"role\": \"system\", \"content\": f\"You are LangAlignerGPT. Analyze the user-supplied alignment examples below and follow any instructions the user gives.\"},\n",
" {\"role\": \"user\", \"content\": prompt},\n",
" ]\n",
"\n",
" response = openai.ChatCompletion.create(\n",
" model=\"gpt-4\",\n",
" messages=messages,\n",
" temperature=0.3,\n",
" n=1,\n",
" presence_penalty=0.5,\n",
" frequency_penalty=0.5,\n",
" )\n",
"\n",
" generated_texts_for_chunk = [\n",
" choice.message[\"content\"].strip() for choice in response[\"choices\"]\n",
" ]\n",
" print(generated_texts_for_chunk[0])\n",
" final_alignments.append(generated_texts_for_chunk[0])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "qdveDvS9XmAX"
},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.4"
},
"orig_nbformat": 4,
"colab": {
"provenance": [],
"include_colab_link": true
}
},
"nbformat": 4,
"nbformat_minor": 0
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment