Created
December 10, 2017 18:41
-
-
Save Noleli/ed594e6ebe8d6cae044e05a953547ee5 to your computer and use it in GitHub Desktop.
Searching for a pasuk with all the vowels
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Lawrence Szenes-Strauss posted the following question on [Facebook](https://www.facebook.com/groups/1071696109619922/permalink/1558249010964627/):\n", | |
"\n", | |
"> Who can come up with a short passage of Tanakh that:\n", | |
"> 1. Contains all 12 distinct Masoretic vowel marks (qamats, patah, hataf patah, tsere, segol, hataf segol, hiriq, holam, shuruq, qubuts, hataf qamats, sheva) and\n", | |
"> 2. Does not contain a shem kodesh.\n", | |
"> \n", | |
"> Looking to use it in an initial reading assessment for some students. Thanks!" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 1, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"import xml.etree.ElementTree as ET\n", | |
"import re" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 2, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"datadir = '../'\n", | |
"vowelnames = {'qamats': u'\\u05B8', 'patah': u'\\u05B7', 'hataf patah': u'\\u05B2', 'tsere': u'\\u05B5', 'segol': u'\\u05B6', 'hataf segol': u'\\u05B1', 'hiriq': u'\\u05B4', 'holam': u'\\u05B9', 'shuruq': 'וּ', 'qubuts': u'\\u05BB', 'hataf qamats': u'\\u05B3', 'sheva': u'\\u05B0'}\n", | |
"letters = 'אבגדהוזחטיכךלמםנןסעפףצץקרשת '" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 3, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"sfarim = ['bereshit', 'shmot', 'vayikra', 'bmidbar', 'dvarim']\n", | |
"\n", | |
"data = {} # so you can inspect manually if you want\n", | |
"results = []\n", | |
"\n", | |
"for sefer in sfarim:\n", | |
" data[sefer] = {}\n", | |
" tree = ET.parse(datadir + sefer + '.xml')\n", | |
" root = tree.getroot() \n", | |
" prakim = root.findall('.//c')\n", | |
" for perek in prakim:\n", | |
" pereknum = int(perek.attrib['n'])\n", | |
" if pereknum not in data[sefer]: data[sefer][pereknum] = {}\n", | |
" psukim = perek.findall('v')\n", | |
" for pasuk in psukim:\n", | |
" pasuknum = int(pasuk.attrib['n'])\n", | |
" if pasuknum not in data[sefer][pereknum]:\n", | |
" data[sefer][pereknum][pasuknum] = {}\n", | |
" text = [w.text for w in pasuk if w.tag=='w' or w.tag=='q']\n", | |
" words = [''.join(list(filter(lambda c: c in letters, w))) for w in text]\n", | |
" vowels = re.findall(r'|'.join(vowelnames.values()), ' '.join(text)) # because shuruq is actually 2 chars\n", | |
" data[sefer][pereknum][pasuknum]['text'] = text\n", | |
" data[sefer][pereknum][pasuknum]['words'] = words\n", | |
" data[sefer][pereknum][pasuknum]['vowels'] = vowels\n", | |
"\n", | |
" if all([v in vowels for v in vowelnames.values()]): results.append((sefer, pereknum, pasuknum))" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 4, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"[('shmot', 32, 6), ('vayikra', 22, 3)]" | |
] | |
}, | |
"execution_count": 4, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"results" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Turns out there are only two results in the Torah, so I'm not bothering to filter for *shem kodesh*.\n", | |
"\n", | |
"Manual inspection shows that **[Shmot 32:6](https://www.sefaria.org/Exodus.32.6?lang=he)** is the answer.\n", | |
"\n", | |
"If I were to add the rest of Tanakh (just a matter of downloading data from tanach.us), it might make sense to filter for *shem kodesh* automatically." | |
] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Python 3", | |
"language": "python", | |
"name": "python3" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython3", | |
"version": "3.6.3" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 1 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment