Skip to content

Instantly share code, notes, and snippets.

@Noleli
Created December 10, 2017 18:41
Show Gist options
  • Save Noleli/ed594e6ebe8d6cae044e05a953547ee5 to your computer and use it in GitHub Desktop.
Save Noleli/ed594e6ebe8d6cae044e05a953547ee5 to your computer and use it in GitHub Desktop.
Searching for a pasuk with all the vowels
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Lawrence Szenes-Strauss posted the following question on [Facebook](https://www.facebook.com/groups/1071696109619922/permalink/1558249010964627/):\n",
"\n",
"> Who can come up with a short passage of Tanakh that:\n",
"> 1. Contains all 12 distinct Masoretic vowel marks (qamats, patah, hataf patah, tsere, segol, hataf segol, hiriq, holam, shuruq, qubuts, hataf qamats, sheva) and\n",
"> 2. Does not contain a shem kodesh.\n",
"> \n",
"> Looking to use it in an initial reading assessment for some students. Thanks!"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import xml.etree.ElementTree as ET\n",
"import re"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"datadir = '../'\n",
"vowelnames = {'qamats': u'\\u05B8', 'patah': u'\\u05B7', 'hataf patah': u'\\u05B2', 'tsere': u'\\u05B5', 'segol': u'\\u05B6', 'hataf segol': u'\\u05B1', 'hiriq': u'\\u05B4', 'holam': u'\\u05B9', 'shuruq': 'וּ', 'qubuts': u'\\u05BB', 'hataf qamats': u'\\u05B3', 'sheva': u'\\u05B0'}\n",
"letters = 'אבגדהוזחטיכךלמםנןסעפףצץקרשת '"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"sfarim = ['bereshit', 'shmot', 'vayikra', 'bmidbar', 'dvarim']\n",
"\n",
"data = {} # so you can inspect manually if you want\n",
"results = []\n",
"\n",
"for sefer in sfarim:\n",
" data[sefer] = {}\n",
" tree = ET.parse(datadir + sefer + '.xml')\n",
" root = tree.getroot() \n",
" prakim = root.findall('.//c')\n",
" for perek in prakim:\n",
" pereknum = int(perek.attrib['n'])\n",
" if pereknum not in data[sefer]: data[sefer][pereknum] = {}\n",
" psukim = perek.findall('v')\n",
" for pasuk in psukim:\n",
" pasuknum = int(pasuk.attrib['n'])\n",
" if pasuknum not in data[sefer][pereknum]:\n",
" data[sefer][pereknum][pasuknum] = {}\n",
" text = [w.text for w in pasuk if w.tag=='w' or w.tag=='q']\n",
" words = [''.join(list(filter(lambda c: c in letters, w))) for w in text]\n",
" vowels = re.findall(r'|'.join(vowelnames.values()), ' '.join(text)) # because shuruq is actually 2 chars\n",
" data[sefer][pereknum][pasuknum]['text'] = text\n",
" data[sefer][pereknum][pasuknum]['words'] = words\n",
" data[sefer][pereknum][pasuknum]['vowels'] = vowels\n",
"\n",
" if all([v in vowels for v in vowelnames.values()]): results.append((sefer, pereknum, pasuknum))"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[('shmot', 32, 6), ('vayikra', 22, 3)]"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"results"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Turns out there are only two results in the Torah, so I'm not bothering to filter for *shem kodesh*.\n",
"\n",
"Manual inspection shows that **[Shmot 32:6](https://www.sefaria.org/Exodus.32.6?lang=he)** is the answer.\n",
"\n",
"If I were to add the rest of Tanakh (just a matter of downloading data from tanach.us), it might make sense to filter for *shem kodesh* automatically."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.3"
}
},
"nbformat": 4,
"nbformat_minor": 1
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment