Skip to content

Instantly share code, notes, and snippets.

@shantanuo
Created November 27, 2023 08:55
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save shantanuo/adae2c76cadffb4c485e9ce1fe3bdfc6 to your computer and use it in GitHub Desktop.
Save shantanuo/adae2c76cadffb4c485e9ce1fe3bdfc6 to your computer and use it in GitHub Desktop.
GIta Repeat phrases
# copy paste plain text from site to file gg.txt
# https://sanskritdocuments.org/doc_giitaa/bhagvadnew.html
# Bhagvad Gita takes around 4 hours to scan
import re, sys
with open('gg.txt') as f:
mytext = f.read()
mytext2 = re.sub('\n', ' ', mytext)
y = re.sub('[॥।\-०१२३४५६७८९]', ' ', mytext2)
z = re.sub('\s+', ' ', y)
response = re.findall(r'(.{7,1200})(?=.*\1)', z, re.S)
sys.stdout = open('result.txt', 'w')
for p in response:
if len(p) > 15:
hidden_letters = re.findall(r".*"+p+".*(?:\n\S.*)*", mytext)
print (p)
print(list(filter(None,hidden_letters)))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment