anityāni śarīrāni vibhavo naiva śāśvataḥ | nityaṁ saṁnihitō mr̥tyuḥ kartavyō dharmasaṁgrahaḥ ||
jāte jagati vālmīkau śabdaḥ kavir iti sthitaḥ | vyāse jāte kavī ceti kavayaś ceti daṇḍini || http://prakrit.info/sanskrit/readings/sumu-4-75.html
import cv2 | |
import glob | |
import os | |
import math | |
import numpy as np | |
# Parameters: | |
# xgive is the number of pixels of "give" when determining whether a line of text | |
# belongs to the main text or commentary, when they are discriminated based on | |
# width. |
import argparse | |
import sanscript | |
from os.path import abspath | |
from os.path import splitext | |
from os.path import basename | |
import sys | |
parser = argparse.ArgumentParser(formatter_class=argparse.RawDescriptionHelpFormatter,description="Convert a file from one script to another using Sanscript and print the output to stdout.\n\nAvailable scripts are: bengali, devanagari, gujarati, gurmukhi, kannada, malayalam, oriya, tamil, telugu, hk, iast, iso, itrans, kolkata, slp1, velthuis, wx.") | |
parser.add_argument("filename",nargs="+",type=str,help="The input file") | |
parser.add_argument("fr",nargs="+",type=str,help="The input script") |
# Usage: | |
# | |
# python3 check.py INPUT_FILE | |
# | |
# where INPUT_FILE is just the name of the file to check. | |
# | |
# This scripts expects that INPUT_FILE will contain one of | |
# the following three strings: | |
# kittel = Kittel's Kannada-English dictionary | |
# ghatage = Ghatage's Prakrit-English dictionary |
# Uses Tesseract (tesserocr) to recognize files | |
# in a directory. I use this as follows: | |
# | |
# 1. split the PDF into JPG images in a directory | |
# called "IMAGES" (e.g., pdftoppm -jpeg input.pdf IMAGES/output) | |
# 2. run this script (python OSD.py), which will | |
# produce a text file for each image in "IMAGES". | |
# 3. concatenate the text files with tail | |
# 4. if desired, |
"""OCR with PDF/TIFF as source files on GCS""" | |
# USAGE: python text_detect.py SOURCE_FILE OUTPUT_FILE | |
# Note that both SOURCE_FILE and OUTPUT_FILE must be | |
# in the Google Cloud bucket. For example: | |
# | |
# python text_detect.py gs://project-name/file.pdf gs://project-name/read | |
# | |
# The API will gather the responses for each page into | |
# a JSON file on the Google Cloud bucket, e.g. | |
# OUTPUT_FILE-output-1-to-1.json. |
# -*- coding: utf-8 -*- | |
""" Usage: python3 versify.py FILENAME """ | |
""" Results in FILENAME.log (errors and statistics) | |
and FILENAME.json (a json file of metrically parsed text) """ | |
""" This program expects the text to be in the format | |
represented by the GRETIL Kuṟuntokai | |
(http://gretil.sub.uni-goettingen.de/gretil/4_drav/tamil/pm/pm110__u.htm) | |
namely: the |
# -*- coding: utf-8 -*- | |
""" Usage: python3 verify_gatha.py FILENAME """ | |
""" Results in FILENAME.err (a list of errors) | |
and FILENAME.log (metrical data) """ | |
""" Take a file in UTF-8 encoding, in the ISO-15919 transliteration | |
scheme, and try to scan its verses and match them against the | |
canonical pattern of the Prakrit gāthā. If there are any errors, | |
output them to an error file. """ |
{ | |
"data": { | |
"General": { | |
"intro": "<p>These are general terms which apply to various aspects of Sanskrit grammar.</p>", | |
"glossary": [ | |
{ | |
"id" : "prakrtih", | |
"skt" : "prakr̥tiḥ", | |
"eng" : "base", | |
"comm" : "That to which an <a href='#pratyayah'>affix</a> is added. A base can be nominal (see <a href='#pratipadikam'>nominal stem</a>) or verbal. This is a synonym of <a href='#angam'>aṅgam</a>." |
anityāni śarīrāni vibhavo naiva śāśvataḥ | nityaṁ saṁnihitō mr̥tyuḥ kartavyō dharmasaṁgrahaḥ ||
jāte jagati vālmīkau śabdaḥ kavir iti sthitaḥ | vyāse jāte kavī ceti kavayaś ceti daṇḍini || http://prakrit.info/sanskrit/readings/sumu-4-75.html