Skip to content

Instantly share code, notes, and snippets.

@aso2101
aso2101 / transliterator.py
Created August 24, 2023 17:54
transliterator.py
View transliterator.py
import argparse
import sanscript
from os.path import abspath
from os.path import splitext
from os.path import basename
import sys
parser = argparse.ArgumentParser(formatter_class=argparse.RawDescriptionHelpFormatter,description="Convert a file from one script to another using Sanscript and print the output to stdout.\n\nAvailable scripts are: bengali, devanagari, gujarati, gurmukhi, kannada, malayalam, oriya, tamil, telugu, hk, iast, iso, itrans, kolkata, slp1, velthuis, wx.")
parser.add_argument("filename",nargs="+",type=str,help="The input file")
parser.add_argument("fr",nargs="+",type=str,help="The input script")
@aso2101
aso2101 / check-by-sorting.py
Last active October 29, 2021 01:48
Script for checking headwords by alphabetical order
View check-by-sorting.py
# Usage:
#
# python3 check.py INPUT_FILE
#
# where INPUT_FILE is just the name of the file to check.
#
# This scripts expects that INPUT_FILE will contain one of
# the following three strings:
# kittel = Kittel's Kannada-English dictionary
# ghatage = Ghatage's Prakrit-English dictionary
@aso2101
aso2101 / OSD.py
Created June 12, 2020 13:43
Python script for OCR using Tesseract
View OSD.py
# Uses Tesseract (tesserocr) to recognize files
# in a directory. I use this as follows:
#
# 1. split the PDF into JPG images in a directory
# called "IMAGES" (e.g., pdftoppm -jpeg input.pdf IMAGES/output)
# 2. run this script (python OSD.py), which will
# produce a text file for each image in "IMAGES".
# 3. concatenate the text files with tail
# 4. if desired,
@aso2101
aso2101 / text_detect.py
Last active August 24, 2023 17:43
Python script for OCR (Google Cloud Vision API)
View text_detect.py
"""OCR with PDF/TIFF as source files on GCS"""
# USAGE: python text_detect.py SOURCE_FILE OUTPUT_FILE
# Note that both SOURCE_FILE and OUTPUT_FILE must be
# in the Google Cloud bucket. For example:
#
# python text_detect.py gs://project-name/file.pdf gs://project-name/read
#
# The API will gather the responses for each page into
# a JSON file on the Google Cloud bucket, e.g.
# OUTPUT_FILE-output-1-to-1.json.
@aso2101
aso2101 / kannadaMap.json
Created February 27, 2020 16:01
GEOJson data for the place-names mentioned in the Way of the Poet King (Kavirājamārgaṁ)
View kannadaMap.json
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@aso2101
aso2101 / versify.py
Last active April 6, 2019 21:36
A python script for parsing Tamil verse into metrical units
View versify.py
# -*- coding: utf-8 -*-
""" Usage: python3 versify.py FILENAME """
""" Results in FILENAME.log (errors and statistics)
and FILENAME.json (a json file of metrically parsed text) """
""" This program expects the text to be in the format
represented by the GRETIL Kuṟuntokai
(http://gretil.sub.uni-goettingen.de/gretil/4_drav/tamil/pm/pm110__u.htm)
namely: the
@aso2101
aso2101 / verify_gatha.py
Last active June 14, 2019 20:01
A Python script for verifying that a Prakrit gāthā is metrically correct.
View verify_gatha.py
# -*- coding: utf-8 -*-
""" Usage: python3 verify_gatha.py FILENAME """
""" Results in FILENAME.err (a list of errors)
and FILENAME.log (metrical data) """
""" Take a file in UTF-8 encoding, in the ISO-15919 transliteration
scheme, and try to scan its verses and match them against the
canonical pattern of the Prakrit gāthā. If there are any errors,
output them to an error file. """
@aso2101
aso2101 / sgt.json
Last active November 17, 2017 05:33
Sanskrit Grammatical Terminology
View sgt.json
{
"data": {
"General": {
"intro": "<p>These are general terms which apply to various aspects of Sanskrit grammar.</p>",
"glossary": [
{
"id" : "prakrtih",
"skt" : "prakr̥tiḥ",
"eng" : "base",
"comm" : "That to which an <a href='#pratyayah'>affix</a> is added. A base can be nominal (see <a href='#pratipadikam'>nominal stem</a>) or verbal. This is a synonym of <a href='#angam'>aṅgam</a>."
@aso2101
aso2101 / verses_notes.org
Last active October 30, 2017 21:02
A collection of Sanskrit verses for beginning students
View verses_notes.org

[X] Ind.Sp. 101 = PaTa.3.96 :simple:nominal:kr̥tya:

anityāni śarīrāni vibhavo naiva śāśvataḥ | nityaṁ saṁnihitō mr̥tyuḥ kartavyō dharmasaṁgrahaḥ ||

[X] Sūktimuktāvalī 4.75 :singular:dual:plural:satisaptamī:

jāte jagati vālmīkau śabdaḥ kavir iti sthitaḥ | vyāse jāte kavī ceti kavayaś ceti daṇḍini || http://prakrit.info/sanskrit/readings/sumu-4-75.html

[ ] Unknown 1

@aso2101
aso2101 / in
Created September 26, 2017 15:38
Andrew's keyboard layout for typing Sanskrit (and other languages) in transliteration
View in
// --- BEGIN Sanskrit (Ollett) ---
partial alphanumeric_keys
xkb_symbols "san-trans" {
name[Group1] = "Sanskrit (Transliteration)";
key.type="FOUR_LEVEL";
// Roman digits
key <TLDE> { [ apostrophe, asciitilde, dead_grave, dead_tilde ] };
key <AE01> { [ 1, exclam, U0323 ] };
key <AE02> { [ 2, at, U0324 ] };