Skip to content

Instantly share code, notes, and snippets.

Avatar
🏠
Working from home

Santhosh Thottingal santhoshtr

🏠
Working from home
View GitHub Profile
@santhoshtr
santhoshtr / corpus-cleanup-malayalam.sed
Created Feb 28, 2020
Malayalam corpus cleanup script
View corpus-cleanup-malayalam.sed
# Misc clean up on corpus
# sed -i -f corpora-cleanup.sed corpus/*.txt
# Chillu normalization
s/ന്‍//g
s/ള്‍//g
s/ല്‍//g
s/ര്‍//g
s/ന്‍//g
s/ണ്‍//g
# Remove ZWNJ at end of words
@santhoshtr
santhoshtr / process.js
Created Feb 21, 2020
process ligatures and glyphs for Manjari
View process.js
const glyphs = require('./glyphs.json').glyphs
const ligatures = require('./ligatures.json').ligatures
const getGlyphValue = (glyphname) => {
const glyph = glyphs.find(g => g.glyph === glyphname);
return glyph && glyph.value;
}
const process = () => {
const ligaturesLength = ligatures.length;
@santhoshtr
santhoshtr / KeralaPRDHeadlinesCrawler.py
Created Oct 26, 2019
Crawl Kerala PRD website and download all content to json
View KeralaPRDHeadlinesCrawler.py
import scrapy
from scrapy.http import Request
class HeadlineCatcher(scrapy.Spider):
name = "headlinecatcher"
start_urls = ["http://www.prd.kerala.gov.in/pressrelease"]
custom_settings = {
'FEED_EXPORT_ENCODING': 'utf-8',
}
View keybase.md

Keybase proof

I hereby claim:

  • I am santhoshtr on github.
  • I am sthottingal (https://keybase.io/sthottingal) on keybase.
  • I have a public key ASAP_nrhFC103eL1sF9vFA9M4mrxkfvudZ2I-Bd9kiukOgo

To claim this, I am signing this object:

@santhoshtr
santhoshtr / hd-playlist-audio-downloader.sh
Created Jul 9, 2018
HD Audio download from a youtube playlist
View hd-playlist-audio-downloader.sh
youtube-dl -f bestaudio --extract-audio --audio-format mp3 --audio-quality 0 -o "%(title)s.%(ext)s" https://www.youtube.com/playlist?list=abdlshfjskdhfuwhrklk
@santhoshtr
santhoshtr / ICUStingComparison.py
Created Apr 6, 2018
ICU based string comparison using various collation strengths
View ICUStingComparison.py
from icu import Locale, Collator as ICUCollator
import locale
collator = ICUCollator.createInstance(Locale("ml_IN"))
word1="അവൻ"
word2="അവ‍ന്\u200d" # "അവന്"
collator.setStrength(ICUCollator.PRIMARY);
print("[ICU] Are they primary equal? ", collator.compare(word1, word2))
collator.setStrength(ICUCollator.SECONDARY);
print("[ICU] Are they secondary equal? ", collator.compare(word1, word2))
@santhoshtr
santhoshtr / index.html
Created Dec 10, 2017
Malayalam number parser
View index.html
<div class="container">
<input id="num" type="number" placeholder="Enter a number" />
<div id="result"></div>
<div id="analysis"></div>
</div>
@santhoshtr
santhoshtr / Collator.py
Last active Sep 18, 2017
Malayalam Collator
View Collator.py
#!/usr/bin/python3
import sys
import gi
gi.require_version('Gtk', '3.0')
from gi.repository import Gtk, Gio, GLib, Pango
import locale
from pyuca import Collator
from icu import UnicodeString, Locale, Collator as ICUCollator
@santhoshtr
santhoshtr / Malayalam-Syllable.peg
Last active May 27, 2017
Malayalam Syllable Model using PEG
View Malayalam-Syllable.peg
Word = Syllable+
Syllable = s:( Vowel
/ Chillu
/ ( Conjunct / Consonant ) Signs
/ ZWNJ
) {
if ( Array.isArray( s ) ) {
return s.join( '' )
}
return s
@santhoshtr
santhoshtr / Malayalam-Conjunct.peg
Last active May 21, 2017
Malayalam Conjuct defined in Parser expression grammar
View Malayalam-Conjunct.peg
Conjunct = Consonant Virama (Conjunct / Consonant )
Consonant = [കഖഗഘങചഛജഝഞടഠഡഢണതഥദധനപഫബഭമയരലവശഷസഹളഴറ]
Virama = [്]