Skip to content

Instantly share code, notes, and snippets.

View anezih's full-sized avatar

Nezih Aşula anezih

View GitHub Profile
@anezih
anezih / unmunch.py
Created April 6, 2024 00:20 — forked from zverok/unmunch.py
"Unmunch" (linearize) word list from Hunspell dictionary with the help of Spylls library
# This is "unmunching" script for Hunspell dictionaries, based on Spylls (full Python port of Hunspell):
# https://github.com/zverok/spylls
#
# "Unmunching" (Hunspell's term) is a process of turning affix-compressed dictionary into plain list
# of all language's words. E.g. for English, in the dictionary we have "spell/JSMDRZG" (stem + flags
# declaring what suffixes and prefixes it might have), and we can run this script:
#
# python unmunch.py path/to/en_US spell
#
# Which will produce this list:
@anezih
anezih / TrCase.md
Last active January 4, 2024 15:36
Türkçe metinlerde büyük/küçük harf değiştiren Python sınıfı

Türkçe metinleri doğru bir şekilde tamamı küçük harf (lowercase), tamamı büyük harf (uppercase) ve sözcüklerin ilk harflerinin büyük olduğu (title case) biçimlerine dönüştürmeye yarayan Python sınıfı. Bu sınıf Python'un büyük İ ve küçük ı konusunda yaşadığı beceriksizliği göz önüne alır. Ayrıca, metni başlık biçimine çevirirken "ve"nin küçük harfle başlamasına, tireden sonra gelen harfin de büyük harf olmasına özen gösterir (Kuruluş isimlerini dönüştürürken önemli).

@anezih
anezih / words.cs
Created April 24, 2023 17:34 — forked from aarondandy/words.cs
Make words
using System.Collections.Generic;
using System.Linq;
using Hunspell;
namespace ConsoleApplication5
{
class Program
{
static void Main(string[] args)
{
function unxor {
param (
[string]$text
)
$enc = [System.Text.Encoding]::BigEndianUnicode
$key = $enc.GetBytes("ş")
$bytes = $enc.GetBytes($text)
$cycle = 0
for ($i = 0; $i -lt $bytes.Length; $i++) {
import os
import re
import sys
import zlib
from struct import unpack
OFFSET = 0x6B4
"""
@anezih
anezih / dictzip.py
Last active April 23, 2023 17:31
Read dictzip files without having to decompress them first. Useful for StarDict .dict.dz files.
# https://github.com/anezih/StarDictNet/blob/main/StarDictNet/DictZip.cs
# https://framagit.org/tuxor1337/dictzip.js/-/blob/main/dictzip_sync.js
import os
import zlib
from dataclasses import dataclass
from struct import unpack
class SUBFIELD:
@anezih
anezih / zemberek_python_word_gen.py
Created February 20, 2023 18:48
zemberek-python ile sözcük üretme (Word Generation)
import pprint
from zemberek.morphology import TurkishMorphology
from zemberek.morphology.lexicon import RootLexicon
from zemberek.morphology.morphotactics.turkish_morphotactics import get_morpheme_map
MORPHOLOGY = TurkishMorphology.create_with_defaults()
def generate_noun(word, morphology):
morpheme_map = get_morpheme_map()
@anezih
anezih / HunspellWordForms.csproj
Last active March 27, 2023 01:10
[OBSOLETE: Use https://github.com/anezih/HunspellWordForms] Hunspell's wordforms and unmunch defined in a single class library. Put WeCantSpell.Hunspell and compiled WordForms.cs dll in the same directory with Unmunch.ps1
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<TargetFramework>net7.0</TargetFramework>
<Version>1.4.0</Version>
</PropertyGroup>
<ItemGroup Condition=" '$(TargetFramework)' == 'net7.0' ">
<PackageReference Include="WeCantSpell.Hunspell" Version="4.0.0" />
</ItemGroup>