Skip to content

Instantly share code, notes, and snippets.

View jonashaag's full-sized avatar

Jonas Haag jonashaag

View GitHub Profile
@jonashaag
jonashaag / enum_with_label.py
Created July 1, 2024 07:29
Python Enum with label / verbose name / description
import enum
class EnumWithDisplayName(enum.Enum):
def __new__(cls, value, name=None):
if not hasattr(cls, "_display_names"):
cls._display_names = {}
if name is not None:
if value in cls._display_names:
raise NotImplementedError(f"'{cls.__name__}' values must be unique")
@jonashaag
jonashaag / tesseract-finetune.md
Last active June 9, 2024 18:19
Tesseract LSTM fine-tuning how-to
  1. Download lots of fonts (eg., .ttf files)
  2. git clone https://github.com/tesseract-ocr/tesstrain/
  3. git clone https://github.com/tesseract-ocr/langdata_lstm
  4. Install Tesseract
  5. Generate training data:
    cd src
    python -m tesstrain \
      --langdata_dir /path/to/langdata_lstm \
      --linedata_only \
    
@jonashaag
jonashaag / prompt.txt
Last active June 3, 2024 11:12
OVH AI Endpoint failure
I'm going to present you with a piece of text. Please classify it according to the classes outlined after the text.
The text:
"""
When should you choose MongoDB over a relational database management system (RDBMS) like MySQL?
By Dimitri Fague / 2024-05-23 / Databases, DBaaS, MongoDB, OVHcloud, Public Cloud
Of all the non-relational database engines (NoSQL) that have emerged in the last decade, MongoDB is without a doubt the most widely used. Source-available, powerful, flexible and scalable, MongoDB covers a wide range of use cases. Many, including startups, choose it to ensure they are not limited in their technological choices, so they can scale and adapt to different use cases. The possibility of switching from MySQL to MongoDB might come up when updating or revamping an existing app. So, let’s see when this switch might be relevant, and why using the MongoDB service managed by OVHcloud could be the ideal option.
The flexibility of the NoSQL data model
@jonashaag
jonashaag / xkcdpass.sh
Created February 21, 2024 10:17
XKCD password
curl -L https://raw.githubusercontent.com/redacted/XKCD-password-generator/master/xkcdpass/static/eff-long \
| sort -R \
| head -n 5 \
| tr '\n' -
@jonashaag
jonashaag / sp_count.sql
Last active January 8, 2024 16:35
SQL Server quickly count number of rows in table
-- Count number of rows in a table quickly (without a full table/index scan).
-- Usage:
-- sp_count 'mydb.dbo.mytable' Get the row count of the given table.
-- sp_count 'dbo.mytable' Get the row count of the given table from the current database.
-- sp_count Get a list of tables and row counts in the current database.
USE [master]
GO
DROP PROCEDURE IF EXISTS [dbo].[sp_count]
@jonashaag
jonashaag / snowflake_unload_parquet.py
Created October 8, 2023 18:22
Snowflake Connector Python download table or query as Parquet
def unload_to_parquet(query: str, target_dir: Path, conn, stage_name: str = "unload_stage"):
conn.execute(f"CREATE TEMP STAGE {stage_name}")
conn.execute(f"COPY INTO @{stage_Name} FROM ({query}) file_format=(type='parquet') header=true")
target_dir.mkdir(parents=True)
conn.execute(f"GET @{stage_name} file://{str(target_dir)}")
@jonashaag
jonashaag / pd_shrink_dtypes.py
Created September 11, 2023 12:09
Pandas shrink dtypes
import numpy as np
import pandas as pd
from pandas.api.types import is_numeric_dtype
from pandas.core.dtypes.base import ExtensionDtype
def shrink_dtype(series: pd.Series) -> pd.Series:
smallest_dtype = get_smallest_dtype(series)
if smallest_dtype == series.dtype:
return series
import json
import sqlite3
repodata = json.load(open("497deca9.json"))
COLS = 'filename, build, build_number, depends, license, license_family, md5, name, sha256, size, subdir, timestamp, version'.split(', ')
db = sqlite3.connect("497deca9.sqlite")
db.execute("create table repodata ({}, primary key (filename))".format(','.join(COLS)))
@jonashaag
jonashaag / compiler.py
Last active June 30, 2023 12:31
Cython prematcher compiler
import textwrap
from dataclasses import dataclass
@dataclass
class Pattern:
pattern: str
prematchers: list[str]
@jonashaag
jonashaag / restarter.sh
Created June 12, 2023 13:53
Bash regularly restart a program
#!/bin/bash
set -euo pipefail
if [ $# -lt 2 ]; then
echo "Usage: $0 SCHEDULE PROG [ARGS]..." >&2
echo "SCHEDULE is used in 'date -d <SCHEDULE>'." >&2
echo "Example: $0 '1 hour' myprog --arg" >&2
exit 1
fi