Jonas Haag jonashaag

## enum_with_label.py
import enum


class EnumWithDisplayName(enum.Enum):
    def __new__(cls, value, name=None):
        if not hasattr(cls, "_display_names"):
            cls._display_names = {}
        if name is not None:
            if value in cls._display_names:
                raise NotImplementedError(f"'{cls.__name__}' values must be unique")

## tesseract-finetune.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                jonashaag
                / tesseract-finetune.md
            
            
              Last active
              June 9, 2024 18:19
            
              
                Tesseract LSTM fine-tuning how-to
              
          
Download lots of fonts (eg., .ttf files)
git clone https://github.com/tesseract-ocr/tesstrain/
git clone https://github.com/tesseract-ocr/langdata_lstm
Install Tesseract
Generate training data:
cd src
python -m tesstrain \
  --langdata_dir /path/to/langdata_lstm \
  --linedata_only \


## prompt.txt
I'm going to present you with a piece of text. Please classify it according to the classes outlined after the text.

The text:

"""
When should you choose MongoDB over a relational database management system (RDBMS) like MySQL?
By Dimitri Fague / 2024-05-23 / Databases, DBaaS, MongoDB, OVHcloud, Public Cloud
Of all the non-relational database engines (NoSQL) that have emerged in the last decade, MongoDB is without a doubt the most widely used. Source-available, powerful, flexible and scalable, MongoDB covers a wide range of use cases. Many, including startups, choose it to ensure they are not limited in their technological choices, so they can scale and adapt to different use cases. The possibility of switching from MySQL to MongoDB might come up when updating or revamping an existing app. So, let’s see when this switch might be relevant, and why using the MongoDB service managed by OVHcloud could be the ideal option.

The flexibility of the NoSQL data model

## xkcdpass.sh
curl -L https://raw.githubusercontent.com/redacted/XKCD-password-generator/master/xkcdpass/static/eff-long \
  | sort -R \
  | head -n 5 \
  | tr '\n' -

## sp_count.sql
-- Count number of rows in a table quickly (without a full table/index scan).
-- Usage:
-- sp_count 'mydb.dbo.mytable'    Get the row count of the given table.
-- sp_count 'dbo.mytable'         Get the row count of the given table from the current database.
-- sp_count                       Get a list of tables and row counts in the current database.

USE [master]
GO

DROP PROCEDURE IF EXISTS [dbo].[sp_count]

## snowflake_unload_parquet.py
def unload_to_parquet(query: str, target_dir: Path, conn, stage_name: str = "unload_stage"):
    conn.execute(f"CREATE TEMP STAGE {stage_name}")
    conn.execute(f"COPY INTO @{stage_Name} FROM ({query}) file_format=(type='parquet') header=true")
    target_dir.mkdir(parents=True)
    conn.execute(f"GET @{stage_name} file://{str(target_dir)}")

## pd_shrink_dtypes.py
import numpy as np
import pandas as pd
from pandas.api.types import is_numeric_dtype
from pandas.core.dtypes.base import ExtensionDtype


def shrink_dtype(series: pd.Series) -> pd.Series:
    smallest_dtype = get_smallest_dtype(series)
    if smallest_dtype == series.dtype:
        return series

## makedb.py
import json
import sqlite3

repodata = json.load(open("497deca9.json"))

COLS = 'filename, build, build_number, depends, license, license_family, md5, name, sha256, size, subdir, timestamp, version'.split(', ')

db = sqlite3.connect("497deca9.sqlite")
db.execute("create table repodata ({}, primary key (filename))".format(','.join(COLS)))

## compiler.py
import textwrap
from dataclasses import dataclass


@dataclass
class Pattern:
    pattern: str
    prematchers: list[str]


## restarter.sh
#!/bin/bash

set -euo pipefail

if [ $# -lt 2 ]; then
  echo "Usage: $0 SCHEDULE PROG [ARGS]..." >&2
  echo "SCHEDULE is used in 'date -d <SCHEDULE>'." >&2
  echo "Example: $0 '1 hour' myprog --arg" >&2
  exit 1
fi
	import enum


	class EnumWithDisplayName(enum.Enum):
	def __new__(cls, value, name=None):
	if not hasattr(cls, "_display_names"):
	cls._display_names = {}
	if name is not None:
	if value in cls._display_names:
	raise NotImplementedError(f"'{cls.__name__}' values must be unique")
	I'm going to present you with a piece of text. Please classify it according to the classes outlined after the text.

	The text:

	"""
	When should you choose MongoDB over a relational database management system (RDBMS) like MySQL?
	By Dimitri Fague / 2024-05-23 / Databases, DBaaS, MongoDB, OVHcloud, Public Cloud
	Of all the non-relational database engines (NoSQL) that have emerged in the last decade, MongoDB is without a doubt the most widely used. Source-available, powerful, flexible and scalable, MongoDB covers a wide range of use cases. Many, including startups, choose it to ensure they are not limited in their technological choices, so they can scale and adapt to different use cases. The possibility of switching from MySQL to MongoDB might come up when updating or revamping an existing app. So, let’s see when this switch might be relevant, and why using the MongoDB service managed by OVHcloud could be the ideal option.

	The flexibility of the NoSQL data model
	curl -L https://raw.githubusercontent.com/redacted/XKCD-password-generator/master/xkcdpass/static/eff-long \
	\| sort -R \
	\| head -n 5 \
	\| tr '\n' -
	-- Count number of rows in a table quickly (without a full table/index scan).
	-- Usage:
	-- sp_count 'mydb.dbo.mytable' Get the row count of the given table.
	-- sp_count 'dbo.mytable' Get the row count of the given table from the current database.
	-- sp_count Get a list of tables and row counts in the current database.

	USE [master]
	GO

	DROP PROCEDURE IF EXISTS [dbo].[sp_count]
	def unload_to_parquet(query: str, target_dir: Path, conn, stage_name: str = "unload_stage"):
	conn.execute(f"CREATE TEMP STAGE {stage_name}")
	conn.execute(f"COPY INTO @{stage_Name} FROM ({query}) file_format=(type='parquet') header=true")
	target_dir.mkdir(parents=True)
	conn.execute(f"GET @{stage_name} file://{str(target_dir)}")
	import numpy as np
	import pandas as pd
	from pandas.api.types import is_numeric_dtype
	from pandas.core.dtypes.base import ExtensionDtype


	def shrink_dtype(series: pd.Series) -> pd.Series:
	smallest_dtype = get_smallest_dtype(series)
	if smallest_dtype == series.dtype:
	return series
	import json
	import sqlite3

	repodata = json.load(open("497deca9.json"))

	COLS = 'filename, build, build_number, depends, license, license_family, md5, name, sha256, size, subdir, timestamp, version'.split(', ')

	db = sqlite3.connect("497deca9.sqlite")
	db.execute("create table repodata ({}, primary key (filename))".format(','.join(COLS)))
	import textwrap
	from dataclasses import dataclass


	@dataclass
	class Pattern:
	pattern: str
	prematchers: list[str]
	#!/bin/bash

	set -euo pipefail

	if [ $# -lt 2 ]; then
	echo "Usage: $0 SCHEDULE PROG [ARGS]..." >&2
	echo "SCHEDULE is used in 'date -d <SCHEDULE>'." >&2
	echo "Example: $0 '1 hour' myprog --arg" >&2
	exit 1
	fi