- Download lots of fonts (eg.,
.ttf
files) git clone https://github.com/tesseract-ocr/tesstrain/
git clone https://github.com/tesseract-ocr/langdata_lstm
- Install Tesseract
- Generate training data:
cd src python -m tesstrain \ --langdata_dir /path/to/langdata_lstm \ --linedata_only \
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import enum | |
class EnumWithDisplayName(enum.Enum): | |
def __new__(cls, value, name=None): | |
if not hasattr(cls, "_display_names"): | |
cls._display_names = {} | |
if name is not None: | |
if value in cls._display_names: | |
raise NotImplementedError(f"'{cls.__name__}' values must be unique") |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
I'm going to present you with a piece of text. Please classify it according to the classes outlined after the text. | |
The text: | |
""" | |
When should you choose MongoDB over a relational database management system (RDBMS) like MySQL? | |
By Dimitri Fague / 2024-05-23 / Databases, DBaaS, MongoDB, OVHcloud, Public Cloud | |
Of all the non-relational database engines (NoSQL) that have emerged in the last decade, MongoDB is without a doubt the most widely used. Source-available, powerful, flexible and scalable, MongoDB covers a wide range of use cases. Many, including startups, choose it to ensure they are not limited in their technological choices, so they can scale and adapt to different use cases. The possibility of switching from MySQL to MongoDB might come up when updating or revamping an existing app. So, let’s see when this switch might be relevant, and why using the MongoDB service managed by OVHcloud could be the ideal option. | |
The flexibility of the NoSQL data model |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
curl -L https://raw.githubusercontent.com/redacted/XKCD-password-generator/master/xkcdpass/static/eff-long \ | |
| sort -R \ | |
| head -n 5 \ | |
| tr '\n' - |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
-- Count number of rows in a table quickly (without a full table/index scan). | |
-- Usage: | |
-- sp_count 'mydb.dbo.mytable' Get the row count of the given table. | |
-- sp_count 'dbo.mytable' Get the row count of the given table from the current database. | |
-- sp_count Get a list of tables and row counts in the current database. | |
USE [master] | |
GO | |
DROP PROCEDURE IF EXISTS [dbo].[sp_count] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def unload_to_parquet(query: str, target_dir: Path, conn, stage_name: str = "unload_stage"): | |
conn.execute(f"CREATE TEMP STAGE {stage_name}") | |
conn.execute(f"COPY INTO @{stage_Name} FROM ({query}) file_format=(type='parquet') header=true") | |
target_dir.mkdir(parents=True) | |
conn.execute(f"GET @{stage_name} file://{str(target_dir)}") |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import numpy as np | |
import pandas as pd | |
from pandas.api.types import is_numeric_dtype | |
from pandas.core.dtypes.base import ExtensionDtype | |
def shrink_dtype(series: pd.Series) -> pd.Series: | |
smallest_dtype = get_smallest_dtype(series) | |
if smallest_dtype == series.dtype: | |
return series |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import json | |
import sqlite3 | |
repodata = json.load(open("497deca9.json")) | |
COLS = 'filename, build, build_number, depends, license, license_family, md5, name, sha256, size, subdir, timestamp, version'.split(', ') | |
db = sqlite3.connect("497deca9.sqlite") | |
db.execute("create table repodata ({}, primary key (filename))".format(','.join(COLS))) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import textwrap | |
from dataclasses import dataclass | |
@dataclass | |
class Pattern: | |
pattern: str | |
prematchers: list[str] | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
set -euo pipefail | |
if [ $# -lt 2 ]; then | |
echo "Usage: $0 SCHEDULE PROG [ARGS]..." >&2 | |
echo "SCHEDULE is used in 'date -d <SCHEDULE>'." >&2 | |
echo "Example: $0 '1 hour' myprog --arg" >&2 | |
exit 1 | |
fi |
NewerOlder