macOS Live Text has a very good quality/speed tradeoff.
Compared to Tesseract, it has much higher quality and is up to 3x as fast.
import concurrent.futures.thread as _thread_impl | |
import threading | |
import time | |
import weakref | |
from concurrent.futures import Future | |
class WorkStealThreadPoolExecutor(_thread_impl.ThreadPoolExecutor): | |
"""A ThreadPoolExecutor that supports work stealing. |
macOS Live Text has a very good quality/speed tradeoff.
Compared to Tesseract, it has much higher quality and is up to 3x as fast.
import enum | |
class EnumWithDisplayName(enum.Enum): | |
def __new__(cls, value, name=None): | |
if not hasattr(cls, "_value_to_display_name"): | |
cls._value_to_display_name = {} | |
cls._display_name_to_value = {} | |
if name is not None: | |
if value in cls._value_to_display_name: |
.ttf
files)git clone https://github.com/tesseract-ocr/tesstrain/
git clone https://github.com/tesseract-ocr/langdata_lstm
cd src
python -m tesstrain \
--langdata_dir /path/to/langdata_lstm \
--linedata_only \
I'm going to present you with a piece of text. Please classify it according to the classes outlined after the text. | |
The text: | |
""" | |
When should you choose MongoDB over a relational database management system (RDBMS) like MySQL? | |
By Dimitri Fague / 2024-05-23 / Databases, DBaaS, MongoDB, OVHcloud, Public Cloud | |
Of all the non-relational database engines (NoSQL) that have emerged in the last decade, MongoDB is without a doubt the most widely used. Source-available, powerful, flexible and scalable, MongoDB covers a wide range of use cases. Many, including startups, choose it to ensure they are not limited in their technological choices, so they can scale and adapt to different use cases. The possibility of switching from MySQL to MongoDB might come up when updating or revamping an existing app. So, let’s see when this switch might be relevant, and why using the MongoDB service managed by OVHcloud could be the ideal option. | |
The flexibility of the NoSQL data model |
curl -L https://raw.githubusercontent.com/redacted/XKCD-password-generator/master/xkcdpass/static/eff-long \ | |
| sort -R \ | |
| head -n 5 \ | |
| tr '\n' - |
-- Count number of rows in a table quickly (without a full table/index scan). | |
-- Usage: | |
-- sp_count 'mydb.dbo.mytable' Get the row count of the given table. | |
-- sp_count 'dbo.mytable' Get the row count of the given table from the current database. | |
-- sp_count Get a list of tables and row counts in the current database. | |
USE [master] | |
GO | |
DROP PROCEDURE IF EXISTS [dbo].[sp_count] |
def unload_to_parquet(query: str, target_dir: Path, conn, stage_name: str = "unload_stage"): | |
conn.execute(f"CREATE TEMP STAGE {stage_name}") | |
conn.execute(f"COPY INTO @{stage_Name} FROM ({query}) file_format=(type='parquet') header=true") | |
target_dir.mkdir(parents=True) | |
conn.execute(f"GET @{stage_name} file://{str(target_dir)}") |
import numpy as np | |
import pandas as pd | |
from pandas.api.types import is_numeric_dtype | |
from pandas.core.dtypes.base import ExtensionDtype | |
def shrink_dtype(series: pd.Series) -> pd.Series: | |
smallest_dtype = get_smallest_dtype(series) | |
if smallest_dtype == series.dtype: | |
return series |
import json | |
import sqlite3 | |
repodata = json.load(open("497deca9.json")) | |
COLS = 'filename, build, build_number, depends, license, license_family, md5, name, sha256, size, subdir, timestamp, version'.split(', ') | |
db = sqlite3.connect("497deca9.sqlite") | |
db.execute("create table repodata ({}, primary key (filename))".format(','.join(COLS))) |