Skip to content

Instantly share code, notes, and snippets.

View zapalote's full-sized avatar

Miguel Albrecht zapalote

View GitHub Profile
# Data source: https://storage.googleapis.com/books/ngrams/books/datasetsv2.html
# extraction pattern: ngram TAB year TAB match_count TAB volume_count NEWLINE
# out: unique_ngram TAB sum(match_count) NEWLINE
import os, sys
from pathlib import Path
from concurrent.futures import ProcessPoolExecutor
from multiprocessing import freeze_support
import polars as pl
// decode credentials upon receiving them from store
function decodeCredentials(crd){
// decrypt it first
const dec = CryptoJS.AES.decrypt(crd, saltCredentials).toString(CryptoJS.enc.Utf8);
// extract the creds length and pepper step
const len = dec.charCodeAt(0) - 96;
const step = dec.charCodeAt(1) - 96;
let i = 0, j = 2, d = [];
// extract the pepper from the salt
@zapalote
zapalote / encodeCredentials.js
Created May 1, 2021 14:33
Obfuscate and encrypt API credentials before storing
// used to obfuscate and encrypt the credentials
const saltCredentials = "jf02heg9u64a{%m<83#@;Pxrjg17uyr#@&*%^Y";
// encode credentials before storing
function encodeCredentials(crds){
// json object expected e.g. {'api-id':'K0xf56g', 'pwd':'Some.Pa$$w0rd'}
const crd = JSON.stringify(crds);
const len = crd.length;
// this constraint is due to storing the length in one byte
if (len > 159) return null;
@zapalote
zapalote / encodeCredentials.html
Last active October 4, 2021 11:56
Credential obfuscation and encryption to store then on a database
<script src="https://cdnjs.cloudflare.com/ajax/libs/crypto-js/4.0.0/crypto-js.min.js"></script>
<script>
// used to obfuscate and encrypt the credentials
const saltCredentials = "jf02heg9u64a{%m<83#@;Pxrjg17uyr#@&*%^Y";
// encode credentials before storing
function encodeCredentials(crds){
// json object expected e.g. {'api-id':'K0xf56g', 'pwd':'Some.Pa$$w0rd'}
const crd = JSON.stringify(crds);
@zapalote
zapalote / extract-gbooks-terms.py
Last active April 2, 2024 11:31
Example of multi-threading and memory mapped file processing.
# extraction pattern: ngram TAB year TAB match_count TAB volume_count NEWLINE
# out: unique_ngram TAB sum(match_count) NEWLINE
import re
import os, sys, mmap
from pathlib import Path
from tqdm import tqdm
from concurrent.futures import ThreadPoolExecutor
abv = re.compile(r'^(([A-Z]\.){1,})(_|[^\w])') # A.B.C.