Skip to content

Instantly share code, notes, and snippets.

View loretoparisi's full-sized avatar
🐍
NightShift

Loreto Parisi loretoparisi

🐍
NightShift
View GitHub Profile

With wildcard

{
  "query": {
    "bool": {
      "must": [
        {
          "wildcard": {
 "text": "*antonio*banderas*"
@loretoparisi
loretoparisi / trace_mem.py
Created May 6, 2021 12:49
Python Memory Alloc Trace
def trace_mem(nframe=6,top=8):
'''
naive memory trace
'''
import tracemalloc
is_tracing = tracemalloc.is_tracing()
if not is_tracing:
# start tracing
tracemalloc.start(nframe)
return {}
@loretoparisi
loretoparisi / aws_s3_get_object_put_object_async.js
Last active July 6, 2022 18:12
NodeJS JavaScript AWS S3 getObject, putObject, deleteObject with async await and recursive approach for nested S3 folders
async function getObjectAsync(bucket, key) {
try {
const data = await s3
.getObject({ Bucket: bucket, Key: key })
.promise();
var contents = data.Body.toString('utf-8');
return contents;
} catch (err) {
console.log(err);
}
@loretoparisi
loretoparisi / tokenizer_unicode.js
Last active October 15, 2020 11:31
Unicode aware Regex Tokenizer in JavaScript with token char offset begin and end
function aggressive_tokenizer(text) {
// most punctuation
text = text.replace(/[^\w\.\-\/\+\<\>,&]/g, " $& ");
// commas if followed by space
text = text.replace(/(,\s)/g, " $1");
// single quotes if followed by a space
text = text.replace(/('\s)/g, " $1");
// single quotes if last char
text = text.replace(/('$)/, " $1");
text = text.replace(/(\s+[`'"‘])(\w+)\b(?!\2)/g, " $2");
@loretoparisi
loretoparisi / dataset_stats.py
Created June 18, 2020 14:04
Dataset Statistics with Python Pandas
#!/usr/bin/env python
# -*- coding: utf-8 -*-
#
# @author loretoparisi at gmail dot com
# Copyright (c) 2020 Loreto Parisi
#
### built-in
import argparse
import json
@thomwolf
thomwolf / loading_wikipedia.py
Last active January 18, 2024 14:04
Load full English Wikipedia dataset in HuggingFace nlp library
import os; import psutil; import timeit
from datasets import load_dataset
mem_before = psutil.Process(os.getpid()).memory_info().rss >> 20
wiki = load_dataset("wikipedia", "20200501.en", split='train')
mem_after = psutil.Process(os.getpid()).memory_info().rss >> 20
print(f"RAM memory used: {(mem_after - mem_before)} MB")
s = """batch_size = 1000
for i in range(0, len(wiki), batch_size):
@malharb
malharb / pulsar01.py
Created June 5, 2020 08:10
Pulsar - evaluation
prediction = model.predict_classes(X_test)
prediction = prediction.reshape(5370,)
data = {'True':y_test,'Predicted':prediction}
df2 = pd.DataFrame(data)
from sklearn.metrics import classification_report,confusion_matrix
print(classification_report(df2['True'],df2['Predicted']))
print(confusion_matrix(df2['True'],df2['Predicted']))
@loretoparisi
loretoparisi / github_quick_setup.md
Last active April 10, 2024 18:25
Github Quick setup — if you’ve done this kind of thing before

Get started by creating a new file or uploading an existing file. We recommend every repository include a README, LICENSE, and .gitignore.

…or create a new repository on the command line

echo "# myrepo" >> README.md
git init
git add README.md
git commit -m "first commit"
git remote add origin https://github.com/loretoparisi/myrepo.git
git push -u origin master
@julien-c
julien-c / ec2instances.md
Created April 3, 2020 20:12
simple markdown table of AWS instance types with vCPU/RAM/price (us-east-1)
API Name Memory vCPUs Physical Processor Network Performance Linux On Demand cost Linux Reserved cost
a1.2xlarge 16.0 GiB 8 vCPUs AWS Graviton Processor Up to 10 Gigabit $148.92 monthly $93.80 monthly
a1.4xlarge 32.0 GiB 16 vCPUs AWS Graviton Processor Up to 10 Gigabit $297.84 monthly $187.61 monthly
a1.large 4.0 GiB 2 vCPUs AWS Graviton Processor Up to 10 Gigabit $37.23 monthly $23.43 monthly
a1.medium 2.0 GiB 1 vCPUs AWS Graviton Processor Up to 10 Gigabit $18.61 monthly $11.75 monthly
a1.metal 32.0 GiB 16 vCPUs AWS Graviton Processor Up to 10 Gigabit $297.84 monthly $187.61 monthly
a1.xlarge 8.0 GiB 4 vCPUs AWS Graviton Processor Up to 10 Gigabit $74.46 monthly $46.93 monthly
c1.medium 1.7 GiB 2 vCPUs Intel Xeon Family Moderate $94.90 monthly $66.43 monthly
[
{
"title": "'Corine, Corine'",
"artist": "The Abletones Big Band",
"count": "19",
"size": "393 MB",
"link": "http://mtkdata.cambridgemusictechnology.co.uk/Telefunken/AbletonesBigBand_CorineCorine_Full.zip"
},
{
"title": "'Song Of India'",