Skip to content

Instantly share code, notes, and snippets.

View martinapugliese's full-sized avatar
🙌

Martina Pugliese martinapugliese

🙌
View GitHub Profile
# Imports
import pandas as pd
import numpy as np
from scipy.stats import entropy
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from matplotlib import pyplot as plt
@martinapugliese
martinapugliese / ref_es_queries.md
Last active August 14, 2023 08:08
Sample Elasticsearch queries in Python, as reference.

Collection of sample Elasticsearch queries

Use the Python client elasticsearch.

Connect to cluster (the client)

from elasticsearch import Elasticsearch

es_client = Elasticsearch() # local
@martinapugliese
martinapugliese / boto_dynamodb_methods.py
Last active June 3, 2021 21:59
Some wrapper methods to deal with DynamoDB databases in Python, using boto3.
# Copyright (C) 2016 Martina Pugliese
from boto3 import resource
from boto3.dynamodb.conditions import Key
# The boto3 dynamoDB resource
dynamodb_resource = resource('dynamodb')
def get_table_metadata(table_name):
@martinapugliese
martinapugliese / printingclass.py
Created August 7, 2016 20:56
A class for styled printing (coloured/styled text, time of execution available), attributes combinable.
# Copyright (C) 2016 Martina Pugliese
# Imports
from datetime import datetime
# #################### ANSI Escape codes for terminal #########################
codes_dict = {
@martinapugliese
martinapugliese / string_builtins.py
Created August 12, 2016 11:59
Collection of examples of Python built-in methods for manipulating strings
# Copyright (C) 2016 Martina Pugliese
def run_methods():
print '\n'
print '* Count occurrences of substring in string'
print 'Martina'.count('art')
print 'Martina'.count('a')

A collection of useful command line hacks (Unix)

Memory usage

MACOS

vm_stat is the command, this makes output user friendly, thanks to this.

vm_stat | perl -ne '/page size of (\d+)/ and $size=$1; /Pages\s+([^:]+)[^\d]+(\d+)/ and printf("%-16s % 16.2f Mi\n", "$1:", $2 * $size / 1048576);'

A collection of little libraries that help workflow

TQDM

Shows progress bar in a notebook's cell.

for i in tqdm(range(10), 'wasting time', unit='iterations wasted'):
    sleep(0.5)

Pyplot reference stuff

Those things that I always forget how to do.

import pyplot as plt

Matplotlib styles

Pandas reference things

df is a DataFrame.

Grouping df on multiple functions and dropping hierarchical level

grouped_df = df.groupby(['colA', 'colB']) \
    .agg(
 {