Skip to content

Instantly share code, notes, and snippets.

View martinapugliese's full-sized avatar
🙌

Martina Pugliese martinapugliese

🙌
View GitHub Profile
@martinapugliese
martinapugliese / boto_dynamodb_methods.py
Last active June 3, 2021 21:59
Some wrapper methods to deal with DynamoDB databases in Python, using boto3.
# Copyright (C) 2016 Martina Pugliese
from boto3 import resource
from boto3.dynamodb.conditions import Key
# The boto3 dynamoDB resource
dynamodb_resource = resource('dynamodb')
def get_table_metadata(table_name):
@martinapugliese
martinapugliese / ref_es_queries.md
Last active August 14, 2023 08:08
Sample Elasticsearch queries in Python, as reference.

Collection of sample Elasticsearch queries

Use the Python client elasticsearch.

Connect to cluster (the client)

from elasticsearch import Elasticsearch

es_client = Elasticsearch() # local
@martinapugliese
martinapugliese / printingclass.py
Created August 7, 2016 20:56
A class for styled printing (coloured/styled text, time of execution available), attributes combinable.
# Copyright (C) 2016 Martina Pugliese
# Imports
from datetime import datetime
# #################### ANSI Escape codes for terminal #########################
codes_dict = {
@martinapugliese
martinapugliese / nltk_plotfreqs.py
Last active August 17, 2016 20:53
Plotting the frequencies in a FreqDist in NLTK instead of the counts.
# Copyright (C) 2016 Martina Pugliese
def plot_freqdist_freq(fd,
max_num=None,
cumulative=False,
title='Frequency plot',
linewidth=2):
"""
As of NLTK version 3.2.1, FreqDist.plot() plots the counts and has no kwarg for normalising to frequency. Work this around here.
@martinapugliese
martinapugliese / string_builtins.py
Created August 12, 2016 11:59
Collection of examples of Python built-in methods for manipulating strings
# Copyright (C) 2016 Martina Pugliese
def run_methods():
print '\n'
print '* Count occurrences of substring in string'
print 'Martina'.count('art')
print 'Martina'.count('a')

Pandas reference things

df is a DataFrame.

Grouping df on multiple functions and dropping hierarchical level

grouped_df = df.groupby(['colA', 'colB']) \
    .agg(
 {

A collection of useful command line hacks (Unix)

Memory usage

MACOS

vm_stat is the command, this makes output user friendly, thanks to this.

vm_stat | perl -ne '/page size of (\d+)/ and $size=$1; /Pages\s+([^:]+)[^\d]+(\d+)/ and printf("%-16s % 16.2f Mi\n", "$1:", $2 * $size / 1048576);'

Pyplot reference stuff

Those things that I always forget how to do.

import pyplot as plt

Matplotlib styles