Skip to content

Instantly share code, notes, and snippets.

View makmanalp's full-sized avatar

Mehmet Ali "Mali" Akmanalp makmanalp

View GitHub Profile
@makmanalp
makmanalp / README.md
Created September 21, 2023 15:16
Why "let's do force index on every query we have" might not be helpful

TLDR: well intentioned but ultimately unhelpful IMHO. Here's why:

  1. It's easy to make a judgement about bad query plans based on an extremely biased sample: To give you a sense of the variety of queries we have: as of today there are over 180k unique query fingerprints at HubSpot. Let's ignore the trivial ones: about 18k unique query fingerprints do > 1000 queries/sec. To be sure, query planner bugs are real, and I'm currently fairly sure we've hit one here (details later) but of the total a miniscule amount is /truly/ (more on this later) query planner silliness.
  2. By contrast, humans can be quite bad at figuring out what index a query needs and will compare dismally to the above success rate if they start doing FORCE INDEX on everything manually. I mess it up often. I see smart, competent, experienced engineers mess it up quite literally every day. People have attempted to codify rules for this exhaustively - every time I scroll through that page I
@makmanalp
makmanalp / gist:ddffd79bdbd75fbff5126c69eb07c1bb
Created March 11, 2019 19:27
ads-0-backup-1552296000-l2zs8 backu
ads-0-backup-1552296000-l2zs8 backup INFO 2019/03/11 19:22:01 tablet prod_iad-1360915300 still has decreasing replication lag of 208.710618394 seconds, will continue waiting
ads-0-backup-1552296000-l2zs8 backup INFO 2019/03/11 19:24:01 tablet prod_iad-1360915300 has caught up on replication
ads-0-backup-1552296000-l2zs8 backup INFO 2019/03/11 19:24:01 (prod_iad-1360915300) checking health
ads-0-backup-1552296000-l2zs8 backup INFO 2019/03/11 19:24:06 (prod_iad-1360915300) succeeded 1 of 3 healthchecks
ads-0-backup-1552296000-l2zs8 backup INFO 2019/03/11 19:24:11 (prod_iad-1360915300) succeeded 2 of 3 healthchecks
ads-0-backup-1552296000-l2zs8 backup INFO 2019/03/11 19:24:16 (prod_iad-1360915300) succeeded 3 of 3 healthchecks
ads-0-backup-1552296000-l2zs8 backup INFO 2019/03/11 19:24:16 getting replication status for replicas
ads-0-backup-1552296000-l2zs8 backup INFO 2019/03/11 19:24:16 getting replication status for master
ads-0-backup-1552296000-l2zs8 backup INFO 2019/03/11 19:24:16 comparing GTIDSets for err
@makmanalp
makmanalp / README.md
Last active February 6, 2019 00:12
Orchestrator OOM investigation

Summary:

When we run a rolling restart on our orchestrator statefulset, the node that is the previous master will get stuck in a crash loop.

Findings so far:

@makmanalp
makmanalp / validate.py
Last active November 16, 2018 21:18
SQLAlchemy validator event example
from sqlalchemy import Column, Integer, String, DateTime, Boolean
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import event
import datetime
Base = declarative_base()
def validate_int(instance, value, oldvalue, initiator):
# Assigning a string to an Integer column will try to coerce it to the
@makmanalp
makmanalp / network.py
Created September 12, 2018 19:39
Read / write d3 style network JSON files with pandas, preserving order and types
import pandas as pd
import json
def read_network(file_name, nodes_field="nodes", edges_field="edges"):
network = None
with open(file_name, "r") as f:
network = json.loads(f.read())
nodes = network[nodes_field]
edges = network[edges_field]
other_fields = {x:network[x] for x in network.keys()
@makmanalp
makmanalp / get_circleci_artifact.py
Last active January 1, 2019 15:50
Quick and dirty ansible module for fetching CircleCI build artifacts (latest on a branch, or by build num & git SHA)
#!/usr/bin/env python
import requests
from ansible.module_utils.basic import AnsibleModule
import traceback
try:
from urllib.parse import quote
except ImportError:
@makmanalp
makmanalp / data_store.py
Created July 30, 2018 22:18
Simple filesystem organization wrapper
"""
Simple filesystem organization scheme. You have:
- Objects: A logical "thing", e.g. a document or a page, with unique IDs
- Keys: A type of data that we're storing about the object, like the
location of margins on a page, or the locations of each text box.
- Files: For a specific object under a specific key, you can have multiple
files, e.g. image files for each column in the page
Generally you might want to store data in a specific object's key:
@makmanalp
makmanalp / stata_dask.py
Last active August 14, 2021 16:10
Read STATA .dta files chunk by chunk (streaming) into dask with pandas's read_stata / StataReader and some hackery
import dask.dataframe as dd
from dask.dataframe.utils import make_meta
from dask.delayed import delayed
import pandas as pd
from itertools import chain
def get_stata_dask_meta(file_name, meta_chunksize=10000, *args, **kwargs):
"""Load up first bit of the file for type metadata info. We have to resort
@makmanalp
makmanalp / .block
Last active February 4, 2021 13:41
Multivariate radar charts with different axes
license: mit
scrolling: yes
@makmanalp
makmanalp / selfjoin.py
Created September 11, 2017 18:46
Readable double self join / tree traversal in SQLAlchemy
FourDigit = aliased(HSProduct)
TwoDigit = aliased(HSProduct)
Section = aliased(HSProduct)
product_data = db.session\
.query(
FourDigit.id.label("product_id"),
FourDigit.code.label("product_code"),
FourDigit.name_en.label("product_name"),
Section.id.label("section_id"),