$ EXPLAIN SELECT base.* FROM base LEFT JOIN relation ON relation.base_id = base.id WHERE relation.id IS NULL ORDER BY id LIMIT 1000;
QUERY PLAN
---------------------------------------------------------------------------------------------------
Limit (cost=755508.94..755508.95 rows=1 width=54)
-> Sort (cost=755508.94..755508.95 rows=1 width=54)
Sort Key: base.id
-> Hash Left Join (cost=44.65..755508.93 rows=1 width=54)
Hash Cond: (base.id = relation.base_id)
#!/usr/bin/python | |
# | |
# Copyright 2017 Otto Seiskari | |
# Licensed under the Apache License, Version 2.0. | |
# See http://www.apache.org/licenses/LICENSE-2.0 for the full text. | |
# | |
# This file is based on | |
# https://github.com/swagger-api/swagger-ui/blob/4f1772f6544699bc748299bd65f7ae2112777abc/dist/index.html | |
# (Copyright 2017 SmartBear Software, Licensed under Apache 2.0) | |
# |
class Specification: | |
def __and__(self, other): | |
return And(self, other) | |
def __or__(self, other): | |
return Or(self, other) | |
def __xor__(self, other): | |
return Xor(self, other) |
The paper presents Twitter's logging infrastructure, how it evolved from application specific logging to a unified logging infrastructure and how session-sequences are used as a common case optimization for a large class of queries.
Twitter uses Scribe as its messaging infrastructure. A Scribe daemon runs on every production server and sends log data to a cluster of dedicated aggregators in the same data center. Scribe itself uses Zookeeper to discover the hostname of the aggregator. Each aggregator registers itself with Zookeeper. The Scribe daemon consults Zookeeper to find a live aggregator to which it can send the data. Colocated with the aggregators is the staging Hadoop cluster which merges the per-category stream from all the server daemons and writes the compressed results to HDFS. These logs are then moved into main Hadoop data warehouse and are deposited in per-category, per-hour directory (eg /logs/cate
/** | |
* Fancy ID generator that creates 20-character string identifiers with the following properties: | |
* | |
* 1. They're based on timestamp so that they sort *after* any existing ids. | |
* 2. They contain 72-bits of random data after the timestamp so that IDs won't collide with other clients' IDs. | |
* 3. They sort *lexicographically* (so the timestamp is converted to characters that will sort properly). | |
* 4. They're monotonically increasing. Even if you generate more than one in the same timestamp, the | |
* latter ones will sort after the former ones. We do this by using the previous random bits | |
* but "incrementing" them by 1 (only in the case of a timestamp collision). | |
*/ |
Asynchronous programming can be tricky for beginners, therefore I think it's useful to iron some basic concepts to avoid common pitfalls.
For an explanation about generic asynchronous programming, I recommend you one of the [many][2] [resources][3] [online][4].
I will focus on solely on asynchronous programming in [Tornado][1]. From Tornado's homepage:
import logging | |
import tornado.escape | |
import tornado.ioloop | |
import tornado.options | |
import tornado.web | |
import tornado.websocket | |
import os.path | |
import asyncmongo | |
import time | |
import functools |
import unittest, os, os.path, sys | |
import tornado.database | |
import tornado.options | |
from tornado.options import options | |
APP_ROOT = os.path.abspath(os.path.join(os.path.dirname(__file__), '..')) | |
sys.path.append(os.path.join(APP_ROOT, '..')) | |
# import your model module | |
import your.model as model |