Skip to content

Instantly share code, notes, and snippets.

@hell0again
hell0again / main.sh
Last active July 29, 2020 10:17
bashで並列処理
#!/bin/sh
# - (...) はサブプロセス実行
# - & をつけるとコマンドをバックグラウンドプロセスとして起動する
# - wait ですべてのバックグラウンドプロセスを待機
# - waitにプロセスidを渡すと指定したプロセスが終了するまで待機したうえで指定プロセスの戻り値を返す
# - プロセスidを渡さない場合、すべての子プロセスを待つが終了ステータスはつねに0になる。
# - 子プロセスが0以外を返しても親プロセスはそれを検知できないので子プロセスの死亡を見て親プロセスを殺すのが難しい
# - プロセスidを複数渡すことはできない
(

Problem Statement

Currently, Hadoop exposes downstream clients to a variety of third party libraries. As our code base grows and matures we increase the set of libraries we rely on. At the same time, as our user base grows we increase the likelihood that some downstream project will run into a conflict while attempting to use a different version of some library we depend on. While there are hot-button third party libraries that drive most of the development and support issues (e.g. Guava, Apache Commons, and Jackson), a coherent general practice will ensure that we avoid future complications. Simply attempting to coordinate library versions among Hadoop and various downstream projects is untenable, because each project has its own release schedule and often attempts to support multiple versions of other ecosystem projects. Furthermore, our current approach of taking a conservative approach to dependency updates leads to reliance on stale versions of everything. Those stale versions include

Presto connector development 1

One of the very good design decisions Presto designers made is that it's loosely coupled from storages.

Presto is a distributed SQL executor engine, and doesn't manager schema or metadata of tables by itself. It doesn't manage read data from storage by itself. Those businesses are done by plugins called Connector. Presto comes with Hive connector built-in, which connects Hive's metastore and HDFS to Presto.

We can connect any storages into Presto by writing connector plugins.

Plugin Architecture

@btompkins
btompkins / teamcity
Last active December 23, 2015 05:09
from fabric.api import *
from fabric.contrib.files import *
env.user = 'your_user'
env.host_string = 'your_host'
def add_teamcity_user():
runcmd('adduser --system --shell /bin/bash --gecos \'TeamCity Build Control\' --group --disabled-password --home /opt/teamcity teamcity')
def download_teamcity():
@repeatedly
repeatedly / idea.md
Created May 14, 2013 03:25
Fluentd Hackathon idea

Task

  • Windows support
  • Add json support to in_exec
  • Improve out_rewrite performance
  • Clean up code
  • Add documents
  • Add Japanese documents
  • Develop v11
  • etc...
@repeatedly
repeatedly / consistent_name_for_rest_api.diff
Created May 9, 2013 01:09
Patch for /metrics REST API
diff --git a/src/java/org/apache/hadoop/mapred/JobTrackerMetricsInst.java b/src/java/org/apache/hadoop/mapred/JobTrackerMetricsInst.java
index 74885a1..a041f28 100644
--- a/src/java/org/apache/hadoop/mapred/JobTrackerMetricsInst.java
+++ b/src/java/org/apache/hadoop/mapred/JobTrackerMetricsInst.java
@@ -121,8 +121,8 @@ class JobTrackerMetricsInst extends JobTrackerInstrumentation implements Updater
metricsRecord.incrMetric("jobs_preparing", numJobsPreparing);
metricsRecord.incrMetric("jobs_running", numJobsRunning);
- metricsRecord.incrMetric("running_maps", numRunningMaps);
- metricsRecord.incrMetric("running_reduces", numRunningReduces);
@jmoiron
jmoiron / eventlet_poc.py
Created October 25, 2011 13:36
eventlet + zmq + multiprocessing poc (similar to gevent_poc)
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""Simple eventlet POC to check how to get ZMQ sockets working with
subprocesses spawned by a simple process."""
import os
import eventlet
import multiprocessing