Skip to content

Instantly share code, notes, and snippets.

View sakama's full-sized avatar
🎯
🍺

Satoshi Akama sakama

🎯
🍺
  • Treasure Data
  • Tokyo, Japan
View GitHub Profile
@sakama
sakama / config.yml
Last active October 26, 2015 04:53
config.yml for tsv that contains '"' in record
```yaml
in:
type: file
path_prefix: sample.tsv
parser:
type: csv
delimiter: "\t"
quote: null
escape: null
columns:
@sakama
sakama / slack-to-bigquery.yml
Created August 20, 2015 13:42
embulk-slack-historyを使ってみる
in:
type: slack_history
token: xxxxxxx
out:
type: bigquery
service_account_email: xxxxx@developer.gserviceaccount.com
p12_keyfile_path: ./key.p12
project: mysamplebqproject
dataset: slack
auto_create_table: true
@sakama
sakama / cdh5_setup.log.md
Last active April 19, 2018 15:02
Amazon Linux AMI 2015.03にCDH5をインストールしてembulk-mapreduce-executorを実行する

CDH5(Cloudera Hadoop 5のインストール)

下準備

# cat /etc/system-release
Amazon Linux AMI release 2015.03

タイムゾーンの変更
# date
@sakama
sakama / mapreduce_executor.log
Last active August 29, 2015 14:27
mapreduce-executorの実行ログ
$ whoami
hdfs
$ pwd
/var/lib/hadoop-hdfs/test
$ hadoop version
Hadoop 2.6.0-cdh5.4.4
Subversion http://github.com/cloudera/hadoop -r b739cd891f6269da5dd22766d7e75bd2c9db73b6
Compiled by jenkins on 2015-07-07T00:02Z
Compiled with protoc 2.5.0
@sakama
sakama / BigQueryJobs.java
Created March 18, 2015 11:11
BigQueryのJob実行/ステータスチェックのサンプル(Java)
Job job = bigQueryClient.jobs().get(<プロジェクトID>, jobRef.getJobId()).execute();
if (job.getStatus().getErrorResult() != null) {
log.warn(String.format("Job failed. job id:[%s] reason:[%s] status:[FAILED]", jobRef.getJobId(), job.getStatus().getErrorResult().getMessage()));
}
String jobStatus = job.getStatus().getState();
if (jobStatus.equals("DONE")) {
JobStatistics statistics = job.getStatistics();
log.info(String.format("Job statistics [%s]", statistics.getLoad()));
}
@sakama
sakama / UploadGcs.java
Created March 18, 2015 11:10
GCSへのアップロードを行うサンプル(Java)
File file = new File(<ローカルファイルのパス>);
stream = new FileInputStream(file);
StorageObject objectMetadata = new StorageObject().setName(<GCSにアップロード後のpath>);
InputStreamContent content = new InputStreamContent(getContentType(), stream);
Storage.Objects.Insert insertObject = storageClient.objects().insert(bucket, objectMetadata, content);
insertObject.setDisableGZipContent(true);
StorageObject response = insertObject.execute();
@sakama
sakama / GoogleCredential.java
Last active August 29, 2015 14:17
GoogleCredentialクラスを使ったGCSの認証
private final HttpTransport httpTransport = GoogleNetHttpTransport.newTrustedTransport();
private final JsonFactory jsonFactory = new JacksonFactory();
private hoge() throws IOException, GeneralSecurityException
{
GoogleCredential credentials = new GoogleCredential.Builder().setTransport(httpTransport) // (1)
.setJsonFactory(jsonFactory)
.setServiceAccountId(<サービスアカウント メールアドレス>)
.setServiceAccountScopes( // (2)
ImmutableList.of(
@sakama
sakama / build.gradle
Last active August 29, 2015 14:17
embulk-input-gcsのbuild.gradle
dependencies {
compile "com.google.http-client:google-http-client-jackson2:1.19.0"
//以下の記述だとConfigDefault("null")を指定できない
compile "com.google.apis:google-api-services-storage:v1-rev27-1.19.1"
//以下に変更
compile ("com.google.apis:google-api-services-storage:v1-rev27-1.19.1") {exclude module: "guava-jdk5"}
}
@sakama
sakama / GcsFileInputPlugin.java
Last active August 29, 2015 14:17
GcsFileInputPlugin.java
GcsFileInputPlugin.java
@Config("last_path")
@ConfigDefault("null") //←この指定をするとビルドが通らない
public Optional<String> getLastPath();
@sakama
sakama / run_embulk-input-gcs.yml
Created March 18, 2015 02:16
embulk-input-gcs動作時のサンプル
$ embulk run /path/to/config.yml
2015-03-17 22:18:22,180 +0900: Embulk v0.5.12
2015-03-17 22:18:26.587 +0900 [INFO] (transaction): {done: 0 / 1, running: 0}
1,32864,2015-01-27 19:23:49,20150127,embulk
2,14824,2015-01-27 19:01:23,20150127,embulk jruby
3,27559,2015-01-28 02:20:02,20150128,Embulk "csv" parser plugin
4,11270,2015-01-29 11:54:36,20150129,NULL
2015-03-17 22:18:27.760 +0900 [INFO] (transaction): {done: 1 / 1, running: 0}
2015-03-17 22:18:27.791 +0900 [INFO] (main): Committed.
2015-03-17 22:18:27.793 +0900 [INFO] (main): Next config diff: {"in":{"last_path":"sample_01.csv.gz"},"out":{}}