Skip to content

Instantly share code, notes, and snippets.

View myui's full-sized avatar

Makoto YUI myui

View GitHub Profile
Vertex failed, vertexName=Map 20, vertexId=vertex_1424704867400_0022_3_49, diagnostics=[Task failed, taskId=task_1424704867400_0022_3_49_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"gid":1,"userid":4422,"movieid":1213,"rating":5}
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:186)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:138)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.S
/*
* Hivemall: Hive scalable Machine Learning Library
*
* Copyright (C) 2013-2014
* National Institute of Advanced Industrial Science and Technology (AIST)
* Registration Number: H25PRO-1520
*
* This library is free software; you can redistribute it and/or
* modify it under the terms of the GNU Lesser General Public
* License as published by the Free Software Foundation.
def time[R](block: => R): R = {
val t0 = System.nanoTime()
val result = block
val t1 = System.nanoTime()
println("Elapsed time: " + (t1 - t0) + "ns")
result
}
val result = time { 1 to 1000 sum }
@myui
myui / hadoop-nodemanager.sh
Last active May 17, 2017 09:22
Init.d script for hadoop-nodemanager
#!/bin/bash
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
@myui
myui / hadoop-datanode.sh
Last active August 29, 2015 14:19
Init.d script for DataNode
#!/bin/bash
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
@myui
myui / cloudinit
Last active August 29, 2015 14:19
cloud-config
#cloud-config
hostname: dcXX
fqdn: dcXX.ec2.internal
mounts:
- [ xvdb, /mnt/disk1, "auto", "defaults,nobootwait,comment=cloudconfig", 0, 2]
- [ xvdc, /mnt/disk2, "auto", "defaults,nobootwait,comment=cloudconfig", 0, 2]
runcmd:
@myui
myui / sqoop.md
Created May 13, 2015 08:19
patch to sqoop 1
$ diff build.xml build.xml.orig
41,42c41,42
<       <echo message="Use Hadoop 2.6.0 by default" />
<       <property name="hadoopversion" value="260" />
---
>       <echo message="Use Hadoop 2.x by default" />
>       <property name="hadoopversion" value="200" />
188,201d187
< 
@myui
myui / online_offline_matrix.md
Last active August 29, 2015 14:21
Real time prediction on MySQL and batch model construction on Hivemall

Hivemall provides a batch learning scheme that builds prediction models on Apache Hadoop. The learning process itself is a batch process; however, an online/real-time prediction can be achieved by carrying a prediction on a transactional relational DBMS.

In this article, we explain how to run a real-time prediction using a relational DBMS. We assume that you have already run the a9a binary classification task.

Online/Offline Matrix of Machine Learning

The following table shows the type matrix of machine learning schemes and applications.

@myui
myui / mf_params
Last active August 29, 2015 14:21
Parameters of train_mf_sgd
HivemallのMatrix Factorization学習のパラメタの説明です。
http://qiita.com/myui/items/dccb4f58799f080e24ab#%E3%83%90%E3%82%A4%E3%82%A2%E3%82%B9%E3%82%92%E8%80%83%E6%85%AE%E3%81%97%E3%81%9F-matrix-factorization
factor, mu, iterations以外は通常指定不要です。指定順序は関係ありません。
etaは場合によっては指定したほうがよいケースもあります。
1) "-factor 10"
The number of latent factor [default: 10]
潜在変数の数

First of all, make sure that your Treasure Data cluster is HDP2, not CDH4. Matrix Factorization is only supported in the up-to-date HDP2 cluster. HDP2 is allocated for users who signed Treasure Data after Feb 2015. CDH4 is allcoated for the others.

NOTE: please ask our customer support to use HDP2 if you get an error.

Data preparation

Download ml-20m.zip and unzip it.