Skip to content

Instantly share code, notes, and snippets.

View staticor's full-sized avatar

SteveYagn staticor

View GitHub Profile
@staticor
staticor / HiveInputSplit.java
Created January 22, 2019 20:05
Hive Input Split 相关的代码
public InputSplit[] getSplits(JobConf job, int numSplits) throws IOException {
//扫描每一个分区
for (Path dir : dirs) {
PartitionDesc part = getPartitionDescFromPath(pathToPartitionInfo, dir);
//获取分区的输入格式
Class inputFormatClass = part.getInputFileFormatClass();
InputFormat inputFormat = getInputFormatFromCache(inputFormatClass, job);
//按照相应格式的分片算法获取分片
//注意:这里的Inputformat只是old version API:org.apache.hadoop.mapred而不是org.apache.hadoop.mapreduce,因此不能采用新的API,否则在查询时会报异常:Input format must implement InputFormat.区别就是新的API的计算inputsplit size(Math.max(minSize, Math.min(maxSize, blockSize))和老的(Math.max(minSize, Math.min(goalSize, blockSize)))不一样;
InputSplit[] iss = inputFormat.getSplits(newjob, numSplits / dirs.length);
@staticor
staticor / hive-ddl-example.hql
Created January 20, 2019 06:11
Hive exmaple
-- Hive db.
use rds;
alter table customer add columns
( shipping_address varchar(50) comment 'shipping_address'
, shipping_zip_code int comment 'shipping_zip_code'
, shipping_city varchar(30) comment 'shipping_city'
, shipping_state varchar(2) comment 'shipping_state'
);
alter table sales_order add columns(order_quantity int comment 'order_quantity');
@staticor
staticor / decision-example.xml
Created January 20, 2019 03:24
workflow decision example
<workflow-app xmlns = "uri:oozie:workflow:0.4" name = "simple-Workflow">
<start to = "external_table_exists" />
<decision name = "external_table_exists">
<switch>
<case to = "Create_External_Table">${fs:exists('/test/abc') eq 'false'}
</case>
<default to = "orc_table_exists" />
</switch>
</decision>
@staticor
staticor / workflow3.xml
Created January 20, 2019 03:14
oozie workflow test
<workflow-app xmlns = "uri:oozie:workflow:0.4" name = "simple-Workflow">
<start to = "fork_node" />
<fork name = "fork_node">
<path start = "Create_External_Table"/>
<path start = "Create_orc_Table"/>
</fork>
<action name = "Create_External_Table">
<hive xmlns = "uri:oozie:hive-action:0.4">
@staticor
staticor / workflow2.xml
Created January 20, 2019 02:59
Oozie workflow example
<!-- This is a comment -->
<workflow-app xmlns = "uri:oozie:workflow:0.4" name = "simple-Workflow">
<start to = "Create_External_Table" />
<!-- Step 1 -->
<action name = "Create_External_Table">
<hive xmlns = "uri:oozie:hive-action:0.4">
<job-tracker>xyz.com:8088</job-tracker>
<name-node>hdfs://rootname</name-node>
@staticor
staticor / hive-meta.sql
Last active January 17, 2019 10:49
hive metadata usage example.
-- Hive Metadata 一般会存储在MySQL中, 所对应的表约20个。
-- * TBLS table name
-- * TABLE_PARAM table properties: is it an external table? or comment etc.
-- * COLUMNS all columns information
-- * SDS serde information
-- * SERDE_PARAM, serde information
-- * PARTITIONS partitions
-- * PARTITION_KEYS keys of partition
-- * PARTITION_KEYS_VALS values of partition
@staticor
staticor / dedao-zhuoke-mima-problem.py
Created January 15, 2019 09:11
得到 卓老板的密码学问题
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numpy as np
url = 'https://pic1cdn.luojilab.com/html/poster/picppXzJNO3x4ckjZ0JG82W.html?ts=1547541323357'
data = '''/vzxmnqj/ifsqnzcnfqjgftlznijxmzczjhmjsllzt/dtzcnjwjsdtslonsgnxmjslonslqn/dfsonzlzfnoncnfslcnslbjsen/eznemtslutdnqjyf/wfslwjsrjshmtslcnsizitslqjkfqftijmzf/jwbtcnfslvnslsnynfthmzdnljsneznlfscnslvzijrnrfczjwjsbz/gnslljnhmzsnijqndtz/dnxmfslonzxmnvzfsgzwjsbz/emzsnmftdzs'''
@staticor
staticor / my-marks.sh
Created January 6, 2019 15:31
quick jump and mark - shell tool
export MARKPATH=$HOME/.marks
function jump {
cd -P "$MARKPATH/$1" 2>/dev/null || echo "No such mark: $1"
}
function mark {
mkdir -p "$MARKPATH"; ln -s "$(pwd)" "$MARKPATH/$1"
echo "staticor is best "
}
function unmark {
@staticor
staticor / keras_quickstart.py
Created January 5, 2019 12:12
keras_quickstart .py mnist datasets
import tensorflow as tf
# Load and prepare the MNIST dataset.
# Convert the samples from integers to floating-point numbers
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
model = tf.keras.models.Sequential([
@staticor
staticor / keras_mnist_test.py
Created January 5, 2019 01:43
keras- mnist- test
#! /usr/bin/python
# -*-coding: utf-8-*-
from keras.datasets import mnist
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Activation, Dropout
from keras.optimizers import SGD