Install pandoc on Mac OS X 10.8
$ brew install haskell-platform
$ brew install haskell-platform
server { | |
listen 80 default; ## listen for ipv4; this line is default and implied | |
listen [::]:80 default ipv6only=on; ## listen for ipv6 | |
# Make site accessible from http://localhost/ | |
server_name localhost; | |
server_name_in_redirect off; | |
charset utf-8; |
user www-data; | |
# As a thumb rule: One per CPU. If you are serving a large amount | |
# of static files, which requires blocking disk reads, you may want | |
# to increase this from the number of cpu_cores available on your | |
# system. | |
# | |
# The maximum number of connections for Nginx is calculated by: | |
# max_clients = worker_processes * worker_connections | |
worker_processes 1; |
This gist covers a simple Hive eval UDF in Java, that mimics NVL2 functionality in Oracle. | |
NVL2 is used to handle nulls and conditionally substitute values. | |
Included: | |
1. Input data | |
2. Expected results | |
3. UDF code in java | |
4. Hive query to demo the UDF | |
5. Output |
This gist covers a simple Pig eval UDF in Java, that mimics NVL2 functionality in Oracle. | |
Included: | |
1. Input data | |
2. UDF code in java | |
3. Pig script to demo the UDF | |
4. Expected result | |
5. Command to execute script | |
6. Output |
This gist covers the Oozie SSH action. | |
It includes components of a sample Oozie workflow application- scripts/code, | |
sample data and commands; Oozie actions covered: secure shell action, email | |
action. | |
My blog has documentation, and highlights of a very basic sample program. | |
http://hadooped.blogspot.com/2013/10/apache-oozie-part-13-oozie-ssh-action_30.html | |
This gist includes: |
My blog has an introduction to reduce side join in Java map reduce- | |
http://hadooped.blogspot.com/2013/09/reduce-side-join-options-in-java-map.html | |
********************** | |
**Gist | |
********************** | |
This gist details how to inner join two large datasets on the map-side, leveraging the join capability | |
in mapreduce. Such a join makes sense if both input datasets are too large to qualify for distribution | |
through distributedcache, and can be implemented if both input datasets can be joined by the join key | |
and both input datasets are sorted in the same order, by the join key. | |
There are two critical pieces to engaging the join behavior: |
************************* | |
Gist | |
************************* | |
One more gist related to controlling the number of mappers in a mapreduce task. | |
Background on Inputsplits | |
-------------------------- | |
An inputsplit is a chunk of the input data allocated to a map task for processing. FileInputFormat | |
generates inputsplits (and divides the same into records) - one inputsplit for each file, unless the |
********************** | |
Gist | |
********************** | |
A common interview question for a Hadoop developer position is whether we can control the number of | |
mappers for a job. We can - there are a few ways of controlling the number of mappers, as needed. | |
Using NLineInputFormat is one way. | |
About NLineInputFormat | |
---------------------- |