Skip to content

Instantly share code, notes, and snippets.

View charalak's full-sized avatar

Charalambos Kanella charalak

View GitHub Profile
@charalak
charalak / Pig_cheatsheet.md
Last active January 29, 2018 08:47
ApachePig cheatsheet

PIG

It is on top of MapReduce. It is a step by step language. U can build relations in ur data. it uses SQL-like syntax to define map and reduce steps.

It is much faster than MapReduce because it makes tasks more automated. MR taks quite of time to be written.

Running Pig

  • Grunt: Javascript task runner
  • Script
  • Ambari/ Hue
@charalak
charalak / mapreduce_cheatsheet.md
Last active January 29, 2018 08:48
mapreduce cheatsheet

MAPREDUCE (MR)

What is it?

  • MR distributes the data processing on the cluster
  • Divides data into partitions that are tranformed (MAPPED) and aggregated (REDUCED) by the mapper and reducer functions respectively.
  • Monitors mappers and reducers on each partition

How does it work?

HDFS

  • Handles huuuuge files distributed across clusters
  • Large files are breaked into blocks and distributed around. Also distribute the processing of those blocks
  • Each block has also many copies distributed around so as not to lost any bit of information in any case of disaster.

HDFS ARCHITECUTRE

Name Node - maintain records where and what everything is in Data Nodes.

@charalak
charalak / hadoop_cheetsheet.md
Last active January 23, 2018 09:30
Hadoop-cheetsheet

Handoop cheetsheet

Setup on MacOS

  • Download and install a Virtual Machine (VM) from here.
  • Then download and install Hortonworks Sandbox for HDP and HDF, which is a quick and easy personal desktop environment to get started on learning, developing, testing and trying out new features. The link is here, download the Hortonworks Sandbox on a VM (~11 gb).

** Important Note: ** If installation of the VM fails, you will still see VM in your applications folder but it will be corrupted. The installation problem is resolved when User approves Kernel Extension. Follow steps here.

Sample dataset

Sample dataset can be found here

@charalak
charalak / Postgres_cheetsheet.md
Last active March 6, 2018 09:48
PostgreSQL cheetsheet

Most information is borrowed from here

SETUP

MacOS Installation

Step 1: Install Homebrew

install Homebrew.

Step 2: Update Homebrew and check if it is healthy

$ brew update
$ brew doctor