Kranthi Reddy kranthi1128

## wordCount.py
import os

from operator import add

from pyspark import SparkContext

if __name__ == "__main__":
    sc = SparkContext(appName="PythonWordCount")
    lines = sc.textFile("sample.txt", 1)
    counts = lines.flatMap(lambda x: x.split(' ')) \

## redis-command.md

      
              1 file
            
          
              0 forks
            
          
                0 comments
              
            
              0 stars
            
          
                kranthi1128
                / redis-command.md
            
            
              Created
              August 29, 2016 20:32
                — forked from vincent178/redis-command.md
            
              
                #redis
              
          
    KEYS *
type
FLUSHDB and FLUSHALL
FLUSHDB deletes all keys in the current databases
FLUSHALL deletes all keys in all databases
clear redis database
redis-cli KEYS soulmate-* | xargs redis-cli DEL

  
## REDIS.txt
DECR
DECRBY
DEL
EXISTS
EXPIRE
GET
GETSET
HDEL
HEXISTS
HGET

## wrapper.sh
#!/bin/bash

if [ $# -lt 3 ];
    then echo "Illegal number of parameters"
             echo "USAGE: PATH/oozie_workflow_run.sh <COMMON_PROP_FILE.properties> <WORKFLOW_FILE.xml> <WORKFLOW_PROP_FILE.properties>"
    exit
fi
#Environmental setup
. /prd/er/common/caqf/pre_model/scripts/setenv_pre_model_caqf.sh

## flip,java
public class Flip {

    public static void main(String[] args) {
        // Math.random() returns a value between 0.0 and 1.0
        // so it is heads or tails 50% of the time
        if (Math.random() < 0.5) System.out.println("Heads");
        else                     System.out.println("Tails");
    }
}

## pullvspush.txt
Pushing : The advantage of pushing is that you know your data and you know what you are pushing.
  No component knows (or should not know) the data better than the component who owns the data, which will theoretically imply a better design and a more robust system.

Pulling : The only advantage I see in pulling approach is the component who pulls exactly knows when it should pull.
  It can initiate the conversation when it needs the data and no component knows (or should not know) when the data is needed than the component who needs it.

Conclusion is, if your component requires the data only on specific time frames that might be triggered by the state of the component do a pull, if that is not the case do a push.

## NAGIOS
# Installing Nagios in RHEL/CentOS

# Dependencies to install
	webserver - httpd & php

1. RPM Packages to install
	$ rpm -Uvh http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm
	$ rpm -Uvh http://rpms.famillecollet.com/enterprise/remi-release-6.rpm
	$ yum -y install nagios nagios-plugins-all nagios-plugins-nrpe nrpe php httpd


## Automation frameworks
Puppet is a tool to automate infrastructure management. It follows client/server architecture model. Server is called Puppet Master and clients are termed as puppet agents.

# Puppet Installation in CentOS 6(Community Edition)
---------------------
*** Add puppet repository using rpm (both on client and server)
    $ rpm -ivh http://yum.puppetlabs.com/puppetlabs-release-el-6.noarch.rpm

    -- Replace 6 with the centos version to install(5 and 7 are supported)

    Run "yum repolist | grep puppet" to check if puppet is available or not

## Ambari Installation
PRE REQs:

1) setup password less SSH on all instances

2) add all hosts with FQDN in /etc/hosts file in all instances

3) Disable SELinux on all instances
          sudo /usr/sbin/setenforce 0
     sudo sed -i.old s/SELINUX=enforcing/SELINUX=disabled/ /etc/sysconfig/selinux
4) Diable Iptables on all instances so that daemons on other machines can interact

## Installing Spark on YARN cluster
----------------Installing Spark on existing YARN Cluster---------------------

Prerequistes :
* Check all hadoop deamons are running or not -- [namenode,datanode,resource manager,node manager,history-server]

Installation :
* Download the spark in /opt
    - $ cd /opt
    - $ wget http://d3kbcqa49mib13.cloudfront.net/spark-1.2.1-bin-hadoop2.4.tgz
* Untar the folder
	import os

	from operator import add

	from pyspark import SparkContext

	if __name__ == "__main__":
	sc = SparkContext(appName="PythonWordCount")
	lines = sc.textFile("sample.txt", 1)
	counts = lines.flatMap(lambda x: x.split(' ')) \
	#!/bin/bash

	if [ $# -lt 3 ];
	then echo "Illegal number of parameters"
	echo "USAGE: PATH/oozie_workflow_run.sh <COMMON_PROP_FILE.properties> <WORKFLOW_FILE.xml> <WORKFLOW_PROP_FILE.properties>"
	exit
	fi
	#Environmental setup
	. /prd/er/common/caqf/pre_model/scripts/setenv_pre_model_caqf.sh
	public class Flip {

	public static void main(String[] args) {
	// Math.random() returns a value between 0.0 and 1.0
	// so it is heads or tails 50% of the time
	if (Math.random() < 0.5) System.out.println("Heads");
	else System.out.println("Tails");
	}
	}
	Pushing : The advantage of pushing is that you know your data and you know what you are pushing.
	No component knows (or should not know) the data better than the component who owns the data, which will theoretically imply a better design and a more robust system.

	Pulling : The only advantage I see in pulling approach is the component who pulls exactly knows when it should pull.
	It can initiate the conversation when it needs the data and no component knows (or should not know) when the data is needed than the component who needs it.

	Conclusion is, if your component requires the data only on specific time frames that might be triggered by the state of the component do a pull, if that is not the case do a push.
	# Installing Nagios in RHEL/CentOS

	# Dependencies to install
	webserver - httpd & php

	1. RPM Packages to install
	$ rpm -Uvh http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm
	$ rpm -Uvh http://rpms.famillecollet.com/enterprise/remi-release-6.rpm
	$ yum -y install nagios nagios-plugins-all nagios-plugins-nrpe nrpe php httpd
	Puppet is a tool to automate infrastructure management. It follows client/server architecture model. Server is called Puppet Master and clients are termed as puppet agents.

	# Puppet Installation in CentOS 6(Community Edition)
	---------------------
	*** Add puppet repository using rpm (both on client and server)
	$ rpm -ivh http://yum.puppetlabs.com/puppetlabs-release-el-6.noarch.rpm

	-- Replace 6 with the centos version to install(5 and 7 are supported)

	Run "yum repolist \| grep puppet" to check if puppet is available or not
	PRE REQs:

	1) setup password less SSH on all instances

	2) add all hosts with FQDN in /etc/hosts file in all instances

	3) Disable SELinux on all instances
	sudo /usr/sbin/setenforce 0
	sudo sed -i.old s/SELINUX=enforcing/SELINUX=disabled/ /etc/sysconfig/selinux
	4) Diable Iptables on all instances so that daemons on other machines can interact
	----------------Installing Spark on existing YARN Cluster---------------------

	Prerequistes :
	* Check all hadoop deamons are running or not -- [namenode,datanode,resource manager,node manager,history-server]

	Installation :
	* Download the spark in /opt
	- $ cd /opt
	- $ wget http://d3kbcqa49mib13.cloudfront.net/spark-1.2.1-bin-hadoop2.4.tgz
	* Untar the folder