Libin Pan libin

## YARNNodeLabels.md

      
              1 file
            
          
              0 forks
            
          
                0 comments
              
            
              0 stars
            
          
                libin
                / YARNNodeLabels.md
            
            
              Created
              August 31, 2022 02:56
                — forked from asears/YARNNodeLabels.md
            
          
    Creating subclusters and node groups within YARN queues using node labels.

Create directories in HDFS for node labels

hadoop fs -mkdir -p /yarn/node-labels
hadoop fs -chown -R yarn:yarn /yarn
hadoop fs -chmod -R 700 /yarn
hadoop fs -mkdir -p /user/yarn
hadoop fs -chown -R yarn:yarn /user/yarn
hadoop fs -chmod -R 700 /user/yarn


## gist:1c6c2c88c159fa0f9c716aef0918b962
# YouTube (english) : https://www.youtube.com/watch?v=FtU2_bBfSgM
# YouTube (french) : https://www.youtube.com/watch?v=VjnaVBnERDU

#
# On your laptop, connect to the Mac instance with SSH (similar to Linux instances)
#
ssh -i <your private key.pem> ec2-user@<your public ip address>

#
# On the Mac

## Gemfile
ruby '2.7.1'

gem 'rails', github: 'rails/rails'
gem 'tzinfo-data', '>= 1.2016.7'  # Don't rely on OSX/Linux timezone data

# Action Text
gem 'actiontext', github: 'basecamp/actiontext', ref: 'okra'
gem 'okra', github: 'basecamp/okra'

# Drivers

## web-servers.md

      
              1 file
            
          
              0 forks
            
          
                0 comments
              
            
              0 stars
            
          
                libin
                / web-servers.md
            
            
              Created
              March 14, 2020 04:11
                — forked from willurd/web-servers.md
            
              
                Big list of http static server one-liners
              
          
    Each of these commands will run an ad hoc http static server in your current (or specified) directory, available at http://localhost:8000. Use this power wisely.
Discussion on reddit.
Python 2.x

$ python -m SimpleHTTPServer 8000

  
## count-lines-hadoop-streaming
In a Hadoop cluster, if you would like to get a count of lines in some files, one easy way is to do the following:

hadoop fs -cat inputdir/* | wc -l

However this streams the content from all machines to the single machine that performs the counting.
It would be nice if "hadoop fs" has a subcommand to do this for example "hadoop fs -wc -l" but that is not the case.

An alternative is to use Hadoop streaming to parallize the lines counting task and then a single reducor to sum up the results from all the nodes. Something like the following:
hadoop jar ${HADOOP_HOME}/hadoop-streaming.jar \
  -Dmapred.reduce.tasks=1 \

## Getting-Cheat-Sheet.md

      
              1 file
            
          
              0 forks
            
          
                0 comments
              
            
              0 stars
            
          
                libin
                / Getting-Cheat-Sheet.md
            
            
              Created
              December 14, 2016 23:04
                — forked from akras14/Getting-Cheat-Sheet.md
            
          
    Git Cheat Sheet

Visit my blog or connect with me on Twitter
Commands

Getting Started

git init
or

  
## gist:23312de56c3bdf1b0b94a0c122673537
# On build host (has internet access): Download and install NodeJS and NPM

wget http://nodejs.org/dist/node-v0.4.10.tar.gz
tar xvzf node-v0.4.10.tar.gz
cd node-v0.4.11
./configure
make
sudo make install
wget http://npmjs.org/install.sh
sudo sh ./install.sh

## benchmark-commands.txt
Producer

Setup
bin/kafka-topics.sh --zookeeper esv4-hcl197.grid.linkedin.com:2181 --create --topic test-rep-one --partitions 6 --replication-factor 1
bin/kafka-topics.sh --zookeeper esv4-hcl197.grid.linkedin.com:2181 --create --topic test --partitions 6 --replication-factor 3

Single thread, no replication

bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance test7 50000000 100 -1 acks=1 bootstrap.servers=esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864 batch.size=8196

## libevent-2.0.22-stable.sh
#!/bin/sh

curl -sL 'https://github.com/downloads/libevent/libevent/libevent-2.0.22-stable.tar.gz' | tar zx
cd libevent-2.0.20-stable/
./configure --prefix=/usr/local/libevent/2.0.22-stable
make
sudo make install
sudo alternatives --install /usr/local/lib64/libevent libevent /usr/local/libevent/2.0.22-stable/lib 20018 \
  --slave /usr/local/include/libevent libevent-include /usr/local/libevent/2.0.22-stable/include \
  --slave /usr/local/bin/event_rpcgen.py event_rpcgen /usr/local/libevent/2.0.22-stable/bin/event_rpcgen.py

## gist:1c480ba06b5ae03c23baed22bac28854
require 'active_record'

ActiveRecord::Base.logger = Logger.new(STDERR)
ActiveRecord::Base.colorize_logging = false

ActiveRecord::Base.establish_connection(
    :adapter => "sqlite3",
    :dbfile  => ":memory:"
)
	# YouTube (english) : https://www.youtube.com/watch?v=FtU2_bBfSgM
	# YouTube (french) : https://www.youtube.com/watch?v=VjnaVBnERDU

	#
	# On your laptop, connect to the Mac instance with SSH (similar to Linux instances)
	#
	ssh -i <your private key.pem> ec2-user@<your public ip address>

	#
	# On the Mac
	ruby '2.7.1'

	gem 'rails', github: 'rails/rails'
	gem 'tzinfo-data', '>= 1.2016.7' # Don't rely on OSX/Linux timezone data

	# Action Text
	gem 'actiontext', github: 'basecamp/actiontext', ref: 'okra'
	gem 'okra', github: 'basecamp/okra'

	# Drivers
	In a Hadoop cluster, if you would like to get a count of lines in some files, one easy way is to do the following:

	hadoop fs -cat inputdir/* \| wc -l

	However this streams the content from all machines to the single machine that performs the counting.
	It would be nice if "hadoop fs" has a subcommand to do this for example "hadoop fs -wc -l" but that is not the case.

	An alternative is to use Hadoop streaming to parallize the lines counting task and then a single reducor to sum up the results from all the nodes. Something like the following:
	hadoop jar ${HADOOP_HOME}/hadoop-streaming.jar \
	-Dmapred.reduce.tasks=1 \
	# On build host (has internet access): Download and install NodeJS and NPM

	wget http://nodejs.org/dist/node-v0.4.10.tar.gz
	tar xvzf node-v0.4.10.tar.gz
	cd node-v0.4.11
	./configure
	make
	sudo make install
	wget http://npmjs.org/install.sh
	sudo sh ./install.sh
	Producer

	Setup
	bin/kafka-topics.sh --zookeeper esv4-hcl197.grid.linkedin.com:2181 --create --topic test-rep-one --partitions 6 --replication-factor 1
	bin/kafka-topics.sh --zookeeper esv4-hcl197.grid.linkedin.com:2181 --create --topic test --partitions 6 --replication-factor 3

	Single thread, no replication

	bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance test7 50000000 100 -1 acks=1 bootstrap.servers=esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864 batch.size=8196
	#!/bin/sh

	curl -sL 'https://github.com/downloads/libevent/libevent/libevent-2.0.22-stable.tar.gz' \| tar zx
	cd libevent-2.0.20-stable/
	./configure --prefix=/usr/local/libevent/2.0.22-stable
	make
	sudo make install
	sudo alternatives --install /usr/local/lib64/libevent libevent /usr/local/libevent/2.0.22-stable/lib 20018 \
	--slave /usr/local/include/libevent libevent-include /usr/local/libevent/2.0.22-stable/include \
	--slave /usr/local/bin/event_rpcgen.py event_rpcgen /usr/local/libevent/2.0.22-stable/bin/event_rpcgen.py
	require 'active_record'

	ActiveRecord::Base.logger = Logger.new(STDERR)
	ActiveRecord::Base.colorize_logging = false

	ActiveRecord::Base.establish_connection(
	:adapter => "sqlite3",
	:dbfile => ":memory:"
	)