Skip to content

Instantly share code, notes, and snippets.

@eellpp
eellpp / Kafka Installation
Created January 30, 2015 11:05
Kafka Installation 0.8.1.1
Install Java:
sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java7-installer
Install Scala:
curl -O http://www.scala-lang.org/files/archive/scala-2.9.1-1.tgz
tar -xzf scala-2.9.1-1.tgz
sudo mv scala-2.9.1-1 /usr/share
@eellpp
eellpp / VirtualBox : Map the command key
Last active November 9, 2015 02:16
map the mac keyboard command key for easy copy paste with windows vm
Installed Sharp keys and followed the instructions from here:
https://forums.virtualbox.org/viewtopic.php?f=8&t=63567#p298750
@eellpp
eellpp / gist:d6d06c258e8710d18495
Last active November 17, 2015 01:17
Mac VirtualBox Settings for Debian
## Installing Guest Additions on Debian
http://virtualboxes.org/doc/installing-guest-additions-on-debian/
Follow these steps to install the Guest Additions on your Debian virtual machine:
Login as root;
Update your APT database with apt-get update;
Install the latest security updates with apt-get upgrade;
Install required packages with apt-get install build-essential module-assistant;
Configure your system for building kernel modules by running m-a prepare;
Click on Install Guest Additions… from the Devices menu, then run mount /media/cdrom.
hadoop fs -ls <>
hadoop fs -rmr
..
use hadoop streaming to run python jobs
mapper reads from stdin and prints out the key and value
the sorted and shuffled output from mappers is given to reducers
since the input to reducers is sorted on the keys, some assumptions about the key read ordering can be made
@eellpp
eellpp / R ml notes
Last active December 11, 2015 02:21
#In this case the blank cell are to be marked as NA. So have to tell R
train.data = read.csv("train.csv", na.strings=c("NA", ""))
# taking data from dataframe based on condition : filtering data
train.data[which(train.data$Survived == 1),"Survived"])
length(train.data[which(train.data$Survived == 1 & train.data$Age > 50),"Survived"])
OR
length(train.data$Survived[train.data$Survived == 1 & train.data$Age > 50])
Reference Cards
http://www.astro.up.pt/~sousasag/Python_For_Astronomers/Python_qr.pdf
Regular expression
https://developers.google.com/edu/python/regular-expressions?hl=en
Python Basics
@eellpp
eellpp / screen_conf
Last active February 19, 2016 01:56
GNU Screen configuration
Problem:
When working in a specific context, i tend to open a number of screen windows and navigate between them. The open windows, with their names, make it lot easier to get into the context when i am at it again.
Solution:
Create multiple screen config files, specific to each context and open them later when required
Create a base .screenrc with common config. Let the other screen config files source this
Add a bash function to open corresponding screen terminal
Procedure:
# create a folder to save the different screen configs
Corpus
>> from nltk.corpus import brown
>>> brown.words()[0:10]
sentence & word tokenization
part-of-speech tagging
chunking & named entity recognition
text classification
many included corpora
@eellpp
eellpp / gist:fcdcb03ca02fbd495b67ce7e488422f5
Last active August 17, 2018 12:02
Running Hadoop in pseudo-distributed mode on mac
> brew install hadoop
This installs hadoop at /usr/local/Cellar/hadoop/2.7.3
Find java home
> cd /usr/local/Cellar/hadoop/2.7.3
> vim etc/hadoop/hadoop-env.sh
The JAVA_HOME should be set as below in file
export JAVA_HOME="$(/usr/libexec/java_home)"
@eellpp
eellpp / multiplot.R
Created May 21, 2016 11:25
Plot multiple ggplot charts together
# From http://www.cookbook-r.com/Graphs/Multiple_graphs_on_one_page_(ggplot2)/
#
#Multiple plot function
#
# ggplot objects can be passed in ..., or to plotlist (as a list of ggplot objects)
# - cols: Number of columns in layout
# - layout: A matrix specifying the layout. If present, 'cols' is ignored.
#
# If the layout is something like matrix(c(1,2,3,3), nrow=2, byrow=TRUE),
# then plot 1 will go in the upper left, 2 will go in the upper right, and