This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Install Java: | |
sudo add-apt-repository ppa:webupd8team/java | |
sudo apt-get update | |
sudo apt-get install oracle-java7-installer | |
Install Scala: | |
curl -O http://www.scala-lang.org/files/archive/scala-2.9.1-1.tgz | |
tar -xzf scala-2.9.1-1.tgz | |
sudo mv scala-2.9.1-1 /usr/share |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Installed Sharp keys and followed the instructions from here: | |
https://forums.virtualbox.org/viewtopic.php?f=8&t=63567#p298750 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
## Installing Guest Additions on Debian | |
http://virtualboxes.org/doc/installing-guest-additions-on-debian/ | |
Follow these steps to install the Guest Additions on your Debian virtual machine: | |
Login as root; | |
Update your APT database with apt-get update; | |
Install the latest security updates with apt-get upgrade; | |
Install required packages with apt-get install build-essential module-assistant; | |
Configure your system for building kernel modules by running m-a prepare; | |
Click on Install Guest Additions… from the Devices menu, then run mount /media/cdrom. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
hadoop fs -ls <> | |
hadoop fs -rmr | |
.. | |
use hadoop streaming to run python jobs | |
mapper reads from stdin and prints out the key and value | |
the sorted and shuffled output from mappers is given to reducers | |
since the input to reducers is sorted on the keys, some assumptions about the key read ordering can be made |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#In this case the blank cell are to be marked as NA. So have to tell R | |
train.data = read.csv("train.csv", na.strings=c("NA", "")) | |
# taking data from dataframe based on condition : filtering data | |
train.data[which(train.data$Survived == 1),"Survived"]) | |
length(train.data[which(train.data$Survived == 1 & train.data$Age > 50),"Survived"]) | |
OR | |
length(train.data$Survived[train.data$Survived == 1 & train.data$Age > 50]) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Reference Cards | |
http://www.astro.up.pt/~sousasag/Python_For_Astronomers/Python_qr.pdf | |
Regular expression | |
https://developers.google.com/edu/python/regular-expressions?hl=en | |
Python Basics |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Problem: | |
When working in a specific context, i tend to open a number of screen windows and navigate between them. The open windows, with their names, make it lot easier to get into the context when i am at it again. | |
Solution: | |
Create multiple screen config files, specific to each context and open them later when required | |
Create a base .screenrc with common config. Let the other screen config files source this | |
Add a bash function to open corresponding screen terminal | |
Procedure: | |
# create a folder to save the different screen configs |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Corpus | |
>> from nltk.corpus import brown | |
>>> brown.words()[0:10] | |
sentence & word tokenization | |
part-of-speech tagging | |
chunking & named entity recognition | |
text classification | |
many included corpora |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
> brew install hadoop | |
This installs hadoop at /usr/local/Cellar/hadoop/2.7.3 | |
Find java home | |
> cd /usr/local/Cellar/hadoop/2.7.3 | |
> vim etc/hadoop/hadoop-env.sh | |
The JAVA_HOME should be set as below in file | |
export JAVA_HOME="$(/usr/libexec/java_home)" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# From http://www.cookbook-r.com/Graphs/Multiple_graphs_on_one_page_(ggplot2)/ | |
# | |
#Multiple plot function | |
# | |
# ggplot objects can be passed in ..., or to plotlist (as a list of ggplot objects) | |
# - cols: Number of columns in layout | |
# - layout: A matrix specifying the layout. If present, 'cols' is ignored. | |
# | |
# If the layout is something like matrix(c(1,2,3,3), nrow=2, byrow=TRUE), | |
# then plot 1 will go in the upper left, 2 will go in the upper right, and |
OlderNewer