Skip to content

Instantly share code, notes, and snippets.

View dhwajraj's full-sized avatar

Dhwaj Raj dhwajraj

View GitHub Profile
@dhwajraj
dhwajraj / gist:cf4e796ab67925ad0d89d6a015fc05fc
Last active October 24, 2018 07:02
iterative extract the noisy samples from training data which hamper the classifier learning
fout = open('classifier_votes.txt','w')
counter={}
for ll in range(100):
print(ll)
X_train, X_test, y_train, y_test, ix_train, ix_test = train_test_split(X, y, indices, test_size=0.2, random_state=ll)
classifiers=[]
classifiers.append(LogisticRegression(class_weight='balanced'))
classifiers.append(RandomForestClassifier(n_estimators=10, max_depth=4, random_state=0,
max_features=None,criterion="entropy", class_weight='balanced'))
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@dhwajraj
dhwajraj / pdfmerger.py
Created January 14, 2017 13:55
pdf merging with filename as bookmarks.. got from github
#! /usr/bin/env python
# Original author Nicholas Kim, modified by Yan Pashkovsky
# New license - GPL v3
import sys
import time
from PyPDF2 import utils, PdfFileReader, PdfFileWriter
def get_cmdline_arguments():
"""Retrieve command line arguments."""
@dhwajraj
dhwajraj / gist:dd33c82341b083ad1fcaea3ca60dc541
Created September 23, 2016 07:13
Launching PySpark application which uses tensorflow serving based prediction on kinesis streaming data
spark-submit --packages org.apache.spark:spark-streaming-kinesis-asl-assembly_2.10:1.6.1 --master yarn-cluster \
--deploy-mode cluster --executor-memory 16g --num-executors 2 --driver-memory 6g --executor-cores 4 \
--conf spark.yarn.executor.memoryOverhead=1000 --py-files /mnt/app.egg /mnt/KinesisReceiver.py
@dhwajraj
dhwajraj / tensorflow_emr_boot_script.sh
Last active September 23, 2016 09:18
boot script for installing Java8 and tensorflow when launching a spark EMR cluster.
# Check java version
JAVA_VER=$(java -version 2>&1 | sed 's/java version "\(.*\)\.\(.*\)\..*"/\1\2/; 1q')
if [ "$JAVA_VER" -lt 18 ]
then
# Figure out how many versions of Java and javac we currently have
NR_OF_JRE_OPTIONS=$(echo 0 | alternatives --config java 2>/dev/null | grep 'There ' | awk '{print $3}' | tail -1)
NR_OF_SDK_OPTIONS=$(echo 0 | alternatives --config javac 2>/dev/null | grep 'There ' | awk '{print $3}' | tail -1)
# Silent install javac (includes jre)
@dhwajraj
dhwajraj / emr_create_spark_cluster.sh
Last active September 23, 2016 05:01
Creating aws emr Spark cluster using aws-cli
aws emr create-cluster --termination-protected --applications Name=Hadoop Name=Hive Name=Pig Name=Hue Name=Ganglia Name=Spark \
--bootstrap-actions '[{"Path":"s3://config-test/utils/boot_script.sh","Name":"Java and Tensorflow Install boot script"}]' \
--ec2-attributes '{"KeyName":"abcdefgh","InstanceProfile":"EMR_EC2_DefaultRole","ServiceAccessSecurityGroup":"sg-a70038df","SubnetId":"subnet-f993ccd1","EmrManagedSlaveSecurityGroup":"sg-000000","EmrManagedMasterSecurityGroup":"sg-000000"}' \
--service-role EMR_DefaultRole --enable-debugging --release-label emr-4.4.0 \
--log-uri 's3n://aws-logs-00000000-us-east-1/elasticmapreduce/' --name 'Agent' \
--instance-groups '[{"InstanceCount":2,"InstanceGroupType":"CORE","InstanceType":"m4.2xlarge","Name":"Core instance group - 2"}\
,{"InstanceCount":1,"InstanceGroupType":"MASTER","InstanceType":"m4.2xlarge","Name":"Master instance group - 1"}]' \
--region us-east-1