Skip to content

Instantly share code, notes, and snippets.

View wolframalpha's full-sized avatar
💭
knobs!

Devi Prasad Khatua wolframalpha

💭
knobs!
View GitHub Profile
@wolframalpha
wolframalpha / evaluator_utils.py
Last active December 14, 2018 05:57
INCSQL -> SEQ2SEQ | Natural Language to SQL | WikiSQL
# from sklearn.metrics import f1_score, accuracy_score
import numpy as np
from collections import defaultdict
def get_metrics_cm_for_one_index(cm, index):
n_samples = cm.ravel().sum()
total_obs = cm[index].sum()
tp = cm[index, index]
fp = cm[:, index].sum() - cm[index, index]
@wolframalpha
wolframalpha / autofit_outliertreatment.py
Last active March 31, 2022 02:52
Function to remove or cap outliers in columns of a `pandas.DataFrame`
def treatoutliers(self, df=None, columns=None, factor=1.5, method='IQR', treament='cap'):
"""
Removes the rows from self.df whose value does not lies in the specified standard deviation
:param columns:
:param in_stddev:
:return:
"""
# if not columns:
# columns = self.mandatory_cols_ + self.optional_cols_ + [self.target_col]
if not columns:
@wolframalpha
wolframalpha / tunnel.sh
Created September 24, 2018 06:20
2024 jupyter interface
# tunnel
gcloud compute ssh --zone=us-east1-b --ssh-flag="-D" --ssh-flag="10001" --ssh-flag="-N" --ssh-flag="-n" "meow-2024-m"
# run the cmd in another terminal session
google-chrome "http://meow-2024-m:8123" \
--proxy-server="socks5://localhost:10001" \
--host-resolver-rules="MAP * 0.0.0.0 , EXCLUDE localhost" \
--user-data-dir=/tmp/ &
@wolframalpha
wolframalpha / read_excel.py
Created August 9, 2018 12:44
xlrd example for converting excel sheet into the into list of lists
import xlrd
wb=xlrd.open_workbook("/home/wolfram/Downloads/17750_01-05-2018_10619 COC VSWR Fault.xlsx")
ws=wb.sheet_by_index(0)
data = [[col.value for col in ws.row(n_row)] for n_row in range(ws.nrows)]
print(data)
@wolframalpha
wolframalpha / splitrowvalue.py
Last active March 7, 2018 06:50
This will work with file of any size since it read and write the data lazily, however it may fail in case the number of columns explodes(which is unlikely). You can even check on your local machine by downloading the file.
import pandas as pd
chunksize = 1000
ip_filename = '/home/wolfram/Downloads/DataParserData.csv'
op_filename = 'datafile.csv'
prefix = 'cleansedquery_'
all_keys = set()
first_write = True
@wolframalpha
wolframalpha / Finalize.cs
Created February 20, 2018 12:22
Override the function.
public override string Finalize()
{
//Calculating the summary statistics
sum = rowvalue.Sum();
min = rowvalue.Min();
max = rowvalue.Max();
count = rowvalue.Count();
mean=rowvalue.Average();
range=max-min;
columns = ['store_id',
'sale_type',
'is_online_sale',
'is_pick_up',
'pick_up_store_id',
'sku_id',
'tot_unit_sold',
'tot_promotion_price',
'tot_reg_price',
'tot_otd_price',
from functools import reduce
rename = {
'is_online_order': 'is_online_order',
'pos_disc_code': 'sale_type',
'brand': 'cat_attribute_desc_1',
'style_code': 'style_code',
'category_code_lvl_1': 'cat_lvl_code_1',
'category_code_lvl_2': 'cat_lvl_code_2',
'category_code_lvl_3': 'cat_lvl_code_3',
'category_code_lvl_4': 'cat_lvl_code_4',
@wolframalpha
wolframalpha / vmops.sh
Last active January 3, 2018 05:03
Bash script to STOP/START VMs in a cluster
#!/bin/bash
VM=$1
OPS=$2
NODES=$4
ZONE=$3
vms=($VM-m)
for (( i=0 ; i<$NODES-1; i++ ))
do
from pyspark import SparkConf,SparkContext
from pyspark.sql.functions import *
from pyspark.sql import *
from pyspark.sql.types import *
configs = [('spark.eventLog.enabled', 'true'),
('spark.dynamicAllocation.minExecutors', '8'),
('spark.executor.instances', '1000'),
('spark.driver.host', '10.142.0.3'),
('spark.yarn.am.memory', '640m'),