Skip to content

Instantly share code, notes, and snippets.

View samarth-agrawal-86's full-sized avatar
🟣
Come to the Dart side

Samarth Agrawal samarth-agrawal-86

🟣
Come to the Dart side
View GitHub Profile
@samarth-agrawal-86
samarth-agrawal-86 / features with duplicate index.py
Created July 14, 2021 13:16
Detect features that have duplicate index
# load packages
import pandas as pd
from fast_ml.utilities import display_all
from fast_ml.feature_selection import get_duplicate_features
# load dataset
df = pd.read_csv('/kaggle/input/dataset-1/dataset_1.csv')
# function to detect duplicate features
duplicate_features = get_duplicate_features(df)
@samarth-agrawal-86
samarth-agrawal-86 / features with duplicate values.py
Created July 14, 2021 13:12
Detect features with duplicate values
# load packages
import pandas as pd
from fast_ml.utilities import display_all
from fast_ml.feature_selection import get_duplicate_features
# load dataset
df = pd.read_csv('/kaggle/input/dataset-1/dataset_1.csv')
# function to detect duplicate features
duplicate_features = get_duplicate_features(df)
@samarth-agrawal-86
samarth-agrawal-86 / custom_code_train_valid_test_split_sorted.py
Created May 15, 2021 23:53
Sorted Split - To create train valid test dataset using custom code
import pandas as pd
df = pd.read_csv('/kaggle/input/bluebook-for-bulldozers/TrainAndValid.csv', parse_dates=['saledate'], low_memory=False)
# Let's say we want to split the data in 80:10:10 for train:valid:test dataset
train_size = 0.8
valid_size=0.1
train_index = int(len(df)*train_size)
@samarth-agrawal-86
samarth-agrawal-86 / fast_ml_train_valid_test_split_sorted.py
Created May 15, 2021 23:50
Sorted Split - To create train valid test dataset using fast_ml train_valid_test_split
import pandas as pd
df = pd.read_csv('/kaggle/input/bluebook-for-bulldozers/TrainAndValid.csv', parse_dates=['saledate'], low_memory=False)
from fast_ml.model_development import train_valid_test_split
X_train, y_train, X_valid, y_valid, X_test, y_test = train_valid_test_split(df, target = 'SalePrice',
method='sorted', sort_by_col='saledate',
train_size=0.8, valid_size=0.1, test_size=0.1)
@samarth-agrawal-86
samarth-agrawal-86 / fast_ml_train_valid_test_split_random.py
Created May 15, 2021 23:47
Random Split - To create train valid test dataset using fast_ml train_valid_test_split
import pandas as pd
df = pd.read_csv('/kaggle/input/bluebook-for-bulldozers/TrainAndValid.csv', parse_dates=['saledate'], low_memory=False)
from fast_ml.model_development import train_valid_test_split
X_train, y_train, X_valid, y_valid, X_test, y_test = train_valid_test_split(df, target = 'SalePrice',
train_size=0.8, valid_size=0.1, test_size=0.1)
@samarth-agrawal-86
samarth-agrawal-86 / sklearn_train_test_split_random.py
Last active July 20, 2023 06:39
Random Split - To create train valid test dataset using sklearn train test split
import pandas as pd
df = pd.read_csv('/kaggle/input/bluebook-for-bulldozers/TrainAndValid.csv', parse_dates=['saledate'], low_memory=False)
from sklearn.model_selection import train_test_split
# Let's say we want to split the data in 80:10:10 for train:valid:test dataset
train_size=0.8
X = df.drop(columns = ['SalePrice']).copy()
@samarth-agrawal-86
samarth-agrawal-86 / sentiment_model_test.py
Last active March 26, 2020 21:50
Testing on test dataset
# Get test data loss and accuracy
test_losses = [] # track loss
num_correct = 0
# init hidden state
h = net.init_hidden(batch_size)
net.eval()
# iterate over test data
from string import punctuation
def tokenize_review(test_review):
test_review = test_review.lower() # lowercase
# get rid of punctuation
test_text = ''.join([c for c in test_review if c not in punctuation])
# splitting by spaces
test_words = test_text.split()
# loss and optimization functions
lr=0.001
criterion = nn.BCELoss()
optimizer = torch.optim.Adam(net.parameters(), lr=lr)
# training params
epochs = 4 # 3-4 is approx where I noticed the validation loss stop decreasing
@samarth-agrawal-86
samarth-agrawal-86 / sentiment_model_define_class.py
Last active August 4, 2019 16:55
Model Class defined in pytorch framework
import torch.nn as nn
class SentimentLSTM(nn.Module):
"""
The RNN model that will be used to perform Sentiment analysis.
"""
def __init__(self, vocab_size, output_size, embedding_dim, hidden_dim, n_layers, drop_prob=0.5):
"""
Initialize the model by setting up the layers.