Skip to content

Instantly share code, notes, and snippets.

View BastinRobin's full-sized avatar
🏠
Working from home

Bastin Robin BastinRobin

🏠
Working from home
View GitHub Profile
@BastinRobin
BastinRobin / ensemble.md
Created May 12, 2021 08:21
Ensemble Technique

Implement Ensemble Techniques On Given Dataste

Dataset This given dataset consists of the following datapoints

  • id - unique ID for excerpt
  • url_legal - URL of source - this is blank in the test set.
  • license - license of source material - this is blank in the test set.
  • excerpt - text to predict reading ease of
  • target - reading ease
@BastinRobin
BastinRobin / kmean.md
Created May 5, 2021 08:36
Implementation Of K-Mean To Analyse Crime in US

# K-Means Clustering - U.S. Crime Data

We'll use k-means to discover clusters in a data set using unsupervised learning. The original data can be found here
https://ufile.io/39bbkph1

From the Unified Crime Reporting Statistics and under the collaboration of the U.S. Department of Justice and the Federal Bureau of Investigation information crime statistics are available for public review. The following data set has information on the crime rates and totals for states across the United States for a wide range of years. The crime reports are divided into two main categories: property and violent crime. Property crime refers to burglary, larceny, and motor related crime while violent crime refers to assault, murder, rape, and robbery. These reports go from 1960 to 2012.

The analysis consists of the following steps.

  • I. Importing necessary libraries and downloading the data
@BastinRobin
BastinRobin / naive.md
Last active March 24, 2021 04:48
Implement Naive Bayes

Implement Naive Bayes Classifier

Naive Bayes methods are a set of supervised learning algorithms based on applying Bayes' theorem with the “naive” assumption of conditional independence between every pair of features given the value of the class variable.

enter image description here

enter image description here

Use the above dataframe as reference and build a Naive Bayes Classifier using python. Follow the guidelines.

  1. Build a production ready classifier following the API interfaces.
@BastinRobin
BastinRobin / mds.md
Last active March 17, 2021 08:36
Multi-Dimensional Scaling

Singular Value Decomposition

In linear algebra, the singular value decomposition (SVD) is a factorization of a real or complex matrix that generalizes the eigendecomposition of a square normal matrix to any. matrix via an extension of the polar decomposition.

IMAGE ALT TEXT HERE

Multidimensional Scaling

Multidimensional scaling is a powerful technique used to visualize in 2-dimensional space the (dis)similarity among objects. The idea is that we can derive to what extent two objects are similar, based on the geometric distance they exhibit in the 2D plan.

PCA

There is no pca() function in NumPy, but we can easily calculate the Principal Component Analysis step-by-step using NumPy functions. The example below defines a small 3×2 matrix, centers the data in the matrix, calculates the covariance matrix of the centered data, and then the eigendecomposition of the covariance matrix. The eigenvectors and eigenvalues are taken as the principal components and singular values and used to project the original data.

from numpy import array
from numpy import mean

from numpy import cov

@BastinRobin
BastinRobin / PCA.py
Last active March 10, 2021 08:36
Manually Calculate Principal Component Analysis
## PCA
There is no pca() function in NumPy, but we can easily calculate the Principal Component Analysis step-by-step using NumPy functions.
The example below defines a small 3×2 matrix, centers the data in the matrix, calculates the covariance matrix of the centered data, and then the eigendecomposition of the covariance matrix. The eigenvectors and eigenvalues are taken as the principal components and singular values and used to project the original data.
from numpy import array
from numpy import mean
from numpy import cov
@BastinRobin
BastinRobin / Regression.md
Last active March 3, 2021 05:28
Regression Analysis Lab

Regression Analysis

This exercise is created to make you confident in building multiple regression models using spreadsheets.

This dataset contains information collected by the U.S Census Service concerning housing in the area of Boston Mass. It was obtained from the StatLib archive, and has been used extensively throughout the literature to benchmark algorithms. However, these comparisons were primarily done outside of Delve and are thus somewhat suspect. The dataset is small in size with only 506 cases.

  • Download the dataset using this link https://gofile.io/d/YZUprY
  • Use MS Excel to build a Multiple Regression Models
  • Audit the regression summary and elimate features which are not significant.
  • Predict the house price MEDV .
@BastinRobin
BastinRobin / sample.js
Created January 22, 2021 06:18
String Matching Algo
let all_features = ["sex_male", "passengerid", "fare", "age", "sex_female", "pclass", "parch", "sibsp", "cabin_missing", "embarked_C", "embarked_S", "cabin_E25", "cabin_C22 C26", "cabin_E24", "embarked_Q", "cabin_B96 B98", "cabin_E12", "cabin_C106", "cabin_C104", "cabin_D45", "cabin_E17", "cabin_E10", "cabin_C52", "cabin_A23", "cabin_B20", "cabin_E8", "cabin_C126", "cabin_E77", "cabin_E121", "cabin_D35", "cabin_D19", "cabin_C148", "cabin_C92", "cabin_C86", "cabin_B49", "cabin_B50", "cabin_C70", "cabin_A26", "cabin_F2", "cabin_C47", "cabin_B82 B84", "cabin_B41", "cabin_E50", "cabin_D49", "cabin_A10", "cabin_A34", "cabin_C111", "cabin_G6", "cabin_A20", "cabin_D33", "cabin_E101", "cabin_B37", "cabin_C124", "cabin_C65", "cabin_D48", "cabin_B71", "cabin_B51 B53 B55", "cabin_C68", "cabin_B101", "cabin_C82", "cabin_C118", "cabin_F33", "cabin_D", "cabin_B77", "cabin_C2", "cabin_B5", "cabin_C23 C25 C27", "cabin_C46", "cabin_B18", "cabin_B57 B59 B63 B66", "cabin_D30", "cabin_B22", "cabin_E67", "cabin_C91", "cabin_E44",
# Using a Python dictionary to act as an adjacency list
graph = {
'A' : ['B','C'],
'B' : ['D', 'E'],
'C' : ['F'],
'D' : [],
'E' : ['F'],
'F' : []
}
@BastinRobin
BastinRobin / Imputer.py
Last active June 23, 2020 05:52
Class Categorical Imputer
import numpy
import pandas
from sklearn.base import TransformerMixin
class SeriesImputer(TransformerMixin):
def __init__(self):
"""Impute missing values.