Bastin Robin BastinRobin

## ensemble.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                BastinRobin
                / ensemble.md
            
            
              Created
              May 12, 2021 08:21
            
              
                Ensemble Technique
              
          
    Implement Ensemble Techniques On Given Dataste

Dataset
This given dataset consists of the following datapoints

id  - unique ID for excerpt
url_legal - URL of source - this is blank in the test set.
license - license of source material - this is blank in the test set.
excerpt - text to predict reading ease of
target - reading ease


## kmean.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              1 star
            
          
                BastinRobin
                / kmean.md
            
            
              Created
              May 5, 2021 08:36
            
              
                Implementation Of K-Mean To Analyse Crime in US
              
          
    # K-Means Clustering - U.S. Crime Data

We'll use k-means to discover clusters in a data set using unsupervised learning. The original data can be found here

https://ufile.io/39bbkph1
From the Unified Crime Reporting Statistics and under the collaboration of the U.S. Department of Justice and the Federal Bureau of Investigation information crime statistics are available for public review. The following data set has information on the crime rates and totals for states across the United States for a wide range of years. The crime reports are divided into two main categories: property and violent crime. Property crime refers to burglary, larceny, and motor related crime while violent crime refers to assault, murder, rape, and robbery. These reports go from 1960 to 2012.
The analysis consists of the following steps.

I. Importing necessary libraries and downloading the data


## naive.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                BastinRobin
                / naive.md
            
            
              Last active
              March 24, 2021 04:48
            
              
                Implement Naive Bayes
              
          
    Implement Naive Bayes Classifier

Naive Bayes methods are a set of supervised learning algorithms based on applying Bayes' theorem with the “naive” assumption of conditional independence between every pair of features given the value of the class variable.


Use the above dataframe as reference and build a Naive Bayes Classifier using python. Follow the guidelines.

Build a production ready classifier following the API interfaces.


## mds.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              1 star
            
          
                BastinRobin
                / mds.md
            
            
              Last active
              March 17, 2021 08:36
            
              
                Multi-Dimensional Scaling
              
          
    Singular Value Decomposition

In linear algebra, the singular value decomposition (SVD) is a factorization of a real or complex matrix that generalizes the eigendecomposition of a square normal matrix to any. matrix via an extension of the polar decomposition.

Multidimensional Scaling

Multidimensional scaling is a powerful technique used to visualize in 2-dimensional space the (dis)similarity among objects. The idea is that we can derive to what extent two objects are similar, based on the geometric distance they exhibit in the 2D plan.

  
## pca.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                BastinRobin
                / pca.md
            
            
              Created
              March 10, 2021 08:36
            
              
                PCA 
              
          
    PCA

There is no pca() function in NumPy, but we can easily calculate the Principal Component Analysis step-by-step using NumPy functions.
The example below defines a small 3×2 matrix, centers the data in the matrix, calculates the covariance matrix of the centered data, and then the eigendecomposition of the covariance matrix. The eigenvectors and eigenvalues are taken as the principal components and singular values and used to project the original data.
from numpy import array
from numpy import mean

from numpy import cov

  
## PCA.py

## PCA

There is no pca() function in NumPy, but we can easily calculate the Principal Component Analysis step-by-step using NumPy functions.
The example below defines a small 3×2 matrix, centers the data in the matrix, calculates the covariance matrix of the centered data, and then the eigendecomposition of the covariance matrix. The eigenvectors and eigenvalues are taken as the principal components and singular values and used to project the original data.


    from numpy import array
    from numpy import mean
    from numpy import cov

## Regression.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                BastinRobin
                / Regression.md
            
            
              Last active
              March 3, 2021 05:28
            
              
                Regression Analysis Lab
              
          
    Regression Analysis

This exercise is created to make you confident in building multiple regression models using spreadsheets.
This dataset contains information collected by the U.S Census Service concerning housing in the area of Boston Mass. It was obtained from the StatLib archive, and has been used extensively throughout the literature to benchmark algorithms. However, these comparisons were primarily done outside of Delve and are thus somewhat suspect. The dataset is small in size with only 506 cases.

Download the dataset using this link https://gofile.io/d/YZUprY
Use MS Excel to build a Multiple Regression Models
Audit the regression summary and elimate features which are not significant.
Predict the house price MEDV .


## sample.js
let all_features = ["sex_male", "passengerid", "fare", "age", "sex_female", "pclass", "parch", "sibsp", "cabin_missing", "embarked_C", "embarked_S", "cabin_E25", "cabin_C22 C26", "cabin_E24", "embarked_Q", "cabin_B96 B98", "cabin_E12", "cabin_C106", "cabin_C104", "cabin_D45", "cabin_E17", "cabin_E10", "cabin_C52", "cabin_A23", "cabin_B20", "cabin_E8", "cabin_C126", "cabin_E77", "cabin_E121", "cabin_D35", "cabin_D19", "cabin_C148", "cabin_C92", "cabin_C86", "cabin_B49", "cabin_B50", "cabin_C70", "cabin_A26", "cabin_F2", "cabin_C47", "cabin_B82 B84", "cabin_B41", "cabin_E50", "cabin_D49", "cabin_A10", "cabin_A34", "cabin_C111", "cabin_G6", "cabin_A20", "cabin_D33", "cabin_E101", "cabin_B37", "cabin_C124", "cabin_C65", "cabin_D48", "cabin_B71", "cabin_B51 B53 B55", "cabin_C68", "cabin_B101", "cabin_C82", "cabin_C118", "cabin_F33", "cabin_D", "cabin_B77", "cabin_C2", "cabin_B5", "cabin_C23 C25 C27", "cabin_C46", "cabin_B18", "cabin_B57 B59 B63 B66", "cabin_D30", "cabin_B22", "cabin_E67", "cabin_C91", "cabin_E44",

## sample.py
# Using a Python dictionary to act as an adjacency list
graph = {
    'A' : ['B','C'],
    'B' : ['D', 'E'],
    'C' : ['F'],
    'D' : [],
    'E' : ['F'],
    'F' : []
}

## Imputer.py
import numpy
import pandas

from sklearn.base import TransformerMixin

class SeriesImputer(TransformerMixin):

    def __init__(self):
        """Impute missing values.

	## PCA

	There is no pca() function in NumPy, but we can easily calculate the Principal Component Analysis step-by-step using NumPy functions.
	The example below defines a small 3×2 matrix, centers the data in the matrix, calculates the covariance matrix of the centered data, and then the eigendecomposition of the covariance matrix. The eigenvectors and eigenvalues are taken as the principal components and singular values and used to project the original data.


	from numpy import array
	from numpy import mean
	from numpy import cov
	# Using a Python dictionary to act as an adjacency list
	graph = {
	'A' : ['B','C'],
	'B' : ['D', 'E'],
	'C' : ['F'],
	'D' : [],
	'E' : ['F'],
	'F' : []
	}
	import numpy
	import pandas

	from sklearn.base import TransformerMixin

	class SeriesImputer(TransformerMixin):

	def __init__(self):
	"""Impute missing values.