Skip to content

Instantly share code, notes, and snippets.

View vb100's full-sized avatar
🎯
Focusing on A.I., ML and Deep Learning

Vytautas Bielinskas vb100

🎯
Focusing on A.I., ML and Deep Learning
View GitHub Profile
@vb100
vb100 / automation_selenium_data_gathering_writing.py
Last active July 2, 2018 13:57
This Python code gathering numerical data from Stat. data website by Selenium, parse tables, structuring data and write all the values directly to Excel file including cell formating (OpenXlsX library).
# Import libraries
import requests, re, os
import pandas as pd
from bs4 import BeautifulSoup
import os
print("Starting sheet: Inflation rates")
""" Prepare Home directory : start """
os.chdir("C:\\Users\\Vytautas.Bielinskas\\Desktop\\PythonWorking\\Python\\")
@vb100
vb100 / automation_data_structuring_writing.py
Last active July 2, 2018 13:53
This Python code read CSV file and automatically recongizes real estate properties, finds specific cells at Excel file where specific values should be written, make all Excel formating and building all neccesary formulas only by one click! Half day work done just in half a minute.
# -*- coding: utf-8 -*-
""" Project Jupyter - extract data from PDF by Vytautas"""
""" Importing libraries """
import pandas as pd
import os
""" Reading Dataset file """
os.chdir("C:\\Users\\Vytautas.Bielinskas\\Desktop\\Python\\")
DF = pd.read_csv("tabula-Statement (BULK) 02 July.csv", header=None)
@vb100
vb100 / Comps_identifiactor_MachineLearning.py
Created June 27, 2018 13:22
This is one of the biggest Machine learning project 100 % made by me. This module read periodically updated Training set, analyze it by performing Hyperparameter Tuning for Decision Tree/Random Forest and set the best selected hyperparameters to the classifier. Then calculate probabilities for a property to be a comps, construct Panda dataframe …
# -*- coding: utf-8 -*-
"""
Created on Thu Jun 21 14:26:09 2018
@author: Vytautas.Bielinskas
Definitions:
JN - Jupyter Notebook
ML - Machine learning
BOG - Bag Of Words
@vb100
vb100 / XGBoost_example.py
Last active June 20, 2018 12:27
Simple readable XGBoost Example
# -*- coding: utf-8 -*-
# Full instructions at: https://cambridgespark.com/content/tutorials/getting-started-with-xgboost/index.html
# Date: 20180620
#------------------------------------------------------------------------------
# Use Pandas to load the data in a dataframe
import pandas as pd
df = pd.read_excel('default of credit card clients.xls', header = 1, index_col = 0)
print('The shape of dataframe is {}.'.format(df.shape))
@vb100
vb100 / SelectingDataFromTable:_RawSQL.py
Created June 18, 2018 21:47
Selecting data from a Table: raw SQL
# Build select statement for census table: stmt
stmt = 'SELECT * FROM census'
# Execute the statement and fetch the results: results
results = connection.execute(stmt).fetchall()
# Print Results
print(results)
@vb100
vb100 / ViewingTableDetails.py
Last active June 18, 2018 21:29
Viewing Table Details
# Reflect the census table from the engine: census
census = Table('census', metadata, autoload=True, autoload_with=engine)
# Print the column names
print(census.columns.keys())
# Print full table metadata
print(repr(metadata.tables['census']))
@vb100
vb100 / AutoloadingTablesfromDatabase.py
Created June 18, 2018 21:11
Autoloading Tables from a Database
# Import Table
from sqlalchemy import Table
# Reflect census table from the engine: census
census = Table('census', metadata, autoload=True, autoload_with=engine)
# Print census table metadata
print(repr(census))
@vb100
vb100 / BinomialPoisson.py
Created June 14, 2018 22:10
Relationship between Binomial and Poisson distributions You just heard that the Poisson distribution is a limit of the Binomial distribution for rare events.
# Draw 10,000 samples out of Poisson distribution: samples_poisson
samples_poisson = np.random.poisson(10, size = 10000)
# Print the mean and standard deviation
print('Poisson: ', np.mean(samples_poisson),
np.std(samples_poisson))
# Specify values of n and p to consider for Binomial: n, p
n = [20, 100, 1000]
p = [0.5, 0.1, 0.01]
@vb100
vb100 / pearson_corr.py
Created June 13, 2018 21:05
Pearson Correlation Coefficient R
def pearson_r(x, y):
"""Compute Pearson correlation coefficient between two arrays."""
# Compute correlation matrix: corr_mat
corr_mat = np.corrcoef(x, y)
# Return entry [0,1]
return corr_mat[0,1]
# Compute Pearson correlation coefficient for I. versicolor: r
r = pearson_r(versicolor_petal_length, versicolor_petal_width)
@vb100
vb100 / compareECDFtoPercentiles.py
Created June 12, 2018 10:48
Comparing percentiles to ECDF To see how the percentiles relate to the ECDF, you will plot the percentiles of Iris versicolor petal lengths you calculated in the last exercise on the ECDF plot you generated in chapter 1. The percentile variables from the previous exercise are available in the workspace as ptiles_vers and percentiles. Note that t…
# Plot the ECDF
_ = plt.plot(x_vers, y_vers, '.')
plt.margins(0.02)
_ = plt.xlabel('petal length (cm)')
_ = plt.ylabel('ECDF')
# Overlay percentiles as red diamonds.
_ = plt.plot(ptiles_vers, percentiles/100, marker='D', color='red',
linestyle='none')