Skip to content

Instantly share code, notes, and snippets.

View suriyadeepan's full-sized avatar
❤️
Rediscovering the joy of solving problems and building applications

Suriyadeepan Ramamoorthy suriyadeepan

❤️
Rediscovering the joy of solving problems and building applications
View GitHub Profile
@suriyadeepan
suriyadeepan / wiki_images.py
Created August 26, 2016 06:09
Scrap images from a wiki page using Beautiful Soup
from bs4 import BeautifulSoup
import requests
url = 'https://en.wikipedia.org/wiki/Transhumanism'
# get contents from url
content = requests.get(url).content
# get soup
soup = BeautifulSoup(content,'lxml') # choose lxml parser
# find the tag : <img ... >
image_tags = soup.findAll('img')
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
[
{
"date": "2014-01-01",
"value": 190000000
},
{
"date": "2014-01-02",
"value": 190379978
},
def create_feat_soil_type(df):
soil_type_cols = [ col for col in df.columns if 'Soil_Type' in col ]
soil_type = df[soil_type_cols].idxmax(axis=1)
return soil_type.apply(lambda x : int(x[9:]))
@suriyadeepan
suriyadeepan / amazon_src.json
Created November 17, 2020 14:08
Amazon source
{
src : `
<!doctype html><html lang="en-in" class="a-no-js" data-19ax5a9jf="dingo"><!-- sp:feature:head-start -->
<head><script>var aPageStart = (new Date()).getTime();</script><meta charset="utf-8"/>
<script type='text/javascript'>var ue_t0=ue_t0||+new Date();</script>
<!-- sp:feature:cs-optimization -->
<meta http-equiv='x-dns-prefetch-control' content='on'>
<link rel="dns-prefetch" href="https://images-eu.ssl-images-amazon.com">
<link rel="dns-prefetch" href="https://m.media-amazon.com">
@suriyadeepan
suriyadeepan / hyp_gen_data_science_pipeline.md
Last active November 16, 2020 10:06
Blog: Hypothesis Generation
Stage Description
1 Hypothesis Generation Study the business problem. Build a conceptual model by developing a deeper understanding of the problem and domain. Generate Hypotheses.
2 Data Collection Go out in the wild and collect data based on the generated hypotheses.
3 Study the variables Identify potential predictors using data visualization
4 Data Preparation Clean the data. Fill in missing data points. Scale, normalize and transform data as necessary.
5 Bivariate/Multivariate Analysis Test the hypotheses you've generated earlier. Choose predictors based on correlation with target.
6 Data Transformation Perform non-linear transformations (log) on variables to fish out non-l
@suriyadeepan
suriyadeepan / model_factors.md
Created November 16, 2020 09:39
Blog: Hypothesis Generation (3)
Factor Type Hypothesis
Age Personal A person above the age of 35 is more likely to stay than a younger person
Occupation Professional A person with a steady income is more likely to stay than a person who is in between jobs
Married Relationship A married person is more likely to stay than an unmarried person
Children Relationship People with children are more likely to stay than people without
Type of Service Company Location-based services could lead to higher churn rate if your customer base is young and unmarried
Monthly Charges Company High monthly charges may lead to termination of subscription
Period of Subscription Company Longer subscription period indicates loyal customers
@suriyadeepan
suriyadeepan / model.md
Created November 16, 2020 09:37
Blog: Hypothesis Generation (2)
Factor Type Comments
Age Personal
Occupation Professional
Married Relationship
Children Relationship
Type of Service Company
Monthly Charges Company
Period of Subscription Company
@suriyadeepan
suriyadeepan / identify_target.md
Created November 16, 2020 09:28
Blog: Hypothesis Generation (1)
Question Target Variable Type
What sorts of people were more likely to survive? Survival Binary (Yes or No)
Predict (What is) the final price of each home. Housing Price Numerical (in USD)
Predict (What is) the probability that a driver will initiate an auto insurance claim in the next year. Insurance Claim Probability (Ranged [0-1] numerical)