Skip to content

Instantly share code, notes, and snippets.

View kennethleungty's full-sized avatar
🎯
Focusing

Kenneth Leung kennethleungty

🎯
Focusing
View GitHub Profile
@kennethleungty
kennethleungty / condo_rental_random_forest.py
Last active January 15, 2021 05:48
Random Forest Regressor Code for Condo Rental Prediction
from sklearn.ensemble import RandomForestRegressor
# Create the parameter grid for GridSearchCV
rf_param_grid = {
'max_depth': [80, 90, 100], # Maximum number of levels in each decision tree
'max_features': [2, 3], # Maximum number of features considered for splitting a node
'min_samples_leaf': [1, 3, 4, 5], # Minimum number of data points allowed in a leaf node
'n_estimators': [100, 300, 600] # Number of trees in the forest
}
@kennethleungty
kennethleungty / condo_rental_xgboost_regressor.py
Last active January 15, 2021 05:47
XGBoost Regressor Code for Condo Rental Prediction
import xgboost as xgb
# Setup XGBoost hyperparameter grid
xgb_param_grid = {"learning_rate": [0.05, 0.1, 0.2], # Step size shrinkage used in update to prevents overfitting.
"max_depth" : [6, 8, 9, 10], # Maximum depth of a tree.
"min_child_weight" : [1, 3, 5, 7], # Minimum number of instances required in a child node
"gamma" : [0.0, 0.1, 0.2, 0.3], # Minimum loss reduction required to make a further partition on a leaf node of the tree.
"colsample_bytree" : [0.3, 0.4, 0.6, 0.8] # Number of features supplied to a tree
}
@kennethleungty
kennethleungty / condo_rental_lightgbm_regressor.py
Created January 15, 2021 05:47
LightGBM Regressor Code for Condo Rental Prediction
import lightgbm as lgb
# Setup LightGBM hyperparameter grid
gbm_param_grid = {'metric': ['rmse'],
'max_depth': [9,10,11,12,13],
'bagging_fraction': [0.8, 0.9, 1],
'feature_fraction': [0.8, 0.9, 1],
'min_data_in_leaf': [20,50,80],
'learning_rate': [0.01,0.05,0.1,0.2]}
@kennethleungty
kennethleungty / twitter_api_auth_tweepy.py
Created January 22, 2021 02:40
Twitter API Authentication with Tweepy
import tweepy
api_key = 'your_api_key_here'
api_key_secret = 'your_api_key_secret_here'
access_token = 'your_access_token_here'
access_token_secret = 'your_access_token_secret_here'
auth = tweepy.OAuthHandler(api_key, api_key_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)
@kennethleungty
kennethleungty / tweet_preprocessor.py
Created January 22, 2021 14:59
Twitter Tweet Pre-Processor
# Import tweet-preprocessor package
import preprocessor as p
# Clean tweet text with tweet-preprocessor
tweets_df['text_cleaned'] = tweets_df['text'].apply(lambda x: p.clean(x))
@kennethleungty
kennethleungty / twitter_nltk_vader.py
Last active January 24, 2021 05:14
NLTK VADER for Tweets
import nltk
nltk.download('vader_lexicon') # Download the VADER lexicon
from nltk.sentiment.vader import SentimentIntensityAnalyzer
# Initialize sentiment intensity analyzer
sia = SentimentIntensityAnalyzer()
# Obtaining NLTK scores
tweets_df['nltk_scores'] = tweets_df['text_cleaned'].apply(lambda x: sia.polarity_scores(x))
@kennethleungty
kennethleungty / twitter_textblob.py
Last active January 24, 2021 05:13
Textblob for Tweets
from textblob import TextBlob
# Obtain polarity scores generated by TextBlob
tweets_df['textblob_score'] = tweets_df['text_cleaned'].apply(lambda x: TextBlob(x).sentiment.polarity)
# Set threshold to define neutral sentiment
neutral_thresh = 0.05
# Convert polarity score into sentiment categories
tweets_df['textblob_sentiment'] = tweets_df['textblob_score'].apply(lambda c: 'Positive' if c >= neutral_thresh else ('Negative' if c <= -(neutral_thresh) else 'Neutral'))
@kennethleungty
kennethleungty / twitter_corenlp.py
Last active January 23, 2021 05:00
Stanford CoreNLP for Tweets
from pycorenlp import StanfordCoreNLP
nlp = StanfordCoreNLP('http://localhost:9000')
# Function to obtain sentiment score (value) from text
def get_sentiment_score(text):
output = nlp.annotate(text, properties={'annotators': 'sentiment', 'outputFormat': 'json', 'timeout': 100000})
return (output['sentences'][0]['sentimentValue'])
corenlp_senti_scores = []
@kennethleungty
kennethleungty / twitter_stanza.py
Last active April 17, 2022 14:28
Stanza for Tweets
import stanza
stanza.download('en')
nlp = stanza.Pipeline(lang='en', processors='tokenize,sentiment')
# Obtain (average) sentiment score generated by Stanza for each Tweet
def stanza_analyze(Text):
document = nlp(Text)
print('Processing')
import bar_chart_race as bcr
bcr.bar_chart_race(df = bcr_df, # Input formatted dataframe
n_bars = 10, # Show 10 bars
sort='desc', # Sort in descending manner (Highest revenue at top)
title='Top 10 Fortune 500 (Global) Companies (1995-2020)',
filename = 'Top 10 Fortune 500 (Global) Companies (1995-2020).mp4',
period_length = 1600, # Duration of animation for each time period
bar_label_size=6, tick_label_size=6,
steps_per_period = 70, # Adjust animation smoothness