Skip to content

Instantly share code, notes, and snippets.

View gracecarrillo's full-sized avatar

Graciela Carrillo gracecarrillo

View GitHub Profile
import geopandas as gpd #libspatialindex nees to be installed first
import json # library to handle JSON files
# Importing the Edinburgh borough boundary GeoJSON file as a dataframe in geopandas
map_df = gpd.read_file(r'/resources/Data_Science_Capstone/neighbourhoods.geojson')
@gracecarrillo
gracecarrillo / model_evaluation-sentiment_analysis_scotref2.ipynb
Created January 26, 2020 17:53
Model_Evaluation-Sentiment_Analysis_Scotref2.ipynb
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@gracecarrillo
gracecarrillo / etl_sentiment_analysis_scotref2.ipynb
Created January 26, 2020 18:01
ETL_Sentiment_Analysis_Scotref2.ipynb
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@gracecarrillo
gracecarrillo / feature_engineering-sentiment_analysis_scotref2.ipynb
Created January 26, 2020 18:04
Feature_Engineering - Sentiment_Analysis_Scotref2.ipynb
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@gracecarrillo
gracecarrillo / models_definition_-_training-sentiment_analysis_scotref2.ipynb
Created January 26, 2020 18:05
Models_Definition_&_Training - Sentiment_Analysis_Scotref2.ipynb
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@gracecarrillo
gracecarrillo / model_deployment-sentiment_analysis_scotref2.ipynb
Created January 26, 2020 18:07
Model_Deployment-Sentiment_Analysis_Scotref2.ipynb
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@gracecarrillo
gracecarrillo / Preprocessing Pipeline
Last active January 27, 2020 13:51
The objective of this step is to clean noise those are less relevant to find the sentiment of tweets such as punctuation, special characters, numbers, and terms which don’t carry much weightage in context to the text.
#----------------------------- ETL ---------------------------------------#
# Pipeline like preprocessing with helper functions
nltk.download('stopwords')
stop_words = stopwords.words('english')
# cleaning helper function -----------------------------#
def processTweet(tweet):
#----------------- FEATURE ENGINEERING ------------------------#
#---------- Sentiment Score with Vader -----------------#
# Instantiate Vader
analyser = SentimentIntensityAnalyzer()
def polarity_scores_all(tweet):
'''
Takes string of text to:
@gracecarrillo
gracecarrillo / Cross-Validation.txt
Created January 27, 2020 15:17
K-fold cross-validation using Exhaustive Grid Search. Cross-validation using Scikit Learn's Grid Search.
#----------- CROSS-VALIDATION WITH GRID SEARCH ------------------------#
# Naive Bayes Classifier
# combine features
features_tfidf = features_union(tfidf)
# instantiate pipeline object
nb_pipeline = Pipeline([('feats', features_tfidf), ('clf', MultinomialNB())])
#------------- DEEP LEARNING -----------------#
#-----LSTM Network with Keras------#
#--- Parameters----#
# encodes input sequence dense vectors
embed_dim = 128
# transforms the vector sequence into a single vector