Skip to content

Instantly share code, notes, and snippets.

View ivopbernardo's full-sized avatar

ivopbernardo

View GitHub Profile
@ivopbernardo
ivopbernardo / nltk_intro.py
Last active November 2, 2022 09:29
Introduction to NLTK Library
# Getting started with NLTK scripts - used in blog post:
# https://towardsdatascience.com/getting-started-with-nltk-eb4ed6eb7a37
from nltk import tokenize
python_wiki = '''
Python is a high-level, interpreted, general-purpose programming language. Its design philosophy emphasizes code readability with the use of significant indentation.
Python is dynamically-typed and garbage-collected. It supports multiple programming paradigms, including structured (particularly procedural), object-oriented and functional programming. It is often described as a "batteries included" language due to its comprehensive standard library.
Guido van Rossum began working on Python in the late 1980s as a successor to the ABC programming language and first released it in 1991 as Python 0.9.0.[33] Python 2.0 was released in 2000 and introduced new features such as list comprehensions, cycle-detecting garbage collection, reference counting, and Unicode support. Python 3.0, released in 2008, was a major revision that is not completely
@ivopbernardo
ivopbernardo / decisiontree.R
Last active November 2, 2022 09:30
Data Science Tutorials Blog Post Series: Training a Decision Tree using R
# Training a decision tree in R - used in blog post:
# https://medium.com/codex/data-science-tutorials-training-a-decision-tree-using-r-d6266936d86
library(dplyr)
library(rpart)
library(rpart.plot)
library(caret)
library(Metrics)
library(ggplot2)
@ivopbernardo
ivopbernardo / geoprocess_dd_post.py
Last active March 11, 2022 14:00
Locate your Data and Boost it with Geo-Processing Post
# Getting Latitude and Longitude from Nominatim
from geopy.geocoders import Nominatim
from geopy.extra.rate_limiter import RateLimiter
geocoder = Nominatim(user_agent="FindAddress")
geocode = RateLimiter(
geocoder.geocode,
min_delay_seconds = 1,
return_value_on_exception = None
@ivopbernardo
ivopbernardo / xgboostr.r
Last active November 2, 2022 09:31
xgboostr.r
# Training an XGBoost in R - used in blog post:
# https://towardsdatascience.com/data-science-tutorials-training-an-xgboost-using-r-cf3c00b1425
library(dplyr)
library(xgboost)
library(Metrics)
library(ggplot2)
# Load london bike csv
london_bike <- read.csv('./london_merged.csv')
# Training a Random Forest in R - used in blog post:
# https://towardsdatascience.com/data-science-tutorials-training-a-random-forest-in-r-a883cc1bacd1
library(dplyr)
library(randomForest)
library(ranger)
library(Metrics)
# Load london bike csv
london_bike <- read.csv('./london_merged.csv')
@ivopbernardo
ivopbernardo / rf_demo.R
Created February 4, 2022 18:18
Random Forests vs. Decision Trees
# Don't forget to download the train.csv file
# to make this gist work.
# Download it at: https://www.kaggle.com/c/titanic/data?select=train.csv
# You also need to install ROCR and rpart libraries
# Reading the titanic train dataset
titanic <- read.csv('./train.csv')
@ivopbernardo
ivopbernardo / cooccurrence_example.py
Created August 16, 2021 12:49
word_vectors_cooccurrence
import wikipedia
import pandas as pd
import numpy as np
import string
from nltk.tokenize import word_tokenize
from sklearn.metrics.pairwise import cosine_similarity
def retrieve_page(page_name: str) -> list:
'''
Retrieves page data from wikipedia
@ivopbernardo
ivopbernardo / stemming_example.py
Last active May 18, 2021 16:51
Examples around NLTK stemming
from nltk.tokenize import word_tokenize
from nltk.stem import PorterStemmer, SnowballStemmer, LancasterStemmer
porter = PorterStemmer()
snowball = SnowballStemmer(language='english')
lanc = LancasterStemmer()
sentence_example = (
'This is definitely a controversy as the attorney labeled the case "extremely controversial"'
)
@ivopbernardo
ivopbernardo / text_representation.py
Created April 23, 2021 16:10
Python Text Representation
# Import sklearn vectorizers and pandas
import pandas as pd
from sklearn.feature_extraction.text import (
CountVectorizer,
TfidfVectorizer
)
# Defining our sentence examples
sentence_list = [
@ivopbernardo
ivopbernardo / cleaning_data.R
Last active January 3, 2021 13:41
cleaning FBI crime data
# Loading readxl library
library(readxl)
clean_crime_data <- function(path) {
# Load the Data
crime_data <- read_xls(path)
# Assigning colnames
colnames(crime_data) <- crime_data[3,]