Skip to content

Instantly share code, notes, and snippets.

View JLFDataScience's full-sized avatar

Jose Luis Fernández Nuevo JLFDataScience

  • FGCSIC
View GitHub Profile
@JLFDataScience
JLFDataScience / Extract_list_component_tickers.py
Created February 21, 2020 11:52
Extract list of component tickers of Madrid stock exchange
#We extract a list of existing acronyms in the section of the Madrid Stock Exchange within yahoo finances in the Spanish version
ScrapedAux = sourceCode_tickers.split('components":{"components":')[1].split('],')[0].split(',')[0:-1]
print(ScrapedAux[:10])
@JLFDataScience
JLFDataScience / sourceCode_tickers.py
Created February 21, 2020 11:06
Extract html tags to de Madrid stock exchange
sourceCode_tickers = str(urlopen('https://es.finance.yahoo.com/quote/IGBM.MA/components?p=IGBM.MA').read())
print(sourceCode_tickers)
@JLFDataScience
JLFDataScience / List_of_fields_srape.py
Created February 21, 2020 10:54
List of fild to scrape of stadistical profile of company
# List of fields we will scrape
list_of_fields = ['Market Cap', 'Enterprise Value', 'Trailing P/E', 'Forward P/E', 'PEG Ratio', 'Price/Sales', 'Price/Book', 'Enterprise Value/Revenue', 'Enterprise Value/EBITDA', 'Fiscal Year Ends', 'Most Recent Quarter', 'Profit Margin', 'Operating Margin', 'Return on Assets', 'Return on Equity', 'Revenue', 'Revenue Per Share', 'Quarterly Revenue Growth', 'Gross Profit', 'EBITDA', 'Net Income Avi to Common', 'Diluted EPS', 'Quarterly Earnings Growth', 'Total Cash', 'Total Cash Per Share', 'Total Debt', 'Total Debt/Equity', 'Current Ratio', 'Book Value Per Share', 'Operating Cash Flow', 'Levered Free Cash Flow', 'Beta', '52-Week Change', 'S&P500 52-Week Change', '52 Week High', '52 Week Low', '50-Day Moving Average', '200-Day Moving Average', 'Avg Vol (3 month)', 'Avg Vol (10 day)', 'Shares Outstanding', 'Float', '% Held by Insiders', '% Held by Institutions', 'Shares Short', 'Short Ratio', 'Short % of Float', 'Shares Short (prior month)', 'Forward Annual Dividend Rate', '
@JLFDataScience
JLFDataScience / import_libraries_simply_scrape.py
Created February 21, 2020 10:50
Import libraries simply scraper of yahoo finances without Beautifulsoup and Scrapy
#Importing libraries
from urllib.request import urlopen
import pandas as pd
import numpy as np
import time
import re
@JLFDataScience
JLFDataScience / Get_movie_recomendations.py
Created February 12, 2020 15:53
get_movie_recomendations function with user 21
#We generate a list of user 21 movie titles to be able to add it to the functions.
#The df ratings is taken and the column movie_title.
user_film_21_list = ratings[ratings.user_id==user_21].movie_title.tolist()
recomendations_21_user = get_movie_recomendations(user_film_21_list)
recomendations_21_user.Titulo.head(20)
@JLFDataScience
JLFDataScience / user_21.py
Created February 12, 2020 15:43
Select ranking properties user 21
user_21 = 21
ratings[ratings.user_id==user_21].sort_values(by=['rating'], ascending=False)
@JLFDataScience
JLFDataScience / Content_recommendation_movie.py
Created February 12, 2020 15:39
A function that returns the correlation vector for a movie
#A function that returns the correlation vector for a movie
def get_similar_movie(movie):
corr_matrix = np.corrcoef(ratings_matriz.T)
movie_idx = list(movie_index).index(movie)
return corr_matrix[movie_idx]
#We return movies that are more similar to the tastes of a model user.
#If we want to recommend movies to a user, we get the list of movies they've watched and add up the correlations
#of those movies with all the others to return the movies with a greater total correlation..
def get_movie_recomendations(user):
@JLFDataScience
JLFDataScience / Pivot_ranting_df.py
Created February 12, 2020 15:34
Create a matrix with the ratios of each user for all movies pivoting table
ratings_matriz = ratings.pivot_table(values='rating', index='user_id', columns='movie_title')
# We fill with 0 in the Nan values
ratings_matriz.fillna(0, inplace=True)
movie_index = ratings_matriz.columns
ratings_matriz.head()
@JLFDataScience
JLFDataScience / load_ranting_movie.py
Created February 12, 2020 15:15
Load ranting movie
#load the rantings file
ratings = pd.read_table('data/ratings.dat', header=None, sep='::', engine='python', names=['user_id', 'movie_id', 'rating', 'timestamp'])
#Deleted the date the rating was created
del ratings ['timestamp']
#Add the title of the film
ratings = pd.merge(ratings, movies_df, on='movie_id')[['user_id', 'movie_title', 'movie_id','rating']]
ratings.head()
@JLFDataScience
JLFDataScience / toy_story_score.py
Last active February 12, 2020 15:01
Toy Story score vector product
# We get the genres of for example the first film, Toy Story
toy_story_features = movies_df.loc[0][movie_categories]
print(toy_story_features)
# We calculate the score of the film against the user through the vector product
toy_story_user_predicted_score = dot_product(toy_story_features, user_preferences.values())
toy_story_user_predicted_score
#5