This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
assetType = ['EQUITY', 'ETF'] | |
exchange = ['NYSE', 'NASDAQ', 'AMEX', 'Pacific', 'BATS'] | |
# Filter the dataframe, only keep where assetType is in the list above and exchange is in the list above | |
filtered_instruments_list = instruments_list[instruments_list['assetType'].isin(assetType) & instruments_list['exchange'].isin(exchange)] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Convert a Big5-HKSCS encoding file into utf-8 encoding file. | |
inFile = open('./t1.html', 'r', encoding='Big5-HKSCS') | |
outFile = open('./t2.html', 'w', encoding='utf-8') | |
content = inFile.read() | |
outFile.write(content) | |
inFile.close() | |
outFile.close() |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
combine = pd.concat([aapl,aapl_overlap]).drop_duplicates().reset_index(drop=True) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import multiprocessing | |
import pandas as pd | |
import numpy as np | |
def _apply_df(args): | |
df, func, kwargs = args | |
return df.apply(func, **kwargs) | |
def apply_by_multiprocessing(df, func, **kwargs): | |
workers = kwargs.pop('workers') |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
""" | |
When dealing with large number of files, In this case around 5000, using for loop will takes a lot of time. | |
The idea is using python's multiprocessing features to utilize mulit-core CPU to speed it up. | |
In simple term, just write the operation willing to perform into a single function instead of using for loop. | |
The iteration of for loop before will be replaced with pool.map() | |
There are still some improvements might want to consider | |
1. If I need some value returns from each process and get a final list fo return value, what to do? | |
2. Is this the best way to use multiprocessing? | |
""" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
``` | |
'datetime' contians unix epoch format ms sceconds data. to_datetime first convert it to timezone aware readable dateand time | |
than using .dt.tz_convert to convert them to specified timezone. Lastly, using dt.date and dt.time to get parts for the data. | |
Somehow this computes very fast compare to using pytz and datetime functions. | |
For test, 20 identical csv files, using pytz and datetime cause 21.1 seconds. While using pandas to_datetime, using 3.74 seconds. | |
``` | |
stock['Date'] = pd.to_datetime(stock['datetime'],unit='ms',utc=True).dt.tz_convert('America/New_York').dt.date | |
stock['Time'] = pd.to_datetime(stock['datetime'],unit='ms',utc=True).dt.tz_convert('America/New_York').dt.time | |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
''' | |
In this case, 'Date' and 'Time' are two new columns needs to be added based on converting existing column 'datetime' | |
which is in unix millseconds format. | |
epoch_converter is a files contians functions to help with the convertion. | |
''' | |
stock['Date'] = stock.apply(lambda row: epoch_converter.get_date_from_mill_epoch(row['datetime']), axis=1) | |
stock['Time'] = stock.apply(lambda row: epoch_converter.get_time_from_mill_epoch(row['datetime']), axis=1) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Some test about how to play with epoch date and time.End date as milliseconds since epoch. | |
import datetime | |
import time | |
import pytz | |
def convert_datetime_to_mill_epoch(dt): | |
epoch = datetime.datetime.utcfromtimestamp(0) | |
return (dt - epoch).total_seconds() * 1000.0 | |
def convert_mill_epoch_to_datetime(e_time): |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import yfinance as yf | |
yf.pdr_override() |
NewerOlder