Skip to content

Instantly share code, notes, and snippets.

@ArthurWangNet
ArthurWangNet / filter_a_dataframe.py
Created November 20, 2021 06:47
How to filter a dataframe
assetType = ['EQUITY', 'ETF']
exchange = ['NYSE', 'NASDAQ', 'AMEX', 'Pacific', 'BATS']
# Filter the dataframe, only keep where assetType is in the list above and exchange is in the list above
filtered_instruments_list = instruments_list[instruments_list['assetType'].isin(assetType) & instruments_list['exchange'].isin(exchange)]
@ArthurWangNet
ArthurWangNet / convert_big5-hkscs-to-utf-8.py
Created November 18, 2021 05:27
Convert a Big5-HKSCS encoding file into utf-8 encoding file.
# Convert a Big5-HKSCS encoding file into utf-8 encoding file.
inFile = open('./t1.html', 'r', encoding='Big5-HKSCS')
outFile = open('./t2.html', 'w', encoding='utf-8')
content = inFile.read()
outFile.write(content)
inFile.close()
outFile.close()
@ArthurWangNet
ArthurWangNet / remove_duplicates.py
Created September 17, 2020 05:38
Remove duplicates while combine two dataframe
combine = pd.concat([aapl,aapl_overlap]).drop_duplicates().reset_index(drop=True)
@ArthurWangNet
ArthurWangNet / apply_df_by_multiprocessing.py
Created September 9, 2020 19:51 — forked from yong27/apply_df_by_multiprocessing.py
pandas DataFrame apply multiprocessing
import multiprocessing
import pandas as pd
import numpy as np
def _apply_df(args):
df, func, kwargs = args
return df.apply(func, **kwargs)
def apply_by_multiprocessing(df, func, **kwargs):
workers = kwargs.pop('workers')
@ArthurWangNet
ArthurWangNet / multiprocessing_files_process.py
Last active September 6, 2020 07:30
Using multiprocessing to handle large number of files
"""
When dealing with large number of files, In this case around 5000, using for loop will takes a lot of time.
The idea is using python's multiprocessing features to utilize mulit-core CPU to speed it up.
In simple term, just write the operation willing to perform into a single function instead of using for loop.
The iteration of for loop before will be replaced with pool.map()
There are still some improvements might want to consider
1. If I need some value returns from each process and get a final list fo return value, what to do?
2. Is this the best way to use multiprocessing?
"""
@ArthurWangNet
ArthurWangNet / pandas_to_datetime.py
Last active September 6, 2020 06:31
Using Panda's built in to_datetime function to handle epcho to readalbe
```
'datetime' contians unix epoch format ms sceconds data. to_datetime first convert it to timezone aware readable dateand time
than using .dt.tz_convert to convert them to specified timezone. Lastly, using dt.date and dt.time to get parts for the data.
Somehow this computes very fast compare to using pytz and datetime functions.
For test, 20 identical csv files, using pytz and datetime cause 21.1 seconds. While using pandas to_datetime, using 3.74 seconds.
```
stock['Date'] = pd.to_datetime(stock['datetime'],unit='ms',utc=True).dt.tz_convert('America/New_York').dt.date
stock['Time'] = pd.to_datetime(stock['datetime'],unit='ms',utc=True).dt.tz_convert('America/New_York').dt.time
@ArthurWangNet
ArthurWangNet / adding_new_columns.py
Created September 5, 2020 05:07
Adding new columns by perform operations on existing columns
'''
In this case, 'Date' and 'Time' are two new columns needs to be added based on converting existing column 'datetime'
which is in unix millseconds format.
epoch_converter is a files contians functions to help with the convertion.
'''
stock['Date'] = stock.apply(lambda row: epoch_converter.get_date_from_mill_epoch(row['datetime']), axis=1)
stock['Time'] = stock.apply(lambda row: epoch_converter.get_time_from_mill_epoch(row['datetime']), axis=1)
@ArthurWangNet
ArthurWangNet / epoch_converter.py
Created September 5, 2020 04:30
Converting between mill seconds epoch format and readable date and time format
# Some test about how to play with epoch date and time.End date as milliseconds since epoch.
import datetime
import time
import pytz
def convert_datetime_to_mill_epoch(dt):
epoch = datetime.datetime.utcfromtimestamp(0)
return (dt - epoch).total_seconds() * 1000.0
def convert_mill_epoch_to_datetime(e_time):
@ArthurWangNet
ArthurWangNet / yfinance_override.py
Created September 5, 2020 04:14
When using yfinance with pandas, declare override
import yfinance as yf
yf.pdr_override()