Skip to content

Instantly share code, notes, and snippets.

@ArthurWangNet
ArthurWangNet / gist:5216339
Created March 21, 2013 20:18
HTML:Basic Template
<!DOCTYPE HTML>
<html>
<head>
<title></title>
<script src=""></script>
</head>
<body>
</body>
</html>
@ArthurWangNet
ArthurWangNet / get_column_index.py
Created September 5, 2020 04:03
Get index number of a column in Dataframe by using its lable.
df.columns.get_loc("TCP")
@ArthurWangNet
ArthurWangNet / get_moving_average_with_rolling.py
Created September 5, 2020 04:06
Try to get n items averages by using rolling. Useful when calculate SMA
# Use rolling to get moving average
df['10Day-TCP-Avg'] = df.iloc[:,10].rolling(window=10).mean()
@ArthurWangNet
ArthurWangNet / yfinance_override.py
Created September 5, 2020 04:14
When using yfinance with pandas, declare override
import yfinance as yf
yf.pdr_override()
@ArthurWangNet
ArthurWangNet / epoch_converter.py
Created September 5, 2020 04:30
Converting between mill seconds epoch format and readable date and time format
# Some test about how to play with epoch date and time.End date as milliseconds since epoch.
import datetime
import time
import pytz
def convert_datetime_to_mill_epoch(dt):
epoch = datetime.datetime.utcfromtimestamp(0)
return (dt - epoch).total_seconds() * 1000.0
def convert_mill_epoch_to_datetime(e_time):
@ArthurWangNet
ArthurWangNet / adding_new_columns.py
Created September 5, 2020 05:07
Adding new columns by perform operations on existing columns
'''
In this case, 'Date' and 'Time' are two new columns needs to be added based on converting existing column 'datetime'
which is in unix millseconds format.
epoch_converter is a files contians functions to help with the convertion.
'''
stock['Date'] = stock.apply(lambda row: epoch_converter.get_date_from_mill_epoch(row['datetime']), axis=1)
stock['Time'] = stock.apply(lambda row: epoch_converter.get_time_from_mill_epoch(row['datetime']), axis=1)
@ArthurWangNet
ArthurWangNet / pandas_to_datetime.py
Last active September 6, 2020 06:31
Using Panda's built in to_datetime function to handle epcho to readalbe
```
'datetime' contians unix epoch format ms sceconds data. to_datetime first convert it to timezone aware readable dateand time
than using .dt.tz_convert to convert them to specified timezone. Lastly, using dt.date and dt.time to get parts for the data.
Somehow this computes very fast compare to using pytz and datetime functions.
For test, 20 identical csv files, using pytz and datetime cause 21.1 seconds. While using pandas to_datetime, using 3.74 seconds.
```
stock['Date'] = pd.to_datetime(stock['datetime'],unit='ms',utc=True).dt.tz_convert('America/New_York').dt.date
stock['Time'] = pd.to_datetime(stock['datetime'],unit='ms',utc=True).dt.tz_convert('America/New_York').dt.time
@ArthurWangNet
ArthurWangNet / multiprocessing_files_process.py
Last active September 6, 2020 07:30
Using multiprocessing to handle large number of files
"""
When dealing with large number of files, In this case around 5000, using for loop will takes a lot of time.
The idea is using python's multiprocessing features to utilize mulit-core CPU to speed it up.
In simple term, just write the operation willing to perform into a single function instead of using for loop.
The iteration of for loop before will be replaced with pool.map()
There are still some improvements might want to consider
1. If I need some value returns from each process and get a final list fo return value, what to do?
2. Is this the best way to use multiprocessing?
"""
@ArthurWangNet
ArthurWangNet / apply_df_by_multiprocessing.py
Created September 9, 2020 19:51 — forked from yong27/apply_df_by_multiprocessing.py
pandas DataFrame apply multiprocessing
import multiprocessing
import pandas as pd
import numpy as np
def _apply_df(args):
df, func, kwargs = args
return df.apply(func, **kwargs)
def apply_by_multiprocessing(df, func, **kwargs):
workers = kwargs.pop('workers')
@ArthurWangNet
ArthurWangNet / remove_duplicates.py
Created September 17, 2020 05:38
Remove duplicates while combine two dataframe
combine = pd.concat([aapl,aapl_overlap]).drop_duplicates().reset_index(drop=True)