A collection of useful Python snippets
Last active
January 10, 2024 04:16
-
-
Save stephenleo/8b5d2cda0d87ccf2a7aa33e11ffe0e7e to your computer and use it in GitHub Desktop.
Python
-
To import a module in one directory into another directory
- Directory structure
└─ heroku_apps ├─ README.md └─ src └─ boyorgirl ├─ train │ ├─ model.py │ └─ train.py └─ utils └─ preprocess.py
- Imports in
src/boyorgirl/train/train.py
from utils import preprocess
- To run
train.py
cd heroku_apps/src/boyorgirl python -m train.train
- Directory structure
-
To reload an already imported module (mainly in Jupyter)
import custom_module from importlib import reload reload(custom_module)
A collection of common operations on Python lists
-
de-duplicate
# Remove duplicates while preserving order items = [1, 2, 0, 1, 3, 2] items = list(dict.fromkeys(items))
-
Merge overllaping lists
# Merge overlapping sublists in a list of lists # https://stackoverflow.com/questions/4842613/merge-lists-that-share-common-elements import networkx from networkx.algorithms.components.connected import connected_components def merge_overlapping_list_of_lists(l): def to_graph(l): G = networkx.Graph() for part in l: # each sublist is a bunch of nodes G.add_nodes_from(part) # it also imlies a number of edges: G.add_edges_from(to_edges(part)) return G def to_edges(l): """ treat `l` as a Graph and returns it's edges to_edges(['a','b','c','d']) -> [(a,b), (b,c),(c,d)] """ it = iter(l) last = next(it) for current in it: yield last, current last = current G = to_graph(l) return list(connected_components(G)) l = [['a','b','c'],['b','d','e'],['k'],['o','p'],['e','f'],['p','a'],['d','g']] print(merge_overlapping_list_of_lists(l)) # prints [['a', 'c', 'b', 'e', 'd', 'g', 'f', 'o', 'p'], ['k']]
-
Merge overlapping tuples
# Merge tuples with overlapping elements # https://stackoverflow.com/questions/22680116/merging-overlapping-items-in-a-list # If an item falls within the range of the next, the two tuples will have to be merged. # The resulting tuple is one that covers the range of the two items (minimum to maximum values). mylist = [(1, 1), (1, 6), (2, 5), (4, 4), (9, 10)] desired_output = [(1, 6), (9, 10)] result = [] for item in sorted(mylist): result = result or [item] if item[0] > result[-1][1]: result.append(item) else: old = result[-1] result[-1] = (old[0], max(old[1], item[1])) print(result) #[(1, 6), (9, 10)]
-
Joins
- While merging two tables, always use `how='inner`` first to ensure that your keys are matching.
- If you get an empty result could mean your keys are of different datatype. Align the datatype and try again.
- Once you get results from
how='inner'
, you can then change tohow='left'
or'right'
.
-
Renaming
- To rename all columns, you can directly use
df.columns = ['col1', 'col2']
- To rename a single column in a table with many columns, you can use
df.rename({'col100': 'column100'}, inplace=True)
- To rename all columns, you can directly use
-
Explode
- Converting Column of lists to multiple rows
df = pd.DataFrame({'brands': [['Apple', 'Xiaomi'], ['Huawei'], ['Samsung', 'Apple']], 'tweet_id': [1234, 1235, 1236]}) print(df.head()) df.explode('brands')
- Converting Column of lists to multiple rows
-
Time Series
- To create a time series, you can use
pd.date_range('2019-01-01', periods=10, freq='D')
- To shift a time series, you can use
df['value'].shift(1)
- To calculate the difference between two time series, you can use
df['value'].diff()
- To calculate the cumulative sum of a time series, you can use
df['value'].cumsum()
- To calculate the
3
day moving average of a time series, you can usedf['value'].rolling(window=3).mean()
- To fill missing values in a time series, you can use
df['value'].fillna(method='ffill')
- To resample a time series, you can use
df['value'].resample('M').mean()
- To create a time series, you can use
- If plots are not showing in Jupyter
import plotly.io as pio pio.renderers.default = "iframe"
- Select anywhere on the row
- Server Side Caching
- If data extraction is too expensive, we can extract the data once during app startup and then cache a copy on the server itself: Link
General code snippets about Python tqdm
progress bar usage
-
Imports
from tqdm import tqdm import pandas
-
Normal usage
for i in tqdm(iterable): print(i)
-
Pandas itertuples usage
for row in tqdm(df.itertuples(), total=df.shape[0]): print(getattr(row, 'col_name'))
-
Pandas apply usage
tqdm.pandas() df['new_col'] = df['col'].progress_apply(fn)
Most instructions from: https://github.com/bast/pypi-howto
- pip install twine
- python setup.py sdist
- python -m twine upload --repository testpypi dist/*
- name: token
- pwd: TestPyPi API_TOKEN
- pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple stripnet
- python setup.py sdist bdist_wheel
- twine upload dist/* -r pypi
- name: token
- pwd: PyPi API_TOKEN
- git tag -a v0.0.7 -m "Update support for Py3.7"
- git push origin --tags
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment