Skip to content

Instantly share code, notes, and snippets.

@stephenleo
Last active January 10, 2024 04:16
Show Gist options
  • Save stephenleo/8b5d2cda0d87ccf2a7aa33e11ffe0e7e to your computer and use it in GitHub Desktop.
Save stephenleo/8b5d2cda0d87ccf2a7aa33e11ffe0e7e to your computer and use it in GitHub Desktop.
Python

Python

A collection of useful Python snippets

  • To import a module in one directory into another directory

    • Directory structure
      └─ heroku_apps
      ├─ README.md
      └─ src
          └─ boyorgirl
            ├─ train
            │  ├─ model.py
            │  └─ train.py
            └─ utils
                └─ preprocess.py
      
    • Imports in src/boyorgirl/train/train.py
      from utils import preprocess
    • To run train.py
      cd heroku_apps/src/boyorgirl
      python -m train.train
  • To reload an already imported module (mainly in Jupyter)

    import custom_module
    from importlib import reload
    reload(custom_module)

Python Lists

A collection of common operations on Python lists

  • de-duplicate

    # Remove duplicates while preserving order
    items = [1, 2, 0, 1, 3, 2]
    items = list(dict.fromkeys(items))
  • Merge overllaping lists

    # Merge overlapping sublists in a list of lists
    # https://stackoverflow.com/questions/4842613/merge-lists-that-share-common-elements
    import networkx 
    from networkx.algorithms.components.connected import connected_components
    
    def merge_overlapping_list_of_lists(l):
        def to_graph(l):
            G = networkx.Graph()
            for part in l:
                # each sublist is a bunch of nodes
                G.add_nodes_from(part)
                # it also imlies a number of edges:
                G.add_edges_from(to_edges(part))
            return G
    
        def to_edges(l):
        """
        treat `l` as a Graph and returns it's edges
        to_edges(['a','b','c','d']) -> [(a,b), (b,c),(c,d)]
        """
        it = iter(l)
        last = next(it)
    
        for current in it:
            yield last, current
            last = current
    
        G = to_graph(l)
        return list(connected_components(G))
    
    l = [['a','b','c'],['b','d','e'],['k'],['o','p'],['e','f'],['p','a'],['d','g']]
    print(merge_overlapping_list_of_lists(l))
    # prints [['a', 'c', 'b', 'e', 'd', 'g', 'f', 'o', 'p'], ['k']]
  • Merge overlapping tuples

    # Merge tuples with overlapping elements
    # https://stackoverflow.com/questions/22680116/merging-overlapping-items-in-a-list
    # If an item falls within the range of the next, the two tuples will have to be merged. 
    # The resulting tuple is one that covers the range of the two items (minimum to maximum values).
    
    mylist = [(1, 1), (1, 6), (2, 5), (4, 4), (9, 10)]
    desired_output = [(1, 6), (9, 10)]
    
    result = []
    for item in sorted(mylist):
        result = result or [item]
        if item[0] > result[-1][1]:
            result.append(item)
        else:
            old = result[-1]
            result[-1] = (old[0], max(old[1], item[1]))
    
    print(result) #[(1, 6), (9, 10)]

Useful Pandas tips

  • Joins

    • While merging two tables, always use `how='inner`` first to ensure that your keys are matching.
    • If you get an empty result could mean your keys are of different datatype. Align the datatype and try again.
    • Once you get results from how='inner', you can then change to how='left' or 'right'.
  • Renaming

    • To rename all columns, you can directly use
      df.columns = ['col1', 'col2']
    • To rename a single column in a table with many columns, you can use
      df.rename({'col100': 'column100'}, inplace=True)
  • Explode

    • Converting Column of lists to multiple rows
      df = pd.DataFrame({'brands': [['Apple', 'Xiaomi'], ['Huawei'], ['Samsung', 'Apple']],
                  'tweet_id': [1234, 1235, 1236]})
      print(df.head())
      df.explode('brands')
  • Time Series

    • To create a time series, you can use
      pd.date_range('2019-01-01', periods=10, freq='D')
    • To shift a time series, you can use
      df['value'].shift(1)
    • To calculate the difference between two time series, you can use
      df['value'].diff()
    • To calculate the cumulative sum of a time series, you can use
      df['value'].cumsum()
    • To calculate the 3 day moving average of a time series, you can use
      df['value'].rolling(window=3).mean()
    • To fill missing values in a time series, you can use
      df['value'].fillna(method='ffill')
    • To resample a time series, you can use
      df['value'].resample('M').mean()

Snippets to work with PDF files

Merge multiple pdfs

from PyPDF2 import PdfMerger
merger = PdfMerger()

# File 1
merger.append("file_1.pdf")

# File 2
merger.append("file_2.pdf", pages=(0,1))

# File 3
merger.append("file_3.pdf", pages=(1,2))

merger.write("Combined_file.pdf")
merger.close()

Plotly and Dash tips and tricks

  • If plots are not showing in Jupyter
    import plotly.io as pio
    pio.renderers.default = "iframe"
  • Select anywhere on the row
  • Server Side Caching
    • If data extraction is too expensive, we can extract the data once during app startup and then cache a copy on the server itself: Link

Python Progress Bars

General code snippets about Python tqdm progress bar usage

  • Imports

    from tqdm import tqdm
    import pandas
  • Normal usage

    for i in tqdm(iterable):
      print(i)
  • Pandas itertuples usage

    for row in tqdm(df.itertuples(), total=df.shape[0]):
      print(getattr(row, 'col_name'))
  • Pandas apply usage

    tqdm.pandas()
    df['new_col'] = df['col'].progress_apply(fn)

Most instructions from: https://github.com/bast/pypi-howto

Test PyPi

  1. pip install twine
  2. python setup.py sdist
  3. python -m twine upload --repository testpypi dist/*
    • name: token
    • pwd: TestPyPi API_TOKEN
  4. pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple stripnet

PyPi

  1. python setup.py sdist bdist_wheel
  2. twine upload dist/* -r pypi
    • name: token
    • pwd: PyPi API_TOKEN
  3. git tag -a v0.0.7 -m "Update support for Py3.7"
  4. git push origin --tags
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment