Skip to content

Instantly share code, notes, and snippets.

@knu2xs
knu2xs / get_pyspark_schema.py
Last active April 17, 2024 22:07
Get PySpark Schema
import re
from pyspark.sql import DataFrame
def get_pyspark_dataframe_schema(df: DataFrame) -> str:
"""Output the DataFrame schema to easily be included when constructing a new PySpark DataFrame."""
# characters to include for tab, since python, using four spaces
tab = ' '
@knu2xs
knu2xs / copy_to_clipboard.py
Created January 29, 2024 14:08
Copy to Clipboard Python
def copy_to_clipboard(string: str) -> None:
"""
Copy a string to the system clipboard.
.. note::
This leans on Pandas, so it will not work unless Pandas is installed.
"""
# ensure Pandas is installed and available
if not importlib.util.find_spec('pandas'):
raise EnvironmentError('copy_to_clipboard requires Pandas. Please ensure Pandas is installed in the environment to use.')
@knu2xs
knu2xs / hashing.py
Last active November 3, 2023 13:30
Add a MD5 hash column to a Pandas data frame for change analysis.
from hashlib import md5
import pandas as pd
from typing import Optional, Iterable
def get_md5_from_series(input_iterable: Iterable) -> str:
"""
Create a MD5 hash from an Iterable, typically a row from a Pandas ``DataFrame``, but can be any
Iterable object instance such as a list, tuple or Pandas ``Series``.
Args:
@knu2xs
knu2xs / python-random-date.py
Last active July 17, 2023 14:15
Use python function to generate random date in given calendar year.
# import modules
import random
import datetime
# create function accepting a single parameter, the year as a four digit number
def get_random_date(year):
# try to get a date
try:
return datetime.datetime.strptime('{} {}'.format(random.randint(1, 366), year), '%j %Y')
@knu2xs
knu2xs / get_dataframe.py
Last active June 28, 2023 11:46
Automatically cast a variety of inputs by introspectively detecting the data type and converting to a Spatially Enabled Dataframe
from arcgis.features import GeoAccessor, FeatureLayer
from arcgis.geometry import Geometry
from arcgis.gis import GIS
import pandas as pd
import os
import re
def get_dataframe(in_features, gis=None):
"""
@knu2xs
knu2xs / enrich-direct-all.ipynb
Created April 27, 2021 22:43
Enrich block groups using all variables directly.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@knu2xs
knu2xs / preprocessing.py
Created July 11, 2022 18:27
Integration of Python API in Sci-Kit Transformers
from functools import lru_cache
from typing import Union, List, Optional
from arcgis.geoenrichment import Country
from arcgis.geometry import Polygon
import numpy as np
import pandas as pd
from sklearn.base import BaseEstimator, TransformerMixin
__all__ = ['EnrichBase', 'EnrichPolygon', 'EnrichStandardGeography', 'KeepOnlyEnrichColumns', 'ArrayToDataFrame']
@knu2xs
knu2xs / init.d.md
Last active April 27, 2022 13:43
ArcGIS on Ubuntu Install Notes

No worries. You do need to copy it to the /etc/init.d as you mentioned. This will only tell the system how it can start/stop/restart the arcgisserver service. However you still need to tell it you want it to start at boot.

All the major distros now use something called systemD which handles starting up the system, so you have to do two other small things, listed below, to get it to start on boot.

To interact with system you mainly use the systemctl command to tell it what you want to do .

  1.  The following will tell systemD to start the arcgisserver service.

systemctl start arcgisserver

  1. This next command will tell systemD to start it EACH time it boots up.
@knu2xs
knu2xs / install_vmware_tools_ubuntu.md
Last active April 7, 2022 06:00
Install VMWare Tools on Ubuntu Server

VM > Install VMWare Tools...

Verify changed to VM > Cancel VMWare Tools Installation

$ sudo mkdir /mnt/cdrom
$ sudo mount /dev/cdrom /mnt/cdrom
$ tar -xzvf /mnt/cdrom/VMWareTools* -C /tmp
$ cd /tmp/vmware-tools-distrib
$ sudo ./vmware-install.pl -d

$ sudo init 6

@knu2xs
knu2xs / get_logger.py
Created September 29, 2021 18:26
concisely get an easy to use logger
import logging
from pathlib import Path
from typing import Union
def get_logger(log_path: Union[str, Path] = 'logfile.log', log_name: str = 'logger', log_level: int = logging.ERROR):
"""
Standardized way to create a console and file logger in one.
Args: