Skip to content

Instantly share code, notes, and snippets.

View avisionh's full-sized avatar
🧽

A Ho avisionh

🧽
View GitHub Profile
@avisionh
avisionh / compare_csv_files.py
Created June 13, 2025 10:28
Identifies differences in records between two .csv files that are the same format but may have different rows/records.
import os
import json
def compare_files(file1_path, file2_path):
"""
Compares two files line by line and identifies unique records in each.
Args:
file1_path (str): The path to the first file.
@avisionh
avisionh / .secrets
Created March 7, 2021 11:11
Example file for managing secrets
# Secrets and credentials should be stored here as environmental variables. For example:
#
# # Google Cloud authentication credentials
# export GOOGLE_APPLICATION_CREDENTIALS="path/to/credentials.json"
#
# These environment variables can then be read in by Python using `os.getenv`:
#
# --------------------------------------------------------
# import os
#
@avisionh
avisionh / if_else.py
Created March 2, 2021 21:18
Easy if-else on column in Pandas dataframe
# https://stackoverflow.com/a/42260631/13416265
import pandas as pd
# create dataframe
df = pd.DataFrame(data={'INDICATOR': ['A', 'B', 'C', 'D'],
'VALUE': [10, 9, 8, 7]})
# create translation dictionary
values_dict = {'A': 1, 'B': 2, 'C': 3, 'D': 4}

Cypher style guide as advocated by Neo4j:

  • Node labels are CamelCase and begin with an upper-case letter (examples: Person, NetworkAddress). Note that node labels are case-sensitive.
  • Property keys, variables, parameters, aliases, and functions are camelCase and begin with a lower-case letter (examples: businessAddress, title). Note that these elements are case-sensitive.
  • Relationship types are in upper-case and can use the underscore. (examples: ACTED_IN, FOLLOWS). Note that relationship types are case-sensitive and that you cannot use the “-” character in a relationship type.
  • Cypher keywords are upper-case (examples: MATCH, RETURN). Note that Cypher keywords are case-insensitive, but a best practice is to use upper-case.
  • String constants are in single quotes, unless the string contains a quote or apostrophe (examples: ‘The Matrix’, “Something’s Gotta Give”). Note that you can also escape single or double quotes within strings that are quoted with the same using a backslash character.
// examine data model
CALL db.schema.visualization();
// return 6 movie nodes released after 2005
MATCH (film:Movie)
WHERE film.released > 2005
RETURN film
LIMIT 6;
// count number of movies nodes released after 2005
@avisionh
avisionh / parallelise_dataframe.py
Created September 7, 2020 10:49
Function to parallelise a function with kwargs on a dataframe
from multiprocessing import Pool
from functools import partial
from tqdm import tqdm
import pandas as pd
def parallelise_dataframe(df, func, n_cores=1, n_splits=1, **kwargs):
""" Apply a function on a dataframe in parallel
reference: https://towardsdatascience.com/make-your-own-super-pandas-using-multiproc-1c04f41944a1#6028
reference: https://stackoverflow.com/questions/34031681/passing-kwargs-with-multiprocessing-pool-map
@avisionh
avisionh / #extract_filename_url
Last active August 19, 2020 12:52
Extracts attachment file names in web URLs
We couldn’t find that file to show.
@avisionh
avisionh / encode_label.py
Created July 28, 2020 08:44
Short code example to get integer representations of categorical variables. Useful for encoding y in a multiclass text classification with Keras. Also conduct one-hot encoding.
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import OneHotEncoder
import numpy as np
# create data
cities = ['London', 'Shanghai', 'Seoul', 'Seoul', 'London']
n_rows = len(cities)
# represent/encode these categories as integers
@avisionh
avisionh / getdatagithubapi.py
Created March 27, 2020 08:00
Does not work because have 2FA. Attempts to pull data from a repo on GitHub via the PyGitHub API
# doesn't work because have 2FA
# https://pygithub.readthedocs.io/en/latest/introduction.html
# https://pygithub.readthedocs.io/en/latest/examples/Repository.html#get-a-specific-content-file
from github import Github
# create Github instance
password = os.getenv("GITHUB_PASSWORD")
g = Github("avisionh", password)
repo = g.get_repo("lukes/ISO-3166-Countries-with-Regional-Codes")
@avisionh
avisionh / gethtmltable.py
Created March 27, 2020 07:54
Example of extracting tables from HTML webpages
import pandas as pd
import requests
# get country lookups
url = "https://unstats.un.org/unsd/methodology/m49/"
html = requests.get(url).content
data_countries = pd.read_html(html)
# take first list element which is in English
data_countries = data_countries[0]