Skip to content

Instantly share code, notes, and snippets.

View ramcandrews's full-sized avatar
🏠
Working from home

Ryan McAndrews ramcandrews

🏠
Working from home
View GitHub Profile
@ramcandrews
ramcandrews / iterate and pop dict.py
Last active September 27, 2019 06:14
Normally you can't remove items from a dictionary while you are iterating over it. But if you convert it to a list first, there are no problems.
origDict = {
'First name': 'Ryan',
'Last name': 'M',
'Subject': 'AI',
'task': 'Cleaning Data'
}
removedItem = origDict.pop('Last name') # this is normal usage of pop()
print(origDict)
print('value = ' + removedItem)
@ramcandrews
ramcandrews / pytorch_image_folder_with_file_paths.py
Created November 17, 2019 09:22 — forked from andrewjong/pytorch_image_folder_with_file_paths.py
PyTorch Image File Paths With Dataset Dataloader
import torch
from torchvision import datasets
class ImageFolderWithPaths(datasets.ImageFolder):
"""Custom dataset that includes image file paths. Extends
torchvision.datasets.ImageFolder
"""
# override the __getitem__ method. this is the method that dataloader calls
def __getitem__(self, index):
@ramcandrews
ramcandrews / Google colab load data from google drive
Created February 17, 2020 06:30
Use this code in the top cell of your colab notebook to read data from google drive. It if you are loading thousands of image files or thousands of small files it is slower than just a few larger files.
from google.colab import drive
drive.mount('/content/drive/')
data_dir = '/content/drive/My Drive/Colab Notebooks/<your assets>/'
@ramcandrews
ramcandrews / pytorch chunk for RNN.py
Last active February 26, 2020 08:55
Batch data into chunks using the pytorch TensorDataset and Dataloader classes
from torch.utils.data import TensorDataset, DataLoader
import torch
# Check for a GPU
train_on_gpu = torch.cuda.is_available()
if not train_on_gpu:
print('No GPU found. Please use a GPU to train your neural network.')
def batch_data(words, sequence_length, batch_size):
# this is mor than 100 years of global weather data 110GB
wget https://www.ncei.noaa.gov/data/global-hourly/archive/csv/{1901..2020}.tar.gz
first create a spatial lite db file. the sqlite file will be more than twice as large as the GDB directory.
ogr2ogr -f SQlite db.sqlite -f OpenFileGDB -overwrite tlgdb_2019_a_us_areawater.gdb
Regex for matching ALL Japanese common & uncommon Kanji (4e00 – 9fcf) ~ The Big Kahuna!
([一-龯])
Regex for matching Hirgana or Katakana
([ぁ-んァ-ン])
Regex for matching Non-Hirgana or Non-Katakana
([^ぁ-んァ-ン])
Regex for matching Hirgana or Katakana or basic punctuation (、。’)
@ramcandrews
ramcandrews / email regex
Last active March 17, 2022 06:55
General Email Regex (RFC 5322 Official Standard) https://www.regular-expressions.info/email.html
(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])
[^@ \t\r\n]+@[^@ \t\r\n]+\.[^@ \t\r\n]+
[^@ \\t\\r\\n] matches for anything other than @, space, tab, new lines repetitions of a non-whitespace character.
https://ihateregex.io/expr/email/ (04/17/2022)
@ramcandrews
ramcandrews / regexJP.py
Last active July 26, 2022 04:39
a python regex to grab every japanese word from an HTML file
import re
with open(rootdir + "something in japanese.html", encoding='utf-8', errors='ignore') as reader:
for line in reader:
words = re.findall(r"[一-龯ぁ-んァ-ン!:/・()ー]*", line)
for word in words:
if word:
print(word)