Skip to content

Instantly share code, notes, and snippets.

View mbstacy's full-sized avatar

Mark Stacy mbstacy

  • Harvard University
  • Boston, MA
  • 17:21 (UTC -04:00)
View GitHub Profile
@mbstacy
mbstacy / anchorLinks.js
Last active March 5, 2024 16:34
Apigee Portal Anchor Links and Overview Navigation

Data Lake API

The CU Boulder Library uses the Cybercommons framework to access the data lake API. The backend database uses Mongo Database as the data document store. The query and aggregation pipeline leverages the MongoDB Query language. The language is passed through URL parameters using the JSON format.

Basic Parameters

- default format is a web presentation with clickable links 
  ?format=json
  ?format=xml

?format=yaml

import PyPDF2
from sys import argv
def getTextPdf(filename):
'''
Reads entire file and returns text
'''
#open allows you to read the file
with open(filename,'rb') as pdfFileObj:
#The pdfReader variable is a readable object that will be parsed
@mbstacy
mbstacy / EKSvsEC2.md
Last active September 20, 2023 04:53
EKS vs EC2 Kubernetes Costs

Kubernetes Cost Comparison EC2 vs EKS

EKS Kubernetes

AWS EKS service provides the Kubernetes control plane. Price for EKS Service is $0.20 per hour / cluster.

EKS Use Case Monthly Cost

Three clusters with four worker nodes.

import pandas as pd
#error file generated in bash: $cat dm-ir.log | awk '{print $5,$8}' > errors2.txt
err=pd.read_csv('errors2.txt',delim_whitespace=True,header=None )
err.columns=['error_type','context_key']
err=err.drop_duplicates()
#Main inventory
df=pd.read_csv('data/inventory-2019-05-30.csv',converters={i: str for i in range(0, 83)})
df= df.drop_duplicates()
import requests, json, os
from sys import argv
catalog_url = "https://libapps.colorado.edu/api/catalog/data/catalog/cuscholar.json"
headers={"Content-Type":"application/json","Authorization":"Token {0}".format(os.getenv('LIBAPPS_APITOKEN'))}
def get_cuscholar_data():
#query='query={"filter":{},"projection":{"data_files.s3.key":1,"title":1,"_id":0}}'
#url = "{0}?page_size=100&{1}".format(catalog_url,query)
import os
def makeCamelCase(name,filename=False,removeSpecial=False):
"""
This funciton produces camelCase variable names or filenames. You can provide option to remove special characters.
ARGS:
name (string)
KWARGS:
filename (Boolean) - default False
removeSpecial (Boolean) - default false
import xmltodict
import sys
def read_xml(filename):
with open(filename,'r') as f1:
return xmltodict.parse("<root>{0}</root>".format(f1.read()),cdata_key='text',attr_prefix='',dict_constructor=dict)
def search_subjects(term,doc):
total_hits=0
total_records=0

gitPushAdminEnforcement

gitPushAdminEnforcement uses Github API to turn off admin enforcement, run git push, and re-enable admin enforcement.

Installation:

  1. pip install PyGithub
  2. Copy gitPushAdminEnforcement in a bin directory (eg: ~/.local/bin)
  3. Add bin directory to PATH variable
  4. chmod +x ~/.local/bin/gitPushAdminEnforcement