Skip to content

Instantly share code, notes, and snippets.

View sfc-gh-mharris's full-sized avatar

Matt Harris sfc-gh-mharris

View GitHub Profile
@sfc-gh-mharris
sfc-gh-mharris / ex_tableA_20240410.json
Created April 11, 2024 18:04
example of JSON with nested values.
{
"id" : "1111",
"updateTimestamp" : "2024-04-10 12:12:12",
"updateUser" : "testuser",
"shareClassId" : "1234",
"shareClassCode" : "9999",
"accountNumber" : "xxxx",
"accountType" : {
"id" : "8888",
"valueType" : "test_type",
@sfc-gh-mharris
sfc-gh-mharris / read_JSON_asVariant.sql
Last active April 11, 2024 18:01
Example of ingesting JSON from text file on external S3 stage.
-- set up db context
CREATE SCHEMA TRP_TESTING_MHARRIS.PRDMSTR;
USE SCHEMA TRP_TESTING_MHARRIS.PRDMSTR;
-- check my S3 Storage Integration
-- docs: https://docs.snowflake.com/en/sql-reference/sql/create-storage-integration
DESC INTEGRATION mharris_trp_S3_Int;
-- Create JSON file format
-- there are more options for this, but here is a basic one.
@sfc-gh-mharris
sfc-gh-mharris / Cloned_Schema_approach.sql
Created March 15, 2024 19:55
A quick example of how you can clone a DB and then create new objects within, as well as replace cloned object within
-- CREATE main DB
CREATE DATABASE TEST;
-- main DB table
CREATE TABLE TEST.PUBLIC.TEST_TABLE AS
SELECT 'A' as COL1,
2 as COL2;
-- clone main DB
CREATE DATABASE TEST_CLONE CLONE TEST;
@sfc-gh-mharris
sfc-gh-mharris / zero shot snowsight UDF.py
Last active March 1, 2024 13:56
Zero-Shot example with optional argument deployed from Snowsight SQL worksheet
create OR REPLACE function get_text_classification(x text, y text)
returns variant
language python
runtime_version = 3.9
imports=('@TIAA_TESTING.NLP_MODEL.model_data_test/bart-large-mnli.joblib')
packages = ('cachetools==4.2.2', 'transformers==4.32.1', 'joblib', 'pytorch')
handler = 'get_text_classification'
as $$
import pandas as pd
import sys
@sfc-gh-mharris
sfc-gh-mharris / Zero Shot with Argument.py
Created February 28, 2024 19:56
Example of Zero Shot Classification UDF that includes an argument to specify the types of class labels to use.
## Changes from previous version
## max_batch_size=10 to control the batch size. with the big pretrained file, a lower number here helps performance
## PandasSeries[list] input so that we can concatenate the class_labels argument
## sentences.iloc[0][1] to extract the class lable from the input series
## sentence[0] to extract the text to be classified from the input series
session.clear_imports()
session.clear_packages()
from snowflake.snowpark.functions import pandas_udf, object_construct, array_construct
@sfc-gh-mharris
sfc-gh-mharris / connection.json
Created February 20, 2024 17:32
example of a Snowpark connection JSON
{
"user": "YOUR USER NAME",
"password": "YOUR PASSWORD",
"role": "ACCOUNTADMIN",
"account": "ORG-ACCOUNT",
"warehouse": "VWH NAME",
"database": "DB NAME",
"schema": "SCHEMA NAME"
}
@sfc-gh-mharris
sfc-gh-mharris / zero short workflow.py
Created February 20, 2024 17:31
Workflow for authenticating and pushing UDF to Snowflake
import traceback
import pandas as pd
# from sklearn.metrics import accuracy_score, classification_report
# from sklearn.model_selection import train_test_split
# from sklearn.preprocessing import LabelEncoder
from torch import Tensor, device
from torch.utils.data import DataLoader
from transformers import BartTokenizerFast
from transformers import pipeline, BartForSequenceClassification, Trainer, TrainingArguments, \
EvalPrediction, AdamW, get_linear_schedule_with_warmup
@sfc-gh-mharris
sfc-gh-mharris / zero shot example.py
Created February 20, 2024 17:21
Example of zero shot classification using Transformers as a Python UDF
## function specific imports need to be in the same notebook cell to be compiled with UDF
## cachetools is a helpeer library to cache the model file for faster loading
import cachetools
import sys
import joblib
## This @ decrator tells Snowpark to cache this function
## Note: this is an arbitrary python function, not a UDF or SPROC
## Note: sys._xoptions.get("snowflake_import_directory") is a system function to get the location of the import dir
## Note: serializing the model with joblib is optimal over a zip/gzip
@sfc-gh-mharris
sfc-gh-mharris / snowpark_torch.yml
Created February 15, 2024 21:37
Conda environment build for TIAA Zero-shot example
name: snowpark_torch_v2_1_12_1
channels:
- https://repo.anaconda.com/pkgs/snowflake
dependencies:
- python=3.9
- pip
- traceback2
- pandas
- scikit-learn
- pytorch
@sfc-gh-mharris
sfc-gh-mharris / boto3_copy_SPROC.sql
Created January 19, 2024 14:25
A simple use of Snowflake External Network integration to use Boto3 to copy files between S3 buckets
SET AWS_ACCESS_KEY_ID='YOUR AWS_Access_Key_ID';
SET AWS_SECRET_ACCESS_KEY='YOUR AWS_Secret_Access_Key';
CREATE OR REPLACE SECRET my_aws_access_key
TYPE = GENERIC_STRING
SECRET_STRING = $AWS_ACCESS_KEY_ID;
CREATE OR REPLACE SECRET my_aws_secret_key
TYPE = GENERIC_STRING