Skip to content

Instantly share code, notes, and snippets.

@zeryx
zeryx / algorithm.py
Last active Dec 13, 2021
Scikit learn Algorithmia demo using the Model Manifest system to tie model data and code together immutably
View algorithm.py
from Algorithmia import ADK
import joblib
## This function uses the model manifest `state` or `modelData` class to get model files defined in the model manifest automatically.
## No client work required, just make sure the name in `get_model` matches the name in your model manifest.
def load(state):
state['model'] = joblib.load(state.get_model("model"))
state['vectorizer'] = joblib.load(state.get_model("vectorizer"))
return state
@zeryx
zeryx / model_reloading.py
Last active Sep 24, 2021
An algorithm that attempts to reload it's model file (if it's been updated) every 5 minutes
View model_reloading.py
import Algorithmia
from time import time
import pickle
from src.data import data
client = Algorithmia.client()
DATA_MODEL_DIR = "data://.my/example"
MODEL_NAME = "example.pkl"
TIME_0 = 0
LAST_MODIFIED = ""
@zeryx
zeryx / algorithm_with_lock.py
Created Sep 24, 2021
This algorithm synchronously checks a resource file by ensuing a lock file doesn't already exist.
View algorithm_with_lock.py
from Algorithmia import ADK
import Algorithmia
from time import sleep, time
state_file_path = "data://.my/locking/resource.json"
lock_file_path = "data://.my/locking/lock"
client = Algorithmia.client()
class AlgorithmiaLock(object):
@zeryx
zeryx / algorithm_process_pandas_df.py
Created Mar 19, 2021
algorithm api calls with large pandas dataframe objects, example of using the data API
View algorithm_process_pandas_df.py
import Algorithmia
import pandas as pd
client = Algorithmia.client()
def apply(input):
input_dataframe = pd.DataFrame.from_dict(client.file(input).getJson())
...
...
@zeryx
zeryx / generative_model_finetuning.py
Created Dec 9, 2020
finetuning a GPT-2 model to handle a character list
View generative_model_finetuning.py
from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers import AdamW
from random import choice
from torch.nn import functional as F
import torch
tokenizer = AutoTokenizer.from_pretrained("gpt2")
model = AutoModelForCausalLM.from_pretrained("gpt2").to('cuda')
View subseq.py
import sys
from time import perf_counter
class SequenceDiscoveryNode:
def __init__(self, parent, value):
self.parent = parent
self.children = []
self.value = value
def construct_tree(self, remaining_sequence: list):
View mleap_spark.md

Mleap + Algorithmia: When to leave your spark pipeline behind for scalable deployment

Intro

Spark is a very powerful big data processing system thats capable of insane workloads. Sometimes though, there are critical paths that don't scale as effectively as you might want. In this blog post, we'll be discussing Spark, Spark Pipelines - and how you might be able to export a critical component from your spark project to Algorithmia by using the MLeap model interchange format & runtime.

What makes Spark great?

Apache Spark is at it's core a distributed data transformation engine for very large datasets and workloads. It links directly with very powerful and battle tested distributed data systems like Hadoop and Cassandra which are industry standard for working in spaces such as the financial industry.

@zeryx
zeryx / Algorithm.scala
Created Sep 28, 2020
MLeap runtime project, for running a Spark model on Algorithmia
View Algorithm.scala
package com.algorithmia
import com.algorithmia.handler.AbstractAlgorithm
import ml.combust.bundle.BundleFile
import ml.combust.bundle.dsl.Bundle
import ml.combust.mleap.core.types._
import ml.combust.mleap.runtime.MleapSupport._
import ml.combust.mleap.runtime.frame.{DefaultLeapFrame, Row, Transformer}
import scala.collection.mutable
View procedure_and_agenda.md

What are we going to do today?

  • We'll understand what this gitlab -> algorithmia integration does
  • Gitlab -> Algorithmia Procedure:
    • Create a new Algorithm on Algorithmia
    • Create a new project in Gitlab
    • Add our secret variables to the GitLab project from Algorithmia
    • Clone both git repositories to our local system
  • Copy over template code from the Algorithmia repo to the Gitlab repo
@zeryx
zeryx / databricks_mleap_example.scala
Last active Oct 7, 2020
spark-shell code used to create an example spark pipeline, and serialize it mleap
View databricks_mleap_example.scala
import ml.combust.bundle.BundleFile
import ml.combust.mleap.spark.SparkSupport._
import org.apache.spark.ml.Pipeline
import org.apache.spark.ml.bundle.SparkBundleContext
import org.apache.spark.ml.feature.{Binarizer, StringIndexer}
import org.apache.spark.sql._
import org.apache.spark.sql.functions._
import resource._
val datasetName = "example-data.csv"
val dataframe: DataFrame = spark.sqlContext.read.format("csv").option("header", true).load(datasetName).withColumn("test_double", col("test_double").cast("double"))