Skip to content

Instantly share code, notes, and snippets.

View zeryx's full-sized avatar

zeryx zeryx

View GitHub Profile
zeryx /
Last active December 13, 2021 16:53
Scikit learn Algorithmia demo using the Model Manifest system to tie model data and code together immutably
from Algorithmia import ADK
import joblib
## This function uses the model manifest `state` or `modelData` class to get model files defined in the model manifest automatically.
## No client work required, just make sure the name in `get_model` matches the name in your model manifest.
def load(state):
state['model'] = joblib.load(state.get_model("model"))
state['vectorizer'] = joblib.load(state.get_model("vectorizer"))
return state
zeryx /
Last active September 24, 2021 17:31
An algorithm that attempts to reload it's model file (if it's been updated) every 5 minutes
import Algorithmia
from time import time
import pickle
from import data
client = Algorithmia.client()
DATA_MODEL_DIR = "data://.my/example"
MODEL_NAME = "example.pkl"
TIME_0 = 0
zeryx /
Created September 24, 2021 16:30
This algorithm synchronously checks a resource file by ensuing a lock file doesn't already exist.
from Algorithmia import ADK
import Algorithmia
from time import sleep, time
state_file_path = "data://.my/locking/resource.json"
lock_file_path = "data://.my/locking/lock"
client = Algorithmia.client()
class AlgorithmiaLock(object):
zeryx /
Created March 19, 2021 16:45
algorithm api calls with large pandas dataframe objects, example of using the data API
import Algorithmia
import pandas as pd
client = Algorithmia.client()
def apply(input):
input_dataframe = pd.DataFrame.from_dict(client.file(input).getJson())
zeryx /
Created December 9, 2020 16:54
finetuning a GPT-2 model to handle a character list
from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers import AdamW
from random import choice
from torch.nn import functional as F
import torch
tokenizer = AutoTokenizer.from_pretrained("gpt2")
model = AutoModelForCausalLM.from_pretrained("gpt2").to('cuda')
import sys
from time import perf_counter
class SequenceDiscoveryNode:
def __init__(self, parent, value):
self.parent = parent
self.children = []
self.value = value
def construct_tree(self, remaining_sequence: list):
zeryx /
Last active September 28, 2020 21:17

Mleap + Algorithmia: When to leave your spark pipeline behind for scalable deployment


Spark is a very powerful big data processing system thats capable of insane workloads. Sometimes though, there are critical paths that don't scale as effectively as you might want. In this blog post, we'll be discussing Spark, Spark Pipelines - and how you might be able to export a critical component from your spark project to Algorithmia by using the MLeap model interchange format & runtime.

What makes Spark great?

Apache Spark is at it's core a distributed data transformation engine for very large datasets and workloads. It links directly with very powerful and battle tested distributed data systems like Hadoop and Cassandra which are industry standard for working in spaces such as the financial industry.

zeryx / Algorithm.scala
Created September 28, 2020 21:17
MLeap runtime project, for running a Spark model on Algorithmia
package com.algorithmia
import com.algorithmia.handler.AbstractAlgorithm
import ml.combust.bundle.BundleFile
import ml.combust.bundle.dsl.Bundle
import ml.combust.mleap.core.types._
import ml.combust.mleap.runtime.MleapSupport._
import ml.combust.mleap.runtime.frame.{DefaultLeapFrame, Row, Transformer}
import scala.collection.mutable

What are we going to do today?

  • We'll understand what this gitlab -> algorithmia integration does
  • Gitlab -> Algorithmia Procedure:
    • Create a new Algorithm on Algorithmia
    • Create a new project in Gitlab
    • Add our secret variables to the GitLab project from Algorithmia
    • Clone both git repositories to our local system
  • Copy over template code from the Algorithmia repo to the Gitlab repo
zeryx / databricks_mleap_example.scala
Last active October 7, 2020 20:02
spark-shell code used to create an example spark pipeline, and serialize it mleap
import ml.combust.bundle.BundleFile
import ml.combust.mleap.spark.SparkSupport._
import{Binarizer, StringIndexer}
import org.apache.spark.sql._
import org.apache.spark.sql.functions._
import resource._
val datasetName = "example-data.csv"
val dataframe: DataFrame ="csv").option("header", true).load(datasetName).withColumn("test_double", col("test_double").cast("double"))