Skip to content

Instantly share code, notes, and snippets.

zeryx /
Last active Dec 13, 2021
Scikit learn Algorithmia demo using the Model Manifest system to tie model data and code together immutably
from Algorithmia import ADK
import joblib
## This function uses the model manifest `state` or `modelData` class to get model files defined in the model manifest automatically.
## No client work required, just make sure the name in `get_model` matches the name in your model manifest.
def load(state):
state['model'] = joblib.load(state.get_model("model"))
state['vectorizer'] = joblib.load(state.get_model("vectorizer"))
return state
zeryx /
Last active Sep 24, 2021
An algorithm that attempts to reload it's model file (if it's been updated) every 5 minutes
import Algorithmia
from time import time
import pickle
from import data
client = Algorithmia.client()
DATA_MODEL_DIR = "data://.my/example"
MODEL_NAME = "example.pkl"
TIME_0 = 0
zeryx /
Created Sep 24, 2021
This algorithm synchronously checks a resource file by ensuing a lock file doesn't already exist.
from Algorithmia import ADK
import Algorithmia
from time import sleep, time
state_file_path = "data://.my/locking/resource.json"
lock_file_path = "data://.my/locking/lock"
client = Algorithmia.client()
class AlgorithmiaLock(object):
zeryx /
Created Mar 19, 2021
algorithm api calls with large pandas dataframe objects, example of using the data API
import Algorithmia
import pandas as pd
client = Algorithmia.client()
def apply(input):
input_dataframe = pd.DataFrame.from_dict(client.file(input).getJson())
zeryx /
Created Dec 9, 2020
finetuning a GPT-2 model to handle a character list
from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers import AdamW
from random import choice
from torch.nn import functional as F
import torch
tokenizer = AutoTokenizer.from_pretrained("gpt2")
model = AutoModelForCausalLM.from_pretrained("gpt2").to('cuda')
import sys
from time import perf_counter
class SequenceDiscoveryNode:
def __init__(self, parent, value):
self.parent = parent
self.children = []
self.value = value
def construct_tree(self, remaining_sequence: list):

Mleap + Algorithmia: When to leave your spark pipeline behind for scalable deployment


Spark is a very powerful big data processing system thats capable of insane workloads. Sometimes though, there are critical paths that don't scale as effectively as you might want. In this blog post, we'll be discussing Spark, Spark Pipelines - and how you might be able to export a critical component from your spark project to Algorithmia by using the MLeap model interchange format & runtime.

What makes Spark great?

Apache Spark is at it's core a distributed data transformation engine for very large datasets and workloads. It links directly with very powerful and battle tested distributed data systems like Hadoop and Cassandra which are industry standard for working in spaces such as the financial industry.

zeryx / Algorithm.scala
Created Sep 28, 2020
MLeap runtime project, for running a Spark model on Algorithmia
View Algorithm.scala
package com.algorithmia
import com.algorithmia.handler.AbstractAlgorithm
import ml.combust.bundle.BundleFile
import ml.combust.bundle.dsl.Bundle
import ml.combust.mleap.core.types._
import ml.combust.mleap.runtime.MleapSupport._
import ml.combust.mleap.runtime.frame.{DefaultLeapFrame, Row, Transformer}
import scala.collection.mutable

What are we going to do today?

  • We'll understand what this gitlab -> algorithmia integration does
  • Gitlab -> Algorithmia Procedure:
    • Create a new Algorithm on Algorithmia
    • Create a new project in Gitlab
    • Add our secret variables to the GitLab project from Algorithmia
    • Clone both git repositories to our local system
  • Copy over template code from the Algorithmia repo to the Gitlab repo
zeryx / databricks_mleap_example.scala
Last active Oct 7, 2020
spark-shell code used to create an example spark pipeline, and serialize it mleap
View databricks_mleap_example.scala
import ml.combust.bundle.BundleFile
import ml.combust.mleap.spark.SparkSupport._
import{Binarizer, StringIndexer}
import org.apache.spark.sql._
import org.apache.spark.sql.functions._
import resource._
val datasetName = "example-data.csv"
val dataframe: DataFrame ="csv").option("header", true).load(datasetName).withColumn("test_double", col("test_double").cast("double"))