Skip to content

Instantly share code, notes, and snippets.

@jaor
jaor / histogram.clj
Created June 15, 2011 18:34
Single-pass histogram
(ns histogram
(:use
[clojure.contrib.math :only [abs]]
[clojure.contrib.priority-map]))
(defn- bin-weight [bin]
(* (double (first bin)) (double (second bin))))
(defn- combine-bins [prev-bin next-bin]
(let [first-weight (bin-weight prev-bin)
@jaor
jaor / JSON Model Schemas.md
Last active December 11, 2015 07:19 — forked from aficionado/JSON PML schemas.md
JSON schemas for models

JSON schemas for models

  • model-schema.json A generic ML model, containing fields shared by most models despite of their concrete type. It uses:
    • sample-schema.json The schema for dataset sampling specifications
    • field-collection-schema.json Auxiliary schema describing a collection of field (or "properties") descriptors
    • generic-field-schema.json Properties shared by all fields, regardless of their type.
    • field-schema.json The union schema of all field descriptor types, with their specific properties.
  • tree-model-schema.json A specialization of the model schema to decision tree models. It uses:
  • node-schema The schema for the nodes in a decision tree
@jaor
jaor / parallel-script.whizzml
Last active November 8, 2016 17:58
model and evaluate all sources in a project
(define (create-evaluations project-id)
(let (src-ids (resource-ids (list-sources {"project" project-id})))
(for (src-id src-ids)
(let (ds-id (create-dataset src-id)
[train-id test-id] (create-random-dataset-split ds-id 0.8)
ens-id (create-ensemble train-id {"number_of_models" 100}))
(create-evaluation test-id ens-id)))))
(define result (wait* (create-evaluations "project/xxxxxxxxxx")))
@jaor
jaor / metadata.json
Last active November 9, 2017 04:31
Create batch prediction dataset with predictions from multiple models
{
"name": "Muti-model predictions",
"kind": "script",
"description": "Generate predictions for a fixed number of models",
"source_code": "multibatch.whizzml",
"inputs":[
{
"name": "model1",
"type": "model-id",
"description": "first model"
@jaor
jaor / metadata.json
Last active December 22, 2017 02:39
Remove text field terms
{
"name": "Prune terms",
"kind": "script",
"description": "Removes from a text fields all entries with low coverage",
"source_code": "script.whizzml",
"inputs":[
{
"name": "dataset-id",
"type": "dataset-id",
"description": "The original dataset"
@jaor
jaor / metadata.json
Last active July 11, 2018 16:54
Filtered timeseries
{
"name": "Filtered timeseries",
"kind": "script",
"description": "Takes a filter field and an objective field and creates a timeseries for each category in the filter field, collecting all resulting forecasts in a new dataset",
"source_code": "script.whizzml",
"inputs":[
{
"name": "dataset",
"type": "dataset-id",
"description": "Input dataset"
@jaor
jaor / metadata.json
Last active August 23, 2018 22:08
Name topics
{
"name": "Name topics",
"kind": "script",
"description": "Give a name to all topics in a topicmodel",
"source_code": "script.whizzml",
"imports":[
],
"inputs":[
{
"name": "topic-model",
@jaor
jaor / script.whizzml
Last active November 22, 2018 21:10
Model and evaluate over different ranges of a dataset rows
(define (model-range dataset from to)
(create-model dataset {"range" [from to]}))
(define (eval-range dataset model from to)
(let (ev-id (create-evaluation dataset model {"range" [from to]}))
[ev-id ((fetch (wait ev-id)) ["result" "model" "average_phi"])]))
(define (size-evaluations dataset-id steps)
(let (ds (fetch dataset-id)
rows (ds "rows")
@jaor
jaor / metadata.json
Last active December 20, 2018 04:57
Create a model using only the most important features in another one
{
"name": "select-important",
"kind": "script",
"description": "Select the important features from an existig moddel to create a new one",
"source_code": "script.whizzml",
"imports":[
],
"inputs":[
{
"name": "model-id",
@jaor
jaor / metadata.json
Last active December 20, 2018 05:03
Mark fields with a prefix as non-preferred
{
"name": "Mark non-preferred",
"kind": "script",
"description": "Given a dataset, mark as non-preferred fields starting with a prefix",
"source_code": "script.whizzml",
"imports":[
],
"inputs":[
{
"name": "dataset-id",