Skip to content

Instantly share code, notes, and snippets.

Poul Petersen petersen-poul

Block or report user

Report or block petersen-poul

Hide content and notifications from this user.

Learn more about blocking users

Contact Support about this user’s behavior.

Learn more about reporting abuse

Report abuse
View GitHub Profile
View latlong-pair-dist.json
{
"name": "Lat/Long Distance between a pair of points",
"description": "Extends a dataset with the distance in meters between pairs of lat/long fields.",
"inputs": [
{
"name": "dataset-in",
"type": "dataset-id",
"description": "Dataset for extending with distance calculation."
},
{
View custom-date-formater.json
{
"name": "Apply Date Format by Field Name",
"description": "Allows applying a custom date format to a source by matching field names",
"inputs": [
{
"name": "source",
"description": "Source to update",
"type": "source-id"
}
],
View one-click-cluster-labels.json
{
"name": "One Click Cluster Labels",
"description": "Given a cluster as input, assigns the cluster label to every instance in the dataset used to train the cluster.",
"inputs": [
{
"name": "cluster",
"description": "Cluster to use for labeling",
"type": "cluster-id"
}
],
View fieldtype-by-fieldname.json
{
"name": "Assign Field Types by Field Name",
"description": "Sometimes, the automatic field detection does not assign field types correctly. This is especially a problem with fields that have a lot of missing values since the detection process only takes a peek at the data to determine if a field should be numeric, categorical, etc. This script allows you to alter the field types for a source based on the name of each field. Just put a partial match for the name in the list for the type you want to assign, and it will change all the fields whose name contain that string.",
"inputs": [
{
"name": "source",
"description": "Source to update.",
"type": "source-id"
},
{
View one-click-dataset-prefer-it-all.json
{
"name": "1-Click Dataset Prefer-it-All",
"description": "Given a source, creates a 1-click dataset and then marks all non-preferred field as preferred.",
"inputs": [
{
"name": "source",
"type": "source-id",
"description": "Source to process."
}
],
View min-scale-class-purity.json
{
"name": "Minimum Scale for Cluster Class Purity",
"description": "Given a dataset and a categorical field, finds the minimum scale required to create class purity in the cluster with k = number of classes.",
"inputs": [
{
"name": "dataset",
"type": "dataset-id",
"description": "Dataset to analyze."
},
{
View json-extract-simple.json
{
"name": "Simple JSON key/val extraction",
"description": "Given a dataset field containing JSON documents and a key, this WhizzML script creates a new feature with the JSON values. This is a hack and *NOT* a valid JSON parser",
"inputs": [
{
"name": "dataset-in",
"type": "dataset-id",
"description": "Dataset to transform by extracting JSON values."
},
{
@petersen-poul
petersen-poul / json-extract.json
Last active May 12, 2016
JSON key/val extraction
View json-extract.json
{
"name": "JSON key/val extraction",
"description": "Given a dataset field containing JSON documents and a key, this WhizzML script creates a new feature with the JSON values. This is a hack and *NOT* a valid JSON parser",
"inputs": [
{
"name": "dataset-in",
"type": "dataset-id",
"description": "Dataset to transform by extracting JSON values."
},
{
View redfin-deals.json
{
"name": "Redfin Deals",
"description": "Given a source of sold homes and listed homes, builds a model to predict the price and then shows possible deals.",
"inputs": [
{
"name": "redfin-sold-source",
"type": "source-id",
"description": "Source of sold homes from Redfin."
},
{
View latlong-ref-dist.json
{
"name": "Lat/Long Distance from a reference point",
"description": "Extends a dataset with the distance in meters between lat/long fields and a reference point.",
"inputs": [
{
"name": "dataset-in",
"type": "dataset-id",
"description": "Dataset for extending with distance calculation."
},
{
You can’t perform that action at this time.