Skip to content

Instantly share code, notes, and snippets.

@mmerce
Last active October 6, 2017 12:55
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mmerce/976a0aba942d69eb89d2b02306c35d0e to your computer and use it in GitHub Desktop.
Save mmerce/976a0aba942d69eb89d2b02306c35d0e to your computer and use it in GitHub Desktop.
Local batch centroid using Python bindings

Python example: local batch centroid

Using the Python bindings and its local cluster object to find the centroids for a CSV file.

Usage:

cat input.csv | python local_batch_centroid.py > centroids.csv

Notes:

Input.csv is expected to start with a headers row and then the values to be used in the bath centroid. Cluster id is hardcoded in the local_batch_centroid.py file. BigML credentials are expected to be available through the environment variables, but can also be provided in the code as shown in the commented paragraph. Requirements

The bigml module is needed. To install it you can use pip:

pip install bigml

The code has been tested in python 2.7

import csv
import sys
import StringIO
from bigml.api import BigML
from bigml.cluster import Cluster
# If credentials are properly set in environment variables, there's no need
# to explicitly create the api object. Otherwise, use next code to set them and
# uncomment the two lines below:
# api = BigML("username", "api-key")
# local_cluster = Cluster('cluster/59d6a1a3364527289b000218', api=api)
local_cluster = Cluster('cluster/59d6a1a3364527289b000218')
# reading the input from stdin
input_stream = StringIO.StringIO(sys.stdin.read())
# reading the CSV into a Python dictionary
reader = csv.DictReader(input_stream)
# we will write to stdout, but can write to any file-like object
output_stream = sys.stdout
# Writing the output
writer = csv.DictWriter(output_stream,
fieldnames=["centroid_name",
"centroid_id",
"distance"])
# predicting
for data_row in reader:
output = local_cluster.centroid(data_row)
writer.writerow(output)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment