Skip to content

Instantly share code, notes, and snippets.

@sjlongland
Created January 23, 2016 11:02
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save sjlongland/02ecb4f3119b438ca26d to your computer and use it in GitHub Desktop.
Save sjlongland/02ecb4f3119b438ca26d to your computer and use it in GitHub Desktop.
pyhaystack API ideas

Project Haystack API ideas

The Project Haystack interface is largely based on plain GET/POST requests made over either HTTP or HTTPS. Authentication differs between versions, but common methods include custom headers (e.g. SkySpark, WideSky) and conventional HTTP authentication (nHaystack).

Having authenticated, a couple of API endpoints allow us to discover what sort of server we are talking to. It's a shame that the authentication method isn't discoverable in the same manner, but that's life.

Probing the server

The about, format and ops endpoints are valuable for figuring out how to best interact with the system:

  • about: gives us detail such as the product name and version used to implement Haystack.
  • formats tells us what encoding formats are supported.
  • ops tells us what operations are supported.

A de-facto standard has arisen with Project Haystack in the ZINC format. This is basically a "souped up CSV" variant, likely with a "different-enough" syntax that would trip up a conventional CSV parser, thus the hszinc module was created to be able to work with it.

It seems fair to assume that ZINC is supported universally. JSON is problematic as some implementations use a non-Haystack compliant encoding for JSON. (e.g. nHaystack represents markers using a Unicode tick mark, instead of the prescribed 'm:' format.), so seeing 'application/json' and assuming it is Haystack JSON is probably not wise. Thus on identification of the server implementation, JSON should only be used on servers known to obey the current standards or known to have issues with ZINC.

The output of the ops call could be used to enable or disable features of the client. Thus if a particular operation is not supported by the server, the methods implementing it should raise NotImplementedError.

Data representation

Where possible, data should be passed around in native Python types. Encoding of data in requests to wire format should be done at the last possible moment and decoding of the response should be done at the earliest possible moment.

The hszinc attempts to cater for this, providing some specialised types to plug holes not filled by the standard Python types.

Error handling

Since we're doing procedure calls over a wide area network, there is plenty of opportunity for a request to fail. It is thus imperative that we alert the caller to problems as they arrive by raising an appropriate exception class.

In the case of the requests module in Python, there is a convenience for doing exactly this: calling raise_for_status() on the Response object provided.

Core methods

Interaction with a given Haystack server is usually fairly consistent once logged in. It is thus probably wise if the core client interface provided some conveniences to reduce the boiler-plate code needed to interact with a server.

This would include basic wrappers around GET and POST calls which supply the necessary HTTP header information for content negotiation, content encoding/ decoding

Low-level methods

There is almost certainly going to be features in one implementation of Haystack that another does not implement. For example, the WideSky implementation of Haystack includes 'first' and 'last' values for HisRead's range parameter that retrieve the first or last measurement for a given data point. SkySpark provides an interface for evaluating arbitrary expressions with its exec endpoint.

With this in mind, it should be viable to provide direct convenient access to those methods. These low-level methods should do little more than provide basic data translation, taking in arguments using standard Python types, and decoding the grid returned. If the grid has an 'err' flag, an exception should be raised.

Asynchronous vs Synchronous

pyhaystack's history as a more-or-less extension on the Jupyter notebook means that its interface is necessarily a synchronous one to facilitate interactive use.

In non-interactive use, it may be useful for pyhaystack to call on an asynchronous IO framework such as TornadoWeb to allow other tasks to take place while we wait for a response from the server. This would be particularly useful with the watchPoll method.

A lot of the methods can be characterised as:

  1. Collate and pack data.
  2. Encode arguments / data into a URI string / POST body
  3. Submit post body
  4. Receive reply
  5. Check HTTP error status of reply
  6. Decode reply
  7. Check for "error grid" (err marker)
  8. Extract data.

Most of these steps are synchronous, however steps 3 and 4 may not be. Steps 2, 5 and 6 seem to be mostly the same, and would benefit being handled in the core. In order to facilitate both asynchronous and synchronous use cases, these should be made into separate functions...

from sys import exc_info
import six

class AsyncException(object):
    '''
    A convenience for passing around exc_info responses
    and re-raising them.
    '''
    def __init__(self):
        self._exc_info = exc_info()

    def reraise(self):
        six.reraise(*self._exc_info)


class GetGridRequest(object):
    '''
    A base class for GET requests returning grids.
    '''

    def __init__(self, session, uri, query_args, callback=None):

        self._session = session
        self._uri = uri
        self._query_args = query_args
        self._callback = callback

    def submit(self):

        uri = self._session._base_uri + self._uri
        if bool(self._query_args):
            uri += '?' + encode_query(self._query_args)

        if self._callback is None:
            response = self._session._get(uri)
            return self._receive(response)
        else:
            self._session._get(uri, callback=self._receive)

    def _receive(self, response):
        try:
            # Check for bad error status
            response.raise_for_status()

            # Decode the grid
            grid = self._decode(response)
            result = self._extract(grid)
            if self._callback is None:
                return result
            else:
                self._callback(result)
        except:
            if self._callback is None:
                raise

            try:
                self._callback(AsyncException())
            except:
                # Shouldn't happen!
                pass

    def _decode(self, response):
        #decoded = '' # Referenced before assignment protection
        # content_type we get with nHaystack is Content_type : application/json; charset=UTF-8
        content_type = response.headers['Content-Type']
        if ';' in content_type:
            # Separate encoding from content type
            (content_type, encoding) = content_type.split(';',1)
            content_type = content_type.strip()
            # TODO: do we need to convert to Unicode, of so, how?

        if content_type in ('text/zinc', 'text/plain'):
            decoded = hszinc.parse(response.text, mode=hszinc.MODE_ZINC)[0]
        elif 'application/json' in content_type:
            decoded = hszinc.parse(response.text, mode=hszinc.MODE_JSON)
        else:
            raise NotImplementedError("Don't know how to parse type %s" \
                    % content_type)
        if 'err' in decoded.metadata:
            raise HaystackError(decoded.metadata.get('dis', 'Unknown error'),
                    traceback=decoded.metadata.get('traceback',None))
        return decoded


class ReadFilterMethod(GetGridRequest):
    def __init__(self, session, filter_expr, callback=None):
        super(ReadFilterMethod, self).__init__(
                session, 'read', {'filter': str(filter_expr)},
                callback=callback)

    def _extract(self, grid):
        # Return its rows
        return grid[:]

In the asynchronous case, we provide the callback function (callbacks are pretty much universal between frameworks such as TornadoWeb, Twisted, asyncio and others).

In the synchronous case, we simply return our data instead of passing it to a callback. A synchronous client simply creates the object, calls submit() on it, then returns whatever comes back.

In the asynchronous case, we call the callback with whatever happened. It is the responsibility of the callback function to handle all exceptions. If an AsyncException object is passed in, it should call its reraise method to retrieve the exception. Calling submit() will return None, and future control is up to the callback function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment