sjlongland/pyhaystack-api-ideas.md

## pyhaystack-api-ideas.md

      
    Raw
  

              pyhaystack-api-ideas.md
            
          
    Project Haystack API ideas

The Project Haystack interface is largely based on plain GET/POST requests made
over either HTTP or HTTPS.  Authentication differs between versions, but common
methods include custom headers (e.g. SkySpark, WideSky) and conventional HTTP
authentication (nHaystack).
Having authenticated, a couple of API endpoints allow us to discover what sort
of server we are talking to.  It's a shame that the authentication method isn't
discoverable in the same manner, but that's life.
Probing the server

The about, format and ops endpoints are valuable for figuring out how to
best interact with the system:

about: gives us detail such as the product name and version used to
implement Haystack.
formats tells us what encoding formats are supported.
ops tells us what operations are supported.

A de-facto standard has arisen with Project Haystack in the ZINC format.  This
is basically a "souped up CSV" variant, likely with a "different-enough" syntax
that would trip up a conventional CSV parser, thus the hszinc module was
created to be able to work with it.
It seems fair to assume that ZINC is supported universally.  JSON is
problematic as some implementations use a non-Haystack compliant encoding for
JSON.  (e.g.  nHaystack represents markers using a Unicode tick mark, instead
of the prescribed 'm:' format.), so seeing 'application/json' and assuming it
is Haystack JSON is probably not wise.  Thus on identification of the server
implementation, JSON should only be used on servers known to obey the current
standards or known to have issues with ZINC.
The output of the ops call could be used to enable or disable features of
the client.  Thus if a particular operation is not supported by the server,
the methods implementing it should raise NotImplementedError.
Data representation

Where possible, data should be passed around in native Python types.  Encoding
of data in requests to wire format should be done at the last possible moment
and decoding of the response should be done at the earliest possible moment.
The hszinc attempts to cater for this, providing some specialised types to
plug holes not filled by the standard Python types.
Error handling

Since we're doing procedure calls over a wide area network, there is plenty
of opportunity for a request to fail.  It is thus imperative that we alert
the caller to problems as they arrive by raising an appropriate exception
class.
In the case of the requests module in Python, there is a convenience for
doing exactly this: calling raise_for_status() on the Response object
provided.
Core methods

Interaction with a given Haystack server is usually fairly consistent once
logged in.  It is thus probably wise if the core client interface provided
some conveniences to reduce the boiler-plate code needed to interact with
a server.
This would include basic wrappers around GET and POST calls which supply the
necessary HTTP header information for content negotiation, content encoding/
decoding
Low-level methods

There is almost certainly going to be features in one implementation of
Haystack that another does not implement.  For example, the WideSky
implementation of Haystack includes 'first' and 'last' values for HisRead's
range parameter that retrieve the first or last measurement for a given data
point.  SkySpark provides an interface for evaluating arbitrary expressions
with its exec endpoint.
With this in mind, it should be viable to provide direct convenient access to
those methods.  These low-level methods should do little more than provide
basic data translation, taking in arguments using standard Python types, and
decoding the grid returned.  If the grid has an 'err' flag, an exception
should be raised.
Asynchronous vs Synchronous

pyhaystack's history as a more-or-less extension on the Jupyter notebook
means that its interface is necessarily a synchronous one to facilitate
interactive use.
In non-interactive use, it may be useful for pyhaystack to call on an
asynchronous IO framework such as TornadoWeb to allow other tasks to
take place while we wait for a response from the server.  This would be
particularly useful with the watchPoll method.
A lot of the methods can be characterised as:

Collate and pack data.
Encode arguments / data into a URI string / POST body
Submit post body
Receive reply
Check HTTP error status of reply
Decode reply
Check for "error grid" (err marker)
Extract data.

Most of these steps are synchronous, however steps 3 and 4 may not be.
Steps 2, 5 and 6 seem to be mostly the same, and would benefit being
handled in the core.  In order to facilitate both asynchronous and synchronous
use cases, these should be made into separate functions...
from sys import exc_info
import six

class AsyncException(object):
    '''
    A convenience for passing around exc_info responses
    and re-raising them.
    '''
    def __init__(self):
        self._exc_info = exc_info()

    def reraise(self):
        six.reraise(*self._exc_info)


class GetGridRequest(object):
    '''
    A base class for GET requests returning grids.
    '''

    def __init__(self, session, uri, query_args, callback=None):

        self._session = session
        self._uri = uri
        self._query_args = query_args
        self._callback = callback

    def submit(self):

        uri = self._session._base_uri + self._uri
        if bool(self._query_args):
            uri += '?' + encode_query(self._query_args)

        if self._callback is None:
            response = self._session._get(uri)
            return self._receive(response)
        else:
            self._session._get(uri, callback=self._receive)

    def _receive(self, response):
        try:
            # Check for bad error status
            response.raise_for_status()

            # Decode the grid
            grid = self._decode(response)
            result = self._extract(grid)
            if self._callback is None:
                return result
            else:
                self._callback(result)
        except:
            if self._callback is None:
                raise

            try:
                self._callback(AsyncException())
            except:
                # Shouldn't happen!
                pass

    def _decode(self, response):
        #decoded = '' # Referenced before assignment protection
        # content_type we get with nHaystack is Content_type : application/json; charset=UTF-8
        content_type = response.headers['Content-Type']
        if ';' in content_type:
            # Separate encoding from content type
            (content_type, encoding) = content_type.split(';',1)
            content_type = content_type.strip()
            # TODO: do we need to convert to Unicode, of so, how?

        if content_type in ('text/zinc', 'text/plain'):
            decoded = hszinc.parse(response.text, mode=hszinc.MODE_ZINC)[0]
        elif 'application/json' in content_type:
            decoded = hszinc.parse(response.text, mode=hszinc.MODE_JSON)
        else:
            raise NotImplementedError("Don't know how to parse type %s" \
                    % content_type)
        if 'err' in decoded.metadata:
            raise HaystackError(decoded.metadata.get('dis', 'Unknown error'),
                    traceback=decoded.metadata.get('traceback',None))
        return decoded


class ReadFilterMethod(GetGridRequest):
    def __init__(self, session, filter_expr, callback=None):
        super(ReadFilterMethod, self).__init__(
                session, 'read', {'filter': str(filter_expr)},
                callback=callback)

    def _extract(self, grid):
        # Return its rows
        return grid[:]

In the asynchronous case, we provide the callback function (callbacks are
pretty much universal between frameworks such as TornadoWeb, Twisted,
asyncio and others).
In the synchronous case, we simply return our data instead of passing it to a
callback.  A synchronous client simply creates the object, calls submit() on
it, then returns whatever comes back.
In the asynchronous case, we call the callback with whatever happened.  It is
the responsibility of the callback function to handle all exceptions.  If an
AsyncException object is passed in, it should call its reraise method to
retrieve the exception.  Calling submit() will return None, and future
control is up to the callback function.