The Project Haystack interface is largely based on plain GET/POST requests made over either HTTP or HTTPS. Authentication differs between versions, but common methods include custom headers (e.g. SkySpark, WideSky) and conventional HTTP authentication (nHaystack).
Having authenticated, a couple of API endpoints allow us to discover what sort of server we are talking to. It's a shame that the authentication method isn't discoverable in the same manner, but that's life.
The about
, format
and ops
endpoints are valuable for figuring out how to
best interact with the system:
about
: gives us detail such as the product name and version used to implement Haystack.formats
tells us what encoding formats are supported.ops
tells us what operations are supported.
A de-facto standard has arisen with Project Haystack in the ZINC format. This
is basically a "souped up CSV" variant, likely with a "different-enough" syntax
that would trip up a conventional CSV parser, thus the hszinc
module was
created to be able to work with it.
It seems fair to assume that ZINC is supported universally. JSON is problematic as some implementations use a non-Haystack compliant encoding for JSON. (e.g. nHaystack represents markers using a Unicode tick mark, instead of the prescribed 'm:' format.), so seeing 'application/json' and assuming it is Haystack JSON is probably not wise. Thus on identification of the server implementation, JSON should only be used on servers known to obey the current standards or known to have issues with ZINC.
The output of the ops
call could be used to enable or disable features of
the client. Thus if a particular operation is not supported by the server,
the methods implementing it should raise NotImplementedError.
Where possible, data should be passed around in native Python types. Encoding of data in requests to wire format should be done at the last possible moment and decoding of the response should be done at the earliest possible moment.
The hszinc
attempts to cater for this, providing some specialised types to
plug holes not filled by the standard Python types.
Since we're doing procedure calls over a wide area network, there is plenty of opportunity for a request to fail. It is thus imperative that we alert the caller to problems as they arrive by raising an appropriate exception class.
In the case of the requests
module in Python, there is a convenience for
doing exactly this: calling raise_for_status()
on the Response
object
provided.
Interaction with a given Haystack server is usually fairly consistent once logged in. It is thus probably wise if the core client interface provided some conveniences to reduce the boiler-plate code needed to interact with a server.
This would include basic wrappers around GET and POST calls which supply the necessary HTTP header information for content negotiation, content encoding/ decoding
There is almost certainly going to be features in one implementation of
Haystack that another does not implement. For example, the WideSky
implementation of Haystack includes 'first' and 'last' values for HisRead
's
range
parameter that retrieve the first or last measurement for a given data
point. SkySpark provides an interface for evaluating arbitrary expressions
with its exec
endpoint.
With this in mind, it should be viable to provide direct convenient access to those methods. These low-level methods should do little more than provide basic data translation, taking in arguments using standard Python types, and decoding the grid returned. If the grid has an 'err' flag, an exception should be raised.
pyhaystack
's history as a more-or-less extension on the Jupyter notebook
means that its interface is necessarily a synchronous one to facilitate
interactive use.
In non-interactive use, it may be useful for pyhaystack
to call on an
asynchronous IO framework such as TornadoWeb to allow other tasks to
take place while we wait for a response from the server. This would be
particularly useful with the watchPoll
method.
A lot of the methods can be characterised as:
- Collate and pack data.
- Encode arguments / data into a URI string / POST body
- Submit post body
- Receive reply
- Check HTTP error status of reply
- Decode reply
- Check for "error grid" (
err
marker) - Extract data.
Most of these steps are synchronous, however steps 3 and 4 may not be. Steps 2, 5 and 6 seem to be mostly the same, and would benefit being handled in the core. In order to facilitate both asynchronous and synchronous use cases, these should be made into separate functions...
from sys import exc_info
import six
class AsyncException(object):
'''
A convenience for passing around exc_info responses
and re-raising them.
'''
def __init__(self):
self._exc_info = exc_info()
def reraise(self):
six.reraise(*self._exc_info)
class GetGridRequest(object):
'''
A base class for GET requests returning grids.
'''
def __init__(self, session, uri, query_args, callback=None):
self._session = session
self._uri = uri
self._query_args = query_args
self._callback = callback
def submit(self):
uri = self._session._base_uri + self._uri
if bool(self._query_args):
uri += '?' + encode_query(self._query_args)
if self._callback is None:
response = self._session._get(uri)
return self._receive(response)
else:
self._session._get(uri, callback=self._receive)
def _receive(self, response):
try:
# Check for bad error status
response.raise_for_status()
# Decode the grid
grid = self._decode(response)
result = self._extract(grid)
if self._callback is None:
return result
else:
self._callback(result)
except:
if self._callback is None:
raise
try:
self._callback(AsyncException())
except:
# Shouldn't happen!
pass
def _decode(self, response):
#decoded = '' # Referenced before assignment protection
# content_type we get with nHaystack is Content_type : application/json; charset=UTF-8
content_type = response.headers['Content-Type']
if ';' in content_type:
# Separate encoding from content type
(content_type, encoding) = content_type.split(';',1)
content_type = content_type.strip()
# TODO: do we need to convert to Unicode, of so, how?
if content_type in ('text/zinc', 'text/plain'):
decoded = hszinc.parse(response.text, mode=hszinc.MODE_ZINC)[0]
elif 'application/json' in content_type:
decoded = hszinc.parse(response.text, mode=hszinc.MODE_JSON)
else:
raise NotImplementedError("Don't know how to parse type %s" \
% content_type)
if 'err' in decoded.metadata:
raise HaystackError(decoded.metadata.get('dis', 'Unknown error'),
traceback=decoded.metadata.get('traceback',None))
return decoded
class ReadFilterMethod(GetGridRequest):
def __init__(self, session, filter_expr, callback=None):
super(ReadFilterMethod, self).__init__(
session, 'read', {'filter': str(filter_expr)},
callback=callback)
def _extract(self, grid):
# Return its rows
return grid[:]
In the asynchronous case, we provide the callback function (callbacks are
pretty much universal between frameworks such as TornadoWeb, Twisted,
asyncio
and others).
In the synchronous case, we simply return our data instead of passing it to a
callback. A synchronous client simply creates the object, calls submit()
on
it, then returns whatever comes back.
In the asynchronous case, we call the callback with whatever happened. It is
the responsibility of the callback function to handle all exceptions. If an
AsyncException object is passed in, it should call its reraise
method to
retrieve the exception. Calling submit()
will return None
, and future
control is up to the callback function.