Skip to content

Instantly share code, notes, and snippets.

@Relequestual
Last active September 4, 2019 14:58
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Relequestual/ed1ac383dedceb74b6799945a266ea1a to your computer and use it in GitHub Desktop.
Save Relequestual/ed1ac383dedceb74b6799945a266ea1a to your computer and use it in GitHub Desktop.
Search API - RC 5 Retrospective

Search API - RC 5 Retrospective

Preface

A retrospective is an agile process for continuous improvement of a team and their processes.

Firstly, thank you all for your contributions to the Search API specification, and those who spent much time leading up to the plenary implementing and raising questions! I know first-hand that open source work is hard, especially with a distributed team, committing time as and when they are able.

Thank you also to those who were able to attend the GA4GH plenary in Basel a while back. We had some great discussions, and I feel I made some great connections across other workstreams and projects.

This document aims to enable us to reflect on the Search API work to its current point, consider how we can be more effective, and outline immediate and future work. International collaboration is not a solved problem, and just as science revealed new truth, reflection can reveal or inform new or improved strategy and process.

What happened to 1.0 release?

As most of you will know, the Search API was not approved by the GA4GH Steering Committee, although it was approved by multiple review committees for approval by the SC.

Let's be honest, the Search API is a hugely ambitious and complex project, especially when coupled with a "components architecture" designed to be useable by all of the Global Alliance.

Harindra emailed the Discovery Search group on 2018/09/12 with some thoughts, and with notice that he will be stepping down from his role as Discovery co-lead.

I managed to speak with a number of individuals that were in the SC meeting session, who provided individual feedback. As I've taken on the role of lead for the Search API, I have be provided access to review the meeting minutes to guide our work moving forward for approval by SC. (I'm actively pursuing options for a co-lead.)

From Harindra's comments, and those of individuals who attended the meeting, I feel the overall concept and approach was positively met, but not fully communicated, with some operational questions remaining.

Feedback from you

If you've been involved at any level with the Search API, please please give your feedback of the process to date.

It can be as brief as you like, but I'd like this to be the chance for you to speak up unreservedly.

Please email myself directly. Emails will be confidential within the Search API and Discovery Workstream leadership. If we want to share your feedback more broadly, we'll ask.

What Now?

The next review committee meeting is in Jan 2019. I feel this probably isn't a tenable deadline, especially if we want a review period (of 1 month).

Additionally, myself and Marc are reviewing a number of possible co-leads to join me on the Search API. If you haven't already thrown your hat into the ring and would like to, please let us know ASAP.

There are a few issues to address between now and then, which require a fair amount of work. I believe the following outlines the work required for a resubmission that stands a better chance of getting past that final approval.

Driver project centric design

The original brief for the Search API came out of the Matchmaker Exchange driver project. Given the huge scope of Search in general, it makes sense to try to engage as many driver projects as are interested, to make sure key aspects are not overlooked, and that any developments will be based on driver project requirements.

The previous timescale for a first draft and review, prohibited collecting use cases from other driver projects.

If you represent a driver project, hopefully you will have already received the Discovery use case collection form recently. Please fill in and return these! It's my hope that responses will catalyse further engagement and development of the Search API.

Technical and non-technical

The existing specification needs to be reorganised to better communicate to technical and non-technical experts, implementers, and potential users.

One feedback item pre the submission was that we should consider using the IETF RFC defined key words to indicate requirement levels, commonly known as rfc2119. While this was a great idea, and something those who write specifications (myself included) are generally familiar with, rectifying this feedback created new issues, mainly the mixing normative and formative language.

Mixing normative and formative language creates specification documents that are hard to understand for non-technical people, and are hard to navigate for technical people. I feel this is one of the main reasons the specification was not approved.

Additionally, the main repo readme file, and the main specification document, didn't make finding all of the specification documents easy.

Splitting normative and formative language into separate files will be tracked using Github issue ga4gh-discovery/ga4gh-case-discovery#68.

Whole picture and detail

The Search API specification is ambitious, in line with the wildly ambitions "Internet of Genomics" long term goal, making it currently hard to grasp some core concepts at various different levels.

The core concepts of the Search API, from the high level view of a user searching using a gene, to the format used to define components, needs to be made findable and accessible.

The concepts need to be communicated much clearer. It needs to be simple for a person with any level of understanding to find what information they need in relation to the Search API. Not everyone will be looking for technical details, not everyone will be looking for super high level detail.

Reorganising the main repo readme will be tracked using Github Issue ga4gh-discovery/ga4gh-case-discovery#69.

Components

Components, blocks, modules, schemas... Given many names, these are bits of data which together form something coherent and meaningful, and can be combined with other elements to provide more detail.

The concept of the components architecture of the API format was not something anyone raised objections towards. There were however a number of concerns regarding governance of components, which wasn't addressed in the specification given how quickly it was requested.

After some discussions at the plenary, I believe I can express a governance framework that will satisfy the questions raised, and make the path forward clear.

This will be tracking using issue Github Issue ga4gh-discovery/ga4gh-case-discovery#70

What can we build?

"How does the Search API compare to Beacon?" - A frequent question, relevant due to how easy the first version of the Beacon API is to understand. The Search API can be used to implement the same functionality as the Beacon API, and this needs to be demonstrated.

We also need to demonstrate how new components could be used to extend the functionality of the Search API.

This will be tracked using issue Github Issue ga4gh-discovery/ga4gh-case-discovery#71

Security

Another common question was in regard to the open extensibility of components. I feel quite a few concerns will be eased once a governance process is documented and in place, however a security concern exists regarding allowing any additional fields.

We should note security considerations:

  • Data from remote sources should be treated as untrusted
  • Must use HTTPS - Verify certificates
  • Use standard JSON deserialisation libraries
  • Limit searching queries so as to prevent accidental Denial of Service attacks
  • Follow Cross Site Scripting (XSS) prevention rules as defined by OWASP XSS Prevention Cheat Sheet with a brief overview of rules 0, 1, 2, and 3. This is for the client side
  • Follow SQL Injection Prevention guide from OWASP. This is for the server and client side
  • Note that the two OWASP guides fall under "good practice"
  • rfc8259 The JavaScript Object Notation (JSON) Data Interchange Format - AKA the JSON specification, highlights a security consideration that is applicable

This will be tracked by issue Github Issue ga4gh-discovery/ga4gh-case-discovery#72

What After that?

Assuming all the above work is complete and the Search API is resubmitted and approved, there's no shortage of further work to do!

Ideally there will be continual component development, which may trigger new major version releases without any change to core functionality. The following items go beyond refining and defining new components.

Compliance Testing

While the JSON Schemas provided allow for testing and confirming compliance in terms of format, they don't have any association with process. How a service uses each given search component in its internal search, should be testable.

Given the dynamic nature of the Search API, a minimal "compliance" test is not terribly difficult to create. However, given the complex nature of the Search API, a fully complete compliance test suite is a challenging set of work.

If we can strategise compliance in a tiered level approach, this work should be easier to manage.

Additional consideration on testing

It's plausible that people may wish to use the Search API format for non-direct searching, like "matching" in the Matchmaker Exchange API (MME API). The Search API specification

Authentication

The Search API defines the use of a proprietary header for authentication.

If the use case presents that the Search API requires more standard authentication processes, we should consider using the Authorization header, following closer to HTTP semantics. This would allow for multiple possible authentication methods, but is more difficult to implement.

Tooling

It would be great to create some tooling which allows the creation of human readable documentation from JSON Schema files. It is expected that someone implementing this Search API would be comfortable reading YAML, but they may not yet be familiar with JSON Schema.

The Workflow Execution Service (WES) project looks like it has a nice setup for this from OpenAPI documentation. We could ask for some help on this sort of setup.

Longer Roadmap Scoping

Defining some longer term goals for the Search API will allow us to scope out a general roadmap.

We should put some considerable effort into considering what we would like the Search API to grow up to be (or facilitate), particularly capturing use cases from driver projects.

This document is also available in markdown format at https://gist.github.com/Relequestual/ed1ac383dedceb74b6799945a266ea1a and will be added to the Search API repository for safe keeping and reflection.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment