Each kind of response is slightly different, but this tries to make them more consistent in a few ways:
- No more gene vs normalized gene object. Everything is a GA4GH core
Gene
. This means no moreassociated_with
vsxref
, so one less kind ofMatchType
. - The outermost level includes the query, additional parameters passed to the API endpoint (I think (...?) this is good practice to include) and service information
- The outermost level also includes a
match
key that points to what the individual PythonQueryHandler
methods would return. IMO it makes more sense to move this stuff into the REST API response because these are things that you don't otherwise typically include in Python-to-Python methods (e.g. another class doesn't need to know what version of Gene Normalizer is running, it's literally sharing the environment). match
objects include source metadata and warnings. In some of the responses, we have previously included source metadata closer to the actual source matches, but I think it might be simpler to just keep them in the same place no matter what kind of response is returning.- Warnings are a bit more standardized. We should define (enumerate) legitimate warning types as needed (descriptions can vary based on specifics)
The awkward part is that each type of search has a different relationship between the returned objects and the match_type. In search
, match_type
is given to a source, and we return every gene with that match_type
under that source*. In normalize
, match_type
is associated to a single normalized gene. In normalize_unmerged
, we return a bunch of genes under a bunch of sources, but the match_type
corresponds to the match for the normalized gene that groups those genes together. Anyway, this makes it hard to set a good, consistent place to hold the match_type
, since its semantics differ slightly across each source. I'd also like to avoid putting it directly into the individual Gene
objects -- would much prefer to stick to the VRS/GA4GH models.
* We have an issue somewhere to return all matches and match types for every source. I think we should do this eventually, and should plan accordingly now even if we don't implement it
system
. In GK-Pilot data, Alex had used URLs (i.e.https://ncit.nci.nih.gov/ncitbrowser/ConceptReport.jsp?dictionary=NCI_Thesaurus&code=
). Do we want to do this?