Skip to content

Instantly share code, notes, and snippets.

View jsstevenson's full-sized avatar

James Stevenson jsstevenson

View GitHub Profile
{
"pre_mapped": {
"id": "ga4gh:VA.FLe4-pSUs7vjdVtVD4TmUNL4JhrBbqTd",
"type": "Allele",
"extensions": [
{
"name": "vrs_ref_allele_seq",
"value": "Y"
}
],
{
"schemaVersion": 1,
"label": "QC",
"message": "N/A",
"color": "white",
"logoSvg": "<svg id=\"Layer_1\" data-name=\"Layer 1\" xmlns=\"http://www.w3.org/2000/svg\" viewBox=\"0 0 135.54 133.3\"><defs><style>.cls-1{fill:#fff;}.cls-2{fill:#231f20;}</style></defs><path class=\"cls-1\" d=\"M85.31,20.58l-4-16.64s0,0,0,0c-1.11-.14-2.21-.26-3.33-.35s-2.24-.15-3.36-.18-2.26,0-3.4,0c-.82,0-1.66.06-2.49.11L63.21,19.66c-.22,0-.43.11-.65.15l0,.11c-1.58.3-3.13.68-4.65,1.12s-3,1-4.45,1.53L40.52,11.64c-.58.32-1.17.63-1.74,1-1,.56-1.91,1.14-2.83,1.74s-1.84,1.23-2.73,1.88-1.77,1.31-2.63,2l-.73.6L34.72,35c-.15.16-.28.33-.42.48v0c-1.12,1.18-2.17,2.42-3.17,3.7s-1.95,2.59-2.84,4h-.06v0l-16.83-1c-.31.62-.62,1.23-.91,1.85-.46,1-.88,2-1.29,3.07-.28.69-.53,1.39-.79,2.09-.12.35-.26.7-.38,1.05-.36,1-.69,2.1-1,3.17-.1.33-.17.68-.26,1l13.4,10.13a54.29,54.29,0,0,0-.56,10.73L5.25,84.54c.09.52.16,1,.26,1.55.21,1.08.44,2.16.7,3.23s.54,2.14.85,3.2.65,2.12,1,3.16c.14.42.31.82.46,1.24h0L25.22,97l0,0h.13a51.6,51.6,0,0,0,5.88,9
@jsstevenson
jsstevenson / tmp-dashboard-results.json
Created May 27, 2024 17:12
tmp-dashboard-results.json
{
"oboscore": {
"dashboard_score_max_impact": {
"dashboard": 1,
"impact": 1,
"impact_external": 3,
"no_base": 5,
"overall_error": 20,
"overall_info": 5,
"overall_warning": 10,

notes

Each kind of response is slightly different, but this tries to make them more consistent in a few ways:

  • No more gene vs normalized gene object. Everything is a GA4GH core Gene. This means no more associated_with vs xref, so one less kind of MatchType.
  • The outermost level includes the query, additional parameters passed to the API endpoint (I think (...?) this is good practice to include) and service information
  • The outermost level also includes a match key that points to what the individual Python QueryHandler methods would return. IMO it makes more sense to move this stuff into the REST API response because these are things that you don't otherwise typically include in Python-to-Python methods (e.g. another class doesn't need to know what version of Gene Normalizer is running, it's literally sharing the environment).
  • match objects include source metadata and warnings. In some of the responses, we have previously included source metadata closer to the actual source matches, b

dgidb-example

2023-12-15

Querying DGIdb GraphQL API

Load packages:

library("ghql")
@jsstevenson
jsstevenson / fusor_test_failures.txt
Last active October 31, 2023 23:42
fusor test failures
[ issue-132-tests ⚙ venv] ~/code/fusor % ipython
Python 3.11.0 (main, Sep 6 2023, 12:48:52) [Clang 14.0.3 (clang-1403.0.22.14.1)]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.16.1 -- An enhanced Interactive Python. Type '?' for help.
In [1]: from gene.database import create_db
In [2]: db = create_db()
In [3]: db.get_source_metadata("NCBI")
We can make this file beautiful and searchable if this error is corrected: Unclosed quoted field in line 2.
normalized_id normalized_xrefs normalized_strand normalized_hgnc_types normalized_ensembl_types normalized_ncbi_types normalized_hgnc_locations normalized_ensembl_locations normalized_ncbi_locations incoming_concept_id incoming_strand incoming_locations incoming_gene_type
hgnc:30046 ['ensembl:ENSG00000254093', 'ensembl:ENSG00000258724', 'ncbigene:54984'] - ['gene with protein product'] ['protein_coding'] [{'type': 'ChromosomeLocation', 'species_id': 'taxonomy:9606', 'chr': '8', 'start': 'p23.1', 'end': 'p23.1'}] [{'type': 'SequenceLocation', 'start': 10725398, 'end': 10839847, 'sequence_id': 'ga4gh:SQ.209Z7zJ-mFypBEWLk4rNC6S_OxY5p7bs'}] ncbigene:54984 + "[{""type"": ""ChromosomeLocation"", ""species_id"": ""taxonomy:9606"", ""chr"": ""8"", ""start"": ""p23.1"", ""end"": ""p23.1""}, {""type"": ""SequenceLocation"", ""start"": 10764960, ""end"": 10839875, ""sequence_id"": ""ga4gh:SQ.209Z7zJ-mFypBEWLk4rNC6S_OxY5p7bs""}, {""type"": ""SequenceLocation"", ""start"": 2507387, ""end"": 2582240, ""sequence_id"": ""g
('ncbigene:100131223', {'src_name': 'NCBI', 'concept_id': 'ncbigene:100131223', 'symbol': 'LOC100131223', 'strand': '-', 'locations': [{'type': <VRSTypes.SEQUENCE_LOCATION: 'SequenceLocation'>, 'start': 28120833, 'end': 28121467, 'sequence_id': 'ga4gh:SQ.Ya6Rs7DHhDeg7YaOSg1EoNi3U_nQ9SvO'}], 'label_and_type': 'ncbigene:100131223##identity', 'item_type': 'identity'})
('ncbigene:440585', {'src_name': 'NCBI', 'xrefs': ['hgnc:34347'], 'concept_id': 'ncbigene:440585', 'symbol': 'FAM183A', 'strand': '+', 'locations': [{'type': <VRSTypes.SEQUENCE_LOCATION: 'SequenceLocation'>, 'start': 43142961, 'end': 43156396, 'sequence_id': 'ga4gh:SQ.Ya6Rs7DHhDeg7YaOSg1EoNi3U_nQ9SvO'}], 'label_and_type': 'ncbigene:440585##identity', 'item_type': 'identity'})
('ncbigene:51668', {'src_name': 'NCBI', 'xrefs': ['hgnc:25019'], 'concept_id': 'ncbigene:51668', 'symbol': 'HSPB11', 'strand': '-', 'locations': [{'type': <VRSTypes.SEQUENCE_LOCATION: 'SequenceLocation'>, 'start': 53911575, 'end': 53946305, 'sequence_id': 'ga4gh:SQ.Ya6Rs7DHhDe
(gene-normalization) ~/code/gene-normalization (issue-112) % ipython
Python 3.9.12 (main, Mar 26 2022, 15:51:15)
Type 'copyright', 'credits' or 'license' for more information
IPython 8.2.0 -- An enhanced Interactive Python. Type '?' for help.
In [2]: import boto3
In [3]: g = boto3.resource("dynamodb").Table("gene_concepts")
In [4]: from boto3.dynamodb.conditions import Key
//
// top level qs:
// * any Array-style components require members?
// * any scalar-style components require values? (ie can be Optional?)
const example = {
// ** reading frame preserved
// mandatory?
// how to (if needed?) represent 'unknown'?
// not required to be included -- fusor // null
// curation tool -- should be yes or no