Skip to content

Instantly share code, notes, and snippets.

@dandanxu

dandanxu/test.md Secret

Created February 24, 2016 22:26
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save dandanxu/473ac2befcfc8d68cd6d to your computer and use it in GitHub Desktop.
Save dandanxu/473ac2befcfc8d68cd6d to your computer and use it in GitHub Desktop.
test

The SolveBio Variant Explorer is a fast and easy way to explore known reference information about a specific sequence variant. You no longer need to trawl through several websites and manually enter variant information each time to get the same information. The Variant Explorer is automatically and dynamically generated for every possible variant!

Currently the Variant Explorer only supports GRCh37/hg19! Please email us at support@solvebio.com to request GRCh38/hg38/hg20.

There are a few ways to load variants into the Variant Explorer: You can use SolveBio Search. You can also go directly to a specific variant by URL. For example, both https://www.solvebio.com/variant/GRCH37-7-140453136-140453136-T and https://www.solvebio.com/variant/GRCH37-7-140453136-A-T will lead you to the same variant.

Example Variant Summary tab

The Variant Summary visualizes the sequence alteration at the reference genome level. The top part, Variant Identification, lists the following information:

  • Genome build (currently only GRCh37/hg19 is supported)
  • Chromosome
  • Start position of the variant relative to GRCh37
  • Stop/end position of the reference allele relative to GRCh37
  • Reference allele present at this chromosome,start,stop location
  • Alternate allele (or variant) that is currently being analyzed
  • Type of variant: single nucleotide variant (SNV), insertion, deletion, or multiple nucleotide substitution (substitution).
  • If the variant is a deletion or insertion, the size of that deletion or insertion.

Additionally, the variant is visualized in the context of a gene model (if the variant lies within a gene) and also the chromosome.

We also link out to variant-specific external databases and information. Each link-out, whether its searching Google or ClinVar, or pulling up the exact location/variant in UCSC or the ExAC browser, is already variant-specific for convenience.

We also generate a number of variant names and find dbSNP rsIDs for each variant. The c. and g. HGVS values are automatically generated through SolveBio's HGVS translation API. We are in the process of indexing historical names of variants to make it easier to find these in the future.

Example Gene tab

The Gene tab combines a number of reference datasets to bring back gene-specific information. If the genetic variant is in several genes, or you are using the multi-gene browser, you can toggle which gene you're looking at with the purple bar on top.

Text summaries for each gene are brought in from RefSeqGene. If the gene does not currently have a RefSeq summary, there will not be one available (please let us know if this happens!). The conditions module has data coming in from the NHGRI CGD database. This module lists all the conditions that have been associated with this gene and details about the inheritance patterns.

Finally, we list many of the known identifiers for this gene symbol, with information indexed from HGNC/HUGO.

Premium Gene-Specific Datasets

Additionally, if you have a BiomarkerBase license from Amplion, you get to see more information about gene-specific targets, biomarkers, diseases, FDA approved tests, LDTs, clinical trials, and drugs! More information available from our blog and from Amplion.

Sample Effects tab

The SolveBio Effect Predictor is a customized effect predictor similar to SnpEff, VEP, and ANNOVAR. The Effects tab displays predictions of the effects of the variant on each transcript/gene that it resides in. Currently the gene models supported on the SolveBio Variant Explorer are based on MapView files for RefSeq 104 & RefSeq 105 (files available via FTP for 104 and 105). If your variant is in multiple genes and/or multiple transcripts of the same gene, you can switch between the different transcripts using the dropdown menu.

The consequences/effects are defined by following exactly Sequence Ontology terms. Impact categories and ranks correspond to the VCF annotation spec (PDF available here).

We also have convenient variant-specific link-outs to content from other sites, such the Ensembl's VEP API JSON output and the UCSC genome browser, so that you can independently verify the predicted effects.

Example Variant classifier tab

The SolveBio Variant Classifier is a flexible automated variant classification system (API documentation for the classifier is also available). The classifier is currently applying the ACMG/CAP guidelines for the interpretation of sequence variants (Richards 2015) but can be extended and customized for different variant classification SOPs/rubrics. Contact SolveBio for information on custom classifiers.

The classifier runs through each rule in the guidelines that can be automated (by our professional judgement), and applies each rule to the variant with data from the SolveBio Data Library, as well as the output of the SolveBio Effect Predictor. Each rule is returned as "Met", "Not Met", or "To be Evaluated", with a long-form text message specific for each variant and each rule.

For example, for the variant NM_007294.3:c.528G>A (GRCH37-17-41251811-41251811-T), the classifier states for BP7 that this variant is "Synonymous variant that is not conserved (PhyloP 46-way score is -0.108669) and not in a splice region." This message is automatically generated and specific for this variant.

Example Clinical Evidence tab

If the variant is in ClinVar, we display the relevant information for each ClinVar submission as well as link-outs to the ClinVar record on the NCBI site.

Example In Silico Predictions tab

In-silico prediction algorithms can give insight to a variant when no experimental or clinical evidence exists.

  • SIFT & PolyPhen2 predict the effect of amino acid changes on protein structure and are derived from dbNSFP.
  • ada and rf scores are composite scores from dbscSNV for splice site predictions
  • InterPro protein domains are brought back from dbNSFP
  • RepeatMasker defines where repeat regions exist

We are constantly adding new in-silico predictors. Email us at support@solvebio.com if your favorite is not currently on our list.

Example Population tab

The population tab displays whether or not the variant has been seen in 1000 Genomes and ExAC, and if so, allele frequencies and counts broken down by sub-population.

Example Somatic Data tab

We've put together many of our commonly used databases & requested link-outs of somatic mutation information into one tab for convenience. There is a link-out to MSKCC's cBioPortal, all the times a variant has been described in COSMIC (pre-commercial licensing), CIViC, and counts of how many times the variant has been seen in TCGA.

Example literature tab

If a paper (with a valid pubmed ID) has been linked to a variant, details about that paper (abstract, title, authors, linkouts to PubMed) are shown.

Currently, the Literature tab shows papers from ClinVar and OMIM. We are actively expanding this source of information through curation.

Frequently Asked Questions

What is the impact category and impact rank on the effects tab?

We followed the VCF annotation spec available here: http://snpeff.sourceforge.net/VCFannotationformat_v1.0.pdf to put together impact categories (putative ranks) and impact ranks (from annotation sort order). Where the spec was internally inconsistent, we used our best judgment to assign categories and ranks.

My gene is missing a gene summary, what is going on?

We currently access RefSeqGene to display a gene summary - if the gene does not have a summary in RefSeqGene, we do not provide a gene summary. We are working on a better system to be able to provide more comprehensive gene summaries.

The coordinates of my variant seems to be different from UCSC/dbSNP/other.

Please make sure you are checking the GRCh37/hg19 coordinates. If you are checking the NC_ based HGVS g. genomic coordinates on dbSNP, make sure you are looking at the one with the lower version number.

For example, in dbSNP for rs9534262, NC_000013.11:g.32362509T>C is actually the GRCh38 coordinates. The GRCh37 coordinates are NC_000013.10:g.32936646T>C and corresponds to the SolveBio variant of GRCH37-13-32936646-32936646-C.

If there still appears to be a discrepancy, please email us at support@solvebio.com and we will look into it immediately!

Can you please add ________?

We love adding new features! Email us your wishlist at support@solvebio.com.

Can we use your HGVS translator, annotator, effect predictor, or classifier?

Yes! They are all available as part of the SolveBio Genomic Web Services (GWS). Please contact us at contact@solvebio.com for more information regarding support and custom services.

What’s next for the Variant Explorer?

We have a long list of features we’re adding:

  • Better multi-transcript support (Ensembl, summary views of the effects tab)
  • More documentation overall
  • Support for GRCh38
  • A protein tab with visualizations of the amino acid changes
  • More in-silico predictors, particularly for splicing
  • Smarter automatic systems to link new papers to variants
  • New information alerts about this variant since last time you've visited

Is there something else you want in particular? Please email us as support@solvebio.com with your wish-list!

Variant Explorer Sources

Updated on February 8, 2016

Source Data Used in Tab SolveBio Version Used
1000G Population 1000G/1.1.0-2015-01-08
BiomarkerBase Gene BiomarkerBase/1.1.3-2015-10-20
CGD Gene CGD/1.1.2-2016-01-11
CIViC Somatic Data CIViC/1.0.0-2015-04-07
ClinVar Clinical Evidence ClinVar/3.7.0-2016-01-11
COSMIC Somatic Data COSMIC/1.1.0-COSMIC71
dbNSFP In Silico Predictions dbNSFP/1.0.0-2.8
dbSNP Summary dbSNP/1.0.0-b144
dbscSnv In Silico Predictions dbscSNV/1.0.0-1.1
ExAC Population ExAC/1.2.0-r0.3
GENCODE Summary GENCODE/1.1.0-2015-01-09
HGNC Gene HGNC/2.1.2-2016-01-11
ISCN Summary ISCN/1.1.0-2015-01-05
MEDLINE Literature MEDLINE/2.1.2-2016
OMIM Gene OMIM/2.1.2-2016-01-11
RefSeqGene Gene RefSeqGene/2.0.0-2016-01-28
RepeatMasker In Silico Predictions RepeatMasker/1.0.0-4.0.6
TCGA Somatic Data TCGA/1.2.0-2015-02-11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment