Skip to content

Instantly share code, notes, and snippets.

Last active May 13, 2020 17:40
Show Gist options
  • Save elucify/e7aeababac92fa9f4070 to your computer and use it in GitHub Desktop.
Save elucify/e7aeababac92fa9f4070 to your computer and use it in GitHub Desktop.
ClinVar sample


This directory contains:

  • ClinVar ( dataset reports, and ClinVar development documents
  • documents related to the NCBI collaboration with ClinGen (
  • - how to apply for expert panel status
  • data common to ClinVar and GTR
  • - terminology used by both GTR and ClinVar.

Go to: ClinVar Home - [Submit Data to ClinVar] ( - Genetic Testing Registry Home



You may submit data to ClinVar using Excel spreadsheets or XML files.

  • Excel Submission Templates
  • - standard submission template
  • - for submissions with less supporting evidence
  • - beta version of updated standard template (please use standard template if you are time-constrained)
  • XML Submission Schema Files
  • - current XML submission document schema
  • - folder of previous schema versions
  • Please direct XML data submission questions to

ClinVar Data Downloads


URL: Format: tab-separated values Updated: daily

Reports names and identifiers used in GTR and ClinVar. When a name is used by more than one source, there may be more than one line per condition. Unlike the gene_condition_source_id file, it is comprehensive, and does not require knowledge of any gene-to-disease relationship.


Col Name Description
1 DiseaseName The name preferred by GTR and ClinVar
2 SourceName Sources that also use this preferred name
3 ConceptID The identifier assigned to a disorder associated with this gene (1)
4 SourceID Identifier used by the source reported in column 2
5 DiseaseMIM MIM number for the condition
6 LastUpdated Last time this record was modified by NCBI staff


(1) If the value starts with a C and is followed by digits, the ConceptID is a value from UMLS; if a value begins with CN, it was created by NCBI-based processing.


URL: Format: tab-separated values Updated: daily

Reports gene-disease relationships used in ClinVar, Gene, GTR and MedGen. The sources of information for the gene-disease relationships include OMIM, GeneReviews, and a limited amount of curation by NCBI staff. The scope of disorders reported in this file is a subset of the disease_names file because a gene-to-disease relationship is required.


Col Name Description
1 GeneID the NCBI GeneID
2 GeneSymbol the preferred symbol corresponding to the GeneID
3 ConceptID The identifier assigned to a disorder associated with this gene (1)
4 SourceName Sources that use this name
5 SourceID The identifier used by this source
6 DiseaseMIM MIM number for the condition
7 LastUpdated Last time this record was modified by NCBI staff


(1) If the value starts with a C and is followed by digits, the ConceptID is a value from UMLS; if a value begins with CN, it was created by NCBI-based processing DiseaseName full name for the condition


URL: Format: tab-separated values Updated: daily

Tracks changes in identifiers assigned to phenotypes over time. The ConceptID values in the first column are no longer active, and are either discontinued (the value in column 2 is 'No longer reported'), or replaced by a record with a different identifier. A replacement may be either the result of a merge (one record becoming secondary to another) or because of a change in numbering, usually because an identifier assigned by NCBI (starting with CN) is now thought to be represented by a ConceptID from UMLS (starting with C followed by numerals).


Col Name Description
1 Previous ConceptID the outdated identifier
2 Current ConceptID the current identifier
3 Date of Action the date this change occurred


Subdirectory Description (notes)
presentations slides or other documents about ClinVar
submission_templates templates for submission by spreadsheet
tab_delimited summary data of several types
vcf_GRCh37 vcf files generated by dbSNP based on GRCh37/hg19 (2)
vcf_GRCh38 vcf files generated by dbSNP based on GRCh38/hg38 (1,2)
xml An extraction of data in ClinVar as xml (3)
xsd_public current and previous versions of XSD schema files for XML data


(1) For more about the conventions used to process and report the vcf data, see also:

(2) Please note that until the new data from 1000 Genomes are processed, there will be no files in GRCh38 coordinates that report common variants (common_all.vcf.gz) or common variants not known to contribute to phenotype (common_no_known_medical_impact-latest.vcf). These are available only in the vcf_GRCh37 subdirectory. _Note: This notice should be in the README for GRCh38!

(3) The schema for the files in the xml directory is

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment