Skip to content

Instantly share code, notes, and snippets.

@abalter
Last active February 1, 2021 21:42
Show Gist options
  • Save abalter/1c2cf4337dfc225d4622c1d9c0957f1c to your computer and use it in GitHub Desktop.
Save abalter/1c2cf4337dfc225d4622c1d9c0957f1c to your computer and use it in GitHub Desktop.
.ipynb_checkpoints

Expand and Flatten VCF

./expand_and_flatten_vcf.py schema -i kaviar_100.vcf -o schema.json

./expand_and_flatten_vcf.py vcf -i kaviar_100.vcf -o expanded_vcf

Expand the INFO column and flatten multiple variants to turn a canonical VCF into a flat table. Also extract the schema. Useful for storing in a database---for instance, uploading to GCP BigQuery.

VCF Format

The canonical format for a VCF file contains 8 "fixed fields"

#CHROM POS ID REF ALT QUAL FILTER INFO

The INFO column contains key-value pairs separated by a delimiter ;.

Example from ClinVar:

ALLELEID=959428;CLNDISDB=MedGen:CN517202;CLNDN=not_provided;CLNHGVS=NC_000001.11:g.943363G>C;CLNREVSTAT=criteria_provided,_single_submitter;CLNSIG=Uncertain_significance;CLNVC=single_nucleotide_variant;CLNVCSO=SO:0001483;GENEINFO=SAMD11:148398;MC=SO:0001583|missense_variant;ORIGIN=1

Example from Kaviar:

AF=0.0000379,0.0000379;AC=1,1;AN=26378;END=10145

Also, when multiple variants are called for a single genomic coordinate, these variants are included in a single row for that coordinate are comma-delimited in that column. Associated data for these variants that might be in the INFO column, such as allele frequency (AF) are then also comma delimited. For example, the following row from Kaviar identifies three possible variants, and three associated values for the allele frequency and allele count (AC):

1	10108	.	C	CA,CCT,CT	.	.	AF=0.0000379,0.0018197,0.0003033;AC=1,48,8;AN=26378

In this case, the values for addional data

VCF Header

The VCF header lines specify the schema for the data contained in the INFO column.

Full Kaviar header:

##fileformat=VCFv4.1
##fileDate=20160209
##source=bin/makeVCF.pl
##reference=file:///proj/famgen/resources/Kaviar-160204-Public/bin/../tabixedRef/hg19.gz
##version=Kaviar-160204 (hg19)
##kaviar_url=http://db.systemsbiology.org/kaviar
##publication=Glusman G, Caballero J, Mauldin DE, Hood L and Roach J (2011) KAVIAR: an accessible system for testing SNV novelty. Bioinformatics, doi: 10.1093/bioinformatics/btr540
##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency">
##INFO=<ID=AC,Number=A,Type=Integer,Description="Allele Count">
##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in data sources">
##INFO=<ID=END,Number=.,Type=Integer,Description="End position">
##INFO=<ID=DS,Number=A,Type=String,Description="Data Sources containing allele">

Samples from ClinVar header:

##INFO=<ID=CLNDN,Number=.,Type=String,Description="ClinVar's preferred disease name for the concept specified by disease identifiers in CLNDISDB">
##INFO=<ID=CLNDNINCL,Number=.,Type=String,Description="For included Variant : ClinVar's preferred disease name for the concept specified by disease identifiers in CLNDISDB">
##INFO=<ID=CLNDISDB,Number=.,Type=String,Description="Tag-value pairs of disease database name and identifier, e.g. OMIM:NNNNNN">
##INFO=<ID=CLNDISDBINCL,Number=.,Type=String,Description="For included Variant: Tag-value pairs of disease database name and identifier, e.g. OMIM:NNNNNN">
##INFO=<ID=CLNHGVS,Number=.,Type=String,Description="Top-level (primary assembly, alt, or patch) HGVS expression.">
##INFO=<ID=CLNREVSTAT,Number=.,Type=String,Description="ClinVar review status for the Variation ID">
##INFO=<ID=CLNSIG,Number=.,Type=String,Description="Clinical significance for this single variant">

Generate Full VCF Schema from Header

While there are many standard or customary INFO fields, such as those in the documentation, custom ones are fine, as in the ClinVar example. In order to generate a full schema specification we need to parse the header rows. We combine this parsed schema with the schema for the fixed fields (constructed by hand), which is shown below.

Usage

usage: expand_and_flatten_vcf.py [-h] --input_vcf INPUT_VCF [--output_vcf OUTPUT_VCF] [--info_column_index INFO_COLUMN_INDEX]
                                 [--info_delimiter INFO_DELIMITER] [--base_schema BASE_SCHEMA]

Expand INFO column in VCF Files and ouput or write.

VCF Files have a column called INFO with 'key=vlaue' 
pairs separated by ';'. 

For example:

<example of INFO column>

Also, when multiple variants are called for a single 
genomic position, these alternates are comma-separated
in the VCF file. In these situations, the genomic position 
is repeated with the alternate variants in successive rows. 
For example:

<example of multiple variants and expanded version> 

optional arguments:
  -h, --help            show this help message and exit
  --input_vcf INPUT_VCF, -i INPUT_VCF
                        Input VCF file with INFO column as string with key-value pairs.
  --output_vcf OUTPUT_VCF, -o OUTPUT_VCF
                        Expanded VCF file
  --info_column_index INFO_COLUMN_INDEX, -x INFO_COLUMN_INDEX
                         0-indexed index of the INFO column. Default value, 
                         according to spec, is 7.
                         
  --info_delimiter INFO_DELIMITER, -d INFO_DELIMITER
                         Custom separator for INFO key-value pairs in case of some 
                         weird file. Default value, according to standard, is ";"
                         
  --base_schema BASE_SCHEMA, -b BASE_SCHEMA
                        The standard VCF format has 7 columns of data and the INFO column. 
                        The schema for these first 7 "base" columns are not in the header. 
                        This should be a JSON string containing the base schema if different 
                        than the default ones in this package.

Schema for Fixed Fields

[
  {
    "description": "Chromosome",
    "mode": "NULLABLE",
    "name": "CHROM",
    "type": "STRING"
  },
  {
    "description": "Start position (0-based). Corresponds to the first base of the string of reference bases.",
    "mode": "NULLABLE",
    "name": "POS",
    "type": "INTEGER"
  },
  {
    "description": "",
    "mode": "NULLABLE",
    "name": "ID",
    "type": "STRING"
  },
  {
    "description": "Reference bases.",
    "mode": "NULLABLE",
    "name": "REF",
    "type": "STRING"
  },
  {
    "description": "Alternate bases.",
    "mode": "NULLABLE",
    "name": "ALT",
    "type": "STRING"
  },
  {
    "description": "Phred-scaled quality score (-10log10 prob(call is wrong)). Higher values imply better quality.",
    "mode": "NULLABLE",
    "name": "QUAL",
    "type": "FLOAT"
  },
  {
    "description": "List of failed filters (if any) or \"PASS\" indicating the variant has passed all filters.",
    "mode": "NULLABLE",
    "name": "FILTER",
    "type": "STRING"
  }
]
#!/usr/bin/env python
import os
import re
import sys
import json
import textwrap
fixed_schema = [
{
"description": "Chromosome",
"mode": "NULLABLE",
"name": "CHROM",
"type": "STRING"
},
{
"description": "Start position (0-based). Corresponds to the first base of the string of reference bases.",
"mode": "NULLABLE",
"name": "POS",
"type": "INTEGER"
},
{
"description": "dbSNP ID (rs###)",
"mode": "NULLABLE",
"name": "ID",
"type": "STRING"
},
{
"description": "Reference bases.",
"mode": "NULLABLE",
"name": "REF",
"type": "STRING"
},
{
"description": "Alternate bases.",
"mode": "NULLABLE",
"name": "ALT",
"type": "STRING"
},
{
"description": "Phred-scaled quality score (-10log10 prob(call is wrong)). Higher values imply better quality.",
"mode": "NULLABLE",
"name": "QUAL",
"type": "STRING"
},
{
"description": "List of failed filters (if any) or \"PASS\" indicating the variant has passed all filters.",
"mode": "NULLABLE",
"name": "FILTER",
"type": "STRING"
}
]
class VCF_INFO_EXPANDER():
fixed_schema = [
{
"description": "Chromosome",
"mode": "NULLABLE",
"name": "CHROM",
"type": "STRING"
},
{
"description": "Start position (0-based). Corresponds to the first base of the string of reference bases.",
"mode": "NULLABLE",
"name": "POS",
"type": "INTEGER"
},
{
"description": "dbSNP ID (rs###)",
"mode": "NULLABLE",
"name": "ID",
"type": "STRING"
},
{
"description": "Reference bases.",
"mode": "NULLABLE",
"name": "REF",
"type": "STRING"
},
{
"description": "Alternate bases.",
"mode": "NULLABLE",
"name": "ALT",
"type": "STRING"
},
{
"description": "Phred-scaled quality score (-10log10 prob(call is wrong)). Higher values imply better quality.",
"mode": "NULLABLE",
"name": "QUAL",
"type": "STRING"
},
{
"description": "List of failed filters (if any) or \"PASS\" indicating the variant has passed all filters.",
"mode": "NULLABLE",
"name": "FILTER",
"type": "STRING"
}
]
def __init__(
self,
input_vcf_file="",
output_vcf_file=None,
info_column_index=7,
info_delimiter = ";",
fixed_schema=fixed_schema
):
# print("__init__")
self.input_vcf_file = input_vcf_file
self.output_vcf_file = output_vcf_file
self.info_column_index = info_column_index
self.fixed_schema = json.loads(fixed_schema)
# print(self.fixed_schema)
self.info_delimiter = info_delimiter
self.info_schema = self.parseVCF_Schema()
# print(self.info_schema)
self.full_schema = self.fixed_schema + self.info_schema
self.base_fields = [var["name"] for var in self.fixed_schema]
self.info_fields = [var["name"] for var in self.info_schema]
self.all_fields = self.base_fields + self.info_fields
# print(self.info_schema)
def parseVCF_Schema(self):
# print("parseVCF_Schema")
vcf_file = self.input_vcf_file
info_column_index = self.info_column_index
info_schema = []
with open(vcf_file) as vcf:
for line in vcf:
### Capture lines that have field info
### They look like with "##INFO=<k=v,k=v, ...>"
if bool(re.search("^##INFO", line)):
info_data = re.sub("^##INFO=<(.*)>", r"\1", line).strip()
### regex to split by commas, but only outside of quotes
regex = r",(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)"
kv_pairs = re.split(regex, info_data)
info_dict = dict(item.split("=") for item in kv_pairs)
### Rename necessary fields to match
### BigQuery schema fields
### Description --> description
### Type --> type
### ID --> name
### Number --> .=Nullable, 1=Repeatable
info_dict['description'] = info_dict['Description']
del info_dict['Description']
info_dict['type'] = info_dict['Type']
del info_dict['Type']
info_dict['name'] = info_dict['ID']
del info_dict['ID']
if info_dict['Number'] == '.':
info_dict['mode'] = 'NULLABLE'
else:
info_dict['mode'] = 'NULLABLE'
del info_dict['Number']
### Add parsed schema to base schema
info_schema.append(info_dict )
### ignore lines that are header lines but not field info
elif bool(re.search("^##", line)):
pass
### done with header, stop reading. Prevents reading through
### entire file.
else:
break
return info_schema
def getNumHeaderLines(self):
# print("getNumHeaderLines")
filename = self.input_vcf_file
num_header_lines = 0
with open(filename) as vcf:
for line in vcf:
### capture lines that have field info
if bool(re.search("^##", line)):
# print(line)
num_header_lines += 1
### done with header, stop reading
else:
break
return num_header_lines
def expandInfoData(self, info):
# print("expandInfoData")
fields = self.info_fields
kv_pairs = [pair.split("=") for pair in info.split(";")]
info_dict = {kv[0]:kv[1] for kv in kv_pairs}
if not fields:
fields = info_dict.keys()
info_dict = {k:info_dict.get(k, ".").split(",") for k in fields}
return info_dict
def splitRowDict(
self,
row_dict,
alt_sep=",",
alt_field="ALT"
):
# print("splitRowDict")
num_alts = len(row_dict[alt_field])
row_dicts = [{}]*num_alts
for i in range(num_alts):
row_dicts[i] = {k:v[ min(i, len(v)-1) ] for k,v in row_dict.items()}
return row_dicts
def convertRowStringToRowDict(
self,
row_string
):
# print("convertRowStringToRowDict")
info_column_index = self.info_column_index
info_delimiter = self.info_delimiter
info_fields = self.info_fields
base_fields = self.base_fields
values = row_string.strip().split("\t")
info_data = values.pop(info_column_index)
row_dict = self.expandInfoData(info_data)
for i in range(len(base_fields)):
row_dict[base_fields[i]] = values[i].split(",")
return row_dict
def writeRowDict(self,row_dict):
row_string = ""
for field in self.all_fields:
value = row_dict[field]
row_string += "\t" if value is "." else (value + "\t")
row_string += "\n"
# fields = self.all_fields
# values = [row_dict[field] for field in fields]
# print(values)
# values = ["" if value is "." else value for value in [row_dict[field] for field in self.all_fields]]
# print(values)
# row_string = "\t".join(values) + "\n"
# print(row_string)
self.outfile.write(row_string)
def expandAndFlatten(self):
# print("expandAndFlatten")
info_column_index = self.info_column_index
info_schema = self.info_schema
fixed_schema = self.fixed_schema
infilename = self.input_vcf_file
outfilename = self.output_vcf_file
base_fields = self.base_fields
info_fields = self.info_fields
all_fields = self.all_fields
if outfilename is None:
self.outfile = sys.stdout
else:
self.outfile = open(outfilename, "w")
### Write header
dummy = self.outfile.write("\t".join(all_fields) + "\n")
num_header_lines = self.getNumHeaderLines()
with open(infilename) as file:
### Skip header
for _ in range(num_header_lines+1):
dummy = next(file)
### Start reading
for line in file:
# print(line)
row_dict = self.convertRowStringToRowDict(row_string=line)
row_dicts = self.splitRowDict(row_dict)
for row_dict in row_dicts:
self.writeRowDict(row_dict)
dummy = self.outfile.close()
def getSchema(self):
# print("getSchema")
if self.output_vcf_file is None:
self.outfile = sys.stdout
else:
self.outfile = open(self.output_vcf_file, "w")
dummy = self.outfile.write(json.dumps(self.full_schema, indent=2))
if __name__ == "__main__":
import argparse
parser = argparse.ArgumentParser(
description = """\
Expand INFO column in VCF Files and ouput or write.
VCF Files have a column called INFO with 'key=vlaue'
pairs separated by ';'.
For example:
<example of INFO column>
Also, when multiple variants are called for a single
genomic position, these alternates are comma-separated
in the VCF file. In these situations, the genomic position
is repeated with the alternate variants in successive rows.
For example:
<example of multiple variants and expanded version>
""",
formatter_class=argparse.RawTextHelpFormatter
)
parser.add_argument('operation',
type = str,
help = """\
Which operation to perform. To export an expanded and flattend
vcf file, use "vcf". To export the full schema use "schema". The
default value is "vcf" if not specified.
""",
choices = ["vcf", "schema"],
default = "vcf"
)
parser.add_argument('--input_vcf', '-i',
required=True,
type=str,
help="Input VCF file with INFO column as string with key-value pairs.",
default = None
)
parser.add_argument('--output_vcf', '-o',
required=False,
type=str,
help="Expanded VCF file",
default=None
)
parser.add_argument('--info_column_index', '-x',
required=False,
type=int,
help="""\
0-indexed index of the INFO column. Default value,
according to spec, is 7.
""",
default = 7
)
parser.add_argument('--info_delimiter', '-d',
required=False,
type=str,
help="""\
Custom separator for INFO key-value pairs in case of some
weird file. Default value, according to standard, is \";\"
""",
default=";"
)
parser.add_argument('--fixed_schema', '-b',
required=False,
type=str,
help="""\
The standard VCF format has 7 columns of data and the INFO column.
The schema for these first 7 \"base\" columns are not in the header.
This should be a JSON string containing the base schema if different
than the default ones in this package.
Check `VCF_INFO_EXPANDER.fixed_schema`
""",
default=json.dumps(fixed_schema)
)
args = parser.parse_args()
operation = args.operation
print("operation", operation)
input_vcf_file = args.input_vcf
# print("input_vcf_file", input_vcf_file)
output_vcf_file = args.output_vcf
# print("output_vcf_file", output_vcf_file)
info_delimiter = args.info_delimiter
# print("info_delimiter", info_delimiter)
info_column_index = args.info_column_index
# print("info_column_index", info_column_index)
fixed_schema = args.fixed_schema
if fixed_schema is None:
fixed_schema = json.dumps(fixed_schema)
# print("fixed_schema", fixed_schema)
expander = VCF_INFO_EXPANDER(
input_vcf_file=input_vcf_file,
output_vcf_file=output_vcf_file,
info_column_index=info_column_index,
info_delimiter=info_delimiter,
fixed_schema=fixed_schema
)
if operation == "vcf":
expander.expandAndFlatten()
else:
expander.getSchema()
##fileformat=VCFv4.1
##fileDate=20160209
##source=bin/makeVCF.pl
##reference=file:///proj/famgen/resources/Kaviar-160204-Public/bin/../tabixedRef/hg19.gz
##version=Kaviar-160204 (hg19)
##kaviar_url=http://db.systemsbiology.org/kaviar
##publication=Glusman G, Caballero J, Mauldin DE, Hood L and Roach J (2011) KAVIAR: an accessible system for testing SNV novelty. Bioinformatics, doi: 10.1093/bioinformatics/btr540
##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency">
##INFO=<ID=AC,Number=A,Type=Integer,Description="Allele Count">
##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in data sources">
##INFO=<ID=END,Number=.,Type=Integer,Description="End position">
##INFO=<ID=DS,Number=A,Type=String,Description="Data Sources containing allele">
#CHROM POS ID REF ALT QUAL FILTER INFO
1 10001 . T C . . AF=0.0000379;AC=1;AN=26378
1 10002 . A C,T . . AF=0.0001137,0.0000379;AC=3,1;AN=26378
1 10002 . A AT . . AF=0.0000379;AC=1;AN=26378
1 10003 . A C,T . . AF=0.0000379,0.0000758;AC=1,2;AN=26378
1 10004 . C A . . AF=0.0000379;AC=1;AN=26378
1 10018 . C T . . AF=0.0000379;AC=1;AN=26378
1 10019 rs775809821 TA T . . AF=0.0000379;AC=1;AN=26378;END=10020
1 10055 rs768019142 T TA . . AF=0.0000379;AC=1;AN=26378
1 10108 rs62651026 C T . . AF=0.0000758;AC=2;AN=26378
1 10108 . C CA,CCT,CT . . AF=0.0000379,0.0018197,0.0003033;AC=1,48,8;AN=26378
1 10109 rs376007522 A T . . AF=0.0006445;AC=17;AN=26378
1 10114 . T C . . AF=0.0000379;AC=1;AN=26378
1 10114 . T TA . . AF=0.0007203;AC=19;AN=26378
1 10122 . A C . . AF=0.0000379;AC=1;AN=26378
1 10128 rs796688738 A AC . . AF=0.0000379;AC=1;AN=26378
1 10139 rs368469931 A T . . AF=0.0000379;AC=1;AN=26378
1 10140 . A AC . . AF=0.0003412;AC=9;AN=26378
1 10144 rs144773400 TA T,TT . . AF=0.0000379,0.0000379;AC=1,1;AN=26378;END=10145
1 10146 rs779258992 AC AA,A . . AF=0.0002654,0.0020851;AC=7,55;AN=26378;END=10147
1 10150 rs371194064 C T . . AF=0.0003033;AC=8;AN=26378
1 10153 . A AC . . AF=0.0004928;AC=13;AN=26378
1 10165 rs796884232 A AC . . AF=0.0000379;AC=1;AN=26378
1 10168 . C T . . AF=0.0000379;AC=1;AN=26378
1 10174 . C T . . AF=0.0000758;AC=2;AN=26378
1 10175 . T A . . AF=0.0000758;AC=2;AN=26378
1 10175 . T TTA . . AF=0.0001137;AC=3;AN=26378
1 10177 rs201752861 A C . . AF=0.0010236;AC=27;AN=26378
1 10177 rs367896724 A AC,AT . . AF=0.0835545,0.0000379;AC=2204,1;AN=26378
1 10179 . C CCT . . AF=0.0001516;AC=4;AN=26378
1 10180 rs201694901 T C . . AF=0.0009098;AC=24;AN=26378
1 10200 . A AC . . AF=0.0001516;AC=4;AN=26378
1 10201 . CCCT C . . AF=0.0001137;AC=3;AN=26378;END=10204
1 10204 . TA T . . AF=0.0001516;AC=4;AN=26378;END=10205
1 10228 rs143255646 TA T . . AF=0.0000379;AC=1;AN=26378;END=10229
1 10228 rs200462216 TAACCCCTAACCCTAACCCTAAACCCTA T . . AF=0.0000379;AC=1;AN=26378;END=10255
1 10230 rs200279319 AC AA,A . . AF=0.0002654,0.0048525;AC=7,128;AN=26378;END=10231
1 10234 rs145599635 C T . . AF=0.0009098;AC=24;AN=26378
1 10235 . T A . . AF=0.0006445;AC=17;AN=26378
1 10235 rs540431307 T TA . . AF=0.0002275;AC=6;AN=26378
1 10237 . A C . . AF=0.0000379;AC=1;AN=26378
1 10240 . C CT . . AF=0.0000758;AC=2;AN=26378
1 10241 . T TA . . AF=0.0007203;AC=19;AN=26378
1 10243 . A AC . . AF=0.0000379;AC=1;AN=26378
1 10247 rs796996180 T C . . AF=0.0001137;AC=3;AN=26378
1 10247 rs148908337 TA T,TT . . AF=0.0001516,0.0007961;AC=4,21;AN=26378;END=10248
1 10249 rs774211241 AAC A . . AF=0.0015164;AC=40;AN=26378;END=10251
1 10250 rs199706086 A C . . AF=0.0007582;AC=20;AN=26378
1 10254 . T C . . AF=0.0000758;AC=2;AN=26378
1 10254 rs140194106 TA T,TT . . AF=0.0001137,0.0006066;AC=3,16;AN=26378;END=10255
1 10257 rs111200574 A C . . AF=0.0008719;AC=23;AN=26378
1 10259 rs200940095 C A . . AF=0.0000379;AC=1;AN=26378
1 10261 . TA T . . AF=0.0001137;AC=3;AN=26378;END=10262
1 10268 . A C . . AF=0.0000379;AC=1;AN=26378
1 10274 . A C . . AF=0.0000379;AC=1;AN=26378
1 10279 . T C . . AF=0.0000379;AC=1;AN=26378
1 10280 . A C . . AF=0.0000379;AC=1;AN=26378
1 10285 . T C . . AF=0.0003791;AC=10;AN=26378
1 10286 . A C . . AF=0.0000379;AC=1;AN=26378
1 10291 rs145427775 C T . . AF=0.0008719;AC=23;AN=26378
1 10297 . C T . . AF=0.0003791;AC=10;AN=26378
1 10298 . A T . . AF=0.0000379;AC=1;AN=26378
1 10309 . C G,T . . AF=0.0000379,0.0000379;AC=1,1;AN=26378
1 10315 . C T . . AF=0.0001896;AC=5;AN=26378
1 10321 . C T . . AF=0.0004549;AC=12;AN=26378
1 10327 rs112750067 T C . . AF=0.0005307;AC=14;AN=26378
1 10327 . TA T,TT . . AF=0.0000379,0.0000379;AC=1,1;AN=26378;END=10328
1 10328 rs201106462 AACCCCTAACCCTAACCCTAACCCT A . . AF=0.0000379;AC=1;AN=26378;END=10352
1 10329 rs150969722 AC AA,A . . AF=0.0002654,0.0007582;AC=7,20;AN=26378;END=10330
1 10333 . C T . . AF=0.0000379;AC=1;AN=26378
1 10333 . CT C . . AF=0.0000758;AC=2;AN=26378;END=10334
1 10348 . A C . . AF=0.0000379;AC=1;AN=26378
1 10351 . C T . . AF=0.0000758;AC=2;AN=26378
1 10352 . T A . . AF=0.0009098;AC=24;AN=26378
1 10352 rs145072688 T TA . . AF=0.0871181;AC=2298;AN=26378
1 10353 . A AAC . . AF=0.0001137;AC=3;AN=26378
1 10354 . C A . . AF=0.0001137;AC=3;AN=26378
1 10357 . T C . . AF=0.0000379;AC=1;AN=26378
1 10377 . A AC . . AF=0.0000379;AC=1;AN=26378
1 10383 rs147093981 A AC . . AF=0.0002275;AC=6;AN=26378
1 10389 rs766767872 AC AA,A . . AF=0.0001516,0.0043218;AC=4,114;AN=26378;END=10390
1 10393 . C T . . AF=0.0004170;AC=11;AN=26378
1 10394 . TA T . . AF=0.0001516;AC=4;AN=26378;END=10395
1 10396 . AC AA,A . . AF=0.0000758,0.0007582;AC=2,20;AN=26378;END=10397
1 10400 . C T . . AF=0.0001516;AC=4;AN=26378
1 10401 . TA T . . AF=0.0002275;AC=6;AN=26378;END=10402
1 10409 . A C . . AF=0.0000379;AC=1;AN=26378
1 10421 . A AC . . AF=0.0002275;AC=6;AN=26378
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment