Skip to content

Instantly share code, notes, and snippets.

View RandyHarr's full-sized avatar

Randy H RandyHarr

View GitHub Profile
@RandyHarr
RandyHarr / fixFTDNAvcf.sh
Last active February 26, 2022 20:43
Shell script around fixFTDNAvcf.py script to fix FTDNA BigY VCF and then annotate with yBrowse SNP names and haplogroups
View fixFTDNAvcf.sh
#!/bin/bash
#
# Fixes FTDNA VCF file so can be processed by standard tools following the VCF standard
# Annotates the FTDNA BigY VCF file with the latest yBrowse DB entries for SNP names, yFull and ISOGG HG
#
# This is all handled behind the scenes (automagically) by WGS Extract (in the next release)
# Simply a stand-alone. simple scenario script installation for demonstration purposes here
#
# Relies on htslib bgzip and bcftools; along with wget, python rm, zip and unzip.
# Relies on access to yBrowse DB file and WGS Extract python utility fixFTDNAvcf.py
@RandyHarr
RandyHarr / fixFTDNAbam.sh
Last active February 26, 2022 21:10
For fixing FTDNA version 1 BAM files that incorrectly include a space in the QNAME field
View fixFTDNAbam.sh
#!/bin/bash
#
# Fixes FTDNA BAM version 1 files so can be processed by standard bioinformatic tools.
# Applies only to Bigy files (not needed for Bigy2 or Bigy3)
#
# This is handled behind the scenes (automagically) by WGS Extract (in the next release)
# Simply a stand-alone. simple scenario script installation for demonstration purposes here
#
# Relies on htslib bgzip and samtools; along with wget, python rm, zip and unzip.
@RandyHarr
RandyHarr / countingNs.py
Last active April 30, 2022 20:59
Python stand-alone program to analyze a FASTA Human Reference Model for runs of N (masked out) entries.
View countingNs.py
# coding: utf8
# Copyright (C) 2022 Randy Harr
#
# License: GNU General Public License v3 or later
# A copy of GNU GPL v3 should have been included in this software package in LICENSE.txt.
"""
Standalone script to process a reference model FASTA to determine the inclusion of N's in the base pair sequence.
Reads the compressed FASTA. Takes as a parameter the FASTA file. Also expects and reads in the DICT file determined
from the FASTA file name -- easier to do one pass if we have the sequence lengths. Processes all sequences.