Skip to content

Instantly share code, notes, and snippets.

View andrewyatz's full-sized avatar

Andrew Yates andrewyatz

  • EMBL-EBI
  • Cambridge, UK
View GitHub Profile
@andrewyatz
andrewyatz / gap_finder.py
Last active March 9, 2022 17:02
Very basic code to find gaps in a single FASTA record. Min gap size is 10bp
#!/usr/bin/python3
import gzip
import sys
filename = sys.argv[1]
file = gzip.open(filename, "rt")
position = 0
contigous_n = 0

Genome provider analysis

This is a selection of data and schema which can be used to investigate similarities/differences between the provided genomes files for GRCh38/hg38.

@andrewyatz
andrewyatz / countries_to_rough_map.csv
Last active November 3, 2020 09:34
An attempt to map countries to some kind of designation. For use in Google Analytics
country code
Afghanistan Asia
Albania E
Algeria Africa
Andorra E
Angola Africa
Antigua & Barbuda CA
Argentina SA
Armenia Asia
Aruba EU
@andrewyatz
andrewyatz / refget-big.png
Last active January 22, 2020 16:32
A possible cave entrance for refget. Want to move into a more central place
refget-big.png
@andrewyatz
andrewyatz / genome_search_def.yaml
Created July 30, 2019 12:45
An example OpenAPI schema for an API
openapi: 3.0.0
info:
title: Species Search API
description: An API for performing species searches to find species of question
version: 1.0.0
tags:
- name: search
description: Search for genomes/species
paths:
"/api/genome_search":
@andrewyatz
andrewyatz / refget-metadata.schema.1.0.0.json
Created May 28, 2019 15:13
JSON schema attempt to describe the refget metadata payload
{
"$schema": "http://json-schema.org/draft-07/schema",
"$id": "http://github.com/samtools/hts-specs/pub/refget-metadata.schema.1.0.0.json",
"title": "Refget Metadata",
"description": "Holds metadata pertaining to a record within the refget protocol",
"type" : "object",
"properties": {
"id" : {
"description": "The checksum identifier for a given record (the default)",
"type" : "string"
@andrewyatz
andrewyatz / RungabGPX.py
Created May 16, 2019 12:59 — forked from hutattedonmyarm/RungapGPX.py
Rungap for iOS exports GPS data as JSON file in the free version. This script converts it to GPX. Tested with python3.6. Might need to install required modules. Simply place either the metadata and data json files, or the complete zip file in the same directoryas the script and run it. Warning: Does barely any error checking
import xml.etree.cElementTree as ElementTree
import json, pytz, zipfile, unicodedata, re
from datetime import datetime
from os import listdir
from os.path import isfile, join
import glob
def slugify(value):
"""
Normalizes string, converts to lowercase, removes non-alpha characters.
@andrewyatz
andrewyatz / README.md
Last active October 24, 2018 11:11
Creates a file of TRUNC512, MD5, ID and sequence from a Gzip compressed FASTA where line 1 is the ID line and line 2 is the sequence

Files

Files are downloaded from the MGnify resource and then processed using the calc.pl and checksum_checker.pl scripts. We also use the Unix commands awk and sort to format input into a santised format

Algorithm

  • Process FASTA files
    • Read two lines at a time
    • Extract ID
    • Extract seq
  • Calculate MD5 and TRUNC512
@andrewyatz
andrewyatz / generate_gc.pl
Created February 6, 2018 14:18
Create a GC wig file from a FASTA file for a given window size. Algorithm is to chunk a sequence into non-overlapping windows of the specified size and calculating GC content. Output is expressed as a % with 2 decimal point precision
#!/usr/bin/env perl
use strict;
use warnings;
use Bio::SeqIO;
my ($window_size, $fasta, $out) = @ARGV;
die "No window size given" if ! $window_size;
die "No fasta input" if ! $fasta;
@andrewyatz
andrewyatz / TabixHeaders.xs
Created August 20, 2015 15:00
Small bit of XS to retrieve the headers from a tabix indexed file such as a VCF file. Method returns an array reference.
SV*
tabix_headers(t)
tabix_t *t
PREINIT:
ti_iter_t iter;
const char *s;
int len;
CODE:
if (ti_lazy_index_load(t) < 0) {
fprintf(stderr,"[tabix] failed to load the index file.\n");