Skip to content

Instantly share code, notes, and snippets.

View cthoyt's full-sized avatar
💭
On sabbatical - unable to respond until Fall 2024.

Charles Tapley Hoyt cthoyt

💭
On sabbatical - unable to respond until Fall 2024.
View GitHub Profile
@cthoyt
cthoyt / bioregistry_formatter_trailing_slashes.py
Created September 24, 2021 11:36
Find URL format strings in the bioregistry with problematic trailing slashes
import tabulate
import bioregistry
def main():
"""Generate a curation sheet for URL format strings with problematic trailing slashes."""
rows = []
for prefix, resource in bioregistry.read_registry().items():
format_url = resource.get_format()
@cthoyt
cthoyt / pykeen_model_tweeterator.py
Created July 19, 2021 20:09
PyKEEN tweet generator
"""Generate PyKEEN model tweets."""
from textwrap import dedent
import click
from docdata import get_docdata
from pykeen.models import model_resolver
@cthoyt
cthoyt / github_has_topic.py
Created July 13, 2021 19:04
Check if the given GitHub repository has the given topic.
from typing import Optional
import requests
def has_topic(owner: str, repo: str, topic: str, token: Optional[str] = None) -> bool:
"""Check if the given GitHub repository has the given topic.
:param owner: The name of the owner/organization for the repository.
:param repo: The name of the repository.
@cthoyt
cthoyt / create_obofoundry_issues.py
Last active June 8, 2021 13:05
Add issues to OBO Foundry repositories to add `obofoundry` topic
"""
This script blasts all of the OBO Foundry Repositories with a given issue.
Use sparingly.
______________________________
/ with great power comes great \
\ responsibility /
------------------------------
\ ^__^
@cthoyt
cthoyt / make_mapping.py
Created June 1, 2021 17:25
Map from OBO Foundry prefixes to Bioregistry prefixes
import json
import sys
import click
import bioregistry
@click.command()
@click.option('--output', default=sys.stdout)
@cthoyt
cthoyt / npa_xeno_example.csv
Created January 30, 2021 16:44
Example dataset for PyNPA
We can't make this file beautiful and searchable because it's too large.
Experiment1.m5_node_label,Experiment1.m5_fold_change,Experiment1.m5_tstats
COX1,0.0442820351919402,1.27430203477638
ATP6,0.0829436812093022,2.86171297191134
SCGB1A1,-0.0829577087977886,-2.31945514137341
BPIFB1,0.303399473563391,7.82213833931689
RPL41,-0.0316199085257711,-1.3019865836353
ND2,0.0609064988662182,1.5913565091562
HBB,-0.60815873574393,-2.82215934759009
SCGB3A1,-0.114966188117398,-1.12153734714022
EEF1A1,-0.0856593570605189,-2.61018187225134
@cthoyt
cthoyt / generate_literals.py
Last active December 7, 2020 16:28
Generate random literal datasets in PyKEEN
"""
Author: Charles Tapley Hoyt (@cthoyt)
License: MIT
See related blog post at https://cthoyt.com/2020/12/07/generating-literal-datasets.html
"""
from typing import Any, List, TextIO, Tuple, Type, Union
import click
import torch
@cthoyt
cthoyt / constrained-evaluation-on-hetionet.ipynb
Created August 25, 2020 02:38
Constrained Evaluation on Hetionet.ipynb
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@cthoyt
cthoyt / README.md
Last active April 17, 2020 13:23
Remapping organism names in BioGRID identifiers dump

BioGRID Identifiers Problem, Solved

The file I included here has a set of organisms in the BioGRID identifiers download (version 3.5.184), latest as of time of writing on 2020-04-17) whose ORGANISM_OFFICIAL_NAME is not correct. I went and mapped these all with a mixture of synonym search on NCBITaxon and manual intervention. Each has the taxonomy identifier, so it can be used to get the most up-to-date information.

I would highly suggest including a taxonomy ID in this dump as well as the name, so it can be programatically mapped for anyone trying to integrate this

@cthoyt
cthoyt / cthoyt-thesis-drinking-game.md
Last active October 6, 2021 14:39
Charlie's PhD Thesis: The Drinking Game

Charlie's PhD, The Drinking Game

How to play:

  1. Read Charlie's PhD thesis... or just skim through it looking for fun
  2. Drink based on the following rules:

1 Sip

  • Charlie uses too many references