Skip to content

Instantly share code, notes, and snippets.

View seandavi's full-sized avatar

Sean Davis seandavi

View GitHub Profile
@seandavi
seandavi / setup_wandb.md
Created March 22, 2024 02:14
Setup weights-and-biases using docker compose

Sure! Here's the converted Docker Compose YAML file with a MySQL server as a separate container and a Docker volume for storage:

version: '3'
services:
  wandb-local:
    image: wandb/local
    container_name: wandb-local
    environment:
 - HOST=https://YOUR_DNS_NAME
@seandavi
seandavi / prompt_example.txt
Created March 7, 2024 23:20
Example of prompt to automate job candidate applications with set of minimal and preferred qualifications to YAML
You are an HR specialist and are evaluating the qualifications of job applicants
for a high-performance computing (HPC) specialist position.
You have been given a set of criteria to evaluate each candidate.
The candidate materials are in the attached PDF.
For each job applicant, fill in the following YAML-format criteria document. You
may use the "comment" field to provide additional context or justification for
your evaluation.
---
# candidate name
@seandavi
seandavi / cmgd_se_to_csv.R
Created February 25, 2024 23:31
convert all CMGD SummarizedExperiments to CSV files
# convert all CMGD SummarizedExperiments to CSV files
# Should run more-or-less directly as a script
# Requires more than 128GB RAM to complete
# Generates about 200GB of files
# BiocManager::install('curatedMetagenomicData')
# BiocManager::install(c('arrow','data.table','dplyr', 'readr'))
library(curatedMetagenomicData)convert all CMGD SummarizedExperiments to CSV files
@seandavi
seandavi / gist:dd7052951a199e5ea5ce584b01c5e0f2
Created January 31, 2024 20:57
Common Fund Data Ecosystem funding from reporter
#!/bin/bash
# results in json format
# Actual data in "results" array
#
# Opportunity numbers taken from https://commonfund.nih.gov/dataecosystem/FundedResearch
curl \
-X POST \
https://api.reporter.nih.gov/v2/projects/search \
-d '{"criteria":{"opportunity_numbers": ["RFA-RM-23-003", "PA20-185", "OTA-23-004", "RFA-RM-22-007", "OTA-23-005", "RFA-RM-17-026", "RFA-RM-21-007", "RFA-RM-19-012"]}}' \
-H 'Content-Type: application/json'
@seandavi
seandavi / sentence_embeddings_for_metadata_curation.ipynb
Created January 25, 2024 18:11
A quick demonstration of using sentence embeddings for semantic similarity search of metadata terms against "ontology" terms
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@seandavi
seandavi / bioconductor_bibliometrix_summary.txt
Created January 5, 2024 20:26
Basic bibliometrix analysis based on dois available from CITATION files in Bioconductor, searched through openalex.
MAIN INFORMATION ABOUT DATA
Timespan 2004 : 2023
Sources (Journals, Books, etc) 100
Documents 586
Annual Growth Rate % 11.38
Document Average Age 6.51
Average citations per doc 626.4
Average citations per year per doc 56.14
References 10872
@seandavi
seandavi / file_metadata.json
Last active November 29, 2023 17:29
Proposal for available files metadata json for easier and more robust client parsing [note that data are fake]
{
"accession": "GSE000123",
"files": [
{
"filetype": "Series SOFT file",
"name": "GSE227465_family.soft.gz",
"size": 23413,
"md5sum": "....",
"created_at": "DATE",
"updated_at": "DATE"
@seandavi
seandavi / test.qmd
Created November 15, 2023 20:37
Quarto test mermaid document
---
format:
html:
mermaid-format: svg
---
```{mermaid}
%%| fig-width: 100%
autonumber
Participant C as Client[<font size=6>]
@seandavi
seandavi / ena_browser_api.md
Created November 11, 2023 11:02
Example queries from ENA browser API

The ENA browser API https://www.ebi.ac.uk/ena/portal/api/swagger-ui/

There is only limit, no offset. API is designed to simply stream large resultsets

Search by SRA study ID

Output as TSV

SEARCH_QUERY='secondary_study_accession=SRP082656' && curl "https://www.ebi.ac.uk/ena/portal/api/search?query=${SEARCH_QUERY}&result=read_run&fields=experiment_accession%2Cexperiment_title%2Csecondary_study_accession%2Caligned%2Caltitude%2Cassembly_quality%2Cassembly_software%2Cbam_aspera%2Cbam_bytes%2Cbam_ftp%2Cbam_galaxy%2Cbam_md5%2Cbase_count%2Cbinning_software%2Cbio_material%2Cbisulfite_protocol%2Cbroad_scale_environmental_context%2Cbroker_name%2Ccage_protocol%2Ccell_line%2Ccell_type%2Ccenter_name%2Cchecklist%2Cchip_ab_provider%2Cchip_protocol%2Cchip_target%2Ccollected_by%2Ccollection_date%2Ccollection_date_end%2Ccollection_date_start%2Ccompleteness_score%2Ccontamination_score%2Ccontrol_experiment%2Ccountry%2Ccultivar%2Cculture_collection%2Cdatahub%2Cdepth%2Cdescription%2Cdev_stage%2Cdnase_protocol%2Cecotype%2
@seandavi
seandavi / datasets.yaml
Last active October 25, 2023 00:33
yaml description of public health data resources
datasets:
- name: brfss
title: The brfss dataset
description: |
a very long description which can be
in [markdown](https://markdown.org).
- list item
- list item 2
processor: readr::read_csv
- name: svi