Skip to content

Instantly share code, notes, and snippets.

View seandavi's full-sized avatar

Sean Davis seandavi

View GitHub Profile
@seandavi
seandavi / README.md
Last active May 31, 2024 15:46
A small tutorial/workflow for using Nextflow on GCP from the CU Anschutz campus

Nextflow on Google Cloud at CU Anschutz

Nextflow is a powerful workflow management system designed for creating scalable and reproducible scientific workflows. It enables you to write workflows in a declarative language, making it easy to define complex pipelines that can be executed on various platforms, including local machines, clusters, and cloud environments like Google Cloud.

This short tutorial is meant for informatics users who are comfortable with a command line interface. It also assumes that the user is familiar with and has run nextflow on a local computer or HPC system.

Roughly, this document will walk through:

@seandavi
seandavi / extract_cancer_incidence.sh
Created May 2, 2024 16:01
Create jsonlines of US CDC cancer incidence data release
#!/bin/bash
curl https://gis.cdc.gov/Cancer/DataVizApi/GetJSON/USCS_County | sed -e 's/<string xmlns="http:\/\/schemas.microsoft.com\/2003\/10\/Serialization\/">//g' -e 's/<\/string>//g'| jq -c '.[] | .USCS_County[]' > output.jsonl
@seandavi
seandavi / setup_wandb.md
Created March 22, 2024 02:14
Setup weights-and-biases using docker compose

Sure! Here's the converted Docker Compose YAML file with a MySQL server as a separate container and a Docker volume for storage:

version: '3'
services:
  wandb-local:
    image: wandb/local
    container_name: wandb-local
    environment:
 - HOST=https://YOUR_DNS_NAME
@seandavi
seandavi / prompt_example.txt
Created March 7, 2024 23:20
Example of prompt to automate job candidate applications with set of minimal and preferred qualifications to YAML
You are an HR specialist and are evaluating the qualifications of job applicants
for a high-performance computing (HPC) specialist position.
You have been given a set of criteria to evaluate each candidate.
The candidate materials are in the attached PDF.
For each job applicant, fill in the following YAML-format criteria document. You
may use the "comment" field to provide additional context or justification for
your evaluation.
---
# candidate name
@seandavi
seandavi / cmgd_se_to_csv.R
Created February 25, 2024 23:31
convert all CMGD SummarizedExperiments to CSV files
# convert all CMGD SummarizedExperiments to CSV files
# Should run more-or-less directly as a script
# Requires more than 128GB RAM to complete
# Generates about 200GB of files
# BiocManager::install('curatedMetagenomicData')
# BiocManager::install(c('arrow','data.table','dplyr', 'readr'))
library(curatedMetagenomicData)convert all CMGD SummarizedExperiments to CSV files
@seandavi
seandavi / gist:dd7052951a199e5ea5ce584b01c5e0f2
Created January 31, 2024 20:57
Common Fund Data Ecosystem funding from reporter
#!/bin/bash
# results in json format
# Actual data in "results" array
#
# Opportunity numbers taken from https://commonfund.nih.gov/dataecosystem/FundedResearch
curl \
-X POST \
https://api.reporter.nih.gov/v2/projects/search \
-d '{"criteria":{"opportunity_numbers": ["RFA-RM-23-003", "PA20-185", "OTA-23-004", "RFA-RM-22-007", "OTA-23-005", "RFA-RM-17-026", "RFA-RM-21-007", "RFA-RM-19-012"]}}' \
-H 'Content-Type: application/json'
@seandavi
seandavi / sentence_embeddings_for_metadata_curation.ipynb
Created January 25, 2024 18:11
A quick demonstration of using sentence embeddings for semantic similarity search of metadata terms against "ontology" terms
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@seandavi
seandavi / bioconductor_bibliometrix_summary.txt
Created January 5, 2024 20:26
Basic bibliometrix analysis based on dois available from CITATION files in Bioconductor, searched through openalex.
MAIN INFORMATION ABOUT DATA
Timespan 2004 : 2023
Sources (Journals, Books, etc) 100
Documents 586
Annual Growth Rate % 11.38
Document Average Age 6.51
Average citations per doc 626.4
Average citations per year per doc 56.14
References 10872
@seandavi
seandavi / file_metadata.json
Last active November 29, 2023 17:29
Proposal for available files metadata json for easier and more robust client parsing [note that data are fake]
{
"accession": "GSE000123",
"files": [
{
"filetype": "Series SOFT file",
"name": "GSE227465_family.soft.gz",
"size": 23413,
"md5sum": "....",
"created_at": "DATE",
"updated_at": "DATE"
@seandavi
seandavi / test.qmd
Created November 15, 2023 20:37
Quarto test mermaid document
---
format:
html:
mermaid-format: svg
---
```{mermaid}
%%| fig-width: 100%
autonumber
Participant C as Client[<font size=6>]