Skip to content

Instantly share code, notes, and snippets.

View slavailn's full-sized avatar

slava ilnytskyy slavailn

View GitHub Profile
@slavailn
slavailn / GEOquery_download.R
Last active April 22, 2024 17:54
Download GEO datasets with GEOquery
library(GEOquery)
# Download existing RA datasets
# Study 1
# Teixeira VH, Olaso R, Martin-Magniette ML, Lasbleiz S et al. Transcriptome
# analysis describing new immunity and defense genes in peripheral blood
# mononuclear cells of rheumatoid arthritis patients.
# PLoS One 2009 Aug 27;4(8):e6803. PMID: 19710928
# Reference: GSE15573
@slavailn
slavailn / get_snps_by_rsids.sh
Last active March 27, 2024 21:35
Retrive SNP sites from VCF files based on rs ids
# Get SNP records based on rs ids from VCF file
# Kevin Blighe recipe from https://www.biostars.org/p/373852/
# Create a file with snp ids of interest, one id per line
# We can give the newly created file any name, for example snp.txt
bcftools view --include ID==@snp.txt target.bcf
@slavailn
slavailn / fetch_fastq.sh
Last active March 7, 2024 20:29
Download raw fastq files from SRA with sra tools
# Taken from https://www.biostars.org/p/111040/
# Examine and save metadata
esearch -db sra -query PRJNA484081 | efetch -format runinfo > bioproj.csv
# The first column of comma separated runinfo file are run ids
cat bioproj.csv | cut -d ',' -f 1 | head
# Download first 4 files as an example
cat bioproj.csv | cut -d ',' -f 1 | grep 'SRR' # Check of we are selecting right files
@slavailn
slavailn / merge_lanes.sh
Created November 22, 2023 16:55
Merge fastq files that belong to the same sample, but were generated from different lanes
#! /bin/bash
# Taken from https://github.com/stephenturner/mergelanes/issues/1
# Exercise caution, does not work accurately in every case:
# Not working accurately for sample IDs like "A11_Barcodexxx_S11_L001_R1_001".
# It cat together all L001 pertaining to sample ID A11 with different barcodes also
ls *R1* | cut -d _ -f 1 | sort | uniq \
| while read id; do \
cat $id*R1*.fastq.gz > $id.R1.fastq.gz;
@slavailn
slavailn / entrez_to_KEGG.R
Created August 25, 2023 20:16
Download KEGG pathways with related entrez ids
library(KEGGREST)
library(org.Rn.eg.db)
# Download entrez ids and corresponding KEGG pathways followed
# by creation of a table where one column is entrez id and another
# column is a comma separated list of KEGG pathways
# Download pathway to entrez id relationship
rno_path_eg <- keggLink("pathway", "rno")
names(rno_path_eg) <- gsub("rno:", "", names(rno_path_eg))
@slavailn
slavailn / Two_factor_anova_DESeq2.R
Created February 21, 2023 20:51
Two-factor analysis with DESeq2
library(RColorBrewer)
library(ggplot2)
library(ggrepel)
setwd("<path/to/dir>")
# Load DESeq2 object
load("expression_data/DESeqOBJ.RData")
dds
@slavailn
slavailn / build_org_db.R
Created August 22, 2022 07:07
Build annotation package org.Xx.eg.db for a non-model organism (Ovis aries)
library(AnnotationForge)
library(biomaRt)
setwd("path/to/dir")
## Get annotation data frame for Ovis aries from BiomaRt
ensembl <- useMart("ENSEMBL_MART_ENSEMBL", host="https://www.ensembl.org")
ensembl <- useDataset("oaries_gene_ensembl", mart=ensembl)
#126 oaries_gene_ensembl Sheep (texel) genes (Oar_v3.1) Oar_v3.1
@slavailn
slavailn / enrichment_bubble.R
Created May 27, 2022 21:28
bubble plot example to visualize GO or pathway enrichment data
library(ggplot2)
library(stringr)
theme_set(
theme_bw() +
theme(legend.position = "right")
)
ggplot(all_go, aes(x = sample_id, y = reorder(GO.label, Enrichment))) +
geom_point(aes(size = Enrichment, fill = P.value), alpha = 0.75, shape = 21) +
In one of the conda environments I encountered the a bowtie2 installation problem breaking RSEM run.
The solution to the problem is described in the following Biostars post:
https://www.biostars.org/p/494922/
See the post content below:
```
$ conda create -n bttest -c bioconda bowtie2
$ conda activate bttest
@slavailn
slavailn / biomaRt_paralogs,R
Created May 17, 2021 05:00
Get paralogs for the list of genes using biomaRt
# Get paralogs for the list of genes
library(biomaRt)
human <- useMart("ensembl", dataset = "hsapiens_gene_ensembl")
gene_id <- c("TPM1", "BOD1", "ADAP1")
results <- getBM(attributes = c("ensembl_gene_id",
"external_gene_name",
"hsapiens_paralog_ensembl_gene",
"hsapiens_paralog_associated_gene_name"),
filters = "external_gene_name",