Skip to content

Instantly share code, notes, and snippets.

@odinokov
odinokov / bb.sh
Created November 15, 2023 04:01
Script to archive a specified directory as a tar.gz
#!/bin/bash
# Script to archive a specified directory
# Ensure the script exits immediately if any command fails
set -e
# Cleanup function
cleanup() {
[ -f "$temp_file" ] && rm "$temp_file"
@odinokov
odinokov / AWS_uploader.py
Last active October 28, 2023 06:13
upload files to AWS
# To securely collect user input regarding file lists, S3 bucket details, and AWS credentials.
# Then, upload the listed files to a specified S3 bucket and transition their storage class to DEEP_ARCHIVE.
import boto3
import os
import getpass
import logging
from botocore.exceptions import NoCredentialsError, BotoCoreError, ClientError
from tqdm import tqdm
@odinokov
odinokov / list_files_from_aws.py
Created August 22, 2023 09:47
Lists all files in the specified S3 bucket
import boto3
from botocore.exceptions import NoCredentialsError, BotoCoreError, ClientError
from typing import List
import logging
# Setting up logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
def list_files_from_aws(
region: str, path: str, aws_access_key_id: str, aws_secret_access_key: str, log_level: str = 'INFO') -> List[str]:
@odinokov
odinokov / ledoit_wolf_portfolio_rebalancing.py
Last active July 21, 2023 06:38
A script to compute optimal portfolio weights using both sklearn's Ledoit-Wolf and custom shrinkage estimators
import yfinance as yf
import numpy as np
from sklearn.covariance import LedoitWolf
from scipy.optimize import minimize
import pandas as pd
from typing import Tuple
from typing import Optional
def shrinkage(returns: np.array) -> Tuple[np.array, float, float]:
# from https://github.com/WLM1ke/LedoitWolf/
#!/bin/bash
#############################################################################
# Script: restore_from_glacier.sh
# Description:
# This script restores files and folders recursively from
# Amazon S3 Glacier for a given S3 path. It checks if the restore
# is already in progress and initiates the restore for objects
# that are not currently being restored. The default values for
@odinokov
odinokov / install_ichorCNA.sh
Created April 17, 2023 07:00
How to install ichorCNA
mamba create --name ichorCNA && \
mamba activate ichorCNA && \
mamba install -y -c conda-forge -c bioconda r-essentials r-base r-devtools hmmcopy bioconductor-genomeinfodb bioconductor-genomicranges r-ichorcna && \
cd $CONDA_PREFIX/lib/R/library/ichorCNA/ && \
mkdir -p tmp && \
git clone https://github.com/broadinstitute/ichorCNA.git tmp && \
cp -r ./tmp/scripts/ .
# dry-run
Rscript $CONDA_PREFIX/lib/R/library/ichorCNA/scripts/runIchorCNA.R
@odinokov
odinokov / cfdna_size_count.sh
Last active April 5, 2023 03:32
counts size of cfDNA fragments in a specific region
#!/bin/bash
set -euo pipefail
# Define software dependencies
declare -ra deps=("samtools" "bedtools")
# Check that required software dependencies are installed
for dep in "${deps[@]}"
do
@odinokov
odinokov / get_GCB_median_mean.sh
Last active April 21, 2023 08:36
Get GC% vs coverage per bin
#!/bin/bash
# This bash script performs statistical analysis on a specified BAM file to investigate GC bias.
# The analysis is done at a bin size of 100 base pairs.
# The script removes blacklisted regions and regions with N from the reference genome (hg38),
# then generates genomic bins limited to autosomes. For each bin, the GC content is computed.
# The input BAM file is downsampled, and the coverage of each bin is then calculated.
# The mean and median coverage for each bin with known GC content is reported.
set -euo pipefail
@odinokov
odinokov / download_file.sh
Last active April 4, 2023 04:33
a demo script
# This function downloads a file from a given URL using wget command.
# If the file already exists, it will be checked to see if it has been downloaded completely.
# If the file was previously partially downloaded, the function will continue the download from where it was left off.
# If the file does not exist, the function will download it from scratch.
# If the download fails, an error message is displayed and the script exits with a non-zero exit status.
#
# Args:
# URL: the URL to download the file from.
# OUT: optional output directory for the downloaded file. If not provided, the file will be downloaded to the current directory.
#
@odinokov
odinokov / get_GCB.sh
Last active April 4, 2023 01:25
Downsample and computeGCBias
#!/bin/bash
# This snippet downsamples BAM, removes blacklisted regions, and calculates GC bias for autosomes.
# mamba install -y -c bioconda -c conda-forge samtools bedtools deeptools pv
# Resulting columns (from https://www.biostars.org/p/447062/):
# N_gc: The number of reads with a given GC content
# F_gc: The number of reads spanning regions with a given GC content
# R_gc: The scaled ratio between the above values
# The number of rows should be equal to 1 plus the estimated median fragment length.