Skip to content

Instantly share code, notes, and snippets.

View jelber2's full-sized avatar

Jean Elbers jelber2

View GitHub Profile
@jelber2
jelber2 / Estimating_effect_of_error-correction_on_read_phase_switches_via_HG002_hapmers.jl
Last active May 24, 2024 11:44
Using Nanopore Reads from HG002, estimate the phase switching with hapmers and readmers
# Get HG002
# wget https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/HG002/assemblies/hg002v1.0.1.fasta.gz
# gunzip hg002v1.0.1.fasta.gz
#
# Use seqtk to get maternal and paternal sequences
# https://github.com/lh3/seqtk
# seqtk seq -l0 hg002v1.0.1.fasta|paste - - |fgrep "MATERNAL" |tr '\t' '\n'|seqtk seq -l60 > hg002v1.0.1.maternal.fasta &
# seqtk seq -l0 hg002v1.0.1.fasta|paste - - |fgrep "PATERNAL" |tr '\t' '\n'|seqtk seq -l60 > hg002v1.0.1.paternal.fasta &
#
# use samtools to index haplotypes
@jelber2
jelber2 / Plot_Animated_Nanopore_Quality_x_Length_2D_Contours.jl
Last active May 15, 2024 14:41
Makes an animated plot of read quality and read length density contours over time for Nanopore Dorado basecalled, indexed BAM files
# new in revision 2, multithreaded support - up to twice as fast as single-threaded, revision 1
# invoke with:
# julia --threads 16 Plot_Animated_Nanopore_Quality_x_Length_2D_Contours.jl --input_file test.bam --fps 0.7 --output_file test.gif
# but need a lot of threads (up to 16, above 16 threads, get saturation in cores vs. time plot)
using XAM
using ArgParse
using DataFrames
@jelber2
jelber2 / PlotQualityHistogram.jl
Last active May 8, 2024 12:42
Plots Quality Score Density Distribution per Hour of a Sequencing Run
# note: the Dorado BAM file produced by basecalling POD5 files needs to be indexed with "samtools index input_file"
# note2: something seems really strange about the qualtiy scores, I have checked several times, but they seem correct
# I am not sure why they are reporting that for example reads with Q38 average quality scores.
# EDIT: this is fixed, see https://www.biostars.org/p/295932/#295936 for better formula for average Phred Score
# tested with julia-1.10.3 and XAM.jl-0.40
using XAM
using ArgParse
using DataFrames
using Dates
using Plots
@jelber2
jelber2 / changeBAMquality.jl
Created April 30, 2024 08:56
Converts quality scores in BAM alignments to a single ASCII Phred Score you desire
# note: outputs to STDOUT a SAM file without a header
# note2: remove auxillary tags and information
# note3: ignores RNEXT and PNEXT from BAM file (puts * and 0 respectivelv)
# tested with julialang v1.10.2 and XAM v0.4.0
# $ julia changeBAMquality.jl input.bam ? > test.sam.noheader
#
# to add a header from the original BAM file that had its qualities changed
# and add on a PG line
# $ ASCII_CHARACTER="?"
# $ INPUT=input.bam
@jelber2
jelber2 / PlotSequenceTime.jl
Last active March 26, 2024 10:04
Plot changes in read length distribution of Nanopore Dorado called bases
using XAM
using ArgParse
using DataFrames
using Dates
using Plots
using StatsBase
using Plots.PlotMeasures
function parse_commandline()
@jelber2
jelber2 / stitch-fasta.jl
Last active March 7, 2024 09:45
Stitch FASTA reads shredded with shred.sh from BBTools back together
# tested on Julialang v.1.10.2, DataStructures v0.18.16, and FASTX v2.1.4
# Usage
# $ julia stitch-fasta.jl chr20.herro.fasta.Q30.recal.shred.fasta > chr20.herro.fasta.Q30.recal.fasta
#
import Pkg; Pkg.add("FASTX")
import Pkg; Pkg.add("DataStructures")
using DataStructures
using FASTX
function process_fasta_file(filename::String)
@jelber2
jelber2 / stitch-fastq.jl
Last active March 7, 2024 09:45
Stitch FASTQ reads shredded with shred.sh from BBTools back together
# tested on Julialang v.1.10.2 , DataStructures v0.18.16, and FASTX v2.1.4
# Usage
# $ julia stitch-fastq.jl chr20.herro.fasta.Q30.recal.shred.fastq > chr20.herro.fasta.Q30.recal.fastq
#
import Pkg; Pkg.add("FASTX")
import Pkg; Pkg.add("DataStructures")
using DataStructures
using FASTX
function process_fastq_file(filename::String)
@jelber2
jelber2 / shredBAM.jl
Last active March 11, 2024 13:09
Shred alignments in a BAM file (output is a SAM file without header) to make long read alignments into short-read-like
# previous versions allowed to generate directly from BAM, but there seemed to have been some troubles
# between versions of XAM, so this version is working so far for its intended purpose on https://github.com/brendanofallon/jovian
# also, this version is rather fast
#
#
# note: ignores quality scores at the moment - fixed in revision#4
# note2: outputs to STDOUT a SAM file without a header
# tested with julialang v1.10.2 and XAM v0.4.0
# $ julia shredBAM.jl input.bam 300 > test.sam.noheader
#
@jelber2
jelber2 / README.txt
Last active February 29, 2024 10:25
Error_rates_in_Herro_and_corrected_Herro_reads_compared_to_NextSeq2000
# output from best commit #fcdfa97 (https://github.com/google/best), .summary_identity_stats.csv files using reads
# aligned to concatenated chr20_MATERNAL and chr20_PATERNAL from hg002v1.0.1.fasta.gz (https://github.com/marbl/HG002) (https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/HG002/assemblies/hg002v1.0.1.fasta.gz)
# using mm2-fast commit # 10bde16 using settings: --eqx --secondary=no -Y -c -ax map-ont -k 19 -w 13 -t 48
# or using these settings for Illumina NextSeq2000 reads: -t 48 --eqx --secondary=no -acx sr
#
# brutal_rewrite (br) commit # ad87f92 (https://github.com/natir/br) using settings: -k 19 -m graph
# kmer read filter (kmrf) commit # 36cad24 (https://github.com/natir/kmrf) using setting: -k 17
# peregrine-2021 (pg_asm) commit # 6698eb1 (https://github.com/cschin/peregrine-2021): using default settings
#
# herro (herro) commit # c41dc30 (https://github.com/lbcb-sci/herro) using defaults and model at time of commit
@jelber2
jelber2 / README.md
Last active October 18, 2023 13:41
MethPhaser

MethPhaser installation

Install micromamba or mamba or conda

# Install micromamba
"${SHELL}" <(curl -L micro.mamba.pm/install.sh)

You will then see something like this in a BASH shell (parts with "(type....)" are added for instructions