Skip to content

Instantly share code, notes, and snippets.

View shenwei356's full-sized avatar
💭
Busy with a project.

Wei Shen shenwei356

💭
Busy with a project.
View GitHub Profile
@shenwei356
shenwei356 / benchmark-encoding.md
Last active August 13, 2022 17:36
k-mer encoding and decoding

Functions:

# encoding: ACTG

def nuc2int_nochecking(b):
    return (ord(b) >> 1) & 3, True
    
def nuc2int_if(b):
    if b == 'a' or b == 'c' or b == 'g' or b == 't' \

or b == 'A' or b == 'C' or b == 'G' or b == 'T':

  8            .          .           TEXT ·__mm_add_epi32(SB),0,$0 
  9        640ms      640ms               VMOVDQU x+0(FP), Y0 
 10        5.62s      5.62s               VMOVDQU y+32(FP), Y1 
 11        4.81s      4.81s               VPADDD  Y1, Y0, Y0 
 12        1.16s      1.16s               VMOVDQU Y0, q+64(FP) 
 13        1.30s      1.30s               VZEROUPPER 

14 . . RET

@shenwei356
shenwei356 / test.go
Created July 4, 2018 16:41
inverse-bloom-filter
package main
import (
"compress/gzip"
"fmt"
"os"
"strconv"
boom "github.com/tylertreat/BoomFilters"
)
@shenwei356
shenwei356 / add-timestamp-for-media-file.sh
Last active September 27, 2023 02:59
Adding create time to image/video files
#!/bin/sh
while read file; do
if [[ $string =~ .*=.* ]]; then
continue
fi
t=$(exiftool "$file" \
| grep "^Create Date" | head -n 1 \
| sed -r "s/\s+/ /g" | cut -d " " -f 4 \
#!/bin/sh
# Test data
#
# Retrieve 1M reads from any Illumina reads
#
# seqkit head -n 1000000 xxxx_1.fq.gz -o test.fq.gz
#
# Or
#
@shenwei356
shenwei356 / howto.md
Last active October 31, 2019 14:35 — forked from killercup/pandoc.css
Add this to your Pandoc HTML documents using `--css pandoc.css` to make them look more awesome. (Tested with Markdown and LaTeX.)

pandoc -f markdown -t html -c pandoc.css -s -o report.html

@shenwei356
shenwei356 / doc.md
Last active April 3, 2017 08:46
Effect of random seed on results of 'seqkit sample'
@shenwei356
shenwei356 / Downloading genome annotation files from NCBI ftp with given FTP URL list.md
Last active February 7, 2019 00:52
Downloading genome annotation files from NCBI ftp with given FTP URL list

URL list

$ head choose_ftp.txt
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/015/405/GCA_000015405.1_ASM1540v1
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/620/625/GCA_000620625.1_ASM62062v1
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/972/925/GCA_000972925.1_ASM97292v1
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/001/021/385/GCA_001021385.1_ASM102138v1
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/328/565/GCA_000328565.1_ASM32856v1

Target files

@shenwei356
shenwei356 / Makefile
Created February 10, 2017 12:47 — forked from isaacs/Makefile
# Hello, and welcome to makefile basics.
#
# You will learn why `make` is so great, and why, despite its "weird" syntax,
# it is actually a highly expressive, efficient, and powerful way to build
# programs.
#
# Once you're done here, go to
# http://www.gnu.org/software/make/manual/make.html
# to learn SOOOO much more.
@shenwei356
shenwei356 / filter spades assembly result according to coverage.md
Last active December 11, 2023 04:42
Filtering spades assembly result according to coverage using SeqKit and csvtk

Filtering Spades assembly result according to coverage information in sequence header

Sample sequence

$ cat contigs.fasta 
>NODE_1_length_869844_cov_1135.34
ACTGNacgtn 
>NODE_2_length_576386_cov_975.882
acgtn