Skip to content

Instantly share code, notes, and snippets.

@jaudoux
Created April 20, 2018 09:46
Show Gist options
  • Save jaudoux/453e70a83a50a6c8455f976eda098ff9 to your computer and use it in GitHub Desktop.
Save jaudoux/453e70a83a50a6c8455f976eda098ff9 to your computer and use it in GitHub Desktop.
Generate k-mer matrice from FASTQ files
RAW_DIR = "raw"
SAMPLES = ['SRR2966453', 'SRR2966456','SRR2966471','SRR2966474', 'SRR2966454', 'SRR2966457', 'SRR2966472', 'SRR2966475', 'SRR2966455', 'SRR2966458', 'SRR2966473', 'SRR2966476']
rule all:
input: "raw_counts.tsv.gz.gbi"
rule jellyfish_counts:
input:
r1 = RAW_DIR + "/{sample}_1.fastq.gz",
r2 = RAW_DIR + "/{sample}_2.fastq.gz"
output: temp("{sample}.jf")
threads: 10
params:
min_qual = '5', # 1/100 prob (phread 20)
k = 20,
min_count = 3 # As in allsome bloom tree
shell: """
jellyfish count -Q {params.min_qual} \
-m {params.k} \
-C \
-s 10000 \
-t {threads} \
-L {params.min_count} \
-o {output} \
<(zcat {input.r1}) <(zcat {input.r2})
"""
rule jellyfish_dump:
input: "{sample}.jf"
output: "{sample}.txt.gz"
threads: 2
shell: "jellyfish dump {input} -c -t | sort -k 1 -S 5G --parallel {threads} | pigz -p {threads} -c > {output}"
rule joinCounts:
input: expand("{sample}.txt.gz",sample = SAMPLES)
output: "raw_counts.tsv.gz"
params:
sample_names = "\t".join(SAMPLES)
run:
shell("echo 'tag\t{params.sample_names}' | gzip -c > {output}")
shell("joinCounts {input} | bgzip -c >> {output}")
rule kamix_index:
input: "raw_counts.tsv.gz"
output: "raw_counts.tsv.gz.bgi"
shell: "tamix index {input}"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment