Skip to content

Instantly share code, notes, and snippets.

@NickRoz1
NickRoz1 / test.py
Created July 13, 2021 11:13
MMAP vs merge from temporary files
MEGABYTE_SIZE = 1048576
FILE_SIZE = 4000 * MEGABYTE_SIZE
BUFFER_SIZE = 1000 * MEGABYTE_SIZE
RECORD_SIZE = 300
import random
import tempfile
import os
import mmap
import math
@NickRoz1
NickRoz1 / final_blog.md
Last active October 3, 2019 11:56
Final Blog

Final Blog

Nick Rozinsky

The code is located at https://github.com/NickRoz1/cBAM

History:

CBAM file format intended to eliminate unnecessary disk overhead when processing of BAM data requires only few fields of each BAM records. Initial implementation was simple tool for parsing full columns and rowgroups, and wasn't suitable for big files and lack of features. Current implementation provides convenient API which is convenient to use.

Since the time of July blog, the CBAM reader implementation was completely refactored. It lacked two, major features - iteration over column in foreach and iteration over few columns simultaneously.

These features implemented on a base of a new Column primitive. Now, to acquire a column of CBAM file one may use: