The idea is to do bootstrap resampling on the control dataset to make sure it is appropriate as a control dataset.
python randomize.py <input.csv> <resample_times>
This will generate
resample_times files called
Start by reading a csv table, and then suffle it
N times, labeling the first half
control, second half
test - make a new csv file for each re-label
This is the idea of Brian McDonald, for his metagenomics analyses. I (Ivan Kryukov) have implemented a basic framework for the randomization, and assume no responsibility in how it is used.