Skip to content

Instantly share code, notes, and snippets.

@drio
Created November 6, 2008 22:43
Show Gist options
  • Save drio/22727 to your computer and use it in GitHub Desktop.
Save drio/22727 to your computer and use it in GitHub Desktop.
Dear HGSC fellows,
As you already know we are facing the challenge of storing the new slx
data while we are running out of physical storage space @ the HGSC. In addition,
the amount of space required for the new Runs is getting bigger and bigger.
The way we have been dealing with data so far is not going to scale.
Keeping all analysis data in the cluster volumes is not an option anymore.
Our idea is to archive part of the slx analysis data (almost everything) and
only keep in the cluster the data that is _crucial_ to perform data analysis
and any other relevant scientific experiments.
We are proposing the following:
We will generate SRFs of all the analysis we perform per each FC. This will
reduce the space footprint and also keep the data in a more manageable format.
Once the SRF is created we will keep in the cluster only reads and qualities
(basically just 1 file per lane in SLX. 2 if PE). The SRF will be then archived
in tape and removed from the cluster volumes.
We will also submit the SRFs. Potentially, once the submission is accepted we can
toss the local SRFs. This is an option we are still exploring.
Please, understand that we should not save any data that can be easily regenerated
from the raw reads/qual files. For example, we should not save mapping info since
it can be recreated using mapping tools.
This being said, if any of you think there are some other files we should keep
in the cluster volumes, please let us know.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment