Created
November 6, 2008 22:43
-
-
Save drio/22727 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Dear HGSC fellows, | |
As you already know we are facing the challenge of storing the new slx | |
data while we are running out of physical storage space @ the HGSC. In addition, | |
the amount of space required for the new Runs is getting bigger and bigger. | |
The way we have been dealing with data so far is not going to scale. | |
Keeping all analysis data in the cluster volumes is not an option anymore. | |
Our idea is to archive part of the slx analysis data (almost everything) and | |
only keep in the cluster the data that is _crucial_ to perform data analysis | |
and any other relevant scientific experiments. | |
We are proposing the following: | |
We will generate SRFs of all the analysis we perform per each FC. This will | |
reduce the space footprint and also keep the data in a more manageable format. | |
Once the SRF is created we will keep in the cluster only reads and qualities | |
(basically just 1 file per lane in SLX. 2 if PE). The SRF will be then archived | |
in tape and removed from the cluster volumes. | |
We will also submit the SRFs. Potentially, once the submission is accepted we can | |
toss the local SRFs. This is an option we are still exploring. | |
Please, understand that we should not save any data that can be easily regenerated | |
from the raw reads/qual files. For example, we should not save mapping info since | |
it can be recreated using mapping tools. | |
This being said, if any of you think there are some other files we should keep | |
in the cluster volumes, please let us know. | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment