sbamin/pbjelly_gap_closing_how_to.md

## pbjelly_gap_closing_how_to.md

      
    Raw
  

              pbjelly_gap_closing_how_to.md
            
          
    PBJelly gapclosing

PBJelly is a part of the PBSuite package of programs. PBJelly can be used to try to close or shrink gaps that may be present between contigs after scaffolding.
The official documentation is availible at: http://sourceforge.net/p/pb-jelly/wiki/Home/
Most of PBJelly can be configured through an .xml protocol that you have to create with a text editor. It is best to put this in the directory where your output is going. Below is an example of how this xml protocol can look:
<jellyProtocol>
<reference>/path/to/reference.fasta</reference>
<outputDir>/path/to/outputdirectory/</outputDir>
<blasr>-minMatch 8 -minPctIdentity 70 -bestn 1 -nCandidates 20
-maxScore -500 -nproc 20 -noSplitSubreads</blasr>
<input baseDir="/path/to/readdirectory/">
<job>filtered_subreads_over500bp.fasta</job>
</input>
</jellyProtocol>

The main parts of the protocol you need to change if you want to run with the default settings are:
<reference> This is probably your scaffold. It should be fasta format (not fastq). Note that this protocol does not accept the “~” symbol, so write all the paths from root.
<outputdir> The directory where you want your output to be saved. It is recommended that you put your xml protocol here too.
-nproc is the number of processors that will be used (20 in this example). Only some of the steps are multithreaded.
<input baseDir> This is the path to the directory where your reads are located that you want to map to the reference.
NOTE: if these reads are in .fasta format, they have to have a .qual file in the same folder with the same prefix. If you do not have one you can generate a fake .qual file by using the fakeQuals.pl script that is included in PBJelly like this:
/path/to/fakeQuals.pl <yourreads.fasta>

This may be located in some place like, for example /usr/local/PBSuite_14.9.9/bin/fakeQuals.pl. Check that the .qual files have been created.
To make things a bit easier for yourself, you may want to add the /bin directory of PBSuite to your $PATH, so you don’t have to write the full path to Jelly.py for every step. Additionally, it is required that you source setup.sh, a script in the PBJelly-folder, before you start running PBJelly. Otherwise it won’t be able to find the required python modules that it needs. You can also add the contents of setup.sh  to your own $PATH to not have to do this after every time you log in.
Anyway, you are now ready to enter the command to start the setup step:
Jelly.py setup <jellyprotocol.xml>

This step should be quite fast. To get additional information about these steps, you can just add the -h tag instead of your protocol. After it is done, you can continue with the mapping step. Remember that the following steps may take a little longer depending on your data, so you may want to start a screen with:
screen -S <screenname>

The blasr mapping is then started with:
Jelly.py mapping <jellyprotocol.xml>

Then, continue by supporting the gaps with:
Jelly.py support <jellyprotocol.xml>

Then, to extract the useful reads:
Jelly.py extraction <jellyprotocol.xml>

Then, to finally assemble the gaps.
Jelly.py assembly <jellyprotocol.xml>

Note that you can add the -x “--nproc=4” flag to run the assembly on 4 cores, as an example, if you need to.
To then collect all of the data you have generated into 3 files:
Jelly.py output <jellyprotocol.xml>

Which will collect the results into the files:
<outputDir>/jelly.out.fasta
<outputDir>/jelly.out.qual
<outputDir>/liftOverTable.json

You can look inside jelly.out.fasta to see what has happened to your sequence or look in output.out and output.err. If there are still gaps remaining in your scaffold, you can use the
grep -Ho N jelly.out.fasta | uniq -c

command to see the number of N in the file and compare to the number of N in your original file.
If there is anything further that needs to be added to this tutorial, feel free to do so.
/Alvar