lcolladotor/recountWorkshop2020_questions.md

## recountWorkshop2020_questions.md

      
    Raw
  

              recountWorkshop2020_questions.md
            
          
    Chat messages

Hi! Tomorrow is the workshop! ^^ I'm hope that you are excited about it. You can find the materials at http://research.libd.org/recountWorkshop2020/index.html. I'll start the workshop with some slides about the recount2 project & friends, then we'll run some of the code in the workshop (it was originally designed for a 2 hour workshop). I'll finish with a few slides about the future and then we can have a Q & A session, though you are more than welcome to continue asking questions beyond the workshop through the different venues (Bioconductor support site for package questions, GitHub issues for feature requests, Slack for informal questions, etc). See you tomorrow! Best, Leo
The slides are available at https://speakerdeck.com/lcolladotor
Questions and answers:


"how does scale_count fun work?" As described in the workshop and associated recountWorkflow, it uses the area under the coverage (AUC) and the base-pair coverage counts. In practice, this data is stored in the RangedSummarizedExperiment objects: colData(rse_gene) and assays(rse_gene)$counts. So scale_counts() uses that data and runs a quick formula to compute the scaled counts.
"You can use the miss classification to detect a unhealthy tissue?" Maybe. I haven't tried it so I don't know the answer. Feel free to do so =) and we'll be happy to help you if needed.
"The recount package is a fantastic resource for RNA-seq but is there any ambition or potential for a ChIP-seq resource? Perhaps a "recall" package. Additionally, could the infrastructure developed to analyse RNA-seq samples at scale be ported to other assay types?" We don't have the bandwidth for ChIP-seq data, so unfortunately we cannot promise that we'll be able to deliver something for that type of data, although we are happy to collaborate and explain what we've done for RNA-seq in more detail. As for other data types, Abhinav Nellore and his student are working on methylation data.
"Is recount2 a snapshot at a certain point in time, or growing with more samples over time? if so, how frequently?" It wasn't intended as a snapshot, but that's what happened, sorry! recount3 will be a major update and will also be updatable as that was a major focus of the new design.
"It's an amazing resource. I wanted to know is there any specific reason for providing the scale counts and not the read count? and also how much is scale count different from read count?" The recount package has functions for converting the base pair coverage counts to read counts. If we had provided the read counts, then we would have lost some of our flexibility. We are also doing this because Rail-RNA never generates a BAM file (due to costs associated with storing large files even if temporarily) and that's why we came up with this new/different way of counting RNA-seq data.
"If i wanted to compare my own dataset with datasets available in recount2, can i do that? Would i be able to analyse my own dataset the same way ?" Yes, but running Rail-RNA is challenging. This won't be the case with recount3. With recount2, you can compare data say at the RPKM level despite the different processing pipelines (see for example the main figure in the recount2 Nature Biotechnology paper http://www.nature.com/nbt/journal/v35/n4/full/nbt.3838.html)
"Is library type (single end or paired end) taken into account in the calculation by scale_counts? So can different library types be compared?" Yes, they can be compared either using scale_counts() as described in the help page. There are a couple of ways of doing so.
"have you compared recount2 count values against Kallisto, Salmon etc. ?" Our counts are different than theirs, though using the recount2 data you can estimate (note that it's not quantify) transcript-level counts as done in https://www.biorxiv.org/content/early/2018/05/25/247346. Kallisto and Salmon provide better transcript level counts, though it's very useful to know that you can get a decent estimate from the recount2 data itself. Alternatively, you can use a TPM formula that has some assumptions as used by Sonali et al at https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7026138/
"Are the Recount data objects compatible with MultiAssayExperiment? For example, combine the junctions tables with mutation information." The recount2 objects are SummarizedExperiment (RangedSummarizedExperiment) objects, so they should be compatible with MultiAssayExperiment.
"How to best find a suitable Dataset? is there only the abstract_search?" You can search for studies using abstract_search() or https://jhubiostatistics.shinyapps.io/recount/ or https://jhubiostatistics.shinyapps.io/recount-brain/ or add_metadata() or all_metadata() and filtering by your criteria of interest.