Skip to content

Instantly share code, notes, and snippets.

@nikhilRP
Last active October 13, 2015 12:52
Show Gist options
  • Save nikhilRP/bd388d3a8cfede2e7f48 to your computer and use it in GitHub Desktop.
Save nikhilRP/bd388d3a8cfede2e7f48 to your computer and use it in GitHub Desktop.
Utility scala class to load and filter alignments
import org.bdgenomics.formats.avro.AlignmentRecord
import org.bdgenomics.adam.rdd.ADAMContext._
import org.bdgenomics.adam.projections.Projection
import org.apache.spark.rdd.RDD
import org.apache.parquet.filter2.dsl.Dsl._
import org.apache.parquet.filter2.predicate.FilterPredicate
import org.bdgenomics.adam.projections.AlignmentRecordField._
val adamFile = "/user/nikhilrp/encoded-data/mm10/chr1/ENCFF891NNX.adam"
val proj = Projection(readName, contig, start, end, qual)
val pred: FilterPredicate = (LongColumn("start") >= 16097631L && LongColumn("end") <= 17097631L)
val adamRDD: RDD[AlignmentRecord] = sc.loadParquetAlignments(adamFile, projection=Some(proj), predicate=Some(pred))
adamRDD.count
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment