The current used technology for next generation sequencing is Illumina sequencing - all others cannot compete with its speed, price and output power - they have therefore specialized in niche applications (not discussed here).
Nevertheless, no sequencing technology cannot simply start sequencing one end of a chromosome until the other end.
The approach therefore is:
- cut the genome into several small pieces that can be sequenced individually
- sequence all those small pieces at the same time <- each of these is a sequencing read
- map the position of each of those to a reference genome
This implies that in every NGS application, there are several common steps:
- prepare a library of DNA fragments to be sequenced - this step is called library preparation
- sequence the library - this step is called sequencing
- assess quality of raw data reads - this step is called quality control
- determine the position of each sequence (read) in the genome - this step is called mapping
From here on, analysis varies from application to application.
- de novo whole-genome sequencing - determine the sequence of a genome from a species never sequenced before and make a reference genome - called genome assembly
- Whole-genome sequencing - determine the sequence of an individual from a species with a genome reference and annotate deviations from the reference - this process is called variant calling
- RNA sequencing - convert RNA molecules in cells to cDNA, sequence the cDNA, determine its origin in the genome (mapping), and count how many cDNA molecules come from each gene - called gene expression profiling
- "Chromatin profiling" (ChIP-seq, ATAC-seq, DNase-seq) - select regions of the genome associated with certain proteins or with a certain conformation, make a library with those only, sequence the library and determine the abundance of reads along the genome (regions with more reads will be the binding sites of proteins)
- ... many others, but these above are >80% of the usage cases.
(make sure you understand these!)
- sequencing library (or simply library)
- library fragment (or simply fragment)
- sequencing read (or simply read)
- mapping
- alignment
- variant
- gene
- transcript
(google at least one per category)
- Aligners
- BWA
- Bowtie2
- Variant calling
- GATK
- Samtools
- Differential expression
- cufflinks
- DESeq
- Genome browsers (to visualize reads and regions in the genome)
- UCSC genome browser
- IGV genome browser
- General purpose (e.g. format conversion)
- samtools
- bedtools
- Fastqc (raw read quality control)
(good to know, but don't worry too much right now)
- FASTQ - format to store reads and measurements of their quality
- SAM/BAM - format to store reads, alignments and measurements of their quality
- VCF/BCF - format to store called variants
- BED - format to store annotation of regions in the genome
https://www.youtube.com/watch?v=womKfikWlxM
...