Skip to content

Instantly share code, notes, and snippets.

@PeterKneale
Last active May 18, 2023 03:02
Show Gist options
  • Save PeterKneale/2dc9096250a307ffc70bd029e4db61d8 to your computer and use it in GitHub Desktop.
Save PeterKneale/2dc9096250a307ffc70bd029e4db61d8 to your computer and use it in GitHub Desktop.
study - genetics

Courses

MITx

  • 7.00x Introduction to Biology - The Secret of Life

  • 7.03.1x Genetics: The Fundamentals

  • 7.03.2x Genetics: Analysis and Application

  • 7.05x Biochemistry: Biomolecules, Methods, and Mechanisms

  • 7.QBWx Quantitative Biology Workshop

  • 7.06.1x Cell Biology: Transport & Signaling, open for self-paced learning

  • 7.06.2x Cell Biology: The Cytoskeleton and Cell Cycle, open for self-paced learning

  • 7.06.3x Cell Biology: Cell-Cell Interactions, open for self-paced learning

  • 7.28.1x Molecular Biology: DNA Replication and Repair

  • 7.28.2x Molecular Biology: Transcription and Transposition

  • 7.28.3x Molecular Biology: RNA Processing and Translation

Coursera

Definitions

Basic Genetics:

  • DNA (Deoxyribonucleic acid) - The molecule that encodes genetic information and is the primary hereditary material in most organisms.
  • RNA (Ribonucleic acid) - A nucleic acid molecule involved in various biological roles, including coding, decoding, regulation, and expression of genes.
  • Gene - A segment of DNA that encodes for a specific functional product, such as a protein or RNA molecule.
  • Genome - The complete set of genetic material (DNA) in an organism, containing all its genes.
  • Chromosome - A thread-like structure of nucleic acids and proteins found in the nucleus of most living cells, carrying genetic information in the form of genes.
  • Autosome - A chromosome that is not a sex chromosome.
  • Sex chromosome - A chromosome involved in determining the sex of an organism, typically the X and Y chromosomes in mammals.
  • X chromosome - One of the two sex chromosomes, females have two X chromosomes while males have one X and one Y chromosome.
  • Y chromosome - One of the two sex chromosomes, present only in males.
  • Allele - One of two or more alternative forms of a gene that arise by mutation and are found at the same place on a chromosome
  • Homozygous - Having two identical alleles of a particular gene or genes
  • Heterozygous - Having two different alleles of a particular gene or genes.
  • Hemizygous - Describes an individual who has only one member of a chromosome pair or chromosome segment rather than the usual two. Hemizygosity is often used to describe X-linked genes in males who have only one X chromosome.
  • Dominant allele - An allele that is fully expressed in the phenotype of a heterozygote, masking the effect of the recessive allele.
  • Recessive allele - An allele that is masked by the presence of a dominant allele and only expressed in the phenotype when both copies are recessive.
  • Co-dominant alleles - Alleles that both contribute to the phenotype when present in a heterozygote.
  • Genotype - The genetic makeup of an organism, which determines its heritable characteristics.
  • Phenotype - The observable characteristics of an organism resulting from the interaction of its genotype with the environment.
  • Penetrance - The proportion of individuals with a particular genotype that actually displays the phenotype associated with the genotype.
  • Expressivity - The degree to which a genotype is expressed in an individual's phenotype, which can be influenced by genetic background and environmental factors.
  • Mendelian inheritance - The patterns of inheritance of traits determined by single genes with two alleles, one of which may be dominant over the other.
  • Population genetics - The study of genetic variation within populations and how it changes over time and is shaped by factors such as mutation, selection, and genetic drift.
  • Locus - The specific location of a gene or DNA sequence on a chromosome

DNA Structure and Function:

  • Donor site - In the context of RNA splicing, the donor site refers to the 5' end of an intron, where the GU nucleotide sequence is located, and is involved in the splicing process that removes the intron from the pre-mRNA molecule.
  • Acceptor site - In the context of RNA splicing, the acceptor site refers to the 3' end of an intron, where the AG nucleotide sequence is located, and is involved in the splicing process that removes the intron from the pre-mRNA molecule.
  • Branch point - A conserved adenosine nucleotide within an intron that plays a crucial role in the splicing process, as it forms a lariat structure with the donor and acceptor sites during the removal of the intron from the pre-mRNA molecule.
  • Lariat - A circular RNA structure that is formed during the splicing process when the branch point adenosine nucleotide forms a 2',5'-phosphodiester bond with the 5' end of the intron, while the 3' end of the intron is joined to the acceptor site.
  • Introns - Non-coding sequences within a gene that are removed during RNA splicing.
  • Exons - Coding sequences within a gene that are joined together during RNA splicing to form the final mRNA molecule.
  • Splicing - The process of removing introns from pre-mRNA and joining exons together to form a mature mRNA molecule.
  • Alternative splicing - A regulated process in which different combinations of exons are joined together to create multiple mRNA molecules from a single gene, increasing the diversity of proteins that can be produced
  • Spliceosome - A complex of RNA and protein molecules that carries out the splicing of pre-mRNA
  • 5' UTR (Untranslated region) - The region of an mRNA molecule upstream of the coding sequence that is not translated into a protein, but may have regulatory roles.
  • 3' UTR (Untranslated region) - The region of an mRNA molecule downstream of the coding sequence that is not translated into a protein, but may have regulatory roles, such as mRNA stability and translation efficiency.
  • Promoter region - A DNA sequence upstream of a gene that binds RNA polymerase and other transcription factors, initiating transcription of the gene.
  • Enhancer region - A DNA sequence that increases the rate of transcription of a gene when bound by specific proteins, often located far from the gene it regulates.
  • Silencer region - A DNA sequence that decreases the rate of transcription of a gene when bound by specific proteins.
  • Transcription factor binding site - A specific DNA sequence recognized and bound by transcription factors, which regulate gene expression.
  • Operon - A group of genes that are transcribed together into a single mRNA molecule, often encoding proteins involved in a common function or pathway, primarily found in prokaryotes.
  • Codon - A sequence of three nucleotides in mRNA that corresponds to a specific amino acid or stop signal during translation.
  • Start codon - The codon (usually AUG) that initiates translation and codes for the amino acid methionine.
  • Stop codon - One of the three codons (UAA, UAG, UGA) that do not code for an amino acid and signal the termination of translation.
  • Reading frame - The way in which nucleotides in a DNA or RNA sequence are grouped into sets of three to be read as codons during translation.

Chromosomes and Mutations:

  • Karyotype - The number and appearance of chromosomes in the nucleus of a eukaryotic cell, including their size, shape, and banding patterns.
  • Mitosis - The process by which a cell divides its nucleus, resulting in two genetically identical daughter cells.
  • Meiosis - The process by which a diploid cell undergoes two successive divisions to produce four haploid cells, called gametes, which are used in sexual reproduction.
  • Nondisjunction - The failure of homologous chromosomes or sister chromatids to separate properly during cell division, leading to aneuploidy.
  • Aneuploidy - The presence of an abnormal number of chromosomes in a cell, usually due to nondisjunction.
  • Trisomy - A form of aneuploidy in which an organism has three copies of a particular chromosome instead of the normal two.
  • Monosomy - A form of aneuploidy in which an organism has only one copy of a particular chromosome instead of the normal two.
  • Polyploidy - The presence of more than two complete sets of chromosomes in a cell or organism.
  • Telomere - The repetitive nucleotide sequences at the ends of eukaryotic chromosomes that protect the chromosome from degradation and help maintain genomic stability.
  • Centromere - The region of a chromosome where sister chromatids are joined together and where the spindle fibers attach during cell division.
  • Chromosomal translocation - A type of chromosomal abnormality in which a segment of DNA is moved from one chromosome to another, potentially disrupting gene function.
  • Deletion mutation - A type of mutation in which a segment of DNA is lost, or deleted, potentially disrupting gene
  • Duplication mutation - A type of mutation in which a segment of DNA is duplicated, potentially disrupting gene function or leading to the production of extra gene products.
  • Inversion mutation - A type of mutation in which a segment of DNA is reversed in orientation, potentially disrupting gene function.
  • Frameshift mutation - A type of mutation in which a nucleotide is inserted or deleted, causing a shift in the reading frame and potentially altering the amino acid sequence of the resulting protein.
  • Point mutation - A type of mutation in which a single nucleotide is changed, potentially altering the amino acid sequence of the resulting protein.
  • Missense mutation - A type of point mutation that results in the substitution of one amino acid for another in the protein sequence.
  • Nonsense mutation - A type of point mutation that results in the premature termination of protein synthesis due to the introduction of a stop codon.
  • Silent mutation - A type of point mutation that does not result in a change in the amino acid sequence of the protein.
  • Crossing over - The exchange of genetic material between homologous chromosomes during meiosis, resulting in a recombination of parental genes and increased genetic diversity.
  • Recombination - The process by which genetic material is rearranged, either through crossing over during meiosis or other mechanisms, leading to new combinations of alleles.

Genomic Techniques:

  • Sequencing - The process of determining the order of nucleotides in a DNA or RNA molecule.
  • Next-generation sequencing (NGS) - A high-throughput sequencing technology that allows for the rapid sequencing of millions of DNA fragments simultaneously.
  • Whole-genome sequencing (WGS) - A method of determining the complete DNA sequence of an organism's genome.
  • Whole-exome sequencing (WES) - A method of sequencing only the protein-coding regions of a genome, which comprise about 1-2% of the total genome.
  • Sanger sequencing - A method of DNA sequencing based on the selective incorporation of chain-terminating dideoxynucleotides during in vitro DNA replication.
  • Polymerase chain reaction (PCR) - A molecular biology technique used to amplify a specific DNA sequence, generating millions of copies in a short period of time.
  • Quantitative PCR (qPCR) - A variation of PCR that enables the quantification of the amount of DNA or RNA in a sample.
  • Reverse transcription PCR (RT-PCR) - A variation of PCR that allows the amplification of RNA by first reverse transcribing the RNA into complementary DNA (cDNA).
  • Gene expression - The process by which the information contained within a gene is used to produce a functional product, such as a protein or RNA molecule.
  • Microarray - A high-throughput technology used to analyze gene expression patterns by hybridizing labeled cDNA or RNA to an array of DNA probes immobilized on a solid surface.
  • RNA-seq - A high-throughput sequencing method used to analyze the transcriptome, or the complete set of RNA molecules, in a cell or tissue at a given time.
  • ChIP-seq - A method used to analyze protein-DNA interactions by combining chromatin immunoprecipitation (ChIP) with high-throughput sequencing.
  • GWAS (Genome-Wide Association Study) - A study in which the genomes of many individuals are analyzed to identify genetic variants associated with a particular trait or disease.
  • Genetic linkage - The phenomenon in which genes that are close together on a chromosome tend to be inherited together because they are less likely to be separated by recombination during meiosis.
  • Linkage mapping - A method of determining the relative positions of genes on a chromosome based on the frequency at which they are inherited together, which is related to their physical distance.
  • CRISPR-Cas9 (Clustered Regularly Interspaced Short Palindromic Repeats and CRISPR-associated protein 9) - A powerful gene-editing technology that allows for the targeted modification of specific DNA sequences in the genome.

Bioinformatics and Computational Biology:

  • Sequence alignment - The process of comparing and aligning two or more nucleotide or amino acid sequences to identify regions of similarity, which may suggest functional, structural, or evolutionary relationships.
  • Multiple sequence alignment - The process of comparing and aligning three or more nucleotide or amino acid sequences to identify regions of similarity.
  • Homology - The similarity in nucleotide or amino acid sequences between two or more molecules due to shared ancestry.
  • Orthologs - Genes in different species that evolved from a common ancestral gene through speciation, often retaining the same function.
  • Paralogs - Genes within the same species that arose through gene duplication, often resulting in new functions.
  • Phylogenetics - The study of the evolutionary relationships among species, populations, or genes, often represented as a phylogenetic tree.
  • Hidden Markov Model (HMM) - A statistical model that represents a system that transitions between a finite set of hidden states, with each state generating an observable output according to a probability distribution, often used in bioinformatics for sequence analysis.
  • Gene Ontology (GO) - A standardized system for describing the functions, processes, and cellular locations of gene products in a species-independent manner.
  • BLAST (Basic Local Alignment Search Tool) - A widely used algorithm for comparing a query sequence against a database of sequences, identifying regions of similarity.

File Formats:

  • FASTA format - A text-based format for representing nucleotide or amino acid sequences, using single-letter codes and a header line with a description.
  • FASTQ format - A text-based format for representing nucleotide sequences and their corresponding quality scores, used in next-generation sequencing.
  • SAM (Sequence Alignment/Map) format - A text-based format for representing sequence alignment data, including both aligned and unaligned sequences.
  • BAM (Binary Alignment/Map) format - A binary version of the SAM format, providing a more efficient storage and retrieval of sequence alignment data.
  • VCF (Variant Call Format) - A text-based format for representing genetic variation data, such as single nucleotide polymorphisms (SNPs), insertions, and deletions.
  • GFF (General Feature Format) - A text-based format for representing genomic features, such as genes, exons, and regulatory regions, on a reference sequence.
  • GTF (Gene Transfer Format) - A variation of the GFF format that is specifically designed for representing gene and transcript annotations.
  • BED (Browser Extensible Data) format - A text-based format for representing genomic regions of interest, such as genes or regulatory elements, and their associated annotations.

Software Tools:

  • IGV (Integrative Genomics Viewer) - A visualization tool for the interactive exploration of genomic data, such as sequence alignments, variant calls, and genomic annotations.
  • Bowtie - A fast and memory-efficient tool for aligning short DNA sequences to a large reference genome, such as the human genome (88).
  • BWA (Burrows-Wheeler Aligner) - A software package for mapping low-divergent sequences against a large reference genome, using the Burrows-Wheeler Transform for efficient indexing.
  • GATK (Genome Analysis Toolkit) - A software package for the analysis of high-throughput sequencing data, including variant discovery, genotyping, and quality control.
  • Bioconductor - An open-source software project for the analysis and comprehension of high-throughput genomic data, providing a collection of R packages for various bioinformatics tasks .
  • Biopython - A set of freely available tools for biological computation written in Python, providing functions for working with sequence data, alignments, phylogenetics, and more.
  • BioPerl - A collection of Perl modules for the development of bioinformatics scripts and applications, providing functionality for working with sequence data, alignments, and other bioinformatics tasks ).
  • BioJava - A library of Java classes for bioinformatics, providing tools for working with sequences, alignments, phylogenetics, and other data types.

Databases and Resources:

  • GenBank - A public database of nucleotide sequences and their protein translations, maintained by the National Center for Biotechnology Information (NCBI).
  • RefSeq (Reference Sequence) - A curated, non-redundant database of nucleotide and protein sequences, maintained by the NCBI and designed to provide a standard set of reference sequences for various organisms.
  • UniProt (Universal Protein Resource) - A comprehensive and high-quality resource of protein sequence and functional information, maintained by a consortium of European bioinformatics institutes.
  • Ensembl - A genome database that provides access to reference genomes, gene annotations, and other genomic information for a variety of organisms, maintained by the European Bioinformatics Institute (EBI) and the Wellcome Trust Sanger Institute.
  • ClinVar - A public database of genetic variations and their relationship to human health, maintained by the NCBI.
  • dbSNP (Single Nucleotide Polymorphism Database) - A public database of genetic variations, including single nucleotide polymorphisms (SNPs), insertions, deletions, and other types of variations, maintained by the NCBI.
  • OMIM (Online Mendelian Inheritance in Man) - A comprehensive, authoritative compendium of human genes and genetic phenotypes, focusing on the relationship between genotype and phenotype in humans.
  • STRING (Search Tool for the Retrieval of Interacting Genes/Proteins) - A database of known and predicted protein-protein interactions, including direct physical interactions as well as indirect functional associations.
  • KEGG (Kyoto Encyclopedia of Genes and Genomes) - A database resource for understanding high-level functions and utilities of biological systems, including the genomic, transcriptomic, and proteomic information.
  • Pfam - A database of protein families and domains, providing multiple sequence alignments, hidden Markov models, and functional annotations for each family.
  • InterPro - A database that integrates protein signature databases, providing a single resource for protein classification and functional annotation.
  • PDB (Protein Data Bank) - A repository for the 3D structural data of proteins, nucleic acids, and complex assemblies, providing structural information derived from experimental methods such as X-ray crystallography, NMR spectroscopy, and cryo-electron microscopy.
  • GEO (Gene Expression Omnibus) - A public functional genomics data repository, supporting MIAME-compliant data submissions and providing a wide variety of high-throughput experimental data, such as microarray, next-generation sequencing, and mass spectrometry data.
  • SRA (Sequence Read Archive) - A public database that stores raw sequencing data generated from next-generation sequencing platforms, along with metadata describing the samples and experiments.
  • ArrayExpress - A public repository for microarray and other high-throughput functional genomics data, maintained by the European Bioinformatics Institute (EBI).
  • Reactome - A database of biological pathways and processes, providing a detailed representation of cellular processes and molecular interactions.
  • Cytoscape - An open-source software platform for visualizing complex networks and integrating these networks with any type of attribute data, such as gene expression data or protein-protein interactions.
  • MEME Suite - A collection of tools for the discovery and analysis of sequence motifs, including de novo motif discovery, motif scanning, and motif comparison.
  • HMMER - A software suite for searching sequence databases for homologs of protein sequences, using profile hidden Markov models for increased sensitivity and specificity.
  • Clustal - A widely used software package for multiple sequence alignment, providing both command-line and graphical user interface options for performing alignments.
  • MUSCLE - A software package for multiple sequence alignment, offering high accuracy and fast alignment of large datasets.
  • Jalview - A multiple sequence alignment editor and visualization tool, providing various options for alignment editing, sequence annotation, and structural visualization.
  • T-Coffee - A software package for multiple sequence alignment, combining different alignment methods to produce a single, high-quality alignment.
  • MAFFT - A software package for multiple sequence alignment, providing various alignment strategies and options for large-scale alignments.
  • RAPD (Random Amplified Polymorphic DNA) - A molecular marker technique that uses short, arbitrary DNA primers to amplify random fragments of genomic DNA, providing a fingerprint of the genome that can be used for genetic analysis).
  • SSR (Simple Sequence Repeat) - A type of molecular marker based on the presence of tandemly repeated nucleotide sequences, also known as microsatellites, which can be used for genetic analysis due to their high level of polymorphism.
  • SNP (Single Nucleotide Polymorphism) - A variation in a single nucleotide that occurs at a specific position in the genome, where each variation is present at a frequency of greater than 1% in the population.
  • CNV (Copy Number Variation) - A type of structural variation in the genome that involves the gain or loss of DNA segments, which can result in the duplication or deletion of genes and other functional elements.
  • Locus - A specific location on a chromosome where a particular gene or other DNA sequence is found.
  • Synteny - The conservation of gene order and orientation between homologous genomic regions in different species, which can provide evidence for shared ancestry and help to identify orthologous genes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment