Skip to content

Instantly share code, notes, and snippets.

@sirenko
Created January 17, 2018 02:04
Show Gist options
  • Save sirenko/795c2aa60484d148780e4629750cfa0b to your computer and use it in GitHub Desktop.
Save sirenko/795c2aa60484d148780e4629750cfa0b to your computer and use it in GitHub Desktop.
Data Structures and Algorithms, UC San Diego

Data Structures and Algorithms | UC San Diego

Data Structures and Algorithms Specialization

Master Algorithmic Programming Techniques. Learn algorithms through programming and advance your software engineering or data science career

About This Specialization

This specialization is a mix of theory and practice: you will learn algorithmic techniques for solving various computational problems and will implement about 100 algorithmic coding problems in a programming language of your choice. No other online course in Algorithms even comes close to offering you a wealth of programming challenges that you may face at your next job interview. To prepare you, we invested over 3000 hours into designing our challenges as an alternative to multiple choice questions that you usually find in MOOCs. Sorry, we do not believe in multiple choice questions when it comes to learning algorithms…or anything else in computer science! For each algorithm you develop and implement, we designed multiple tests to check its correctness and running time — you will have to debug your programs without even knowing what these tests are! It may sound difficult, but we believe it is the only way to truly understand how the algorithms work and to master the art of programming. The specialization contains two real-world projects: Big Networks and Genome Assembly. You will analyze both road networks and social networks and will learn how to compute the shortest route between New York and San Francisco (1000 times faster than the standard shortest path algorithms!) Afterwards, you will learn how to assemble genomes from millions of short fragments of DNA and how assembly algorithms fuel recent developments in personalized medicine.

COURSE 1 - Algorithmic Toolbox

Current session: Jan 15

Commitment
5 weeks of study, 4-8 hours/week
Subtitles
English, Spanish

The course covers basic algorithmic techniques and ideas for computational problems arising frequently in practical applications: sorting and searching, divide and conquer, greedy algorithms, dynamic programming. We will learn a lot of theory: how to sort data and how it helps for searching; how to break a large problem into pieces and solve them recursively; when it makes sense to proceed greedily; how dynamic programming is used in genomic studies. You will practice solving computational problems, designing new algorithms, and implementing solutions efficiently (so that they run in less than a second).

WEEK 1 - Programming Challenges

Welcome to the first module of Data Structures and Algorithms! Here we will provide an overview of where algorithms and data structures are used (hint: everywhere) and walk you through a few sample programming challenges. The programming challenges represent an important (and often the most difficult!) part of this specialization because the only way to fully understand an algorithm is to implement it. Writing correct and efficient programs is hard; please don’t be surprised if they don’t work as you planned—our first programs did not work either! We will help you on your journey through the specialization by showing how to implement your first programming challenges. We will also introduce testing techniques that will help increase your chances of passing assignments on your first attempt. In case your program does not work as intended, we will show how to fix it, even if you don’t yet know which test your implementation is failing on.

  • Video · Welcome!
  • Programming Assignment · Programming Assignment 1: Programming Challenges
  • Practice Quiz · Solving Programming Challenges
  • Reading · Optional Videos and Screencasts
  • Video · Solving the Sum of Two Digits Programming Challenges (screencast)
  • Video · Solving the Maximum Pairwise Product Programming Challenge: Improving the Naive Solution, Testing, Debugging
  • Video · Stress Test - Implementation
  • Video · Stress Test - Find the Test and Debug
  • Video · Stress Test - More Testing, Submit and Pass!
  • Reading · Acknowledgements

WEEK 2 - Algorithmic Warm-up

In this module you will learn that programs based on efficient algorithms can solve the same problem billions of times faster than programs based on naïve algorithms. You will learn how to estimate the running time and memory of an algorithm without even implementing it. Armed with this knowledge, you will be able to compare various algorithms, select the most efficient ones, and finally implement them as our programming challenges!

  • Video · Why Study Algorithms?
  • Video · Coming Up
  • Video · Problem Overview
  • Video · Naive Algorithm
  • Video · Efficient Algorithm
  • Reading · Resources
  • Video · Problem Overview and Naive Algorithm
  • Video · Efficient Algorithm
  • Reading · Resources
  • Video · Computing Runtimes
  • Video · Asymptotic Notation
  • Video · Big-O Notation
  • Video · Using Big-O
  • Reading · Resources
  • Practice Quiz · Logarithms
  • Practice Quiz · Big-O
  • Practice Quiz · Growth rate
  • Video · Course Overview
  • Programming Assignment · Programming Assignment 2: Algorithmic Warm-up

WEEK 3 - Greedy Algorithms

In this module you will learn about seemingly naïve yet powerful class of algorithms called greedy algorithms. After you will learn the key idea behind the greedy algorithms, you may feel that they represent the algorithmic Swiss army knife that can be applied to solve nearly all programming challenges in this course. But be warned: with a few exceptions that we will cover, this intuitive idea rarely works in practice! For this reason, it is important to prove that a greedy algorithm always produces an optimal solution before using this algorithm. In the end of this module, we will test your intuition and taste for greedy algorithms by offering several programming challenges.

  • Video · Largest Number
  • Video · Car Fueling
  • Video · Car Fueling - Implementation and Analysis
  • Video · Main Ingredients of Greedy Algorithms
  • Practice Quiz · Greedy Algorithms
  • Video · Celebration Party Problem
  • Video · Efficient Algorithm for Grouping Children
  • Video · Analysis and Implementation of the Efficient Algorithm
  • Video · Long Hike
  • Video · Fractional Knapsack - Implementation, Analysis and Optimization
  • Video · Review of Greedy Algorithms
  • Reading · Resources
  • Practice Quiz · Fractional Knapsack
  • Programming Assignment · Programming Assignment 3: Greedy Algorithms

WEEK 4 - Divide-and-Conquer

In this module you will learn about a powerful algorithmic technique called Divide and Conquer. Based on this technique, you will see how to search huge databases millions of times faster than using naïve linear search. You will even learn that the standard way to multiply numbers (that you learned in the grade school) is far from the being the fastest! We will then apply the divide-and-conquer technique to design two efficient algorithms (merge sort and quick sort) for sorting huge lists, a problem that finds many applications in practice. Finally, we will show that these two algorithms are optimal, that is, no algorithm can sort faster!

  • Video · Intro
  • Video · Linear Search
  • Video · Binary Search
  • Video · Binary Search Runtime
  • Reading · Resources
  • Practice Quiz · Linear Search and Binary Search
  • Video · Problem Overview and Naïve Solution
  • Video · Naïve Divide and Conquer Algorithm
  • Video · Faster Divide and Conquer Algorithm
  • Reading · Resources
  • Practice Quiz · Polynomial Multiplication
  • Video · What is the Master Theorem?
  • Video · Proof of the Master Theorem
  • Reading · Resources
  • Practice Quiz · Master Theorem
  • Video · Problem Overview
  • Video · Selection Sort
  • Video · Merge Sort
  • Video · Lower Bound for Comparison Based Sorting
  • Video · Non-Comparison Based Sorting Algorithms
  • Reading · Resources
  • Practice Quiz · Sorting
  • Video · Overview
  • Video · Algorithm
  • Video · Random Pivot
  • Video · Running Time Analysis (optional)
  • Video · Equal Elements
  • Video · Final Remarks
  • Reading · Resources
  • Practice Quiz · Quick Sort
  • Programming Assignment · Programming Assignment 4: Divide and Conquer

WEEK 5 - Dynamic Programming 1

In this final module of the course you will learn about the powerful algorithmic technique for solving many optimization problems called Dynamic Programming. It turned out that dynamic programming can solve many problems that evade all attempts to solve them using greedy or divide-and-conquer strategy. There are countless applications of dynamic programming in practice: from maximizing the advertisement revenue of a TV station, to search for similar Internet pages, to gene finding (the problem where biologists need to find the minimum number of mutations to transform one gene into another). You will learn how the same idea helps to automatically make spelling corrections and to show the differences between two versions of the same text.

  • Video · Change Problem
  • Practice Quiz · Change Money
  • Reading · Resources
  • Video · The Alignment Game
  • Video · Computing Edit Distance
  • Video · Reconstructing an Optimal Alignment
  • Practice Quiz · Edit Distance
  • Reading · Resources
  • Programming Assignment · Programming Assignment 5: Dynamic

Programming 1

WEEK 6 - Dynamic Programming 2

In this module, we continue practicing implementing dynamic programming solutions.

  • Video · Problem Overview
  • Practice Quiz · Knapsack
  • Video · Knapsack with Repetitions
  • Video · Knapsack without Repetitions
  • Video · Final Remarks
  • Reading · Resources
  • Video · Problem Overview
  • Practice Quiz · Maximum Value of an Arithmetic Expression
  • Video · Subproblems
  • Video · Algorithm
  • Video · Reconstructing a Solution
  • Programming Assignment · Programming Assignment 6: Dynamic

Programming 2

COURSE 2 - Data Structures

Current session: Jan 15

Commitment
4 weeks of study, 5-10 hours/week
Subtitles
English

WEEK 1 - Basic Data Structures

In this module, you will learn about the basic data structures used throughout the rest of this course. We start this module by looking in detail at the fundamental building blocks: arrays and linked lists. From there, we build up two important data structures: stacks and queues. Next, we look at trees: examples of how they’re used in Computer Science, how they’re implemented, and the various ways they can be traversed. Once you’ve completed this module, you will be able to implement any of these data structures, as well as have a solid understanding of the costs of the operations, as well as the tradeoffs involved in using each data structure.

  • Reading · Welcome
  • Video · Arrays
  • Video · Singly-Linked Lists
  • Video · Doubly-Linked Lists
  • Reading · Slides and External References
  • Video · Stacks
  • Video · Queues
  • Reading · Slides and External References
  • Video · Trees
  • Video · Tree Traversal
  • Reading · Slides and External References
  • Practice Quiz · Basic Data Structures
  • Reading · Available Programming Languages
  • Reading · FAQ on Programming Assignments
  • Programming Assignment · Programming Assignment 1: Basic Data Structures
  • Reading · Acknowledgements

WEEK 2 - Dynamic Arrays and Amortized Analysis

In this module, we discuss Dynamic Arrays: a way of using arrays when it is unknown ahead-of-time how many elements will be needed. Here, we also discuss amortized analysis: a method of determining the amortized cost of an operation over a sequence of operations. Amortized analysis is very often used to analyse performance of algorithms when the straightforward analysis produces unsatisfactory results, but amortized analysis helps to show that the algorithm is actually efficient. It is used both for Dynamic Arrays analysis and will also be used in the end of this course to analyze Splay trees.

  • Video · Dynamic Arrays
  • Video · Amortized Analysis: Aggregate Method
  • Video · Amortized Analysis: Banker’s Method
  • Video · Amortized Analysis: Physicist’s Method
  • Video · Amortized Analysis: Summary
  • Quiz · Dynamic Arrays and Amortized Analysis
  • Reading · Slides and External References

WEEK 3 - Priority Queues and Disjoint Sets

We start this module by considering priority queues which are used to efficiently schedule jobs, either in the context of a computer operating system or in real life, to sort huge files, which is the most important building block for any Big Data processing algorithm, and to efficiently compute shortest paths in graphs, which is a topic we will cover in our next course. For this reason, priority queues have built-in implementations in many programming languages, including C++, Java, and Python. We will see that these implementations are based on a beautiful idea of storing a complete binary tree in an array that allows to implement all priority queue methods in just few lines of code. We will then switch to disjoint sets data structure that is used, for example, in dynamic graph connectivity and image processing. We will see again how simple and natural ideas lead to an implementation that is both easy to code and very efficient. By completing this module, you will be able to implement both these data structures efficiently from scratch.

  • Video · Introduction
  • Video · Naive Implementations of Priority Queues
  • Reading · Slides
  • Video · Binary Trees
  • Reading · Tree Height Remark
  • Video · Basic Operations
  • Video · Complete Binary Trees
  • Video · Pseudocode
  • Reading · Slides and External References
  • Video · Heap Sort
  • Video · Building a Heap
  • Video · Final Remarks
  • Quiz · Priority Queues: Quiz
  • Reading · Slides and External References
  • Video · Overview
  • Video · Naive Implementations
  • Reading · Slides and External References
  • Video · Trees for Disjoint Sets
  • Video · Union by Rank
  • Video · Path Compression
  • Video · Analysis (Optional)
  • Quiz · Quiz: Disjoint Sets
  • Reading · Slides and External References
  • Practice Quiz · Priority Queues and Disjoint Sets
  • Programming Assignment · Programming Assignment 2: Priority Queues and Disjoint Sets

WEEK 4 - Hash Tables

In this module you will learn about very powerful and widely used technique called hashing. Its applications include implementation of programming languages, file systems, pattern search, distributed key-value storage and many more. You will learn how to implement data structures to store and modify sets of objects and mappings from one type of objects to another one. You will see that naive implementations either consume huge amount of memory or are slow, and then you will learn to implement hash tables that use linear memory and work in O(1) on average! In the end, you will learn how hash functions are used in modern disrtibuted systems and how they are used to optimize storage of services like Dropbox, Google Drive and Yandex Disk!

  • Video · Applications of Hashing
  • Video · Analysing Service Access Logs
  • Video · Direct Addressing
  • Video · List-based Mapping
  • Video · Hash Functions
  • Video · Chaining Scheme
  • Video · Chaining Implementation and Analysis
  • Video · Hash Tables
  • Reading · Slides and External References
  • Video · Phone Book Problem
  • Video · Phone Book Problem - Continued
  • Video · Universal Family
  • Video · Hashing Integers
  • Video · Proof: Upper Bound for Chain Length (Optional)
  • Video · Proof: Universal Family for Integers (Optional)
  • Video · Hashing Strings
  • Video · Hashing Strings - Cardinality Fix
  • Reading · Slides and External References
  • Quiz · Hash Tables and Hash Functions
  • Video · Search Pattern in Text
  • Video · Rabin-Karp’s Algorithm
  • Video · Optimization: Precomputation
  • Video · Optimization: Implementation and Analysis
  • Reading · Slides and External References
  • Video · Instant Uploads and Storage Optimization in Dropbox
  • Video · Distributed Hash Tables
  • Reading · Slides and External References
  • Practice Quiz · Hashing
  • Programming Assignment · Programming Assignment 3: Hash Tables

WEEK 5 - Binary Search Trees

In this module we study binary search trees, which are a data structure for doing searches on dynamically changing ordered sets. You will learn about many of the difficulties in accomplishing this task and the ways in which we can overcome them. In order to do this you will need to learn the basic structure of binary search trees, how to insert and delete without destroying this structure, and how to ensure that the tree remains balanced.

  • Video · Introduction
  • Video · Search Trees
  • Video · Basic Operations
  • Video · Balance
  • Reading · Slides and External References
  • Video · AVL Trees
  • Video · AVL Tree Implementation
  • Video · Split and Merge
  • Reading · Slides and External References
  • Practice Quiz · Binary Search Trees

WEEK 6 - Binary Search Trees 2

In this module we continue studying binary search trees. We study a few non-trivial applications. We then study the new kind of balanced search trees - Splay Trees. They adapt to the queries dynamically and are optimal in many ways.

  • Video · Applications
  • Reading · Slides and External References
  • Video · Splay Trees: Introduction
  • Video · Splay Trees: Implementation
  • Video · (Optional) Splay Trees: Analysis
  • Reading · Slides and External References
  • Practice Quiz · Splay Trees
  • Programming Assignment · Programming Assignment 4: Binary Search Trees

COURSE 3 - Algorithms on Graphs

Current session: Jan 15

Commitment
5 weeks of study, 3-4 hours/week
Subtitles
English

WEEK 1 - Decomposition of Graphs 1

Graphs arise in various real-world situations as there are road networks, computer networks and, most recently, social networks! If you’re looking for the fastest time to get to work, cheapest way to connect set of computers into a network or efficient algorithm to automatically find communities and opinion leaders hot in Facebook, you’re going to work with graphs and algorithms on graphs. In this module, you will learn ways to represent a graph as well as basic algorithms for decomposing graphs into parts. In the programming assignment of this module, you will apply the algorithms that you’ve learned to implement efficient programs for exploring mazes, analyzing Computer Science curriculum, and analyzing road networks. In the first week of the module, we focus on undirected graphs.

  • Reading · Welcome
  • Video · Graph Basics
  • Video · Representing Graphs
  • Reading · Slides and External References
  • Video · Exploring Graphs
  • Video · Connectivity
  • Video · Previsit and Postvisit Orderings
  • Reading · Slides and External References
  • Programming Assignment · Programming Assignment 1: Decomposition of Graphs

WEEK 2 - Decomposition of Graphs 2

This week we continue to study graph decomposition algorithms, but now for directed graphs.

  • Video · Directed Acyclic Graphs
  • Video · Topological Sort
  • Video · Strongly Connected Components
  • Video · Computing Strongly Connected Components
  • Reading · Slides and External References
  • Programming Assignment · Programming Assignment 2: Decomposition of Graphs

WEEK 3 - Paths in Graphs 1

In this module you will study algorithms for finding Shortest Paths in Graphs. These algorithms have lots of applications. When you launch a navigation app on your smartphone like Google Maps or Yandex.Navi, it uses these algorithms to find you the fastest route from work to home, from home to school, etc. When you search for airplane tickets, these algorithms are used to find a route with the minimum number of plane changes. Unexpectedly, these algorithms can also be used to determine the optimal way to do currency exchange, sometimes allowing to earh huge profit! We will cover all these applications, and you will learn Breadth-First Search, Dijkstra’s Algorithm and Bellman-Ford Algorithm. These algorithms are efficient and lay the foundation for even more efficient algorithms which you will learn and implement in the Shortest Paths Capstone Project to find best routes on real maps of cities and countries, find distances between people in Social Networks. In the end you will be able to find Shortest Paths efficiently in any Graph. This week we will study Breadth-First Search algorithm.

  • Video · Most Direct Route
  • Video · Breadth-First Search
  • Video · Breadth-First Search (continued)
  • Video · Implementation and Analysis
  • Video · Proof of Correctness
  • Video · Proof of Correctness (continued)
  • Video · Shortest-Path Tree
  • Video · Reconstructing the Shortest Path
  • Reading · Slides and External References
  • Programming Assignment · Programming Assignment 3: Paths in Graphs

WEEK 4 - Paths in Graphs 2

This week we continue to study Shortest Paths in Graphs. You will learn Dijkstra’s Algorithm which can be applied to find the shortest route home from work. You will also learn Bellman-Ford’s algorithm which can unexpectedly be applied to choose the optimal way of exchanging currencies. By the end you will be able to find shortest paths efficiently in any Graph.

  • Video · Fastest Route
  • Video · Naive Algorithm
  • Video · Dijkstra’s Algorithm: Intuition and Example
  • Video · Dijkstra’s Algorithm: Implementation
  • Video · Dijkstra’s Algorithm: Proof of Correctness
  • Video · Dijkstra’s Algorithm: Running Time
  • Reading · Slides and External References
  • Video · Currency Exchange
  • Video · Currency Exchange: Reduction to Shortest Paths
  • Video · Bellman-Ford Algorithm
  • Video · Bellman-Ford Algorithm: Proof of Correctness
  • Video · Negative Cycles
  • Video · Infinite Arbitrage
  • Reading · Slides and External References
  • Programming Assignment · Programming Assignment 4: Paths in Graphs

WEEK 5 - Minimum Spanning Trees

In this module, we study the minimum spanning tree problem. We will cover two elegant greedy algorithms for this problem: the first one is due to Kruskal and uses the disjoint sets data structure, the second one is due to Prim and uses the priority queue data structure. In the programming assignment for this module you will be computing an optimal way of building roads between cities and an optimal way of partitioning a given set of objects into clusters (a fundamental problem in data mining).

  • Video · Building a Network
  • Video · Greedy Algorithms
  • Video · Cut Property
  • Video · Kruskal’s Algorithm
  • Video · Prim’s Algorithm
  • Reading · Slides and External References
  • Programming Assignment · Programming Assignment 5: Minimum Spanning Trees

WEEK 6 - Advanced Shortest Paths Project (Optional)

In this module, you will learn Advanced Shortest Paths algorithms that work in practice 1000s (up to 25000) of times faster than the classical Dijkstra’s algorithm on real-world road networks and social networks graphs. You will work on a Programming Project based on these algorithms. You will find the shortest paths on the real maps of parts of US and the shortest paths connecting people in the social networks. We encourage you not only to use the ideas from this module’s lectures in your implementations, but also to come up with your own ideas for speeding up the algorithm! We encourage you to compete on the forums to see whose implementation is the fastest one :)

  • Video · Programming Project: Introduction
  • Video · Bidirectional Search
  • Video · Six Handshakes
  • Video · Bidirectional Dijkstra
  • Video · Finding Shortest Path after Meeting in the Middle
  • Video · Computing the Distance
  • Reading · Slides and External References
  • Video · A* Algorithm
  • Video · Performance of A*
  • Video · Bidirectional A*
  • Video · Potential Functions and Lower Bounds
  • Video · Landmarks (Optional)
  • Reading · Slides and External References
  • Video · Highway Hierarchies and Node Importance
  • Video · Preprocessing
  • Video · Witness Search
  • Video · Query
  • Video · Proof of Correctness
  • Video · Node Ordering
  • Reading · Slides and External Refernces
  • Practice Quiz · Bidirectional Dijkstra, A* and Contraction

Hierarchies

  • Practice Programming Assignment · Advanced Shortest Paths

COURSE 4 - Algorithms on Strings

Current session: Jan 15

Commitment
4 weeks of study, 4-8 hours/week
Subtitles
English

WEEK 1 - Suffix Trees

How would you search for a longest repeat in a string in LINEAR time? In 1973, Peter Weiner came up with a surprising solution that was based on suffix trees, the key data structure in pattern matching. Computer scientists were so impressed with his algorithm that they called it the Algorithm of the Year. In this lesson, we will explore some key ideas for pattern matching that will - through a series of trials and errors - bring us to suffix trees.

  • Video · From Genome Sequencing to Pattern Matching
  • Video · Brute Force Approach to Pattern Matching
  • Video · Herding Patterns into Trie
  • Reading · Trie Construction - Pseudocode
  • Video · Herding Text into Suffix Trie
  • Video · Suffix Trees
  • Reading · FAQ
  • Reading · Slides and External References
  • Reading · Available Programming Languages
  • Reading · FAQ on Programming Assignments
  • Programming Assignment · Programming Assignment 1

WEEK 2 - Burrows-Wheeler Transform and Suffix Arrays

Although EXACT pattern matching with suffix trees is fast, it is not clear how to use suffix trees for APPROXIMATE pattern matching. In 1994, Michael Burrows and David Wheeler invented an ingenious algorithm for text compression that is now known as Burrows-Wheeler Transform. They knew nothing about genomics, and they could not have imagined that 15 years later their algorithm will become the workhorse of biologists searching for genomic mutations. But what text compression has to do with pattern matching??? In this lesson you will learn that the fate of an algorithm is often hard to predict – its applications may appear in a field that has nothing to do with the original plan of its inventors.

  • Video · Burrows-Wheeler Transform
  • Video · Inverting Burrows-Wheeler Transform
  • Video · Using BWT for Pattern Matching
  • Reading · Using BWT for Pattern Matching
  • Video · Suffix Arrays
  • Reading · Pattern Matching with Suffix Array
  • Video · Approximate Pattern Matching
  • Reading · FAQ
  • Reading · Slides and External References
  • Programming Assignment · Programming Assignment 2

WEEK 3 - Knuth–Morris–Pratt Algorithm

Congratulations, you have now learned the key pattern matching concepts: tries, suffix trees, suffix arrays and even the Burrows-Wheeler transform! However, some of the results Pavel mentioned remain mysterious: e.g., how can we perform exact pattern matching in O(|Text|) time rather than in O(|Text|*|Pattern|) time as in the naïve brute force algorithm? How can it be that matching a 1000-nucleotide pattern against the human genome is nearly as fast as matching a 3-nucleotide pattern??? Also, even though Pavel showed how to quickly construct the suffix array given the suffix tree, he has not revealed the magic behind the fast algorithms for the suffix tree construction!In this module, Miсhael will address some algorithmic challenges that Pavel tried to hide from you :) such as the Knuth-Morris-Pratt algorithm for exact pattern matching and more efficient algorithms for suffix tree and suffix array construction.

  • Video · Exact Pattern Matching
  • Video · Safe Shift
  • Video · Prefix Function
  • Video · Computing Prefix Function
  • Video · Knuth-Morris-Pratt Algorithm
  • Quiz · Exact Pattern Matching
  • Reading · Programming Assignment 3 lasts for two weeks
  • Reading · Slides and External References

WEEK 4 - Constructing Suffix Arrays and Suffix Trees

In this module we continue studying algorithmic challenges of the string algorithms. You will learn an O(n log n) algorithm for suffix array construction and a linear time algorithm for construction of suffix tree from a suffix array. You will also implement these algorithms and the Knuth-Morris-Pratt algorithm in the last

Programming Assignment in this course.

  • Video · Suffix Array
  • Video · General Strategy
  • Video · Initialization
  • Reading · Counting Sort
  • Video · Sort Doubled Cyclic Shifts
  • Video · SortDouble Implementation
  • Video · Updating Classes
  • Video · Full Algorithm
  • Reading · Slides and External References
  • Quiz · Suffix Array Construction
  • Video · Suffix Array and Suffix Tree
  • Video · LCP Array
  • Video · Computing the LCP Array
  • Reading · Computing the LCP Array - Additional Slides
  • Video · Construct Suffix Tree from Suffix Array and LCP Array
  • Reading · Suffix Tree Construction - Pseudocode
  • Reading · Slides and External References
  • Programming Assignment · Programming Assignment 3

***

COURSE 5 - Advanced Algorithms and Complexity

Current session: Jan 15

Commitment
4 weeks of study, 4-8 hours/week
Subtitles
English

WEEK 1 - Flows in Networks

Network flows show up in many real world situations in which a good needs to be transported across a network with limited capacity. You can see it when shipping goods across highways and routing packets across the internet. In this unit, we will discuss the mathematical underpinnings of network flows and some important flow algorithms. We will also give some surprising examples on seemingly unrelated problems that can be solved with our knowledge of network flows.

  • Reading · Slides and Resources on Flows in Networks
  • Video · Introduction
  • Video · Network Flows
  • Video · Residual Networks
  • Video · Maxflow-Mincut
  • Video · The Ford–Fulkerson Algorithm
  • Video · Slow Example
  • Video · The Edmonds–Karp Algorithm
  • Video · Bipartite Matching
  • Video · Image Segmentation
  • Quiz · Flow Algorithms
  • Reading · Available Programming Languages
  • Reading · FAQ on Programming Assignments
  • Programming Assignment · Programming Assignment 1

WEEK 2 - Linear Programming

Linear programming is a very powerful algorithmic tool. Essentially, a linear programming problem asks you to optimize a linear function of real variables constrained by some system of linear inequalities. This is an extremely versatile framework that immediately generalizes flow problems, but can also be used to discuss a wide variety of other problems from optimizing production procedures to finding the cheapest way to attain a healthy diet. Surprisingly, this very general framework admits efficient algorithms. In this unit, we will discuss some of the importance of linear programming problems along with some of the tools used to solve them.

  • Reading · Slides and Resources on Linear Programming
  • Video · Introduction
  • Video · Linear Programming
  • Video · Linear Algebra: Method of Substitution
  • Video · Linear Algebra: Gaussian Elimination
  • Video · Convexity
  • Video · Duality
  • Video · (Optional) Duality Proofs
  • Video · Linear Programming Formulations
  • Video · The Simplex Algorithm
  • Video · (Optional) The Ellipsoid Algorithm
  • Quiz · Linear Programming Quiz
  • Programming Assignment · Programming Assignment 2

WEEK 3 - NP-complete Problems

Although many of the algorithms you’ve learned so far are applied in practice a lot, it turns out that the world is dominated by real-world problems without a known provably efficient algorithm. Many of these problems can be reduced to one of the classical problems called NP-complete problems which either cannot be solved by a polynomial algorithm or solving any one of them would win you a million dollars (see Millenium Prize Problems) and eternal worldwide fame for solving the main problem of computer science called P vs NP. It’s good to know this before trying to solve a problem before the tomorrow’s deadline :) Although these problems are very unlikely to be solvable efficiently in the nearest future, people always come up with various workarounds. In this module you will study the classical NP-complete problems and the reductions between them. You will also practice solving large instances of some of these problems despite their hardness using very efficient specialized software based on tons of research in the area of NP-complete problems.

  • Reading · Slides and Resources on NP-complete Problems
  • Video · Brute Force Search
  • Video · Search Problems
  • Video · Traveling Salesman Problem
  • Video · Hamiltonian Cycle Problem
  • Video · Longest Path Problem
  • Video · Integer Linear Programming Problem
  • Video · Independent Set Problem
  • Video · P and NP
  • Video · Reductions
  • Video · Showing NP-completeness
  • Video · Independent Set to Vertex Cover
  • Video · 3-SAT to Independent Set
  • Video · SAT to 3-SAT
  • Video · Circuit SAT to SAT
  • Video · All of NP to Circuit SAT
  • Video · Using SAT-solvers
  • Quiz · NP-complete Problems
  • Programming Assignment · Programming Assignment 3

WEEK 4 - Coping with NP-completeness

After the previous module you might be sad: you’ve just went through 5 courses in Algorithms only to learn that they are not suitable for most real-world problems. However, don’t give up yet! People are creative, and they need to solve these problems anyway, so in practice there are often ways to cope with an NP-complete problem at hand. We first show that some special cases on NP-complete problems can, in fact, be solved in polynomial time. We then consider exact algorithms that find a solution much faster than the brute force algorithm. We conclude with approximation algorithms that work in polynomial time and find a solution that is close to being optimal.

  • Reading · Slides and Resources on Coping with NP-completeness
  • Video · Introduction
  • Video · 2-SAT
  • Video · 2-SAT: Algorithm
  • Video · Independent Sets in Trees
  • Video · 3-SAT: Backtracking
  • Video · 3-SAT: Local Search
  • Video · TSP: Dynamic Programming
  • Video · TSP: Branch and Bound
  • Video · Vertex Cover
  • Video · Metric TSP
  • Video · TSP: Local Search
  • Quiz · Coping with NP-completeness
  • Programming Assignment · Programming Assignment 4

WEEK 5 - Streaming Algorithms (Optional)

In most previous lectures we were interested in designing algorithms with fast (e.g. small polynomial) runtime, and assumed that the algorithm has random access to its input, which is loaded into memory. In many modern applications in big data analysis, however, the input is so large that it cannot be stored in memory. Instead, the input is presented as a stream of updates, which the algorithm scans while maintaining a small summary of the stream seen so far. This is precisely the setting of the streaming model of computation, which we study in this lecture. The streaming model is well-suited for designing and reasoning about small space algorithms. It has received a lot of attention in the literature, and several powerful algorithmic primitives for computing basic stream statistics in this model have been designed, several of them impacting the practice of big data analysis. In this lecture we will see one such algorithm (CountSketch), a small space algorithm for finding the top k most frequent items in a data stream.

  • Video · Introduction
  • Video · Heavy Hitters Problem
  • Video · Reduction 1
  • Video · Reduction 2
  • Video · Basic Estimate 1
  • Video · Basic Estimate 2
  • Video · Final Algorithm 1
  • Video · Final Algorithm 2
  • Video · Proofs 1
  • Video · Proofs 2
  • Practice Quiz · Quiz: Heavy Hitters
  • Practice Programming Assignment · (Optional) Programming Assignment 5

COURSE 6 - Genome Assembly Programming Challenge

Upcoming session: Jan 22

Subtitles
English

WEEK 1 - The 2011 European E. coli Outbreak

In April 2011, hundreds of people in Germany were hospitalized with a deadly disease that often started as food poisoning with bloody diarrhea. It was the beginning of the deadliest outbreak in recent history, caused by a mysterious bacterial strain that we will refer to as E. coli X. Within a few months, the outbreak had infected thousands and killed 53 people. To prevent the further spread of the outbreak, computational biologists all over the world had to answer the question “What is the genome sequence of E. coli X?” in order to figure out what new genes it acquired to become pathogenic. The 2011 German outbreak represented an early example of epidemiologists collaborating with computational biologists to stop an outbreak. In this Genome Assembly Programming Challenge, you will follow in the footsteps of the bioinformaticians investigating the outbreak by developing a program to assemble the genome of the deadly E. coli X strain. However, before you embark on building a program for assembling the E. coli X strain, we have to explain some genomic concepts and warm you up by having you solve a simpler problem of assembling a small virus.

  • Video · 2011 European E. coli outbreak
  • Video · Assembling phage genome
  • Reading · What does it mean to assemble a genome?
  • Reading · Project Description
  • Programming Assignment · Programming Assignment 1: Assembling the phi174X Genome Using Overlap Graphs

WEEK 2 - Assembling Genomes Using de Bruijn Graphs

DNA sequencing approach that led to assembly of a small virus in 1977 went through a series of transformations that contributed to the emergence of personalized medicine a few years ago. By the late 1980s, biologists were routinely sequencing viral genomes containing hundreds of thousands of nucleotides, but the idea of sequencing a bacterial (let alone the human) genome containing millions (or even billions) of nucleotides remained preposterous and would cost billions of dollars. In 1988, three biologists (independently and simultaneously!) came up with an idea to reduce sequencing cost and proposed the futuristic and at the time completely implausible method of DNA arrays. None of these three biologists could have possibly imagined that the implications of his own experimental research would eventually bring him face-to-face with challenging algorithmic problems. In this module you will learn about the algorithmic challenge of DNA sequencing using information about short k-mers provided by DNA arrays. You will also travel to the 18the century to learn about the Bridges of Konigsberg and solve a related problem of assembling a jigsaw puzzle!

  • Video · DNA arrays
  • Video · Assembling genomes from k-mers
  • Video · De Bruijn graphs
  • Video · Bridges of Königsberg and universal strings
  • Video · Euler theorem
  • Programming Assignment · Programming Assignment 2: Assembling the phi174X Genome Using De Bruijn Graphs

WEEK 3 - Genome Assembly Faces Real Sequencing Data

Our discussion of genome assembly has thus far relied upon various assumptions. In this module, we will face practical challenges introduced by quirks in modern sequencing technologies and discuss some algorithmic techniques that have been devised to address these challenges. Afterwards, you will assemble the smallest bacterial genome that lives symbiotically inside leafhoppers. Its sheltered life has allowed it to reduce its genome to only about 112,091 nucleotides and 137 genes. And afterwards, you will be ready to assemble the E. coli X genome!

  • Video · Splitting the genome into contigs
  • Video · From reads to read-pairs
  • Video · Genome assembly faces real sequencing data
  • Programming Assignment · Programming Assignment 3: Genome Assembly Faces Real Sequencing Data
@rebelliard
Copy link

How did you generate this? I'd like to do this for other courses.

@sirenko
Copy link
Author

sirenko commented Dec 5, 2018

@rebelliard:
FWIW, roughly, it was done in the following way:

  • open the page you want to convert in your browser of choice, in my case it was Chrome
  • open developer tools and locate tag having the content you want to convert. You can use pointer "select element to inspect for it". Usually desirable content is enclosed into the "<div>..</div>" tag.
  • point the mouse's cursor at the desirable block, and with the right button select from the drop-down menu "Copy > Copy Element"
  • in the text browser of your choice, create valid html document with empty body
  • paste the copied ("<div>") block into the body and save it
  • with the pandoc (can be installed from https://github.com/jgm/pandoc) convert saved html document into the .org (in my case, but you may prefer markdown target format) with the following command:
pandoc --from html-raw_html-native_divs-native_spans saved.html -t org --wrap=none > result.org
  • clean up result .org file in the editor of your choice. In my case it was Emacs with multiple-cursors package.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment