chrswt/toolbox_week2.md

## toolbox_week2.md

      
    Raw
  

              toolbox_week2.md
            
          
    Back to Overview of Toolbox

Fibonacci numbers

Problem overview

The fibonacci series is a classic series of numbers where the 0th element is 0, the 1st element is 1, and from thereon, each element is the sum of the previous two elements. For example, the following represents a fibonacci series: 0, 1, 1, 2, 3, 5, 8, 13, 34, ...
Naive algorithm

def fibonacci_recursive(n):
    """Calculate the n'th fibonacci number recursively."""
    if n <= 1:
        return n
    else:
        return fibonacci_recursive(n-1) + fibonacci_recursive(n-2)
This algorithm is slow because we are re-computing the same fibonacci number many times. In fact, the growth rate is similar to the growth rate of fibonacci numbers, which is exponential. Where T(n) represents the number of lines of code that must be run, this algorithm grows at the rate of: T(n) = 3 + T(n - 1) + T(n - 2).
Efficient algorithm

def fibonacci_list(n):
    """Calculate the n'th fibonacci iteratively by generating a list to n."""
    series = [None for _ in range(n)]
    series[0], series[1] = 0, 1

    for i in range(2, n):
        series[i] = series[i-1] + series[i-2]

    return series[n]
This algorithm grows by a constant factor, which allows large fibonacci numbers to be computed efficiently. Where T(n) represents the number of lines of code that must be run, this algorithm grows at the rate of: T(n) = 2n + 2.
Additional resources


Algorithms (1st edition) by Dasgupta, Papadimitrious, Vazirani DPV


Computing fibonacci numbers: section 0.2
Properties of fibonacci numbers: exercises 0.2-0.4


Algorithms by Cormen, Balkcom - section on recursion
Computing fibonacci numbers visualization

Greatest common divisor

Problem overview and naive algorithm

For integers, a and b, their greatest common divisor or gcd(a, b) is the largest integer d so that d divides both a and b.

Input: Integers a, b >= 0
Output: gcd(a, b)
We want to be able to run this on very large numbers

def gcd_naive(a, b):
    """Calculate the gcd of two ints: a, b using a brute force algorithm."""
    best = 0

    for d in range(1, a+b + 1):
        if a % d == 0 and b % d == 0:
            best = d

    return best
However, this algorithm is inefficient because the runtime is approximately a + b, which would be too inefficient for large numbers.
Efficient algorithm

The efficient implementation of gcd() lies in the following key lemma:

Let a' be the remainder when a is divided by b, then

gcd(a, b) = gcd(a', b) = gcd(b, a')

Proof (sketch):

a = a' + bq for some q

d divides a and b if and only if it divides a' and b

Euclid's algorithm is based on this idea that the common divisors of a and b are exactly the same as the common divisors of a' and b. Therefore, the gcd of a and b is the gcd of a' and b. a' is generally smaller than a, which allows us to compute a new gcd recursively for an easier problem.
def gcd_euclid(a, b):
    """Calculate the gcd of two ints: a, b using euclid's algorithm."""

    # Everything divides 0 so we just need the biggest int that divides a
    if b == 0:
        return a
    
    a_prime = a % b
    return gcd_euclid(b, a_prime)
Here are the steps the above algorithm would take for finding the gcd of two large numbers:
>> gcd(3918848, 1653264)
>> gcd(1653264, 612320)
>> gcd(612320, 428624)
>> gcd(428624, 183696)
>> gcd(183696, 61232)
>> gcd(61232, 0)
61232
Each step reduces the size of numbers by about a factor of 2. This results in a runtime of approximately log(ab) steps.
Additional resources


DPV


Greatest common divisor: section 1.2.3


Introduction to Algorithms (3rd edition) by Cormen, Leiserson, Rivest, Stein CLRS


Greatest common divisor: section 31.2


Khan Academy introduction to gcd

Big-O notation

Asymptotic notation

The key idea is that low-level run time analysis can result in confounding factors that multiply runtimes by a (large) constant, so we seek to measure runtime in a way that ignores constant multiples. More specifically, we consider asymptotic runtimes: how does runtime scale with input size? If we have two runtimes: n and n^2, as long as n is sufficiently large, the runtime of n^2 will always be worse than n for any constant multiple.

log n < sqrt(n) < n < n log n < n^2 < 2^n

Big-O notation

The definition of Big-O is:
f(n) = O(g(n)) (f is Big-O of g) or f <= g if there exist constants N and c such that for all n >= N: f(n) <= c * g(n).
All this means is that for sufficiently large inputs, f is bounded above by some constant multiple of g.
Warnings when using Big-O:

Using Big-O loses important information about constant multiples.
Big-O is only asymptotic (in the sense that it only tells us what happens when n is large, but nothing about the runtime given a specific input).

Using big-O

Some of the common rules for applying Big-O notation to analysis are:

Multiplicative constants can be omitted (e.g. 7n^3 = O(n^3)).
Larger exponents grow faster (n^a < n^b for 0 < a < b).
Any exponent grows faster than any polynomial (n^a < b^n for a > 0, b > 1).
Any power of log n grows slower than any power of n ((log n)^a < n^b for a, b > 0).
Smaller terms in a sum of terms can be omitted (e.g. n^2 + n = O(n^2)).

Additional resources


DPV


Big-O notation and growth rate: section 0.3


Khan Academy big-O notation
Khan Academy intro to logarithms


Programming assignment 1

The handout for this programming assignment is available here.
Key excerpts:

Modern computers perform roughly 10^8 - 10^9 operations per second. So, if the maximum size of a dataset in the problem description is n = 10^5, then an algorithm with a quadratic running time would take approximately n^2 = 10^10 operations which would take > 1 second.
To test your program's ability to process large datasets, it makes sense to implement your algorithm as a function like solve(dataset) and then implement an additional procedure generate() that produces a large dataset within the bounds of the problem statement.

Small fibonacci number


Task: Given an integer n, find the nth Fibonacci number F_n.
Input format: The input consists of a single integer n.
Constraints: 0 <= n <= 45.
Output format: Output F_n.
Time limit: Python - 5s.
Memory limit: 512MB.

#!/usr/bin/env python3
def get_nth_fibonacci(n):
    """Calculate the n'th fibonacci number using an iterative generator."""
    if n <= 1:
        return n

    previous, current = 0, 1

    for _ in range(n - 1):
        previous, current = current, previous + current

    return current

if __name__ == '__main__':
    n = int(input())
    print(get_nth_fibonacci(n))
Last digit of a large fibonacci number


Task: Given an integer n, find the last digit of the nth Fibonacci number F_n (that is, F_n % 10).
Input format: The input consists of a single integer n.
Constraints: 0 <= n <= 10^7.
Output format: Output the last digit of F_n.
Time limit: Python - 5s.
Memory limit: 512MB.

#!/usr/bin/env python3
def get_nth_fib_last_digit(n):
    """Calculate last digit of n'th fib using an iterative generator."""
    if n <= 1:
        return n

    previous, current = 0, 1

    for _ in range(n - 1):
        previous, current = current % 10, (previous + current) % 10

    return current

if __name__ == '__main__':
    n = int(input())
    print(get_nth_fib_last_digit(n))
Greatest common divisor


Task: Given two integers a and b, find their greatest common divisor.
Input format: The two integers a, b are given in the same line separated by a space.
Constraints: 1 <= a, b <= 2 * 10^9.
Output format: Output gcd(a, b).
Time limit: Python - 5s.
Memory limit: 512MB.

#!/usr/bin/env python3
def gcd(a, b):
    """Calculate the gcd of 2 int's: a, b with euclid's algorithm."""
    if b == 0:
        return a

    a_prime = a % b
    return gcd(b, a_prime)

if __name__ == '__main__':
    a, b = map(int, input().split())
    print(gcd(a, b))
Least common multiple

The least common multiple of two positive integers a and b is the least positive integer m that is divisible by both a and b.

Task: Given two integers a and b, find their least common multiple.
Input format: The two integers a and b are given in the same line separated by a space.
Constraints: 1 <= a, b <= 2 * 10^9.
Output format: Output the least common multiple of a and b.
Time limit: Python - 5s.
Memory limit: 512MB.


For any two positive integers a and b, lcm(a, b) * gcd(a, b) = ab.

#!/usr/bin/env python3
from gcd import gcd

def lcm(a, b):
    """Calculate the lowest common multiple of 2 int's: a, b using `gcd()`."""
    return a * b // gcd(a, b)

if __name__ == '__main__':
    a, b = map(int, input().split())
    print(lcm(a, b))
(Advanced) Huge fibonacci number modulo m

Your goal is to compute F_n % m, where n may be really huge (up to 10^18), such that an algorithm looping for n iterations will not meet the time requirements. If we examine how F_i % m is distributed for small m's, we observe that the sequences are periodic.
  i      0   1   2   3   4   5   6   7   8   9   10   11   12   13   14   15 
------------------------------------------------------------------------------
  F_i    0   1   1   2   3   5   8   13  21  34  55   89   144  233  377  610
F_i % 2  0   1   1   0   1   1   0   1   1   0   1    1    0    1    1    0
F_i % 3  0   1   1   2   0   2   2   1   0   1   1    2    0    2    2    1

For m = 2, the period is 011 and has length 3. For m = 3, the period is 01120221 and has length 8. Therefore, to compute F_2015 % 3, we just need to find 2015 % 8. Since 2015 % 8 = 7, we conclude that F_2015 % 3 = F_7 % 3 = 1. This is true in general: for any integer m >= 2, the sequence F_n % m is periodic. The period always starts with 01 and is known as the Pisano period.

Task: Given two integers n and m, output F_n % m (that is, the remainder of F_n when divided by m).
Input format: The input consists of two integers n and m given on the same line (separated by a space).
Constraints: 1 <= n <= 10^18, 2 <= m <= 10^5.
Output format: Output F_n % m.
Time limit: Python - 5s.
Memory limit: 512MB.

#!/usr/bin/env python3
def get_nth_fib_mod_m(n, m):
    """Calculate the n'th fib number modulo m using Pisano periods."""
    fib_mod_m = [0, 1]
    found_sequence = False
    sequence_length = 0

    # Search for Pisano period by trying to find '01' (starting sequence)
    while not found_sequence:
        next_number = (fib_mod_m[-1] + fib_mod_m[-2]) % m

        if fib_mod_m[-1] == 0 and next_number == 1:
            fib_mod_m.pop()
            found_sequence = True
            sequence_length = len(fib_mod_m)
        else:
            fib_mod_m.append(next_number)

    return fib_mod_m[n % sequence_length]

if __name__ == '__main__':
    n, m = map(int, input().split())
    print(get_nth_fib_mod_m(n, m))
(Advanced) Sum of fibonacci numbers


Task: Given an integer n, find the last digit of the sum F_0 + F_1 + ... + F_n.
Input format: The input consists of a single integer n.
Constraints: 0 <= n <= 10^14.
Output format: Output the last digit of F_0 + F_1 + ... + F_n.
Time limit: Python - 5s.
Memory limit: 512MB.


We can observe that the sum till the nth Fibonacci term is = F_(n+2) - 1.

Since we want the last digit, we need to mod the output by 10. We know from the previous implementation that the length of the Pisano period for m = 10 is 60, so we can search for the last digit in the series at position (n + 2) % 60.

#!/usr/bin/env python3
from fibonacci_huge import get_nth_fib_mod_m

def get_fibonacci_sum_last_digit(n):
    """Find last digit of sum from 0'th fib number to n'th fib number."""

    # Sum of fib sequence to n = F_(n+2) - 1.
    # Since we want the last digit, we mod the result by 10, we also know that
    # the length of the pisano sequence for m = 10 is 60, so we only need to
    # look in the first 60 entries (the sequence repeats itself).
    # When subtracting 1, we need to mod by 10 again to ensure 0's become 9's.
    return (get_nth_fib_mod_m((n + 2) % 60, 10) - 1) % 10

if __name__ == '__main__':
    n = int(input())
    print(get_fibonacci_sum_last_digit(n))
(Advanced) Partial sum of fibonacci numbers


Task: Given two non-negative integers m and n, where m <= n, find the last digit of the sum F_m + F_(m+1) + ... + F_n.
Input format: The input consists of two non-negative integers m and n separated by a space.
Constraints: 0 <= m <= n <= 10^18.
Output format: Output the last digit of F_m + F_(m+1) + ... + F_n.
Time limit: Python - 5s.
Memory limit: 512MB.

#!/usr/bin/env python3
from fibonacci_sum_last_digit import get_fibonacci_sum_last_digit

def get_fibonacci_partial_sum(m, n):
    """Find the last digit of sum from m'th fib number to n'th fib number."""

    # Last digit of sequence m..n is last digit of sum to n minus the last
    # digit of sum to m-1 (because we want to include m in the sequence). If
    # the number is negative, we need to round it back up via imaginary tens.
    digit_difference = (get_fibonacci_sum_last_digit(n) -
                        get_fibonacci_sum_last_digit(m-1))
    if digit_difference < 0:
        digit_difference += 10

    return digit_difference

if __name__ == '__main__':
    m, n = map(int, input().split())
    print(get_fibonacci_partial_sum(m, n))

Last (Week 1 of Algorithmic Toolbox) | Next (Week 3 of Algorithmic Toolbox)