Skip to content

Instantly share code, notes, and snippets.

@dosentmatter
Last active August 22, 2020 09:12
Show Gist options
  • Save dosentmatter/7dd5b3612eb5243f78fe57cbb59e2c15 to your computer and use it in GitHub Desktop.
Save dosentmatter/7dd5b3612eb5243f78fe57cbb59e2c15 to your computer and use it in GitHub Desktop.
Leetcode 283. "Move Zeroes" Solutions 2/3 Analysis
I don't think Solution 2 is always better than Solution 3
First of all, it doesn't matter unless n is large, since both are O(n).
For both Solutions 2 and 3, you can skip operations when there are non-zeros at the beginning of the array by doing the following:
For Solution 2, you can skip moving when `lastNonZeroFoundAt == i`
For Solution 3, you can skip swapping when `lastNonZeroFoundAt == cur`
For Solution 2, even if you do a memset, that is an O(n) because it has to write each element to 0.
For Solution 3, we can say that swap is a 3 cost operation since it does 3 writes including the temporary variable.
---
Let's calculate the cost for the general case:
Assumptions:
1. Each loop iteration costs 1
2. Reading is 0 cost
3. Comparisons are 0 cost
4. Incrementing `lastNonZeroFoundAt` is 0 cost since both solutions increment it by the same amount (ie. the number of non-zeros) anyway
5. Writing is 1 cost
6. Swapping is 3 costs because it consists of 3 writes.
Let n be the number of elements in the array
Let C be the number of zeros.
Let k be the number of non-zeros in the incorrect location.
To explain k:
[1, 2, 3] - k = 0 because no zeros
[0, 1, 2, 3] - k = 3 because 1, 2, 3 need to be moved to indices 0, 1, 2 respectively
[0, 0, 1, 2, 3] - k = 3, because 1, 2, 3 need to be moved to indices 0, 1, 2 respectively
[0, 1, 2, 0, 3] - k = 3, because 1, 2, 3 need to be moved to indices 0, 1, 2 respectively
Notice that another way to think about k is "the number of non-zeros with at
least one zero before it". This is because if there is at least one zero
before a non-zero, it is out of place. More than one zero before a non-zero
does not change anything. That will only put more zeros at the end. This
knowledge will come in handy later.
---
Cost of Solution 2 for the general case:
For the first loop, we have:
Cost(first loop) = Cost(first loop iterations) + Cost(first loop moves ie. writes)
since k of the elements need to be moved:
Cost(first loop) = Cost(n iterations) + Cost(k moves)
Cost(first loop) = n + k
For the second loop (memset), we have:
Cost(second loop) = Cost(second loop iterations) + Cost(second loop writes)
Cost(second loop) = Cost(C iterations) + Cost(C writes)
Cost(second loop) = C + C
Cost(second loop) = 2*C
So in total,
Cost(Solution 2) = Cost(first loop) + Cost(second loop)
Cost(Solution 2) = n + k + 2*C
Cost of Solution 3 for the general case:
Cost(Solution 3) = Cost(loop iterations) + Cost(loop swaps)
Cost(Solution 3) = Cost(n iterations) + Cost(k swaps)
Since a swap costs 3,
Cost(Solution 3) = n + 3*k
---
So let's see when Solution 3 is better than or equal to Solution 2:
Cost(Solution 2) >= Cost(Solution 3)
n + k + 2*C >= n + 3*k
2*C >= 2*k
C >= k
k <= C
So this means that Cost(Solution 3) <= Cost(Solution 2) when k <= C.
k <= C means the number of non-zeros in the incorrect location is <= the number of zeros.
It would take O(n) to determine k or C.
For C, you O(n) loop to count number of zeros.
For k, you O(n) loop to find the first zero and then count the number of non-zeros found afterwards.
Even though you know the optimal Solution to use after counting, it might not
be worth it to do an O(n) count, so let's do some probability.
---
So now let's do some probability and calculate what we expect k or C to be
given the following input:
1. We generate an array of length n.
2. Each element is generated independently with a probability p to be a non-zero, else zero.
What is the expected value of C and k? That is E[C] and E[k]?
If E[k] <= E[C], probability would suggest Solution 3 would be better
---
E[C]:
Let C_i be a random variable on whether the element at index i is a zero.
Because of Linearity of Expectation,
E[C] = E[C_0] + E[C_1] + ... E[C_n-1]
There is a (1 - p) chance to generate a zero.
Since each element can contribute at most 1 zero with probability (1 - p).
E[C_i] = (1 - p) * 1
E[C] = (1 - p) * 1 + (1 - p) * 1 + ... } n times
E[C] = (1 - p) * n
---
E[k]:
This one is a little trickier
Again, we can use Linearity of Expecation,
Let k_i be a random variable on whether the element at index i is a non-zero in the incorrect location.
Mentioned above, I said another way to think about this is,
"a non-zero with at least one zero before it"
E[k] = E[k_0] + E[k_1] + ... E[k_n-1]
There is a p chance to generate a non-zero
Each element can contribute at most 1 incorrect non-zero, but what is the
probability of an incorrect non-zero?
The probability of an incorrect non-zero, is the probability of generating a
non-zero with at least one zero before it.
Let P mean "the probability of"
P(having at least one zero) = (1 - P(having all non-zeros))
P(incorrect non-zero at index i)
= P(non-zero at index i with at least one zero before it)
Since the following two are independent events, they can be factored:
= P(non-zero at index i AND having at least one zero before index i)
= P(non-zero at index i) * P(having at least one zero before index i)
Since i is indexed starting at 0, there are i elements before index i and we can plug into above formulas:
= P(non-zero) * P(having at least one zero in i elements)
= P(non-zero) * (1 - P(having i non-zeros))
= p * (1 - p**i)
Since each element can contribute at most 1 incorrect non-zero,
E[k_i] = (p * (1 - p**i)) * 1
E[k_i] = p * (1 - p**i)
E[k_i] = p - p**(i + 1)
E[k] = E[k_0] + E[k_1] + ... E[k_n-1]
E[k] = (p - p**(0 + 1)) + (p - p**(1+1)) + ... (p - p**(n - 1 + 1))
E[k] = (p - p**1) + (p - p**2) + ... (p - p**n)
---
I took some time to analyze this with graphs and stuff, so I will just prove some assumptions:
p is a probability so it is in the range [0, 1]
Proposition: "If 0 <= p <= 0.5, then E[k] <= E[C]"
Lets prove by induction on n:
Proof:
Let P(n), where n is the number of elements in the array, be the statement that E[k] <= E[C]
Base Case:
Let's special case when n=0 since the forumla for E[k] above doesn't work for n=0
When n=0,
E[C] = (1 - p) * 0 = 0, or you think of it as 0 because there are no zeros in an empty array
E[k] = 0, because there are no incorrect non-zeros in an empty array
So for n=0, E[k] <= E[C]
Let's start induction from n=1,
When n=1,
E[C] = (1 - p) * 1 = 1 - p
E[k] = (p - p**1) = 0
So for n=1, E[k] <= E[C]
Inductive step:
Assume that P(x) is true for some arbitrary x >= 1
We have to prove P(x + 1) is true.
P(x + 1) means
E[k_x+1 elements] < E[C_x+1 elements]
(p - p**1) + (p - p**2) + ... + (p - p**x) + (p - p**(x + 1)) <= (1 - p) * (x + 1)
(p - p**1) + (p - p**2) + ... + (p - p**x) + (p - p**(x + 1)) <= (1 - p) * x + (1 - p)
The first chunk of each side is just P(x), I will put them in square brackets:
[ (p - p**1) + (p - p**2) + ... + (p - p**x) ] + (p - p**(x + 1)) <= [ (1 - p) * x ] + (1 - p)
The part in the square brackets
[ (p - p**1) + (p - p**2) + ... + (p - p**x) ] <= [ (1 - p) * x ]
(p - p**1) + (p - p**2) + ... + (p - p**x) <= (1 - p) * x
E[k_x elements] <= E[C_x elements]
is just P(x), which we have assumed to be true.
How can we use P(x) to prove P(x + 1)?
We can use this fact:
If a <= b is true,
a + x <= b + y is also true if x <= y
In our case,
a = E[k_x elements] = (p - p**1) + (p - p**2) + ... + (p - p**x) = left terms inside square brackets
b = E[C_x elements] = (1 - p) * x = right terms inside square brackets
x = (p - p**(x + 1)) = left term outside square brackets
y = (1 - p) = right term outside square brackets
This leaves us with having to prove the following in order to prove P(x + 1).
Note that this is just the expected value of the next term added when adding
one more element to the array.
(p - p**(x + 1)) <= (1 - p)
p - p**(x + 1) <= 1 - p
2*p <= 1 + p**(x + 1)
Since 0 <= p <= 0.5, the greatest 2*p can be is 1, and 1 + p**(x + 1) is always greater than 1.
This means that the original x <= y inequality is true and P(x + 1) is true.
Thus, P(n) is true for n being any integer >= 0.
In other words:
"If 0 <= p <= 0.5 and n is an integer >= 0, then E[k] <= E[C]"
This means that for 0 <= p <= 0.5, Solution 3 is always better.
---
How about 0.5 < p <= 1?
Assumptions:
1. n is very large. This is an okay assumption since we only care about
optimizing Solution 2/3 when n is large.
Proposition: "If 0.5 < p <= 1, then E[k] >= E[C]"
We will special case p=1.
E[C] = (1 - p) * n
= (1 - 1) * n
= (0) * n
= 0
E[k] = (p - p**1) + (p - p**2) + ... (p - p**n)
= (1 - 1**1) + (1 - 1**2) + ... (1 - 1**n)
= (1 - 1) + (1 - 1) + ... (1 - 1)
= (0) + (0) + ... (0)
= 0
When p=1, the array is filled with non-zeros. No work is done and E[k] = E[C],
but E[k] >= E[C] still holds true.
Now let's exclude p=1 and prove the rest where 0.5 < p < 1.
For this proposition, I will not do a formal proof. I will only analyze E[k] and E[C]
and show that E[k] grows faster than E[C] for large n.
Let's start with E[C]
E[C] = (1 - p) * n
Since 0.5 < p < 1, we know that 0 < (1 - p) < 0.5
As n grows large, we keep adding more 0 < (1 - p) < 0.5 terms.
How about E[k]?
E[k] = (p - p**1) + (p - p**2) + ... (p - p**n)
Keep in mind 0.5 < p < 1.
For earlier terms such as (p - p**1) or (p - p**2), the terms will be pretty small, because
the exponent is small, so the subtraction is large.
However for later terms, and when n is large, (p - p**i) can be approximated as just (p),
since p**i ~= 0, where i is large. This is because 0.5 < p < 1, so each multiple of p decreases p**i.
When n is large, there will be many terms that will be approximated as just p.
As n grows large, we keep adding more approximate 0.5 < p < 1 terms.
Compare the statements above about E[C] and E[k].
For large n, E[C] keeps adding 0 < (1 - p) < 0.5 terms
while E[k] keeps adding 0.5 < p < 1 terms
The added E[k] terms are larger than the E[C] terms.
This means the proposition holds true for large n, since E[k] grows faster than E[C] as you increase n.
Note that for low n, it's possible for the opposite to be true. E[C] might be
greater. You can graph it out on a calculator to see visually. But for low n, optimization is unnecessary.
This means for large n and 0.5 < p <= 1, Solution 2 is always better.
---
So what have we learned?
Use Solution 3 for 0 <= p <= 0.5
Use Solution 2 for 0.5 < p <= 1 (for large n)
---
How do I apply this to the general case?
In the general case, there are a lot of unknowns. If there are some knowns, you
can use the above findings to determine the optimal algorithm.
For the general case, you are given a large n-size array. You don't know the p
value used to generate the array. You don't know the value of C or k.
It's not worth determining C or k because that would take O(n).
You can determine the approximation of p by sampling a few values and averaging
by how many non-zeros you see.
How many values should you sample? We can use the Chebyshev theorem:
See Chapter 8.1 of
https://www.dartmouth.edu/~chance/teaching_aids/books_articles/probability_book/amsbook.mac.pdf
and 5a of
https://inst.eecs.berkeley.edu/~ee126/fa19/hw/hw5-sol.pdf
P(|X - u| >= E) <= V(X) / E**2
Let X be a Bernoulli trial with X_i as the random variable for each coin flip.
There is a p chance for the coin to flip heads (non-zero element)
and a (1 - p) chance for tails (zero element)
S_n = X_0 + X_1 + ... X_n-1
P(|S_n / n - p| >= E) <= V(S_n / n) / E**2
P(|S_n / n - p| >= E) <= Var(S_n) / (n**2 * E**2)
P(|S_n / n - p| >= E) <= n * Var(X_i) / (n**2 * E**2)
P(|S_n / n - p| >= E) <= Var(X_i) / (n * E**2)
Var(X_i) = E[X_i**2] - E[X]**2
= [ (1 - p) * 0**2 + p * 1**2 ] - [ (1 - p) * 0 + p * 1 ] ** 2
= [ p * 1**2 ] - [ p * 1 ] ** 2
= [ p ] - [ p ] ** 2
= p - p**2
= p * (1 - p)
P(|S_n / n - p| >= E) <= Var(X_i) / (n * E**2)
P(|S_n / n - p| >= E) <= p * (1 - p) / (n * E**2)
We set d = p * (1 - p) / (n * E ** 2)
n * E ** 2 = p * (1 - p) / d
n = p * (1 - p) / (d * E**2)
We can use this formula to determine a good value for n, the sample size.
E is the error threshold of estimated probability vs actual p:
|S_n / n - p| >= E
d is the probability of the estimated probability breaching the error threshold, E
P(|S_n / n - p| >= E) <= d
Formula:
n = p * (1 - p) / (d * E**2)
To be more concrete, we can maximize p * (1 - p) to be the maximum, which is when p = 0.5
n = 0.5 * (0.5) / (d * E**2)
n = 1 / (4 * d * E**2)
we can set E = 0.1 and d = 0.1
n = 1 / (4 * 0.1 * 0.1**2)
n = 250 elements to sample
For large n, 250 elements isn't much. You can adjust E and d depending how accurate you need to be.
Keep in mind, the cutoff to determining if you should use Solution 2 or 3 is at p = 0.5.
So your E should be small enough to distinguish the cutoff, but large enough so n isn't too large to sample.
Set d depending on how accurate you want your estimate to be
If n gets too large, you can also set a 10% array size limit to the sample size n.
@HimanshuMittal01
Copy link

WOW, such a in-depth analysis of two approaches. Thanks !!

@dosentmatter
Copy link
Author

@HimanshuMittal01, you're welcome, and thanks for the compliment. I'm glad you liked it 😀.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment