Last active
September 17, 2018 21:07
-
-
Save dhermes/ba7276f20d5a4947cafbb911671ab8f1 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import itertools | |
import operator | |
def is_canonical(n_tuple): | |
# |b1| <= |bn| ==> -b1 <= bn ==> b1 + bn >= 0 | |
return n_tuple[-1] + n_tuple[0] >= 0 | |
def beta_values_to_p(n_tuple, n): | |
# beta1 = -1 - p1 ==> p1 = -1 - beta1 | |
p = [None] * n | |
p[0] = -1 - n_tuple[0] | |
for i in xrange(1, n): | |
p[i] = n_tuple[i] - n_tuple[i - 1] - 1 | |
return p | |
def advance_tuple(n_tuple, n): | |
first_update = n_tuple[:] | |
for tuple_index in xrange(n): | |
first_update[tuple_index] -= 1 | |
yield first_update | |
for index in xrange(1, n): | |
next_update = n_tuple[:] | |
for tuple_index in xrange(index, n): | |
next_update[tuple_index] += 1 | |
yield next_update | |
def advance_stage(stage, n): | |
new_stage = set() | |
for n_tuple in stage: | |
for new_tuple in advance_tuple(n_tuple, n): | |
new_stage.add(tuple(new_tuple)) | |
return sorted(map(list, new_stage), reverse=True) | |
def compute_values(curr_stage, stage, sd, sd_plus_one): | |
print('Stage: %d' % (curr_stage,)) | |
print('-' * 40) | |
for n_tuple in stage: | |
sd_val = sd(n_tuple) | |
sd_plus_one_val = sd_plus_one(n_tuple) | |
message = '%20s -> %s' % (n_tuple, sd_plus_one_val) | |
if sd_val == 0: | |
if is_canonical(n_tuple): | |
print(message) | |
def advance_front(n, sd, sd_plus_one, num_stages=3): | |
stage = [range(-1, -1 + n)] | |
compute_values(0, stage, sd, sd_plus_one) | |
for curr_stage in xrange(num_stages): | |
stage = advance_stage(stage, n) | |
compute_values(curr_stage + 1, stage, sd, sd_plus_one) | |
def make_sj(j): | |
def sj(n_tuple): | |
result = 0 | |
for val_tuple in itertools.combinations(n_tuple, j): | |
result += reduce(operator.mul, val_tuple, 1) | |
return result | |
return sj | |
if __name__ == '__main__': | |
s1 = make_sj(1) | |
s2 = make_sj(2) | |
print('') | |
advance_front(5, s1, s2, num_stages=11) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
\documentclass[a4paper,10pt]{article} | |
\usepackage[margin=1in]{geometry} | |
\usepackage{amsmath,amsthm,amssymb} | |
\usepackage{hyperref} | |
\usepackage{fancyhdr} | |
\renewcommand{\qed}{\(\blacksquare\)} | |
\hypersetup{colorlinks=true,urlcolor=blue} | |
\pagestyle{fancy} | |
\lhead{} | |
\rhead{Toying Around with Finite Difference Approximations \\ Danny Hermes} | |
\renewcommand{\headrulewidth}{0pt} | |
\begin{document} | |
\section*{Some fun facts about approximations} | |
\ \vspace{-.2in} | |
\paragraph{Putting a max on the order of approximation of \(u'\)} If we | |
seek to approximate the value \(u'(c)\) from \(n\) points, we'll show | |
that we can not achieve an \(\mathcal{O}(h^{n + 1})\) approximation. Consider | |
the approximation given by | |
\[\sum_{i = 1}^n \alpha_i u(c + \beta_i h)\] | |
where both the coefficients of the outputs \(u(\cdot)\) and the distances of | |
the points (\(\beta_i h\)) from the center \(c\) may vary. Currently the only | |
condition on the points are that the \(\beta_i\) are distinct (else we would | |
have fewer than \(n\) points) and that the differences \(\beta_i h\) are | |
\(\mathcal{O}(h)\) (which we've indicated clearly by the presence of \(h\) in | |
the difference) so that we can use a Taylor approximation. | |
Taylor expanding \(u(c + \beta_i h)\) about \(c\) we have | |
\[u(c + \beta_i h) = \sum_{k = 0}^{n + 1} u^{(k)}(c) \left(\beta_i | |
h\right)^k + \mathcal{O}(h^{n + 2}) = \sum_{k = 0}^{n + 1} u^{(k)}(c) | |
\left(\beta_i h\right)^k + \mathcal{O}(h^{n + 2}).\] | |
Since we want \(\sum_{i = 1}^n \alpha_i u(c + \beta_i h)\) to approximate | |
we'll want to set | |
\[u'(c) = \sum_{i = 1}^n \alpha_i \left[u'(c) \beta_i h\right]\] | |
so that the \(\alpha_i = \mathcal{O}\left(\frac{1}{h}\right)\). Due to this, | |
in order to have a \(\mathcal{O}(h^{n + 1})\) approximation we'd need | |
\[\sum_{i = 1}^n \alpha_i u^{(k)}(c) \left(\beta_i h\right)^k = 0\] | |
for \(k = 0\) and \(2 \leq k \leq n + 1\). These encompass all the terms, | |
and the error would be \(\mathcal{O}(\alpha_i h^{n + 2}) = | |
\mathcal{O}(h^{n + 1})\). | |
Since we want to do this for generic \(u(\cdot)\) we assume that \(u^{(k)}(c) | |
\neq 0\) so that dividing by \(u^{(k)}(c) h^k\) we have the system | |
\[\sum_{i = 1}^n \alpha_i \beta_i = \frac{1}{h} \quad \text{ and } \quad | |
\sum_{i = 1}^n \alpha_i \beta_i^k = 0 \quad \text{for } k \neq 1.\] | |
This is a system of \(n + 2\) equations in \(2n\) variables, so it seems | |
on the surface that we should have a solution, hence should be able to get | |
an \(\mathcal{O}(h^{n + 1})\) approximation of \(u'(c)\) with \(n\) points. | |
However, we'll show this system does not have a solution. Writing this | |
as a matrix, equation, we have | |
\[\left[ \begin{array}{c c c} | |
1 & \cdots & 1 \\ | |
\beta_1 & \cdots & \beta_n \\ | |
\beta_1^2 & \cdots & \beta_n^2 \\ | |
\vdots & \ddots & \vdots \\ | |
\beta_1^{n + 1} & \cdots & \beta_n^{n + 1} \end{array}\right] | |
\left[ \begin{array}{c} | |
\alpha_1 \\ | |
\alpha_2 \\ | |
\vdots \\ | |
\alpha_n \end{array}\right] = | |
\left[ \begin{array}{c} | |
0 \\ | |
\frac{1}{h} \\ | |
0 \\ | |
\vdots \\ | |
0 \end{array}\right].\] | |
Dropping the second and third equations from this system, we have a square | |
system given by | |
\[\left[ \begin{array}{c c c} | |
1 & \cdots & 1 \\ | |
\beta_1^3 & \cdots & \beta_n^3 \\ | |
\vdots & \ddots & \vdots \\ | |
\beta_1^{n + 1} & \cdots & \beta_n^{n + 1} \end{array}\right] | |
\left[ \begin{array}{c} | |
\alpha_1 \\ | |
\alpha_2 \\ | |
\vdots \\ | |
\alpha_n \end{array}\right] = | |
\left[ \begin{array}{c} | |
0 \\ | |
0 \\ | |
\vdots \\ | |
0 \end{array}\right].\] | |
If we define the augmented Vandermonde matrix \(A_n\) by | |
\[\left[ \begin{array}{c c c} | |
1 & \cdots & 1 \\ | |
\beta_1^3 & \cdots & \beta_n^3 \\ | |
\vdots & \ddots & \vdots \\ | |
\beta_1^{n + 1} & \cdots & \beta_n^{n + 1} \end{array}\right],\] | |
then our system is \(A_n \boldsymbol{\alpha} = \mathbf{0}\). | |
(For notation's sake, we should probably just introduce a notation like | |
\[A_n(0, 3, \ldots, n + 1)\] to represent the exponents of each row and | |
we should also probably note that this depends on the \(\beta_i\), but | |
we didn't -- so there.) | |
If we can show that \(A_n\) is invertible then we'd have | |
\(\boldsymbol{\alpha} = A_n^{-1} \mathbf{0} = \mathbf{0}\) forcing our | |
approximation of \(u'(c)\) to be uniformly \(0\). This clearly is not | |
\(\mathcal{O}(h^{n + 1})\). | |
\paragraph{Showing that this system is degenerate} With the same invariants | |
\(\beta_1, \ldots, \beta_n\), define the Vandermonde matrix \(V_n\) as the | |
matrix | |
\[\left[ \begin{array}{c c c} | |
1 & \cdots & 1 \\ | |
\beta_1 & \cdots & \beta_n \\ | |
\vdots & \ddots & \vdots \\ | |
\beta_1^{n - 1} & \cdots & \beta_n^{n - 1} \end{array}\right].\] | |
(This is the transpose of what's typically called the Vandermonde but it's | |
fine this way, we're over it.) | |
To show that \(A_n\) is invertible we'll define a fancy matrix \(F_n\) and show | |
that \(A_n = F_n V_n\). This will be important, since the determinant of | |
\(V_n\) is well-known and (we'll see) the determinant of \(F_n\) can be easily | |
computed. But first, the definition | |
\[\left[ \begin{array}{c c c c c c c} | |
1 & 0 & 0 & 0 & \cdots & 0 & 0 \\ | |
0 & 0 & 0 & 1 & \cdots & 0 & 0 \\ | |
\vdots & \vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\ | |
0 & 0 & 0 & 0 & \cdots & 1 & 0 \\ | |
0 & 0 & 0 & 0 & \cdots & 0 & 1 \\ | |
f_{n - 1,1} & f_{n - 1, 2} & f_{n - 1, 3} & f_{n - 1, 4} & \cdots & | |
f_{n - 1, n - 1} & f_{n - 1, n} \\ | |
f_{n,1} & f_{n, 2} & f_{n, 3} & f_{n, 4} & \cdots & f_{n, n - 1} & | |
f_{n, n} \end{array}\right].\] | |
This is clearly not enough, but just to indicate that the last two rows are | |
where the real content is. The first \(n - 2\) rows make plenty of sense. | |
We have row \(1\) equal to \(e_1^T\) and then row \(i\) for \(2 \leq i \leq | |
n - 2\) is equal to \(e_{i + 2}^T\). | |
Before defining what the unnamed \(f_{i, j}\) are, we can compute both | |
\(\det(F_n)\) and the product \(F_n V_n\) in terms of these. First, since | |
we have a shifted identity in the first \(n - 2\) rows, the only | |
terms in the cofactor that are nonzero will be from the second and third | |
column and the last two rows. Hence, up to a sign we know | |
\[\pm \det(F_n) = f_{n - 1, 2} f_{n, 3} - f_{n, 2} f_{n - 1, 3}.\] | |
%% (See the \underline{\hyperref[det-sign]{note}} below to see that this is | |
(See the \hyperref[det-sign]{note} below to see that this is | |
the correct sign.) | |
To compute some parts of \(F_n V_n\) note that the product \(e_i^T M\) | |
represents the \(i\)th row of \(M\). Hence, the majority of the | |
multiplication just moves rows: | |
\[F_n V_n = \left[ \begin{array}{c} | |
e_1^T \\ e_4^T \\ \vdots \\ e_n^T \\ f_{n - 1}^T \\ f_n^T \end{array}\right] | |
\left[ \begin{array}{c c c} | |
1 & \cdots & 1 \\ | |
\beta_1 & \cdots & \beta_n \\ | |
\vdots & \ddots & \vdots \\ | |
\beta_1^{n - 1} & \cdots & \beta_n^{n - 1} \end{array}\right] = | |
\left[ \begin{array}{c c c} | |
1 & \cdots & 1 \\ | |
\beta_1^3 & \cdots & \beta_n^3 \\ | |
\vdots & \ddots & \vdots \\ | |
\beta_1^{n - 1} & \cdots & \beta_n^{n - 1} \\ | |
\multicolumn{3}{ c }{f_{n - 1}^T V_n} \\ | |
\multicolumn{3}{ c }{f_n^T V_n} \end{array}\right].\] | |
The first \(n - 2\) rows of \(F_n V_n\) match those of \(A_n\) so to show | |
equality we need to show the last \(2\) match as well. Hence at this point | |
we'll define the last \(2\) rows of \(F_n\). These will be defined in terms | |
of the elementary symmetric polynomials. | |
We'll use \(s_k\) to denote the \(k\)th symmetric polynomial; for example | |
\(s_1 = \beta_1 + \cdots + \beta_n\), \(s_2 = \sum_{1 \leq i < j \leq n} | |
\beta_i \beta_j\) and \(s_n = \beta_1 \cdots \beta_n\). | |
For row \(n - 1\), we have | |
\[f_{n - 1, k} = (-1)^{n - k} s_{n - k + 1}.\] | |
For row \(n\), we first have | |
\[f_{n, 1} = (-1)^{n - 1} s_1 s_n\] | |
and | |
\[f_{n, k} = (-1)^{n - k} \left[s_1 s_{n - k + 1} - s_{n - k + 2}\right].\] | |
for \(2 \leq k \leq n\). Viewing this it makes sense that we defined | |
\(f_{n, 1}\) separately since for \(k = 1\), \(s_{n - k + 2} = s_{n + 1}\) | |
which is not defined since we are drawing from only \(n\) indeterminates. | |
With this, we are ready to compute \(\det(F_n)\) and confirm the last two | |
rows of the product. If we define | |
\[g(X) = (X - \beta_1) \cdots (X - \beta_n)\] | |
then we know by Vieta's relations that | |
\[g(X) = X^n - s_1 X^{n - 1} + s_2 X^{n - 2} + \cdots + (-1)^{n} s_n\] | |
(and by construction that \(g(\beta_i) = 0\) for each \(i\)). To compute row | |
\(n - 1\) of \(F_n V_n\) note that since each column of \(V_n\) is | |
\(\left[\begin{array}{c c c c} 1 & \beta_i & \cdots & \beta_i^{n - 1} | |
\end{array}\right]^T\), entry \(i\) in row \(n - 1\) is | |
\begin{align*} | |
\sum_{k = 1}^n f_{n - 1, k} \cdot \beta_i^{k - 1} &= \sum_{k = 1}^n (-1)^{n - k} | |
s_{n - k + 1} \beta_i^{k - 1} \\ | |
&= (-1)^{n - 1} s_n + (-1)^{n - 2} s_{n - 1} \beta_i + \cdots + (-1)^0 s_1 | |
\beta_i^{n - 1} \\ | |
&= -\left(g(\beta_i) - \beta_i^n\right) = \beta_i^n. | |
\end{align*} | |
By similar logic, entry \(i\) in row \(n\) is | |
\begin{align*} | |
\sum_{k = 1}^n f_{n, k} \cdot \beta_i^{k - 1} &= \sum_{k = 1}^n (-1)^{n - k} | |
s_{n - k + 1} \beta_i^{k - 1} \\ | |
&= (-1)^{n - 1} s_1 s_n + \sum_{k = 2}^n (-1)^{n - k} \left[s_1 s_{n - k + 1} - | |
s_{n - k + 2}\right] \beta_i^{k - 1} \\ | |
&= s_1 \left[(-1)^{n - 1} s_n + (-1)^{n - 2} s_{n - 1} \beta_i + \cdots + | |
(-1)^0 s_1 \beta_i^{n - 1}\right] \\ | |
&- \left[(-1)^{n - 2} s_n \beta_i + \cdots + (-1)^0 s_2 | |
\beta_i^{n - 1}\right] \\ | |
&= s_1 \left[\beta_i^n\right] - \beta_i \left[(-1)^n s_n + \cdots + (-1)^2 | |
s_2 \beta_i^{n - 2}\right] \\ | |
&= s_1 \beta_i^n - \beta_i \left[g(\beta_i) - \beta_i^n + s_1 \beta_i^{n - 1} | |
\right] \\ | |
&= s_1 \left[\beta_i^n - \beta_i \cdot \beta_i^{n - 1}\right] + \beta_i \cdot | |
\beta_i^n \\ | |
&= \beta_i^{n + 1}. | |
\end{align*} | |
To compute \(\det(F_n)\), note that it involves the following: | |
\begin{align*} | |
f_{n - 1, 2} &= (-1)^{n - 2} s_{n - 1} = (-1)^n s_{n - 1} \\ | |
f_{n - 1, 3} &= (-1)^{n - 3} s_{n - 2} = (-1)^{n - 1} s_{n - 2} \\ | |
f_{n, 2} &= (-1)^{n - 2}\left[s_1 s_{n - 1} - s_n\right] = (-1)^n | |
\left[s_1 s_{n - 1} - s_n\right] \\ | |
f_{n, 3} &= (-1)^{n - 3}\left[s_1 s_{n - 2} - s_{n - 1}\right] = | |
(-1)^{n - 1} \left[s_1 s_{n - 2} - s_{n - 1}\right]. | |
\end{align*} | |
Combining these with our previous definition | |
\begin{align*} | |
\det(F_n) &= f_{n - 1, 2} f_{n, 3} - f_{n, 2} f_{n - 1, 3} \\ | |
&= (-1)^{2n - 1} s_{n - 1} \left[s_1 s_{n - 2} - s_{n - 1}\right] - | |
(-1)^{2n - 1} \left[s_1 s_{n - 1} - s_n\right] s_{n - 2} \\ | |
&= s_1 s_{n - 1} s_{n - 2} - s_n s_{n - 2} - | |
\left[s_1 s_{n - 1} s_{n - 2} - s_{n - 1}^2\right] \\ | |
&= s_{n - 1}^2 - s_n s_{n - 2}. | |
\end{align*} | |
\paragraph{Showing \(A_n\) is invertible} Assuming the \(\beta_i\) are | |
distinct, we can show \(A_n\) is invertible by showing that it has | |
nonzero determinant. Since we showed \(A_n = F_n V_n\) we know that | |
\[\det(A_n) = (s_{n - 1}^2 - s_n s_{n - 2}) \prod_{1 \leq i < j \leq n} ( | |
\beta_j - \beta_i)\] | |
where the product is a well-known result about the Vandermonde determinant. | |
Since the \(\beta_i\) are distinct, the only way this can be zero is if | |
\(s_{n - 1}^2 - s_n s_{n - 2} = 0\). We'll rewrite this product as a sum of | |
squares and will show that it can't be \(0\) when the \(\beta_i\) are | |
distinct. Since the subsets of size \(k\) and \(n - k\) are in full bijection | |
(via complement) we can write | |
\[s_{n - 1} = \sum_{i = 1}^n \prod_{j \neq i} \beta_j.\] | |
If we define the formal sum | |
\[s_{-1} = \sum_{i = 1}^n \frac{1}{\beta_i}\] | |
then we see that \(s_{n - 1} = s_n s_{-1}\). This sum may not necessarily be | |
defined (e.g. if some \(\beta_i = 0\)), but using it for manipulation of | |
sums formally is perfectly fine so long as it is multiplied by \(s_n\) since | |
this will remove all the potentially problematic terms from the denominator. | |
Similarly, define \(s_{-2}\) so that \(s_{n - 2} = s_n s_{-2}\): | |
\[s_{-2} = \sum_{1 \leq i < j \leq n} \frac{1}{\beta_i \beta_j}.\] | |
With this, \(s_{n - 1}^2 - s_n s_{n - 2}\) becomes \(s_n^2\left(s_{-1}^2 - | |
s_{-2}\right)\). To analyze this, note that (formally) | |
\[s_{-1}^2 - s_{-2} = \left(\sum_{i = 1}^n \frac{1}{\beta_i^2} + 2 s_{-2} | |
\right) - s_{-2} = \sum_{i = 1}^n \frac{1}{\beta_i^2} + s_{-2}.\] | |
To write this as a sum of squares, we combine the above fact about | |
\(s_{-1}^2\) with another formal sum: | |
\[\sum_{1 \leq i < j \leq n} \left(\frac{1}{\beta_i} + \frac{1}{\beta_j} | |
\right)^2 = (n - 1) \sum_{i = 1}^n \frac{1}{\beta_i^2} + 2 s_{-2},\] | |
(where the coefficient is \(n - 1\) since each \(\beta_i\) is paired with | |
\(\binom{n - 1}{1}\) other \(j\)). Combining these (formally) we have a sum | |
of squares: | |
\begin{align*} | |
& (n - 3)s_{-1}^2 + \sum_{1 \leq i < j \leq n} \left(\frac{1}{\beta_i} + | |
\frac{1}{\beta_j}\right)^2 \\ | |
&= ((n - 3) + (n - 1)) \sum_{i = 1}^n \frac{1}{\beta_i^2} + (2(n - 3) + 2) | |
s_{-2} \\ | |
&= (2n - 4)\left[\sum_{i = 1}^n \frac{1}{\beta_i^2} + s_{-2}\right] \\ | |
&= 2(n - 2)\left[s_{-1}^2 - s_{-2}\right]. | |
\end{align*} | |
Multiplying back by \(s_n^2\), we have | |
\[s_{n - 1}^2 - s_n s_{n - 2} = \frac{(n - 3) s_{n -1}^2 + | |
\displaystyle \sum_{1 \leq i < j \leq n} \left(\frac{s_n}{\beta_i} + | |
\frac{s_n}{\beta_j}\right)^2}{2(n - 2)}.\] | |
(Note that \(n = 2\) will cause problems, but we won't in the end | |
be considering \(n\) below \(3\).) Since it is a sum of squares, this | |
can only occur when \(s_{n - 1} = 0\) (so long as \(n > 3\)) and | |
\[\frac{s_n}{\beta_i} + \frac{s_n}{\beta_j} = 0 \text{ for all } 1 \leq | |
i < j \leq n.\] | |
Introducing the intermediate terms \(S_i = \frac{s_n}{\beta_i} = | |
\prod_{k \neq i} \beta_k\), we know that \(s_{n - 1} = \sum_{k = 1}^n S_k\) | |
and the other condition becomes \(S_i = - S_j\) for all \(j \neq i\). This | |
forces | |
\[0 = \sum_{k = 1}^n S_k = S_i + \sum_{k \neq i} S_k = S_i + (n - 1)(-S_i) = | |
(2 - n) S_i.\] | |
Again, since \(n > 2\), this forces \(-S_j = S_i = 0\). Now since | |
\(\prod_{k \neq i} \beta_k = S_i = 0\), we must have \(\beta_{k_0} = 0\) | |
for some \(k_0 \neq i\). But we also know that \(0 = S_{k_0} = | |
\prod_{k \neq k_0} \beta_k\) hence there is some \(k_1 \neq k_0\) such | |
that \(\beta_{k_1} = 0 = \beta_{k_0}\). But since \(k_1 \neq k_0\), this | |
would give us nondistinct \(\beta_i\) (both \(0\)) which is not allowed. | |
Hence, the sum of squares \(s_{n - 1}^2 - s_n s_{n - 2}\) can never be | |
\(0\) and so neither can \(\det(A_n)\). | |
\paragraph{Note about Sign of Determinant} \label{det-sign} | |
We can show the sign is correct by considering the two transpositions | |
\(\tau, \sigma\) result in a nonzero product in | |
\[\det F_n = \sum_{\sigma' \in S_n} \operatorname{sgn}(\sigma') | |
\prod_{i = 1}^n f_{i, \sigma'(i)}.\] | |
We need \(\sigma(1) = \tau(1) = 1\) (so \(1\) is not part of the cycle | |
decomposition). We also need \(\sigma(i) = \tau(i) = i + 2\) for \(2 \leq | |
i \leq n - 2\) due to the column shift. This leaves us with only two | |
choices for the remaining inputs. Let \(\tau(n - 1) = 2, \tau(n) = 3\) and | |
\(\sigma(n - 1) = 3, \sigma(n) = 2\), then we have | |
\[\det F_n = \operatorname{sgn}(\tau) f_{n - 1, 2} f_{n, 3} + | |
\operatorname{sgn}(\sigma) f_{n - 1, 3} f_{n, 2}\] | |
since \(f_{1, 1} = f_{i, i + 2} = 1\) for all other \(i\). Clearly, since | |
\(\sigma = (2 3) \cdot \tau\) we know that \(\operatorname{sgn}(\tau) = - | |
\operatorname{sgn}(\sigma)\) so it's enough to find \(\operatorname{sgn}( | |
\tau)\). If \(n\) is even, then we have | |
\[2 \xrightarrow{\tau} 4 \xrightarrow{\tau} \cdots \xrightarrow{\tau} n | |
\xrightarrow{\tau} 3 \xrightarrow{\tau} 5 \xrightarrow{\tau} \cdots | |
\xrightarrow{\tau} (n - 1) \xrightarrow{\tau} 2\] | |
hence \(\tau = (2 \, 4 \cdots n \, 3 \cdots (n - 1))\) is a cycle of length | |
\(n - 1\) (it contains every number except \(1\)) hence \(\operatorname{sgn}( | |
\tau) = 1\) (odd cycles have even sign). Similarly, if \(n\) is odd, | |
then | |
\[2 \xrightarrow{\tau} 4 \xrightarrow{\tau} \cdots \xrightarrow{\tau} (n - 1) | |
\xrightarrow{\tau} 2; \quad 3 \xrightarrow{\tau} 5 \xrightarrow{\tau} \cdots | |
\xrightarrow{\tau} n \xrightarrow{\tau} 3\] | |
Hence \(\tau\) is the product of two disjoint cycles of the same length, | |
so it has sign \((\pm 1)^2 = 1\) here as well. Hence we always have | |
\[1 = \operatorname{sgn}(\tau) = - \operatorname{sgn}(\sigma)\] | |
forcing | |
\[\det(F_n) = f_{n - 1, 2} f_{n, 3} - f_{n, 2} f_{n - 1, 3}.\] | |
\end{document} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
\documentclass[letterpaper,10pt]{article} | |
% http://stackoverflow.com/a/11331659/1068170 | |
\usepackage{upquote} | |
\usepackage[margin=1in]{geometry} | |
\usepackage{amsmath,amsthm,amssymb,hyperref,fancyhdr} | |
%% https://tex.stackexchange.com/a/396829/32270 | |
\allowdisplaybreaks | |
\renewcommand*{\today}{November 27, 2013} | |
\renewcommand{\qed}{\(\blacksquare\)} | |
\hypersetup{colorlinks=true,urlcolor=blue} | |
\pagestyle{fancy} | |
\lhead{} | |
\rhead{Toying Around with Finite Difference Approximations \\ Danny Hermes} | |
\renewcommand{\headrulewidth}{0pt} | |
\DeclareMathOperator{\rank}{rank} | |
\title{Finite Approximations of \(u^{(k)}(c)\)} | |
\begin{document} | |
\maketitle | |
\pagebreak | |
\tableofcontents | |
\section{Introduction} | |
If we seek to approximate the derivative \(u^{(k)}\) (of a function of one | |
real variable) with \(n\) points, we can do so by taking advantage of | |
Taylor's theorem with well chosen points. Doing so is in many ways quite | |
natural and when creating these approximations there are a few natural | |
questions that arise. | |
\section{Motivation} Take the following as a motivation. Due to the | |
definition of the derivative, we can approximate \(u'(c)\) with \(2\) points | |
via quantities like | |
\[D_{+} u(c) = \frac{u(c + h) - u(c)}{h}, \, D_{-} u(c) = \frac{u(c) - | |
u(c - h)}{h}, \text{ and } D_0 u(c) =\frac{u(c + h) - u(c - h)}{2h}. \] | |
Taylor's theorem tells us that | |
\[u(c \pm h) = u(c) \pm h u'(c) + \frac{h^2}{2} u''(c) \pm \frac{h^3}{6} | |
u'''(c) + \frac{h^4}{24} u^{(4)}(c) + \mathcal{O}(h^5),\] | |
so that | |
\[D_{\pm} u(c) - u'(c) = \frac{1}{h} \mathcal{O}\left(\pm \frac{h^2}{2} u''(c) | |
\right) = \mathcal{O}(h) \text{ and } D_0 u(c) - u'(c) = \frac{1}{h} | |
\mathcal{O}\left(\frac{2 h^3}{6} u'''(c)\right) = \mathcal{O}(h^2).\] | |
Additionally, in a mean-value theorem inspired mode, we define | |
\[D^2 u(c) = \frac{\frac{u(c + h) - u(c)}{h} - \frac{u(c) - u(c - h)}{h}}{h} = | |
\frac{u(c + h) - 2 u(c) + u(c - h)}{h^2}.\] | |
as an approximation for \(u''(c)\) so that | |
\[D^2 u(c) - u''(c) = \frac{1}{h^2} \mathcal{O}\left(\frac{2 h^4}{24} | |
u^{(4)}(c)\right) = \mathcal{O}(h^2).\] | |
We see that we can (easily) find three different approximations for \(u'(c)\) | |
using only \textbf{two} points and with orders of accuracy \(\mathcal{O}(h)\) | |
and \(\mathcal{O}(h^2)\). Similarly, we can find an approximation for \(u''(c)\) | |
using \textbf{three} points with order of accuracy \(\mathcal{O}(h^2)\). | |
\section{Natural Questions} Along with the conclusions of our motivating | |
examples, a few natural questions arise: | |
\begin{itemize} | |
\item Given an approximation of \(u^{(k)}(c)\) with \(n\) points, what is the | |
maximum order of accurary? | |
\item What is the largest \(k\) such that \(u^{(k)}(c)\) can be approximated | |
with \(n\) points? | |
\item What conditions on the points must be satisfied to achieve the maximum | |
order of accurary? | |
\end{itemize} | |
\section{Framework for approximations} Before answering these questions, we'll | |
introduce some notation and explicitly write down what it means to say | |
``approximate with \(n\) points''. | |
In general, if we want to approximate \(u^{(k)}(c)\) with \(n\) points, we | |
seek to find coefficients \(\left\{\alpha_i\right\}_{i = 1}^n\) and | |
\textbf{distinct} points \(\left\{c_i\right\}_{i = 1}^n\) so that | |
\[u^{(k)}(c) \approx \sum_{i = 1}^n \alpha_i u(c_i).\] | |
In order to get the order in terms of \(\mathcal{O}(h^p)\) we instead write | |
our points as \(\left\{c + \beta_i h\right\}_{i = 1}^n\), which are centered | |
around \(c\) and with differences from the center \(\left\{c - c_i\right\}\) | |
that are \(\mathcal{O}(h)\). The condition that the \(c_i\) be distinct | |
also means that the \(\beta_i\) must be distinct. | |
Using these points in a Taylor approximation of order \(p\) we have: | |
\[u(c + \beta_i h) = \sum_{j = 0}^p u^{(j)}(c) \frac{\left( | |
\beta_i h\right)^j}{j!} + \mathcal{O}\left(h^{p + 1}\right).\] | |
Expanding these within \(\sum_{i = 1}^n \alpha_i u(c + \beta_i h)\) we have | |
\[u^{(k)}(c) \approx \sum_{j = 0}^p u^{(j)}(c) \left[\sum_{i = 1}^n \alpha_i | |
\beta_i^j\right] \frac{h^j}{j!} + \mathcal{O}\left(h^{p + 1} \sum_{i = 1}^n | |
\alpha_i \right).\] | |
We approximate \(u^{(k)}(c)\) by setting the coefficients of the \(u^{(j)}(c)\) | |
in the sum. To do this to approximate \(u^{(k)}(c)\) we need \(k\) to be in | |
the sum hence \(p \geq k\). In order for all the terms in the Taylor expansion | |
to approximate our derivative, we need to satisfy the system: | |
\[\left[ \begin{array}{c c c} | |
1 & \cdots & 1 \\ | |
\beta_1 & \cdots & \beta_n \\ | |
\vdots & \ddots & \vdots \\ | |
\beta_1^p & \cdots & \beta_n^p \end{array}\right] | |
\left[ \begin{array}{c} | |
\alpha_1 \\ | |
\vdots \\ | |
\alpha_n \end{array}\right] = | |
\left[\underbrace{0 \, \cdots \, 0}_{\text{first } k \text{ terms}} \, | |
\frac{k!}{h^k} \, \underbrace{0 \, \cdots \, 0}_{\text{last } (p - k) | |
\text{ terms}}\right]^T.\] | |
%% H/T: http://tex.stackexchange.com/a/180269/32270 | |
\section[\texorpdfstring{Largest $k$ given $n$}{Largest k given n}]{Largest \(k\) given \(n\)} | |
Using the above description we can show | |
that we must have \(k < n\). | |
Assuming the contrary --- \(k \geq n\) --- the first \(n\) rows of the | |
system must be \(0\). This is because they correspond to the terms | |
\[u(c) h^0, u'(c) h^1, \ldots, u^{(n - 1)}(c) h^{n - 1},\] | |
all of which come before \(u^{(k)}(c) = u^{(n)}(c)\). These first \(n\) rows | |
result in the square system \(V_n \boldsymbol{\alpha} = \mathbf{0}\) where | |
\(V_n\) is the Vandermonde matrix | |
\[\left[ \begin{array}{c c c} | |
1 & \cdots & 1 \\ | |
\beta_1 & \cdots & \beta_n \\ | |
\vdots & \ddots & \vdots \\ | |
\beta_1^{n - 1} & \cdots & \beta_n^{n - 1} \end{array}\right].\] | |
(This is the transpose of what's typically called the Vandermonde but this | |
is the matrix that arises for our problem.) However, it's well-known that | |
\(\det(V_n) = \displaystyle \prod_{1 \leq i < j \leq n} (\beta_j - | |
\beta_i)\). Since our \(\beta_i\) are distinct, we see \(\det(V_n) \neq 0\), | |
which tells us that \(\boldsymbol{\alpha} = V_n^{-1} \mathbf{0} = | |
\mathbf{0}\). This of course results in the approximation | |
\[u^{(k)}(c) \approx 0\] | |
which is \(\mathcal{O}(1)\) and hence is a very poor approximation. Effectively, | |
this means that we can't approximate (in any reasonable way) \(u^{(k)}(c)\) | |
using \(n\) points when \(k \geq n\). | |
\section{Maximum order of accurary} To determine the order of accurary of a | |
given approximation, assume there is a solution \(\left\{\alpha_i\right\}, | |
\left\{\beta_i\right\}\) of | |
\[\left[ \begin{array}{c c c} | |
1 & \cdots & 1 \\ | |
\beta_1 & \cdots & \beta_n \\ | |
\vdots & \ddots & \vdots \\ | |
\beta_1^p & \cdots & \beta_n^p \end{array}\right] | |
\left[ \begin{array}{c} | |
\alpha_1 \\ | |
\vdots \\ | |
\alpha_n \end{array}\right] = | |
\left[\underbrace{0 \, \cdots \, 0}_{\text{first } k \text{ terms}} \, | |
\frac{k!}{h^k} \, \underbrace{0 \, \cdots \, 0}_{\text{last } (p - k) | |
\text{ terms}}\right]^T.\] | |
If we consider the \(\beta_i\) to be fixed, we have \(\alpha_i = | |
\mathcal{O}\left(\frac{1}{h^k}\right)\) due to the strong influence | |
of the nonzero item on the right hand side of the system. This results | |
in the approximation | |
\begin{align*} | |
& u^{(k)}(c) \left[\sum_{i = 1}^n \alpha_i | |
\beta_i^k\right] \frac{h^k}{k!} + \sum_{0 \leq j \leq p, j \neq k} | |
u^{(j)}(c) \left[\sum_{i = 1}^n \alpha_i \beta_i^j\right] \frac{h^j}{j!} + | |
\mathcal{O}\left(h^{p + 1} \sum_{i = 1}^n \alpha_i\right) \\ | |
&= u^{(k)}(c) \left[\frac{k!}{h^k}\right] \frac{h^k}{k!} + \sum_{0 \leq j | |
\leq p, j \neq k} u^{(j)}(c) \left[0\right] \frac{h^j}{j!} + \mathcal{O}\left( | |
h^{p + 1} \frac{1}{h^k}\right) \\ | |
&= u^{(k)}(c) + \mathcal{O}\left(h^{p - k + 1}\right). | |
\end{align*} | |
This of course depends on the existence of a solution. We've already said that | |
we need \(p \geq k\) and \(n > k\). We claim (and will show) that | |
\begin{enumerate} | |
\item No solution exists for \(p \geq n + 1\). | |
\item A family of solutions does exist for \(p = n\). | |
\end{enumerate} | |
Assuming both of the above facts, this actually tells us that the best | |
approximation we can get is | |
\[\sum_{i = 1}^n \alpha_i u(c + \beta_i h) - u^{(k)}(c) = \mathcal{O} | |
\left(h^{n - k + 1}\right).\] | |
\section{A preliminary fact} In order to actually show that no solution | |
exists when \(p \geq n + 1\) and that a solution does exist for \(p = n\), | |
we will show later that some matrix is invertible and that some other | |
matrix has a dimension one kernel. As we'll see, doing this will involve | |
factoring out the Vandermonde matrix \(V_n\) from some other matrix and | |
hence will require representing powers \(\beta_i^j\) in terms of the | |
columns of \(V_n\); these must be linear combinations of \(\left\{1, \beta_i, | |
\ldots, \beta_i^{n - 1}\right\}\). If we want to find \(\beta_i^j\) for | |
\(0 \leq j \leq n - 1\), this is trivial to find as a linear combination | |
of the above elements. However, if we want higher powers, we'll need to | |
utilize Vieta's relations. | |
To do so, we define | |
\[g(X) = (X - \beta_1) \cdots (X - \beta_n)\] | |
and we know by Vieta's relations that | |
\[g(X) = X^n - s_1 X^{n - 1} + s_2 X^{n - 2} + \cdots + (-1)^{n} s_n\] | |
and by construction that \(g(\beta_i) = 0\) for each \(i\). | |
In the above, the \(s_j\) are the elementary symmetric polynomials in the | |
\(\beta_i\). For example | |
\[s_1 = \beta_1 + \cdots + \beta_n, \, s_2 = \sum_{1 \leq i < j \leq n} | |
\beta_i \beta_j, \text{ and } s_n = \beta_1 \cdots \beta_n.\] | |
Using these, we can write both \(\beta_i^n\) and \(\beta_i^{n + 1}\) as a | |
linear combination (with coefficients determined by the \(s_j\)) of the | |
lower powers of \(\beta_i\), as desired. First, | |
\[0 = g(\beta_i) = \beta_i^n - s_1 \beta_i^{n - 1} + s_2 \beta_i^{n - 2} + | |
\cdots + (-1)^{n} s_n \Rightarrow \beta_i^n = s_1 \beta_i^{n - 1} - s_2 | |
\beta_i^{n - 2} + \cdots + (-1)^{n - 1} s_n.\] | |
If we wanted to compute this in terms of \(V_n\), we can see this as the dot | |
product | |
\[\left[\begin{array}{c c c c c} (-1)^{n - 1} s_n & (-1)^{n - 2} s_{n - 1} & | |
\cdots & -s_2 & s_1 \end{array}\right] (V_n)_i\] | |
(where \((V_n)_i\) is the \(i\)th column). For future use, we'll denote this | |
row vector as \(R_1^T\) (we use \(1\) since it's the vector which produces | |
\(1\) extra power of \(\beta_i\) from the first \(n\) nonnegative powers). | |
By similar, logic, to compute \(\beta_i^{n + 1}\) we see that | |
\begin{align*} | |
0 = \beta_i g(\beta_i) &= \beta_i^{n + 1} - s_1 \beta_i^n + s_2 | |
\beta_i^{n - 1} + \cdots + (-1)^{n} s_n \beta_i \\ | |
\Rightarrow \beta_i^{n + 1} &= s_1 \beta_i^n - s_2 \beta_i^{n - 1} + \cdots + | |
(-1)^{n - 1} s_n \beta_i \\ | |
&= s_1 \left[s_1 \beta_i^{n - 1} - s_2 \beta_i^{n - 2} + \cdots + (-1)^{n - 1} | |
s_n\right] \\ | |
&- s_2 \beta_i^{n - 1} + \cdots + (-1)^{n - 1} s_n \beta_i \\ | |
&= \left[s_1 s_1 - s_2\right] \beta_i^{n - 1} - \left[s_1 s_2 - s_3\right] | |
\beta_i^{n - 2} + \cdots \\ | |
&+ (-1)^{n - 2} \left[s_1 s_{n - 1} - s_n\right] \beta_i + (-1)^{n - 1} | |
s_1 s_n. | |
\end{align*} | |
Again, viewing this in terms of matrix computations we see it as the dot | |
product | |
\[\left[\begin{array}{c c c c c} (-1)^{n - 1} s_1 s_n & (-1)^{n - 2} \left( | |
s_1 s_{n - 1} - s_n\right) & \cdots & -\left(s_1 s_2 - s_3\right) & \left( | |
s_1 s_1 - s_2\right) \end{array}\right] (V_n)_i.\] | |
For future use, we'll denote this row vector as \(R_2^T\). | |
\section[\texorpdfstring{No solution exists for $p \geq n + 1$}{No solution exists for p geq n + 1}]{No solution exists for \(p \geq n + 1\)} | |
Now with the machinery | |
we need, we're ready to prove that accuracy greater than or equal to | |
\((n + 1) - k + 1\) is not possible. For such \(p\), the system | |
\[\left[ \begin{array}{c c c} | |
1 & \cdots & 1 \\ | |
\beta_1 & \cdots & \beta_n \\ | |
\vdots & \ddots & \vdots \\ | |
\beta_1^p & \cdots & \beta_n^p \end{array}\right] | |
\left[ \begin{array}{c} | |
\alpha_1 \\ | |
\vdots \\ | |
\alpha_n \end{array}\right] = | |
\left[\underbrace{0 \, \cdots \, 0}_{\text{first } k \text{ terms}} \, | |
\frac{k!}{h^k} \, \underbrace{0 \, \cdots \, 0}_{\text{last } (p - k) | |
\text{ terms}}\right]^T\] | |
must contain the square subsystem | |
\[\left[ \begin{array}{c c c} | |
1 & \cdots & 1 \\ | |
\vdots & \ddots & \vdots \\ | |
\beta_1^n & \cdots & \beta_n^n \\ | |
\beta_1^{n + 1} & \cdots & \beta_n^{n + 1} \end{array}\right] | |
\left[ \begin{array}{c} | |
\alpha_1 \\ | |
\vdots \\ | |
\alpha_n \end{array}\right] = | |
\left[ \begin{array}{c} | |
0 \\ | |
\vdots \\ | |
0 \end{array}\right]\] | |
where the matrix on the left does not contain the row corresponding to | |
\(u^{(k)}(c)\) or the row directly after it. (This last condition is to make | |
the system square since the exponents \(0, \ldots, n + 1\) are in \(n + 2\) | |
rows and we need to remove \(2\) of those rows, one of which must be the | |
nonzero row corresponding to \(k < n\).) When \(k = n - 1\), removing the | |
row directly after \(u^{(k)}(c)\) would remove the powers of \(\beta_i^n\) | |
so in this case we remove the row directly before it since we'll end up | |
working with those powers. | |
We introduce the notation \(A_n(k)\) for the ``augmented'' Vandermonde | |
matrix described above. Note that the last statement is equivalent to | |
\(A_n(n - 1) = A_n(n - 2)\) and for \(k < n - 1\) the rows of \(A_n(k)\) | |
correspond to the powers \(\left\{0, \ldots, k - 1, k + 2, \ldots, n, | |
n + 1\right\}\) of the \(\beta_i\). | |
Since we seek to show that no solution exists for \(p \geq n + 1\), we'll | |
show that \(A_n(k)\) is invertible, which means the system \(A_n(k) | |
\boldsymbol{\alpha} = \mathbf{0}\) has only the trivial solution \( | |
\boldsymbol{\alpha} = \mathbf{0}\). As we saw above, this would mean our | |
approximation was \(u^{(k)}(c) \approx 0\) which is \(\mathcal{O}(1)\) and | |
not a valuable approximation. | |
\section{Invertibility of first augmented Vandermonde} To show that | |
\(A_n(k)\) is invertible, we proceed much like we did before but first show | |
that we can factor this as \(A_n(k) = C_n(k) V_n\) for some coefficient matrix | |
\(C_n(k)\) which depends on \(k\). By construction, the first \(n - 2\) rows | |
of \(A_n(k)\) are powers of \(\beta_i\) lower than \(n\) hence they are rows | |
of \(V_n\). Clearly \(e_j^T V_n\) produces the \(j\)th row of \(V_n\) which | |
is the row of \(\beta_i^{j - 1}\). As we showed above, \(R_1^T V_n\) will | |
produce the row of \(\beta_i^n\) and \(R_2^T V_n\) will produce the row of | |
\(\beta_i^{n + 1}\). Altogether, the row exponents \(\left\{0, \ldots, k - 1, | |
k + 2, \ldots, n, n + 1 \right\}\) of \(A_n(k)\) correspond to | |
multiplications by \(\left\{e_1^T, \ldots, e_k^T, e_{k + 3}^T, \ldots, | |
R_1^T, R_2^T\right\}\). So if we define | |
\[C_n(k) = \left[\begin{array}{c c c c c c c} e_1 & \cdots & e_k & | |
e_{k + 3} & \cdots & R_1 & R_2 \end{array}\right]^T\] | |
then we must have \(A_n(k) = C_n(k) V_n\). (As before, since we define | |
\(A_n(n - 1) = A_n(n - 2)\) we'll also define \(C_n(n - 1) = C_n(n - 2)\). | |
Also, in the case that \(k = n - 2\), \(e_{k + 3} = e_{n + 1}\), so the | |
description of \(C_n(n - 2)\) shouldn't actually include \(e_{k + 3}\). | |
This makes perfect sense looking back at the exponents since the exponent | |
\(k + 2 = n\) corresponds to the exponent given by \(R_1\) rather than | |
the exponent given by \(e_{k + 3}\).) | |
In order to show that \(A_n(k)\) is invertible, it's enough to show that | |
it has nonzero determinant for \(1 \leq k \leq n - 2\). But | |
\[\det(A_n(k)) = \det(C_n(k)) \det(V_n).\] | |
As we showed above, \(\det(V_n) \neq 0\) since the \(\beta_i\) are distinct | |
so it remains to show that \(\det(C_n(k)) \neq 0\). Since determinant is | |
multilinear, in general | |
\[\det(\ldots, e_j, \ldots, v, \ldots) = \det(\ldots, e_j, \ldots, v - | |
v_j e_j, \ldots);\] | |
in other words, when computing the determinant, if one of the columns is the | |
basis vector \(e_j\) then we can assume all the other entries in row \(j\) are | |
\(0\). We know the first \(n - 2\) columns of \(C_n(k)^T\) are \(e_j\) for | |
\(1 \leq j \leq n\) except for \(j = k + 1, k + 2\). Hence, the only entries of | |
\(R_1, R_2\) that contribute to the determinant will be in rows \(k + 1, | |
k + 2\) of \(C_n(k)^T\). After removing all entries of \(R_1, R_2\) but those | |
in rows \(k + 1, k + 2\) we have a \(2 \times 2\) square matrix in the final | |
two columns and an augmented copy of \(I_{n - 2}\) in the first \(n - 2\) | |
columns hence | |
\[\pm \det(C_n(k)) = R_{1, k + 1} R_{2, k + 2} - R_{1, k + 2} R_{2, k + 1}\] | |
where the sign may be \(\pm 1\) depending on the cofactor expansion. (We | |
can actually show it's always \(1\) but it's not needed here.) | |
Hence to show \(\det(A_n(k)) \neq 0\) it's enough to show that \(R_{1, k + 1} | |
R_{2, k + 2} - R_{1, k + 2} R_{2, k + 1} \neq 0\). Converting this to the | |
symmetric polynomials involved, since \(k \leq n - 2\) we have | |
\[R_{1, k + 1} = (-1)^{n - k - 1} s_{n - k}, R_{1, k + 2} = (-1)^{n - k - 2} | |
s_{n - k - 1}\] | |
and | |
\[R_{2, k + 1} = (-1)^{n - k - 1} \left(s_1 s_{n - k} - s_{n - k + 1}\right), | |
R_{2, k + 2} = (-1)^{n - k - 2} \left(s_1 s_{n - k - 1} - s_{n - k}\right).\] | |
(We don't need to worry about \(R_{2, 1} = (-1)^{n - 1} s_1 s_n\) --- which | |
has a different form than the rest of \(R_2\) --- since \(k + 1 \geq 2\). | |
Also since \(k \leq n - 2\) the terms \(R_{r, k + 2}\) will always be | |
defined.) With these, we have | |
\begin{align*} | |
R_{1, k + 1} R_{2, k + 2} - R_{1, k + 2} R_{2, k + 1} &= (-1)^{n - k - 1} | |
s_{n - k} (-1)^{n - k - 2} \left(s_1 s_{n - k - 1} - s_{n - k}\right) \\ | |
&- (-1)^{n - k - 2} s_{n - k - 1} (-1)^{n - k - 1} \left(s_1 s_{n - k} - | |
s_{n - k + 1}\right) \\ | |
&= (-1)^{-3} \left(s_{n - k} s_1 s_{n - k - 1} - s_{n - k}^2\right)\\ | |
&- (-1)^{-3} \left(s_{n - k - 1} s_1 s_{n - k} - s_{n - k - 1} | |
s_{n - k + 1}\right) \\ | |
&= s_{n - k}^2 - s_{n - k - 1} s_{n - k + 1}. | |
\end{align*} | |
However, by \href{http://en.wikipedia.org/wiki/Newton's\_inequalities} | |
{Newton's inequalities}, we know that. | |
\[\left(\frac{s_{n - k}}{\binom{n}{n - k}}\right)^2 \geq | |
\frac{s_{n - k - 1}}{\binom{n}{n - k - 1}} \frac{s_{n - k + 1}}{\binom{n} | |
{n - k + 1}} \Longleftrightarrow s_{n - k}^2 \geq \left(\frac{k + 1}{n - k} | |
s_{n - k - 1}\right) \left(\frac{n - k + 1}{k} s_{n - k + 1}\right).\] | |
Since both of the fractions \(\frac{k + 1}{k}\) and \(\frac{n - k + 1} | |
{n - k}\) are greater than \(1\) when \(1 \leq k \leq n - 2\), we actually | |
see that \(s_{n - k}^2 > s_{n - k - 1} s_{n - k + 1}\) whenever | |
\(s_{n - k - 1} s_{n - k + 1} > 0\). If \(s_{n - k - 1} s_{n - k + 1} < 0\) | |
then \(s_{n - k}^2 \geq 0 > s_{n - k - 1} s_{n - k + 1}\) so the case | |
\(s_{n - k - 1} s_{n - k + 1} = 0\) is the only other potentially | |
problematic case. | |
We'd like to show that \(s_{n - k}^2 - s_{n - k - 1} s_{n - k + 1} = 0\) | |
can't occur when \(s_{n - k - 1} s_{n - k + 1} = 0\) and will show this by | |
contradiction. Assuming the two hold, then we also have \(s_{n - k}^2 = 0\) | |
forcing \(s_{n - k} = s_{n - k - 1} = 0\) or \(s_{n - k} = s_{n - k + 1} | |
= 0\). So it's enough to show that \(s_{j - 1} = s_j = 0\) forces | |
\(\beta_{i_0} = \beta_{i_1}\) for some \(i_0 \neq i_1\) which contradicts | |
our requirement of distinct \(\beta_i\). (This assumes \(n \geq 2\). If | |
\(n = 1\) we can't approximate any derivatives by the above so it's not | |
worth addressing.) | |
\section[\texorpdfstring{The equality $s_{j - 1} = s_j = 0$ can't occur for distinct $\beta_i$}{The equality s\{j-1\} = sj = 0 can't occur for distinct beta}]{The equality \(s_{j - 1} = s_j = 0\) can't occur for distinct \(\beta_i\)} | |
If \(j = n\) then \(0 = s_j = s_n\) means that the product of | |
all the \(\beta_i\) is \(0\), forcing \(\beta_{i_0} = 0\) for some \(i_0\). | |
Since all the \(\binom{n}{n - 1}\) terms in the sum \(s_{n - 1}\) involve | |
\(\beta_{i_0}\) except for \(\binom{n - 1}{n - 1} = 1\) of them, | |
\(\beta_{i_0} = 0\) combined with \(s_{n - 1} = 0\) tells us that | |
\[0 = s_{n - 1} = \beta_{i_0} \cdot \left(\text{something}\right) + | |
\prod_{i \neq i_0} \beta_i = \prod_{i \neq i_0} \beta_i.\] | |
Hence, there is also \(i_1 \neq i_0\) with \(\beta_{i_1} = 0\). | |
To do this for \(j < n\), we'll need a \hyperref[reduction-lemma]{lemma} | |
which reduces symmetric polynomials of \(n\) points to symmetric polynomials | |
of \(n - 1\) points. The lemma shows that for a given \(j < n\) | |
\[\frac{s_j(\beta_1, \ldots, \beta_n)}{\binom{n}{j}} = \frac{s_j\left( | |
\beta^{(1)}_1, \ldots, \beta^{(1)}_{n - 1}\right)}{\binom{n - 1}{j}}\] | |
where \(\left\{\beta^{(1)}_i\right\}\) is a set of \(n - 1\) points, each | |
of which lies between two consecutive \(\beta_i\). (Note we have used | |
\(s_j(\, \cdot \,)\) without defining it. This is simply the elementary | |
symmetric polynomial with respect to the variables listed.) Hence we see that | |
\[s_j(\beta_1, \ldots, \beta_n) = 0 \Longleftrightarrow s_j\left( | |
\beta^{(1)}_1, \ldots, \beta^{(1)}_{n - 1}\right) = 0.\] | |
From the \(\beta^{(1)}_{i}\) we can get another set of \(n - 2\) points that | |
also have \(s_j = 0\) and so on to even lower numbers of points. Let | |
\(\beta^{(r)}_1, \ldots, \beta^{(r)}_{n - r}\) denote this set of points after | |
\(r\) reductions (and for completeness of the notation let \(\beta^{(0)}_i = | |
\beta_i\)). We'll show that \(s_{j - 1} = s_j = 0\) forces at least two | |
members of the \(\beta^{(n - j)}_i\) to be zero which in turn forces at | |
least \(N_r = n - j - r + 2\) members of the \(\beta^{(r)}_i\) to be zero | |
for \(0 \leq r \leq n - j\). Hence when \(r = 0\) we have at least \(N_0 = | |
n - j + 2 \geq 1 + 2 = 3\) members equal to \(0\) which is not possible | |
since we have distinct \(\beta^{(0)}_i = \beta_i\). | |
First, when \(r = n - j\), we have \(n - r = j\) points with | |
\[s_{j - 1}\left(\beta^{(r)}_1, \ldots, \beta^{(r)}_j\right) = s_j\left( | |
\beta^{(r)}_1, \ldots, \beta^{(r)}_j\right) = 0.\] | |
As we showed above, this means that at least two members of the | |
\(\beta^{(r)}_i\) are equal to \(0\). Since we know that each | |
\(\beta^{(r)}_i\) lies between two of the \(\beta^{(r - 1)}_i\), this also | |
means that at least one of the \(\beta^{(r - 1)}_i\) lies between between | |
the two \(\beta^{(r)}_i\) which are equal to \(0\). Clearly the only points | |
with \(0 \leq p \leq 0\) are \(0\) hence we see at least one of the | |
\(\beta^{(r - 1)}_i\) is \(0\), say \(\beta^{(r)}_{i_0} = 0\). Since \(n - | |
(r - 1) = j + 1\), we have \(j + 1\) points at this level and of the | |
\(\binom{j + 1}{j}\) terms in the sum \(s_j\) only \(\binom{j}{j} = 1\) of | |
them don't involve \(\beta^{(r)}_{i_0}\). Hence the elementary symmetric | |
polynomial \(\widehat{s}_j\) for the \(j\) points excluding | |
\(\beta^{(r)}_{i_0}\) satisfies \(\widehat{s}_j = 0\) as well since | |
\[0 = s_j = \beta^{(r)}_{i_0} \cdot \left(\text{something}\right) + | |
\widehat{s}_j = 0 + \widehat{s}_j = \widehat{s}_j.\] | |
Similarly, of the \(\binom{j + 1}{j - 1}\) terms in | |
the sum \(s_{j - 1}\) only \(\binom{j}{j - 1} = j\) of them don't involve | |
\(i_0\). Hence the elementary symmetric polynomial \(\widehat{s}_{j - 1}\) | |
for the \(j\) points excluding \(\beta^{(r)}_{i_0}\) satisfies | |
\(\widehat{s}_{j - 1} = 0\) as well. Thus we are again reduced to the | |
situation where \(\widehat{s}_{j - 1} = \widehat{s}_j = 0\) with \(j\) | |
points and so find distinct \(i_1, i_2\) from the set excluding \(i_0\) with | |
\(\beta^{(r)}_{i_1} = \beta^{(r)}_{i_2} = 0\). This gives us \(3\) points | |
which agrees with \(N_r = n - (n - j - 1) - j + 2 = 3\) when \(r = n - j - | |
1\). | |
Carrying forward, we see that if we have at least \(N_r\) of the | |
\(\beta^{(r)}_i\) equal to \(0\) then backing up a step will result in | |
\(N_r - 1\) of the \(\beta^{(r - 1)}_i\) equal to \(0\) since the points at | |
the \(r\) reduction must lie between the points at the \(r - 1\) reduction. | |
But this reduces the \(n - (r - 1)\) points \(\beta^{(r - 1)}_i\) to \(n - | |
(r - 1) - (N_r - 1) = n - r - N_r + 2\) potentially nonzero points. For | |
\(r = n - j\) and \(r = n - j - 1\) we saw that \(N_r = n - r - j + 2\) so | |
that at each stage we reduce to \(n - r - N_r + 2 = j\) potentially nonzero | |
points and have \(\widehat{s}_{j - 1} = \widehat{s}_j = 0\). Again, by the | |
same logic as above this gives two more \(\beta^{(r - 1)}_i = 0\) bringing | |
the total to \(N_{r - 1} = (N_r - 1) + 2 = N_r + 1\). So in general, | |
reducing \(r\) by \(1\) raises \(N_r\) by \(1\) and the value | |
\[j = n - r - N_r + 2\] | |
remains invariant (this is needed to reduce to \(n - r - N_r + 2 = j\) points | |
when backing up a step/reducing \(r\)). | |
\section[\texorpdfstring{A family of solutions exists for $p = n$}{A family of solutions exists for p = n}]{A family of solutions exists for \(p = n\)} | |
Since we showed | |
the system is degenerate when \(p > n\), we know the maximal order for our | |
approximation which has not yet been ruled out occurs when \(p = n\). We'll | |
show both that it's possible for the system to have a solution and give a | |
condition on such a solution. | |
When \(p = n\) we have a system of \(p + 1 = n + 1\) equations | |
in \(2n\) indeterminates: | |
\[\left[ \begin{array}{c c c} | |
1 & \cdots & 1 \\ | |
\beta_1 & \cdots & \beta_n \\ | |
\vdots & \ddots & \vdots \\ | |
\beta_1^n & \cdots & \beta_n^n \end{array}\right] | |
\left[ \begin{array}{c} | |
\alpha_1 \\ | |
\vdots \\ | |
\alpha_n \end{array}\right] = | |
\left[\underbrace{0 \, \cdots \, 0}_{\text{first } k \text{ terms}} \, | |
\frac{k!}{h^k} \, \underbrace{0 \, \cdots \, 0}_{\text{last } (n - k) | |
\text{ terms}}\right]^T.\] | |
Before, to get a square system with all zeroes on the right, we had to remove | |
\textbf{two} of the first \(n + 2\) rows but now we have only one to remove | |
and a unique choice at that: the row corresponding to \(\beta_i^k\). As | |
before we introduce the notation \(B_n(k)\) for the ``augmented'' Vandermonde | |
matrix where the rows of \(B_n(k)\) correspond to the powers \(\left\{0, | |
\ldots, k - 1, k + 1, \ldots, n\right\}\) of the \(\beta_i\). By similar | |
calculations as above we must have | |
\[B_n(k) = \left[\begin{array}{c c c c c c c} e_1 & \cdots & e_k & | |
e_{k + 2} & \cdots & R_1 \end{array}\right]^T V_n = D_n(k) V_n\] | |
for some coefficient matrix \(D_n(k)\). (In the case that \(k = n - 1\), | |
the \(e_{k + 2}\) above does not belong since \(k + 1\) corresponds to | |
\(R_1\) directly and the first \(0, \ldots, k - 1\) are the shifted | |
identity. As before we have \(k \leq n - 1\) since we can't approximate any | |
higher derivatives with \(n\) points.) | |
In order for a nontrivial solution to exist (the opposite of what we wanted | |
above), \(B_n(k)\) can't be invertible (also the opposite). Hence we need | |
\(\det(B_n(k)) = 0\). As we've seen, for distinct \(\beta_i\), \(\det(V_n) | |
\neq 0\) so we need \(\det(D_n(k)) = 0\). Again, due to multilinearity of | |
the determinant, all the entries in the last row --- \(R_1\) --- of | |
\(D_n(k)\) can be considered \(0\) except for \(R_{1,k + 1}\) since | |
\(e_{k + 1}\) is the only basis vector missing in the first \(n - 1\) rows. | |
Hence | |
\[\pm \det(D_n(k)) = R_{1, k + 1}\] | |
where the sign may be \(\pm 1\) depending on the cofactor expansion. (We | |
can actually show it's always \((-1)^{n - k - 1}\) but it's not needed here.) | |
Since \(R_{1, k + 1} = (-1)^{n - k - 1} s_{n - k}\), we see that a | |
\fbox{nontrivial solution can only exist if \(s_{n - k} = 0\)}. | |
\section{Inspecting the solutions} Having \(s_{n - k} = 0\) means that | |
we can pick any \(\beta_1, \ldots, \beta_{n - 1}\) we like and \(\beta_n\) | |
will be uniquely determined. To see this, denote \(\widehat{s}_j\) as the | |
\(j\)th symmetric sum in the \(\beta_1, \ldots, \beta_{n - 1}\) (note that | |
\(\beta_n\) is excluded). With this, we have | |
\[s_{n - k} = \beta_n \widehat{s}_{n - k - 1} + \widehat{s}_{n - k}\] | |
so that \(s_{n - k} = 0\) gives us the unique (only unique to \(k\)) choice | |
\[\beta_n = - \frac{\widehat{s}_{n - k}}{\widehat{s}_{n - k - 1}}.\] | |
(We define \(\widehat{s}_0 = 1\). This is needed for the | |
case \(s_1 = \beta_n \widehat{s}_0 + \widehat{s}_1\) to make sense and also | |
fits with the general case where \(1\) is just the value of an empty product.) | |
We see that if \(\widehat{s}_{n - k - 1} = 0\) then there is no solution, | |
so there is some part of \(\mathbf{R}^{n - 1}\) where no solution exists but | |
this is a measure \(0\) set. Once we've picked our \(\left(\beta_1, \ldots, | |
\beta_{n - 1}\right)\) to find \(\boldsymbol{\alpha}\) we need to determine | |
the kernel of \(B_n(k)\) since \(B_n(k) \boldsymbol{\alpha} = \mathbf{0}\). | |
Since the first \(n - 1\) rows of \(D_n(k)\) are linearly independent | |
standard basis vectors we have \(\rank(D_n(k)) \geq n - 1\). Since | |
\(\det(D_n(k)) = 0\) we know it can't have full rank hence the rank is | |
exactly \(n - 1\) meaning the kernel of \(D_n(k)\) is one dimensional. | |
Hence the same is true of \(B_n(k)\) since \(V_n\) is invertible. This is | |
quite helpful since it limits our choices of \(\boldsymbol{\alpha}\) to a | |
single scalar multiple of a basis vector of the kernel. To find a choice for | |
this basis vector we start with | |
\[B_n(k) v = 0 \Longleftrightarrow (D_n(k) V_n)v = 0 \Longleftrightarrow | |
D_n(k) w = 0, w = V_n v.\] | |
But since the first \(n - 1\) rows of \(D_n(k)\) are the standard basis vectors | |
this means \(0 = e_j^T w = w_j\) for \(j \neq k + 1\), making \(w_{k + 1}\) | |
the only nonzero component. Since \(R_1^T e_{k + 1} = R_{1, k + 1} = 0\) by | |
construction, \(w = e_{k + 1}\) will do the trick as our basis for the kernel | |
of \(D_n(k)\) hence \(v = V_n^{-1} w = V_n^{-1} e_{k + 1}\) works as a basis | |
for the kernel of \(B_n(k)\). | |
So we have \(\boldsymbol{\alpha} = \ell V_n^{-1} e_{k + 1}\) for some scalar | |
\(\ell\) and the final condition is | |
\[\left[\begin{array}{c c c} \beta_1^k & \cdots & \beta_n^k \end{array} | |
\right] \boldsymbol{\alpha} = \frac{k!}{h^k} \Longleftrightarrow \ell | |
\left[\begin{array}{c c c} \beta_1^k & \cdots & \beta_n^k \end{array} | |
\right] V_n^{-1} e_{k + 1} = \frac{k!}{h^k}.\] | |
As it turns out \(\left[\begin{array}{c c c} \beta_1^k & \cdots & \beta_n^k | |
\end{array} \right] V_n^{-1} e_{k + 1} = 1\) which tells us that \(\ell = | |
\frac{k!}{h^k}\). To see this note that \(e_j^T V_n\) gives the \(j\)th row | |
of \(V_n\), hence | |
\[e_{k + 1}^T V_n = \left[\begin{array}{c c c} \beta_1^k & \cdots & \beta_n^k | |
\end{array} \right] \Rightarrow \left(\left[\begin{array}{c c c} \beta_1^k & | |
\cdots & \beta_n^k \end{array} \right] V_n^{-1}\right) e_{k + 1} = | |
e_{k + 1}^T e_{k + 1} = 1.\] | |
Thus the solution is \(\boldsymbol{\alpha} = \frac{k!}{h^k} V_n^{-1} e_{k + | |
1}\); for one, this confirms our notion that \(\alpha_i = \mathcal{O} | |
\left(\frac{1}{h^k}\right)\). Alternatively, this is the same as the solution | |
we would have obtained had we just dropped the \((n + 1)\)st row from the | |
system and tried to solve \(V_n \boldsymbol{\alpha} = \frac{k!}{h^k} | |
e_{k + 1}\) (this would not have guaranteed the order condition, which | |
we needed \(s_{n - k} = 0\) for). | |
\section[\texorpdfstring{Choosing the $n$ points}{Choosing the n points}]{Choosing the \(n\) points} | |
Though we know the theoretical bound | |
on the estimate and a condition on points to achieve this bound, this | |
doesn't narrow down the choices for our \(\beta_i\). Given an approximation | |
for \(u^{(k)}(c)\), we know by the above that the \(u^{(n + 1)}(c)\) terms | |
in the Taylor expansion can never sum to \(0\). However, we may be able to | |
make the coefficient of this term small to get a better estimate. | |
For \(\left\{\alpha_i\right\}\), \(\left\{\beta_i\right\}\) which give a | |
maximum order approximation of \(u^{(k)}(c)\) we have | |
\[u^{(k)}(c) \approx \sum_{i = 1}^n \alpha_i u(c + \beta_i h) = u^{(k)}(c) | |
+ u^{(n + 1)}(c) \frac{h^{n + 1}}{(n + 1)!} \sum_{i = 1}^n \alpha_i | |
\beta_i^{n + 1} + \mathcal{O}\left(h^{n + 2} \sum_{1 = 1}^n \alpha_i | |
\right).\] | |
So the part of the coefficient we can control is determined by | |
\[\sum_{i = 1}^n \alpha_i \beta_i^{n + 1} = \left[\begin{array}{c c c} | |
\beta_1^{n + 1} & \cdots & \beta_n^{n + 1}\end{array}\right] \left[ | |
\begin{array}{c} \alpha_1 \\ | |
\vdots \\ | |
\alpha_n \end{array}\right]\] | |
However, as we showed above \(R_2^T V_n = \left[\begin{array}{c c c} | |
\beta_1^{n + 1} & \cdots & \beta_n^{n + 1}\end{array} \right]\) hence | |
the coefficient is | |
\[R_2^T V_n \boldsymbol{\alpha} = R_2^T V_n \left(\frac{k!}{h^k} V_n^{-1} | |
e_{k + 1}\right) = \frac{k!}{h^k} R_2^T e_{k + 1} = \frac{k!}{h^k} | |
R_{2, k + 1}.\] | |
Since \(k + 1 \geq 2\) (\(R_{2, 1}\) is different than the rest of the entries | |
in \(R_2\)) we know \(R_{2, k + 1}\) has the form | |
\[R_{2, k + 1} = (-1)^{n - k - 1} \left(s_1 s_{n - k} - s_{n - k + 1} | |
\right) = (-1)^{n - k} s_{n - k + 1}\] | |
since we already have \(s_{n - k} = 0\). | |
Since \(s_{n - k}, s_{n - k + 1}\) are homogenous we can make \(s_{n - k + | |
1}\) as small as we want by simply scaling the \(\beta_i\) that make \(s_{n - | |
k} = 0\). Thus the infimum of \(\left|s_{n - k + 1}\right|\) is \(0\) but | |
this can never be attained since we showed before that the uniqueness | |
condition makes \(s_{n - k} = s_{n - k + 1} = 0\) impossible. However, it's | |
still worth noting that our approximation takes the form | |
\[u^{(k)}(c) - \frac{k! \cdot u^{(n + 1)}(c) \cdot s_{n - k + 1}}{(n + 1)!} | |
(-h)^{n - k + 1} + \mathcal{O}\left(h^{n - k + 2}\right).\] | |
\section{Reasonable Choices} Since we'd be approximating on a grid, it's | |
much more reasonable to assume that \((\beta_1, \ldots, \beta_n) \in | |
\mathbf{Z}^n\) than the much less restricted \(\mathbf{R}^n\). Now, the | |
equality | |
\[\beta_n = - \frac{\widehat{s}_{n - k}}{\widehat{s}_{n - k - 1}}\] | |
has some more meaning. | |
The computations iterate over tuples which satisfy \(s_{n - k} = 0\) and a | |
few more constraints. Since we seek to minimize \(s_{n - k + 1}\), we | |
require the tuple has no common divisor \(d > 1\), for if it did we could | |
divide by it and reduce the size of \(s_{n - k + 1}\) by a factor of | |
\(d^{n - k + 1}\). In addition, since tuples are the same up to re-ordering | |
we require that \(\beta_1 < \cdots < \beta_n\) (with the strictness due to | |
the uniqueness constraint on the \(\beta_i\)). Finally, since scaling by | |
\(-1\) has no impact on the zero condition we require that | |
\[\left|\beta_n\right| = \max_i \left|\beta_i\right|\] | |
(really only \(\left|\beta_1\right|\) and \(\left|\beta_n\right|\) were | |
candidates, one being the smallest negative, the other the largest positive). | |
With these tuples in hand we compute some minima of \(\left|s_{n - k + 1} | |
\right|\): | |
\begin{align*} | |
%% n = 2 | |
n &= 2 \\ | |
\left|s_{2}\left(-1, 1\right)\right| &= 1 \text{ when } k = 1 \\ | |
%% s_{2 - 1}\left(-1, 1\right) &= 0 \\ | |
%% \left|s_{2 - 1 + 1}\left(-1, 1\right)\right| &= 1 \\ | |
%% n = 3 | |
n &= 3 \\ | |
\left|s_{3}\left(-2, 3, 6\right)\right| &= 36 \text{ when } k = 1 \\ | |
\left|s_{2}\left(-1, 0, 1\right)\right| &= 1 \text{ when } k = 2 \\ | |
%% s_{3 - 1}\left(-2, 3, 6\right) &= 0 \\ | |
%% \left|s_{3 - 1 + 1}\left(-2, 3, 6\right)\right| &= 36 \\ | |
%% s_{3 - 2}\left(-1, 0, 1\right) &= 0 \\ | |
%% \left|s_{3 - 2 + 1}\left(-1, 0, 1\right)\right| &= 1 \\ | |
%% n = 4 | |
n &= 4 \\ | |
\left|s_{4}\left(-2, -1, 1, 2\right)\right| &= 4 \text{ when } k = 1 \\ | |
\left|s_{3}\left(-2, 1, 2, 4\right)\right| &= 20 \text{ when } k = 2 \\ | |
\left|s_{2}\left(-2, -1, 1, 2\right)\right| &= 5 \text{ when } k = 3 \\ | |
%% s_{4 - 1}\left(-2, -1, 1, 2\right) &= 0 \\ | |
%% \left|s_{4 - 1 + 1}\left(-2, -1, 1, 2\right)\right| &= 4 \\ | |
%% s_{4 - 2}\left(-2, 1, 2, 4\right) &= 0 \\ | |
%% \left|s_{4 - 2 + 1}\left(-2, 1, 2, 4\right)\right| &= 20 \\ | |
%% s_{4 - 3}\left(-2, -1, 1, 2\right) &= 0 \\ | |
%% \left|s_{4 - 3 + 1}\left(-2, -1, 1, 2\right)\right| &= 5 \\ | |
%% n = 5 | |
n &= 5 \\ | |
\left|s_{5}\left(-2, -1, 1, 3, 6\right)\right| &= 36 \text{ when } k = 1 \\ | |
\left|s_{4}\left(-2, -1, 0, 1, 2\right)\right| &= 4 \text{ when } k = 2 \\ | |
\left|s_{3}\left(-2, 0, 1, 2, 4\right)\right| &= 20 \text{ when } k = 3 \\ | |
\left|s_{2}\left(-2, -1, 0, 1, 2\right)\right| &= 5 \text{ when } k = 4 \\ | |
%% s_{5 - 1}\left(-2, -1, 1, 3, 6\right) &= 0 \\ | |
%% \left|s_{5 - 1 + 1}\left(-2, -1, 1, 3, 6\right)\right| &= 36 \\ | |
%% s_{5 - 2}\left(-2, -1, 0, 1, 2\right) &= 0 \\ | |
%% \left|s_{5 - 2 + 1}\left(-2, -1, 0, 1, 2\right)\right| &= 4 \\ | |
%% s_{5 - 3}\left(-2, 0, 1, 2, 4\right) &= 0 \\ | |
%% \left|s_{5 - 3 + 1}\left(-2, 0, 1, 2, 4\right)\right| &= 20 \\ | |
%% s_{5 - 4}\left(-2, -1, 0, 1, 2\right) &= 0 \\ | |
%% \left|s_{5 - 4 + 1}\left(-2, -1, 0, 1, 2\right)\right| &= 5 | |
n &= 6 \\ | |
\left|s_{6}\left(-3, -2, -1, 1, 2, 3\right)\right| &= 36 \text{ when } k = 1 \\ | |
\left|s_{5}\left(-2, -1, 0, 1, 3, 6\right)\right| &= 36 \text{ when } k = 2 \\ | |
\left|s_{4}\left(-3, -2, -1, 1, 2, 3\right)\right| &= 49 \text{ when } k = 3 \\ | |
\left|s_{3}\left(-2, -1, 0, 1, 3, 7\right)\right| &= 50 \text{ when } k = 4 \\ | |
\left|s_{2}\left(-3, -2, -1, 1, 2, 3\right)\right| &= 14 \text{ when } k = 5 \\ | |
n &= 7 \\ | |
\left|s_{7}\left(-3, -2, -1, 1, 2, 4, 12\right)\right| = \left|s_{7}\left(-4, | |
-2, -1, 1, 3, 4, 6\right)\right| &= 576 \text{ when } k = 1 \\ | |
\left|s_{6}\left(-3, -2, -1, 0, 1, 2, 3\right)\right| &= 36 \text{ when } | |
k = 2 \\ | |
\left|s_{5}\left(-5, -4, -3, -1, 0, 2, 14\right)\right| &= 2036 \text{ when } | |
k = 3 \\ | |
\left|s_{4}\left(-3, -2, -1, 0, 1, 2, 3\right)\right| &= 49 \text{ when } | |
k = 4 \\ | |
\left|s_{3}\left(-3, -1, 0, 1, 2, 3, 5\right)\right| &= 70 \text{ when } | |
k = 5 \\ | |
\left|s_{2}\left(-3, -2, -1, 0, 1, 2, 3\right)\right| &= 14 \text{ when } | |
k = 6 \\ | |
n &= 8 \\ | |
\left|s_{8}\left(-4, -3, -2, -1, 1, 2, 3, 4\right)\right| &= 576 \text{ when } | |
k = 1 \\ | |
\left|s_{7}\left(-3, -2, -1, 0, 1, 2, 4, 12\right)\right| = \left|s_{7} | |
\left(-4, -2, -1, 0, 1, 3, 4, 6\right)\right| &= 576 \text{ when } k = 2 \\ | |
\left|s_{6}\left(-4, -3, -2, -1, 1, 2, 3, 4\right)\right| &= 820 \text{ when } | |
k = 3 \\ | |
\left|s_{5}\left(-8, -4, -3, -2, 0, 1, 2, 11\right)\right| &= 3276 \text{ when } | |
k = 4 \\ | |
\left|s_{4}\left(-2, -1, 0, 1, 2, 3, 4, 5\right)\right| &= 231 \text{ when } | |
k = 5 \\ | |
\text{this broke the} &\text{ pattern for } s_{2m} \\ | |
\left|s_{3}\left(-3, -2, -1, 0, 1, 3, 4, 9\right)\right| &= 182 \text{ when } | |
k = 6 \\ | |
\left|s_{2}\left(-4, -3, -2, -1, 1, 2, 3, 4\right)\right| &= 30 \text{ when } | |
k = 7 | |
\end{align*} | |
\section{Finding Minimizers: Programming} | |
We can freely abstract our problem as | |
\[\min_{\boldsymbol{\beta}} \left|s_{d + 1}\right| \qquad \text{such that } | |
s_d = 0, \boldsymbol{\beta} \in \mathbf{Z}^n, \; | |
\beta_i \neq \beta_j \; \forall i, j.\] | |
This is only well-defined if \(d + 1 \leq n\) since there are no more | |
elementary symmetric polynomials with only \(n\) points. We also must have | |
\(0 < d\) since \(s_0 = 1\) can never equal \(0\). | |
Since these are symmetric, we can WLOG say | |
\[\beta_1 \leq \beta_2 \leq \cdots \leq \beta_n.\] | |
Since the \(\left\{\beta_j\right\}\) must be distinct integers, we can say | |
\begin{align*} | |
\beta_1 + 1 &\leq \beta_2 \\ | |
\beta_2 + 1 &\leq \beta_3 \\ | |
&\vdots \\ | |
\beta_{n - 1} + 1 &\leq \beta_n. | |
\end{align*} | |
What's more, we must have \(\beta_1 < 0\). (If not then we'd have \(s_d > 0\) | |
since there would be no negative terms to cancel with.) | |
With this, we can move this problem into the positive orthant | |
\(\mathbf{Z}_{\geq 0}^n\). We write | |
\[\beta_1 = - 1 - p_1 \leq -1 \qquad \text{for } p_1 \geq 0\] | |
and further define | |
\begin{align*} | |
p_2 = \beta_2 - \beta_1 - 1 &\geq 0 \\ | |
p_3 = \beta_3 - \beta_2 - 1 &\geq 0 \\ | |
&\vdots \\ | |
p_n = \beta_n - \beta_{n - 1} - 1 &\geq 0 \\ | |
\end{align*} | |
Now by construction we have | |
\(\mathbf{p} = \left(p_1, \ldots, p_n\right) \in \mathbf{Z}_{\geq 0}^n\). | |
\subsection{Example: $n = 2$} | |
We have | |
\[\beta_1 = -1 - p_1, \quad \beta_2 = 1 + \beta_1 + p_2 = -p_1 + p_2\] | |
The only allowable values is \(d = 1\). We have | |
\[s_1 = \beta_1 + \beta_2 = -1 - 2 p_1 + p_2\] | |
and | |
\[s_2 = \beta_1 \beta_2 = \left(-1 - p_1\right)\left(-p_1 + p_2\right).\] | |
To solve | |
\[s_1 = 0 \Longrightarrow p_2 = 1 + 2 p_1 \Longrightarrow \left|s_2\right| = | |
\left|\left(-1 - p_1\right)\left(1 + p_1\right)\right| = | |
\left(1 + p_1\right)^2.\] | |
For \(p_1 \geq 0 > -1\), this function is increasing so choosing | |
\(p_1 = 0\) minimizes which gives \(p_2 = 1 + 2(0) = 1\). | |
Converting back to \(\left\{\beta_j\right\}\), we confirm the solution | |
we've seen before: | |
\[\boxed{\beta_1 = -1 - 0 = -1, \qquad \beta_2 = -0 + 1 = 1}.\] | |
\subsection{Example: $n = 3$} | |
We add to the above with | |
\[\beta_3 = 1 + \beta_2 + p_3 = 1 - p_1 + p_2 + p_3\] | |
and | |
\begin{align*} | |
s_1 &= \left[-1 - 2 p_1 + p_2\right] + \beta_3 = - 3 p_1 + 2 p_2 + p_3 \\ | |
s_2 &= \left[\left(-1 - p_1\right)\left(-p_1 + p_2\right)\right] + | |
\beta_3 \left[-1 - 2 p_1 + p_2\right] \\ | |
&= 3 p_{1}^{2} - 4 p_{1} p_{2} - 2 p_{1} p_{3} + p_{2}^{2} + | |
p_{2} p_{3} - p_{2} - p_{3} - 1 \\ | |
s_3 &= \left(-1 - p_1\right)\left(-p_1 + p_2\right)\left(1 - p_1 + p_2 + p_3\right) | |
\end{align*} | |
\paragraph{When \(d = 1\):} We can use | |
\begin{align*} | |
s_1 &= 0 \\ | |
\Longrightarrow p_3 &= 3 p_1 - 2 p_2 \\ | |
\Longrightarrow s_2 &= | |
- 3 p_{1}^{2} + 3 p_{1} p_{2} - 3 p_{1} - p_{2}^{2} + p_{2} - 1. | |
\end{align*} | |
Viewing this as a parabola in \(p_2\), it is downward opening (lead coefficient | |
\(-1\)). Since | |
\[\frac{\partial s_2}{\partial p_2} = 3 p_1 - 2 p_2 + 1\] | |
the max value of \(s_2\) (the down-opening parabola) occurs when | |
\(p_2 = \frac{3 p_1 + 1}{2}\), at which point | |
\[s_2\left(p_1, \frac{3 p_1 + 1}{2}\right) = | |
- \frac{3}{4}\left(p_1 + 1\right)^2.\] | |
Thus \(s_2\) is non-positive for \textbf{any} (real) choice of \(p_1\), | |
hence | |
\[\left|s_2\right| = 3 p_{1}^{2} - 3 p_{1} p_{2} + | |
3 p_{1} + p_{2}^{2} - p_{2} + 1.\] | |
To minimize these, we first look to the value above. For a fixed \(p_1\) | |
the ``best'' choice for \(p_2\) gives | |
\[p_3 = 3 p_1 - 2 \left(\frac{3 p_1 + 1}{2}\right) = -1.\] | |
This is not allowed, so we need to consider other choices of \(p_2\) (again | |
for a fixed \(p_1\)). Letting | |
\[p_2 = \frac{3 p_1 + 1}{2} + \delta \Longrightarrow p_3 = -1 - 2 \delta, | |
\; \left|s_2(p_1, p_2)\right| = \frac{3}{4}\left(p_1 + 1\right)^2 + \delta^2.\] | |
Thus to minimize \(\left|s_2\right|\) (for a fixed \(p_1\)) we want an | |
acceptable \(\delta\) that is as close to zero as possible. We see that | |
\[0 \leq p_3 = -1 - 2 \delta \Longrightarrow \delta \leq -\frac{1}{2}.\] | |
The parity of \(p_1\) determines the value of \(\delta\) nearest to | |
\(-\frac{1}{2}\) such that \(p_2 \in \mathbf{Z}\): | |
\[\delta = \begin{cases} | |
-\frac{1}{2} & \text{if } p_1 \equiv 0 \bmod{2} \\ | |
-1 & \text{if } p_1 \equiv 1 \bmod{2} | |
\end{cases} \Longrightarrow \min \left|s_2\right| = \begin{cases} | |
\frac{3}{4}\left(p_1 + 1\right)^2 + \frac{1}{4} & | |
\text{if } p_1 \in \left\{0, 2, 4, \ldots\right\} \\ | |
\frac{3}{4}\left(p_1 + 1\right)^2 + 1 & | |
\text{if } p_1 \in \left\{1, 3, 5, \ldots\right\} | |
\end{cases}.\] | |
Noting that \(\left(p_1 + 1\right)^2\) is increasing for \(p_1 \geq 0 > -1\), we | |
have two candidates for \(\min \left|s_2\right|\): | |
\[\frac{3}{4}\left(0 + 1\right)^2 + \frac{1}{4} = 1 \quad \text{and} \quad | |
\frac{3}{4}\left(1 + 1\right)^2 + 1 = 4\] | |
Thus we choose | |
\[p_1 = 0 \Longrightarrow p_2 = \frac{3 \cdot 0 + 1}{2} - \frac{1}{2} = 0 | |
\Longrightarrow p_3 = 3 (0) - 2(0) = 0.\] | |
From this we confirm the solution we've seen before: | |
\[\boxed{\beta_1 = -1 - 0 = -1, \qquad \beta_2 = -0 + 0 = 0, \qquad | |
\beta_3 = 1 - 0 + 0 + 0 = 1}.\] | |
\paragraph{When \(d = 2\):} We can use | |
\begin{align*} | |
0 &= s_2 = 3 p_{1}^{2} - 4 p_{1} p_{2} - 2 p_{1} p_{3} + p_{2}^{2} + | |
p_{2} p_{3} - p_{2} - p_{3} - 1 \\ | |
\Longrightarrow p_3 &= | |
\frac{3 p_1^2 - 4 p_1 p_2 + p_2^2 - p_2 - 1}{2 p_1 - p_2 + 1} \\ | |
&= \frac{\left(2 p_1 - p_2 + 1\right)^2 - \left(p_1 + 1\right)^2 - | |
\left(2 p_1 - p_2 + 1\right)}{2 p_1 - p_2 + 1} \\ | |
&= \left(2 p_1 - p_2\right) - | |
\frac{\left(p_1 + 1\right)^2}{2 p_1 - p_2 + 1} | |
\end{align*} | |
The only case this is not true is if | |
\[0 = 2 p_1 - p_2 + 1\] | |
which in turn forces (via \(s_2 = 0\)): | |
\[3 p_1^2 - 4 p_1 p_2 + p_2^2 - p_2 - 1 = 0\] | |
Solving this system of two variables in two unknowns gives | |
\(p_1 = p_2 = -1\). Thus we know that no triple satisfying | |
\(s_2 = 0\) can have \(2 p_1 - p_2 + 1 = 0\) since by definition | |
\(p_1, p_2 \geq 0\). | |
One can also show that \(p_3 \geq 0\) forces | |
\[\frac{p_2 - 1}{3} \leq p_1 \leq \frac{p_2 - 2}{2} \quad \text{or} | |
\quad p_2 + 1 \leq p_1.\] | |
With this, we have | |
\[s_3 = \frac{\left(p_1 + 1\right)^2 \left(p_1 - p_2\right)^2}{2 p_1 - p_2 + 1} | |
= \frac{E^2 \left(E - D\right)^2}{D}\] | |
where | |
\[D = 2 p_1 - p_2 + 1, E = p_1 + 1 \Longrightarrow p_2 - p_1 = p_1 - D + 1\] | |
To minimize \(s_3\), neither \(E = 0\) or \(E - D = 0\) is allowed. The first | |
requires \(p_1 = -1\) which isn't positive and the second forces \(p_2 = p_1\) | |
which in turn forces \(p_3 = -1\). With this substitution we have | |
\[p_3 = D - 1 - \frac{E^2}{D}.\] | |
In particular that tells us \(D \mid E^2\) since | |
\(\frac{E^2}{D} = D - 1 - p_3 \in \mathbf{Z}\). To minimize | |
\(\left|s_3\right|\), we consider the cases \(D\) positive and negative | |
separately and then try to minimize for a fixed \(E\). | |
\[\left|s_3\right| = \begin{cases} | |
\frac{E^2 \left(E - D'\right)^2}{D'} & \text{if } D' = D > 0 \\ | |
\frac{E^2 \left(E + D'\right)^2}{D'} & \text{if } -D' = D < 0 | |
\end{cases} = \begin{cases} | |
E^2 \left(\frac{E^2}{D'} + D' - 2 E\right) & \text{if } D' = D > 0 \\ | |
E^2 \left(\frac{E^2}{D'} + D' + 2 E\right) & \text{if } -D' = D < 0 | |
\end{cases}\] | |
Thus minimizing corresponds to positive \(D'\) that minimize | |
\[0 < \frac{\left(E - D'\right)^2}{D'} = \frac{E^2}{D'} + D' - 2 E \quad | |
\text{and} \quad | |
0 < \frac{\left(E + D'\right)^2}{D'} = \frac{E^2}{D'} + D' + 2 E.\] | |
Though \(\frac{E^2}{D'} + D' - 2 E < \frac{E^2}{D'} + D' + 2 E\) (since | |
\(E = p_1 + 1\) is always positive) we have, values of \(D'\) that | |
may be usable for one expression may not be for the other. This | |
is because we need to satisfy | |
\[0 \leq p_2 = \begin{cases} | |
2E - 1 - D' & \text{if } D' = D > 0 \\ | |
2E - 1 + D' & \text{if } -D' = D < 0 | |
\end{cases} \quad \text{and} \quad 0 \leq p_3 = \begin{cases} | |
\left(D' - \frac{E^2}{D'}\right) - 1 & \text{if } D' = D > 0 \\ | |
\left(\frac{E^2}{D'} - D'\right) - 1 & \text{if } -D' = D < 0 | |
\end{cases}.\] | |
These inequalities mean that | |
\[E < D' \leq 2E - 1 \text{ when } D' = D \quad \text{and} \quad | |
1 \leq D' < E \text{ when } D' = -D.\] | |
We know \(\frac{E^2}{D'} + D'\) is minimized at \(D' = E\) (by | |
AM-GM inequality, for example) but we've seen above that | |
\(D' = E\) is not allowed in either case. As \(D'\) moves out | |
from \(E\) in either direction, that value increases from the | |
minimum value \(2E\). Thus the best value of \(D'\) in either | |
\(\left[1, E\right)\) or \(\left(E, 2E - 1\right]\) is the | |
one closest to \(E\). | |
Since we must have \(D' \mid E^2\), there is a unique choice | |
\(D' \in \left[1, E\right)\) that is closest to \(E\). Conversely, the | |
equivalent value \(D'' = \frac{E^2}{D'}\) is the unique choice | |
\(D'' \in \left(E, E^2\right]\) that is closest to \(E\). | |
In the case that \(D'' \in \left(E, 2E - 1\right]\), then we must have | |
\[\frac{E^2}{D''} + D'' - 2E = D' + \frac{E^2}{D'} - 2E < | |
\frac{E^2}{D'} + D' + 2E.\] | |
Hence for a fixed \(E\), we have | |
\[\min \left|s_3\right| = \begin{cases} | |
E^2 \left(\frac{E^2}{D'} + D' - 2 E\right) & \text{if } | |
\exists E < D' \leq 2E - 1 \text{ such that } D' \mid E^2 \\ | |
E^2 \left(\frac{E^2}{D'} + D' + 2 E\right) & \text{otherwise} | |
\end{cases}\] | |
Where the values of \(D'\) above are understood to be the factors of \(E^2\) | |
nearest to \(E\) (above and below). | |
We \textbf{claim} that \(\min \left|s_3\right| = 36\) and can check this | |
by considering all values \(1 \leq E \leq 6\). | |
\begin{itemize} | |
\item \(E = 1\): No possible values since | |
\(\left[1, E\right) = \left(E, 2E - 1\right] = \varnothing\) | |
\item \(E = 2\): Only possible values are \(1 = D' = \left[1, E\right)\) and | |
\(3 = D' \in \left(E, 2E - 1\right]\). But since \(3 \nmid E^2\) our only | |
choice is \(-D = D' = 1\). This yields | |
\[\left|s_3\right| = 2^2\left(\frac{2^2}{1} + 1 + 2(2)\right) = 36.\] | |
\item \(E = 3\): The only value in \(\left[1, E\right) = \left[1, 3\right)\) | |
dividing \(E^2 = 9\) is \(D' = 1\) and no value in | |
\(\left(E, 2E - 1\right] = \left(3, 5\right]\) divides \(E^2\). | |
Thus our only choice gives | |
\[\left|s_3\right| = 3^2\left(\frac{3^2}{1} + 1 + 2(3)\right) = 144.\] | |
\item \(E = 4\): The values in \(\left[1, E\right) = \left[1, 4\right)\) | |
dividing \(E^2 = 16\) are \(D' = 1, 2\) and no value in | |
\(\left(E, 2E - 1\right] = \left(4, 7\right]\) divides \(E^2\). | |
Thus our choice nearest \(E = 4\) gives | |
\[\left|s_3\right| = 4^2\left(\frac{4^2}{2} + 2 + 2(4)\right) = 288.\] | |
\item \(E = 5\): The only value in \(\left[1, E\right) = \left[1, 5\right)\) | |
dividing \(E^2 = 25\) is \(D' = 1\) and no value in | |
\(\left(E, 2E - 1\right] = \left(5, 9\right]\) divides \(E^2\). | |
Thus our only choice gives | |
\[\left|s_3\right| = 5^2\left(\frac{5^2}{1} + 1 + 2(5)\right) = 900.\] | |
\item \(E = 6\): The values in \(\left[1, E\right) = \left[1, 6\right)\) | |
dividing \(E^2 = 36\) are \(D' = 1, 2, 3, 4\) and the only value in | |
\(\left(E, 2E - 1\right] = \left(6, 11\right]\) dividing \(E^2\) is | |
\(D' = 9\). Thus we take the choice above \(E = 6\), which gives | |
\[\left|s_3\right| = 6^2\left(\frac{6^2}{9} + 9 - 2(6)\right) = 36.\] | |
\end{itemize} | |
Thus the minimum \(\left|s_3\right| = 36\) occurs when | |
\[E = 2, D = -D' = -1 \Longrightarrow p_1 = 1, p_2 = 2E - D - 1 = 4, | |
p_3 = D - 1 - \frac{E^2}{D} = 2\] | |
and | |
\[E = 6, D = D' = 9 \Longrightarrow p_1 = 5, p_2 = 2E - D - 1 = 2, | |
p_3 = D - 1 - \frac{E^2}{D} = 4\] | |
From this we confirm the solution(s) we've seen before: | |
\[\boxed{\beta_1 = -1 - 1 = -2, \qquad \beta_2 = -1 + 4 = 3, \qquad | |
\beta_3 = 1 - 1 + 4 + 2 = 6}.\] | |
and | |
\[\boxed{\beta_1 = -1 - 5 = -6, \qquad \beta_2 = -5 + 2 = -3, \qquad | |
\beta_3 = 1 - 5 + 2 + 4 = 2}.\] | |
\textbf{NOTE}: These are actually the same since multiplying by \(-1\) | |
preserves \(s_d = 0\) and also preserves the value of | |
\(\left|s_{d + 1}\right|\). If we seek to find a \textbf{unique} | |
solution we could require that | |
\[\left|\beta_1\right| \leq \left|\beta_n\right| \Longleftrightarrow | |
-\beta_1 \leq \beta_n \Longleftrightarrow 0 \leq \beta_1 + \beta_n\] | |
(noting that just as \(\beta_1 < 0\) is required for cancellation, | |
\(\beta_n > 0\) is also required). | |
\subsection{Example: $n = 4$} | |
We add to the above with | |
\[\beta_4 = 1 + \beta_3 + p_4 = 2 - p_1 + p_2 + p_3 + p_4\] | |
and | |
\begin{align*} | |
s_1 &= \left[- 3 p_1 + 2 p_2 + p_3\right] + \beta_4 | |
= 2 - 4 p_1 + 3 p_2 + 3 p_2 + p_4 \\ | |
s_2 &= \left[\beta_1 \beta_2 + \beta_2 \beta_3 + \beta_3 \beta_1\right] | |
+ \beta_4\left[\beta_1 + \beta_2 + \beta_3\right] \\ | |
&= \left[3 p_{1}^{2} - 4 p_{1} p_{2} - 2 p_{1} p_{3} + p_{2}^{2} + | |
p_{2} p_{3} - p_{2} - p_{3} - 1\right] \\ | |
&+ \left(2 - p_1 + p_2 + p_3 + p_4\right)\left[- 3 p_1 + 2 p_2 + p_3\right] | |
\end{align*} | |
(We'll leave out the rest for now.) | |
\paragraph{When \(d = 1\):} We have | |
\[s_1 = 0 \Longrightarrow \beta_4 = -\left(\beta_1 + \beta_2 + \beta_3\right)\] | |
hence | |
\begin{align*} | |
s_2 &= \left[\beta_1 \beta_2 + \beta_2 \beta_3 + \beta_3 \beta_1\right] | |
+ \beta_4\left[\beta_1 + \beta_2 + \beta_3\right] \\ | |
&= \left(\beta_1 \beta_2 + \beta_2 \beta_3 + \beta_3 \beta_1\right) | |
- \left(\beta_1 + \beta_2 + \beta_3\right)^2 \\ | |
&= -\left(\beta_1^2 + \beta_1 \beta_2 + \beta_1 \beta_3 + | |
\beta_2^2 + \beta_2 \beta_3 + \beta_3^2\right) \\ | |
&= - \frac{1}{2} \left(\beta_1 + \beta_2\right)^2 | |
- \frac{1}{2} \left(\beta_1 + \beta_3\right)^2 | |
- \frac{1}{2} \left(\beta_2 + \beta_3\right)^2 | |
\end{align*} | |
Luckily this means we can just deal with minimizing | |
\begin{align*} | |
2\left|s_2\right| &= \left(\beta_1 + \beta_2\right)^2 | |
+ \left(\beta_1 + \beta_3\right)^2 + \left(\beta_2 + \beta_3\right)^2 \\ | |
&= \left(- 2 p_{1} + p_{2} - 1\right)^{2} + \left(- 2 p_{1} + p_{2} + | |
p_{3}\right)^{2} + \left(- 2 p_{1} + 2 p_{2} + p_{3} + 1\right)^{2}. | |
\end{align*} | |
We set \(A = -2 p_1 + p_2\), this becomes | |
\[2\left|s_2\right| = \left(A - 1\right)^2 + \left(A + p_3\right)^2 | |
+ \left(A + p_2 + p_3 + 1\right)^2\] | |
Since | |
\[\beta_4 = 3 p_{1} - 2 p_{2} - p_{3} \Longrightarrow p_4 = | |
4 p_{1} - 3 p_{2} - 2 p_{3} - 2.\] | |
\subsection{Algorithm: Advancing Front} | |
We use an advancing front | |
\begin{align*} | |
s_1 + \cdots + s_n &= 0 \\ | |
s_1 + \cdots + s_n &= 1 \\ | |
s_1 + \cdots + s_n &= 2 \\ | |
& \vdots | |
\end{align*} | |
until some maximum value is reached. | |
As a proxy, when \(n = 2m\) is even, note that | |
\[\boldsymbol{\beta} = \left(-m, -(m - 1), \ldots, -1, 1, \ldots, | |
m - 1, m\right)\] | |
gives \(s_{2j - 1} = 0\) since multiplying each term | |
by \(-1\) does nothing to the sequence but changes the sum by | |
a factor of \((-1)^{2j - 1} = -1\). | |
Similarly for the odd sums when \(n = 2m + 1\) is odd, we can | |
use | |
\[\boldsymbol{\beta} = \left(-m, -(m - 1), \ldots, -1, 0, | |
1, \ldots, m - 1, m\right)\] | |
to get \(s_{2j - 1} = 0\). | |
Both of these will give use worst cases for \(\left|s_{2j}\right|\) and | |
will tell us when to \textbf{stop advancing our front}. | |
\section{Lemma: Symmetric Polynomial Order Reduction} \label{reduction-lemma} | |
For real values \(X_1 \leq \cdots \leq X_n\), define the | |
\(j\)th symmetric polynomial | |
\[s_j(X_1, \ldots, X_n) = \sum_{1 \leq i_1 < \cdots < i_j \leq n} X_{i_1} | |
\cdots X_{i_j}\] | |
\textit{Claim}: There exist \(n - 1\) points \(Y_1, | |
\ldots, Y_{n - 1}\) \textbf{between} the \(\left\{X_j\right\}\), i.e. satisfying | |
\[X_1 \leq Y_1 \leq X_2 \leq \cdots \leq X_{n - 1} \leq Y_{n - 1} \leq X_n,\] | |
such that | |
\[\frac{s_j(X_1, \ldots, X_n)}{\binom{n}{j}} = \frac{s_j(Y_1, | |
\ldots, Y_{n - 1})}{\binom{n - 1}{j}}.\] | |
\begin{proof} | |
We know the polynomial | |
\[g(X) = (X - X_1) \cdots (X - X_n)\] | |
takes the form | |
\begin{align*} | |
g(X) &= X^n - s_1(X_1, \ldots, X_n) X^{n - 1} + \cdots + (-1)^n | |
s_n(X_1, \ldots, X_n) \\ | |
&= X^n + \sum_{i = 1}^n (-1)^i s_i(X_1, \ldots, X_n) X^{n - i} | |
\end{align*} | |
by Vieta's relations. Since \(g\) is degree \(n\) and has \(n\) real | |
roots we know that \(g'(X)\) has \(n - 1\) real roots. Between any two | |
distinct roots \(X_i < X_{i + 1}\), Rolle's Theorem tells us that there | |
exists \(X_i < Y_i < X_{i + 1}\) such that \(g'(Y_{i + 1}) = 0\). For | |
nondistinct \(X_i\), if \(X_i = \cdots = X_{i + r - 1}\) for some number | |
of repetitions \(r\) then \((X - X_i)^r\) divides \(g(X)\) hence \((X - | |
X_i)^{r - 1}\) divides \(g'(X)\). Thus, in the case the roots are distinct, | |
fully repeated, or somewhere in between, there are always \(r - 1\) roots | |
between \(X_i\) and \(X_{i + r - 1}\). Letting \(i = 1\), \(r = n\), we see | |
that we have roots \(Y_i\) of \(g'(X)\) satisfying: | |
\[X_1 \leq Y_1 \leq \cdots \leq Y_{n - 1} \leq X_n.\] | |
Since \(g'(X) = n X^{n - 1} + \cdots\) we know that | |
\[g'(X) = n (X - Y_1) \cdots (X - Y_{n - 1})\] | |
and by Vieta | |
\[\frac{g'(X)}{n} = X^{n - 1} - s_1\left(Y_1, \ldots, Y_{n - 1}\right) X^{n - 2} + | |
\cdots + (-1)^{n - 1} s_{n - 1}\left(Y_1, \ldots, Y_{n - 1}\right).\] | |
Instead, differentiating the original we have | |
\[g'(X) = n X^{n - 1} - (n - 1) s_1(X_1, \ldots, X_n) X^{n - 2} + | |
\cdots + (-1)^{n - 1} s_{n - 2}\left(X_1, \ldots, X_n\right).\] | |
So for \(1 \leq j \leq n - 1\) (i.e. \(s_j\) among \(s_1, \ldots, s_{n - 1}\)), | |
we have | |
\[(n - j) s_j\left(X_1, \ldots, X_n\right) = n s_j\left(Y_1, \ldots, Y_{n - 1}\right).\] | |
Noting that \((n - j) \binom{n}{j} = n \binom{n - 1}{j}\), we | |
conclude that | |
\[\frac{s_j(X_1, \ldots, X_n)}{\binom{n}{j}} = \frac{s_j(Y_1, | |
\ldots, Y_{n - 1})}{\binom{n - 1}{j}}. \tag*{\qedhere}\] | |
\end{proof} | |
\end{document} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment