Skip to content

Instantly share code, notes, and snippets.

@dhermes
Last active September 17, 2018 21:07
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save dhermes/ba7276f20d5a4947cafbb911671ab8f1 to your computer and use it in GitHub Desktop.
Save dhermes/ba7276f20d5a4947cafbb911671ab8f1 to your computer and use it in GitHub Desktop.
import itertools
import operator
def is_canonical(n_tuple):
# |b1| <= |bn| ==> -b1 <= bn ==> b1 + bn >= 0
return n_tuple[-1] + n_tuple[0] >= 0
def beta_values_to_p(n_tuple, n):
# beta1 = -1 - p1 ==> p1 = -1 - beta1
p = [None] * n
p[0] = -1 - n_tuple[0]
for i in xrange(1, n):
p[i] = n_tuple[i] - n_tuple[i - 1] - 1
return p
def advance_tuple(n_tuple, n):
first_update = n_tuple[:]
for tuple_index in xrange(n):
first_update[tuple_index] -= 1
yield first_update
for index in xrange(1, n):
next_update = n_tuple[:]
for tuple_index in xrange(index, n):
next_update[tuple_index] += 1
yield next_update
def advance_stage(stage, n):
new_stage = set()
for n_tuple in stage:
for new_tuple in advance_tuple(n_tuple, n):
new_stage.add(tuple(new_tuple))
return sorted(map(list, new_stage), reverse=True)
def compute_values(curr_stage, stage, sd, sd_plus_one):
print('Stage: %d' % (curr_stage,))
print('-' * 40)
for n_tuple in stage:
sd_val = sd(n_tuple)
sd_plus_one_val = sd_plus_one(n_tuple)
message = '%20s -> %s' % (n_tuple, sd_plus_one_val)
if sd_val == 0:
if is_canonical(n_tuple):
print(message)
def advance_front(n, sd, sd_plus_one, num_stages=3):
stage = [range(-1, -1 + n)]
compute_values(0, stage, sd, sd_plus_one)
for curr_stage in xrange(num_stages):
stage = advance_stage(stage, n)
compute_values(curr_stage + 1, stage, sd, sd_plus_one)
def make_sj(j):
def sj(n_tuple):
result = 0
for val_tuple in itertools.combinations(n_tuple, j):
result += reduce(operator.mul, val_tuple, 1)
return result
return sj
if __name__ == '__main__':
s1 = make_sj(1)
s2 = make_sj(2)
print('')
advance_front(5, s1, s2, num_stages=11)
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
\documentclass[a4paper,10pt]{article}
\usepackage[margin=1in]{geometry}
\usepackage{amsmath,amsthm,amssymb}
\usepackage{hyperref}
\usepackage{fancyhdr}
\renewcommand{\qed}{\(\blacksquare\)}
\hypersetup{colorlinks=true,urlcolor=blue}
\pagestyle{fancy}
\lhead{}
\rhead{Toying Around with Finite Difference Approximations \\ Danny Hermes}
\renewcommand{\headrulewidth}{0pt}
\begin{document}
\section*{Some fun facts about approximations}
\ \vspace{-.2in}
\paragraph{Putting a max on the order of approximation of \(u'\)} If we
seek to approximate the value \(u'(c)\) from \(n\) points, we'll show
that we can not achieve an \(\mathcal{O}(h^{n + 1})\) approximation. Consider
the approximation given by
\[\sum_{i = 1}^n \alpha_i u(c + \beta_i h)\]
where both the coefficients of the outputs \(u(\cdot)\) and the distances of
the points (\(\beta_i h\)) from the center \(c\) may vary. Currently the only
condition on the points are that the \(\beta_i\) are distinct (else we would
have fewer than \(n\) points) and that the differences \(\beta_i h\) are
\(\mathcal{O}(h)\) (which we've indicated clearly by the presence of \(h\) in
the difference) so that we can use a Taylor approximation.
Taylor expanding \(u(c + \beta_i h)\) about \(c\) we have
\[u(c + \beta_i h) = \sum_{k = 0}^{n + 1} u^{(k)}(c) \left(\beta_i
h\right)^k + \mathcal{O}(h^{n + 2}) = \sum_{k = 0}^{n + 1} u^{(k)}(c)
\left(\beta_i h\right)^k + \mathcal{O}(h^{n + 2}).\]
Since we want \(\sum_{i = 1}^n \alpha_i u(c + \beta_i h)\) to approximate
we'll want to set
\[u'(c) = \sum_{i = 1}^n \alpha_i \left[u'(c) \beta_i h\right]\]
so that the \(\alpha_i = \mathcal{O}\left(\frac{1}{h}\right)\). Due to this,
in order to have a \(\mathcal{O}(h^{n + 1})\) approximation we'd need
\[\sum_{i = 1}^n \alpha_i u^{(k)}(c) \left(\beta_i h\right)^k = 0\]
for \(k = 0\) and \(2 \leq k \leq n + 1\). These encompass all the terms,
and the error would be \(\mathcal{O}(\alpha_i h^{n + 2}) =
\mathcal{O}(h^{n + 1})\).
Since we want to do this for generic \(u(\cdot)\) we assume that \(u^{(k)}(c)
\neq 0\) so that dividing by \(u^{(k)}(c) h^k\) we have the system
\[\sum_{i = 1}^n \alpha_i \beta_i = \frac{1}{h} \quad \text{ and } \quad
\sum_{i = 1}^n \alpha_i \beta_i^k = 0 \quad \text{for } k \neq 1.\]
This is a system of \(n + 2\) equations in \(2n\) variables, so it seems
on the surface that we should have a solution, hence should be able to get
an \(\mathcal{O}(h^{n + 1})\) approximation of \(u'(c)\) with \(n\) points.
However, we'll show this system does not have a solution. Writing this
as a matrix, equation, we have
\[\left[ \begin{array}{c c c}
1 & \cdots & 1 \\
\beta_1 & \cdots & \beta_n \\
\beta_1^2 & \cdots & \beta_n^2 \\
\vdots & \ddots & \vdots \\
\beta_1^{n + 1} & \cdots & \beta_n^{n + 1} \end{array}\right]
\left[ \begin{array}{c}
\alpha_1 \\
\alpha_2 \\
\vdots \\
\alpha_n \end{array}\right] =
\left[ \begin{array}{c}
0 \\
\frac{1}{h} \\
0 \\
\vdots \\
0 \end{array}\right].\]
Dropping the second and third equations from this system, we have a square
system given by
\[\left[ \begin{array}{c c c}
1 & \cdots & 1 \\
\beta_1^3 & \cdots & \beta_n^3 \\
\vdots & \ddots & \vdots \\
\beta_1^{n + 1} & \cdots & \beta_n^{n + 1} \end{array}\right]
\left[ \begin{array}{c}
\alpha_1 \\
\alpha_2 \\
\vdots \\
\alpha_n \end{array}\right] =
\left[ \begin{array}{c}
0 \\
0 \\
\vdots \\
0 \end{array}\right].\]
If we define the augmented Vandermonde matrix \(A_n\) by
\[\left[ \begin{array}{c c c}
1 & \cdots & 1 \\
\beta_1^3 & \cdots & \beta_n^3 \\
\vdots & \ddots & \vdots \\
\beta_1^{n + 1} & \cdots & \beta_n^{n + 1} \end{array}\right],\]
then our system is \(A_n \boldsymbol{\alpha} = \mathbf{0}\).
(For notation's sake, we should probably just introduce a notation like
\[A_n(0, 3, \ldots, n + 1)\] to represent the exponents of each row and
we should also probably note that this depends on the \(\beta_i\), but
we didn't -- so there.)
If we can show that \(A_n\) is invertible then we'd have
\(\boldsymbol{\alpha} = A_n^{-1} \mathbf{0} = \mathbf{0}\) forcing our
approximation of \(u'(c)\) to be uniformly \(0\). This clearly is not
\(\mathcal{O}(h^{n + 1})\).
\paragraph{Showing that this system is degenerate} With the same invariants
\(\beta_1, \ldots, \beta_n\), define the Vandermonde matrix \(V_n\) as the
matrix
\[\left[ \begin{array}{c c c}
1 & \cdots & 1 \\
\beta_1 & \cdots & \beta_n \\
\vdots & \ddots & \vdots \\
\beta_1^{n - 1} & \cdots & \beta_n^{n - 1} \end{array}\right].\]
(This is the transpose of what's typically called the Vandermonde but it's
fine this way, we're over it.)
To show that \(A_n\) is invertible we'll define a fancy matrix \(F_n\) and show
that \(A_n = F_n V_n\). This will be important, since the determinant of
\(V_n\) is well-known and (we'll see) the determinant of \(F_n\) can be easily
computed. But first, the definition
\[\left[ \begin{array}{c c c c c c c}
1 & 0 & 0 & 0 & \cdots & 0 & 0 \\
0 & 0 & 0 & 1 & \cdots & 0 & 0 \\
\vdots & \vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\
0 & 0 & 0 & 0 & \cdots & 1 & 0 \\
0 & 0 & 0 & 0 & \cdots & 0 & 1 \\
f_{n - 1,1} & f_{n - 1, 2} & f_{n - 1, 3} & f_{n - 1, 4} & \cdots &
f_{n - 1, n - 1} & f_{n - 1, n} \\
f_{n,1} & f_{n, 2} & f_{n, 3} & f_{n, 4} & \cdots & f_{n, n - 1} &
f_{n, n} \end{array}\right].\]
This is clearly not enough, but just to indicate that the last two rows are
where the real content is. The first \(n - 2\) rows make plenty of sense.
We have row \(1\) equal to \(e_1^T\) and then row \(i\) for \(2 \leq i \leq
n - 2\) is equal to \(e_{i + 2}^T\).
Before defining what the unnamed \(f_{i, j}\) are, we can compute both
\(\det(F_n)\) and the product \(F_n V_n\) in terms of these. First, since
we have a shifted identity in the first \(n - 2\) rows, the only
terms in the cofactor that are nonzero will be from the second and third
column and the last two rows. Hence, up to a sign we know
\[\pm \det(F_n) = f_{n - 1, 2} f_{n, 3} - f_{n, 2} f_{n - 1, 3}.\]
%% (See the \underline{\hyperref[det-sign]{note}} below to see that this is
(See the \hyperref[det-sign]{note} below to see that this is
the correct sign.)
To compute some parts of \(F_n V_n\) note that the product \(e_i^T M\)
represents the \(i\)th row of \(M\). Hence, the majority of the
multiplication just moves rows:
\[F_n V_n = \left[ \begin{array}{c}
e_1^T \\ e_4^T \\ \vdots \\ e_n^T \\ f_{n - 1}^T \\ f_n^T \end{array}\right]
\left[ \begin{array}{c c c}
1 & \cdots & 1 \\
\beta_1 & \cdots & \beta_n \\
\vdots & \ddots & \vdots \\
\beta_1^{n - 1} & \cdots & \beta_n^{n - 1} \end{array}\right] =
\left[ \begin{array}{c c c}
1 & \cdots & 1 \\
\beta_1^3 & \cdots & \beta_n^3 \\
\vdots & \ddots & \vdots \\
\beta_1^{n - 1} & \cdots & \beta_n^{n - 1} \\
\multicolumn{3}{ c }{f_{n - 1}^T V_n} \\
\multicolumn{3}{ c }{f_n^T V_n} \end{array}\right].\]
The first \(n - 2\) rows of \(F_n V_n\) match those of \(A_n\) so to show
equality we need to show the last \(2\) match as well. Hence at this point
we'll define the last \(2\) rows of \(F_n\). These will be defined in terms
of the elementary symmetric polynomials.
We'll use \(s_k\) to denote the \(k\)th symmetric polynomial; for example
\(s_1 = \beta_1 + \cdots + \beta_n\), \(s_2 = \sum_{1 \leq i < j \leq n}
\beta_i \beta_j\) and \(s_n = \beta_1 \cdots \beta_n\).
For row \(n - 1\), we have
\[f_{n - 1, k} = (-1)^{n - k} s_{n - k + 1}.\]
For row \(n\), we first have
\[f_{n, 1} = (-1)^{n - 1} s_1 s_n\]
and
\[f_{n, k} = (-1)^{n - k} \left[s_1 s_{n - k + 1} - s_{n - k + 2}\right].\]
for \(2 \leq k \leq n\). Viewing this it makes sense that we defined
\(f_{n, 1}\) separately since for \(k = 1\), \(s_{n - k + 2} = s_{n + 1}\)
which is not defined since we are drawing from only \(n\) indeterminates.
With this, we are ready to compute \(\det(F_n)\) and confirm the last two
rows of the product. If we define
\[g(X) = (X - \beta_1) \cdots (X - \beta_n)\]
then we know by Vieta's relations that
\[g(X) = X^n - s_1 X^{n - 1} + s_2 X^{n - 2} + \cdots + (-1)^{n} s_n\]
(and by construction that \(g(\beta_i) = 0\) for each \(i\)). To compute row
\(n - 1\) of \(F_n V_n\) note that since each column of \(V_n\) is
\(\left[\begin{array}{c c c c} 1 & \beta_i & \cdots & \beta_i^{n - 1}
\end{array}\right]^T\), entry \(i\) in row \(n - 1\) is
\begin{align*}
\sum_{k = 1}^n f_{n - 1, k} \cdot \beta_i^{k - 1} &= \sum_{k = 1}^n (-1)^{n - k}
s_{n - k + 1} \beta_i^{k - 1} \\
&= (-1)^{n - 1} s_n + (-1)^{n - 2} s_{n - 1} \beta_i + \cdots + (-1)^0 s_1
\beta_i^{n - 1} \\
&= -\left(g(\beta_i) - \beta_i^n\right) = \beta_i^n.
\end{align*}
By similar logic, entry \(i\) in row \(n\) is
\begin{align*}
\sum_{k = 1}^n f_{n, k} \cdot \beta_i^{k - 1} &= \sum_{k = 1}^n (-1)^{n - k}
s_{n - k + 1} \beta_i^{k - 1} \\
&= (-1)^{n - 1} s_1 s_n + \sum_{k = 2}^n (-1)^{n - k} \left[s_1 s_{n - k + 1} -
s_{n - k + 2}\right] \beta_i^{k - 1} \\
&= s_1 \left[(-1)^{n - 1} s_n + (-1)^{n - 2} s_{n - 1} \beta_i + \cdots +
(-1)^0 s_1 \beta_i^{n - 1}\right] \\
&- \left[(-1)^{n - 2} s_n \beta_i + \cdots + (-1)^0 s_2
\beta_i^{n - 1}\right] \\
&= s_1 \left[\beta_i^n\right] - \beta_i \left[(-1)^n s_n + \cdots + (-1)^2
s_2 \beta_i^{n - 2}\right] \\
&= s_1 \beta_i^n - \beta_i \left[g(\beta_i) - \beta_i^n + s_1 \beta_i^{n - 1}
\right] \\
&= s_1 \left[\beta_i^n - \beta_i \cdot \beta_i^{n - 1}\right] + \beta_i \cdot
\beta_i^n \\
&= \beta_i^{n + 1}.
\end{align*}
To compute \(\det(F_n)\), note that it involves the following:
\begin{align*}
f_{n - 1, 2} &= (-1)^{n - 2} s_{n - 1} = (-1)^n s_{n - 1} \\
f_{n - 1, 3} &= (-1)^{n - 3} s_{n - 2} = (-1)^{n - 1} s_{n - 2} \\
f_{n, 2} &= (-1)^{n - 2}\left[s_1 s_{n - 1} - s_n\right] = (-1)^n
\left[s_1 s_{n - 1} - s_n\right] \\
f_{n, 3} &= (-1)^{n - 3}\left[s_1 s_{n - 2} - s_{n - 1}\right] =
(-1)^{n - 1} \left[s_1 s_{n - 2} - s_{n - 1}\right].
\end{align*}
Combining these with our previous definition
\begin{align*}
\det(F_n) &= f_{n - 1, 2} f_{n, 3} - f_{n, 2} f_{n - 1, 3} \\
&= (-1)^{2n - 1} s_{n - 1} \left[s_1 s_{n - 2} - s_{n - 1}\right] -
(-1)^{2n - 1} \left[s_1 s_{n - 1} - s_n\right] s_{n - 2} \\
&= s_1 s_{n - 1} s_{n - 2} - s_n s_{n - 2} -
\left[s_1 s_{n - 1} s_{n - 2} - s_{n - 1}^2\right] \\
&= s_{n - 1}^2 - s_n s_{n - 2}.
\end{align*}
\paragraph{Showing \(A_n\) is invertible} Assuming the \(\beta_i\) are
distinct, we can show \(A_n\) is invertible by showing that it has
nonzero determinant. Since we showed \(A_n = F_n V_n\) we know that
\[\det(A_n) = (s_{n - 1}^2 - s_n s_{n - 2}) \prod_{1 \leq i < j \leq n} (
\beta_j - \beta_i)\]
where the product is a well-known result about the Vandermonde determinant.
Since the \(\beta_i\) are distinct, the only way this can be zero is if
\(s_{n - 1}^2 - s_n s_{n - 2} = 0\). We'll rewrite this product as a sum of
squares and will show that it can't be \(0\) when the \(\beta_i\) are
distinct. Since the subsets of size \(k\) and \(n - k\) are in full bijection
(via complement) we can write
\[s_{n - 1} = \sum_{i = 1}^n \prod_{j \neq i} \beta_j.\]
If we define the formal sum
\[s_{-1} = \sum_{i = 1}^n \frac{1}{\beta_i}\]
then we see that \(s_{n - 1} = s_n s_{-1}\). This sum may not necessarily be
defined (e.g. if some \(\beta_i = 0\)), but using it for manipulation of
sums formally is perfectly fine so long as it is multiplied by \(s_n\) since
this will remove all the potentially problematic terms from the denominator.
Similarly, define \(s_{-2}\) so that \(s_{n - 2} = s_n s_{-2}\):
\[s_{-2} = \sum_{1 \leq i < j \leq n} \frac{1}{\beta_i \beta_j}.\]
With this, \(s_{n - 1}^2 - s_n s_{n - 2}\) becomes \(s_n^2\left(s_{-1}^2 -
s_{-2}\right)\). To analyze this, note that (formally)
\[s_{-1}^2 - s_{-2} = \left(\sum_{i = 1}^n \frac{1}{\beta_i^2} + 2 s_{-2}
\right) - s_{-2} = \sum_{i = 1}^n \frac{1}{\beta_i^2} + s_{-2}.\]
To write this as a sum of squares, we combine the above fact about
\(s_{-1}^2\) with another formal sum:
\[\sum_{1 \leq i < j \leq n} \left(\frac{1}{\beta_i} + \frac{1}{\beta_j}
\right)^2 = (n - 1) \sum_{i = 1}^n \frac{1}{\beta_i^2} + 2 s_{-2},\]
(where the coefficient is \(n - 1\) since each \(\beta_i\) is paired with
\(\binom{n - 1}{1}\) other \(j\)). Combining these (formally) we have a sum
of squares:
\begin{align*}
& (n - 3)s_{-1}^2 + \sum_{1 \leq i < j \leq n} \left(\frac{1}{\beta_i} +
\frac{1}{\beta_j}\right)^2 \\
&= ((n - 3) + (n - 1)) \sum_{i = 1}^n \frac{1}{\beta_i^2} + (2(n - 3) + 2)
s_{-2} \\
&= (2n - 4)\left[\sum_{i = 1}^n \frac{1}{\beta_i^2} + s_{-2}\right] \\
&= 2(n - 2)\left[s_{-1}^2 - s_{-2}\right].
\end{align*}
Multiplying back by \(s_n^2\), we have
\[s_{n - 1}^2 - s_n s_{n - 2} = \frac{(n - 3) s_{n -1}^2 +
\displaystyle \sum_{1 \leq i < j \leq n} \left(\frac{s_n}{\beta_i} +
\frac{s_n}{\beta_j}\right)^2}{2(n - 2)}.\]
(Note that \(n = 2\) will cause problems, but we won't in the end
be considering \(n\) below \(3\).) Since it is a sum of squares, this
can only occur when \(s_{n - 1} = 0\) (so long as \(n > 3\)) and
\[\frac{s_n}{\beta_i} + \frac{s_n}{\beta_j} = 0 \text{ for all } 1 \leq
i < j \leq n.\]
Introducing the intermediate terms \(S_i = \frac{s_n}{\beta_i} =
\prod_{k \neq i} \beta_k\), we know that \(s_{n - 1} = \sum_{k = 1}^n S_k\)
and the other condition becomes \(S_i = - S_j\) for all \(j \neq i\). This
forces
\[0 = \sum_{k = 1}^n S_k = S_i + \sum_{k \neq i} S_k = S_i + (n - 1)(-S_i) =
(2 - n) S_i.\]
Again, since \(n > 2\), this forces \(-S_j = S_i = 0\). Now since
\(\prod_{k \neq i} \beta_k = S_i = 0\), we must have \(\beta_{k_0} = 0\)
for some \(k_0 \neq i\). But we also know that \(0 = S_{k_0} =
\prod_{k \neq k_0} \beta_k\) hence there is some \(k_1 \neq k_0\) such
that \(\beta_{k_1} = 0 = \beta_{k_0}\). But since \(k_1 \neq k_0\), this
would give us nondistinct \(\beta_i\) (both \(0\)) which is not allowed.
Hence, the sum of squares \(s_{n - 1}^2 - s_n s_{n - 2}\) can never be
\(0\) and so neither can \(\det(A_n)\).
\paragraph{Note about Sign of Determinant} \label{det-sign}
We can show the sign is correct by considering the two transpositions
\(\tau, \sigma\) result in a nonzero product in
\[\det F_n = \sum_{\sigma' \in S_n} \operatorname{sgn}(\sigma')
\prod_{i = 1}^n f_{i, \sigma'(i)}.\]
We need \(\sigma(1) = \tau(1) = 1\) (so \(1\) is not part of the cycle
decomposition). We also need \(\sigma(i) = \tau(i) = i + 2\) for \(2 \leq
i \leq n - 2\) due to the column shift. This leaves us with only two
choices for the remaining inputs. Let \(\tau(n - 1) = 2, \tau(n) = 3\) and
\(\sigma(n - 1) = 3, \sigma(n) = 2\), then we have
\[\det F_n = \operatorname{sgn}(\tau) f_{n - 1, 2} f_{n, 3} +
\operatorname{sgn}(\sigma) f_{n - 1, 3} f_{n, 2}\]
since \(f_{1, 1} = f_{i, i + 2} = 1\) for all other \(i\). Clearly, since
\(\sigma = (2 3) \cdot \tau\) we know that \(\operatorname{sgn}(\tau) = -
\operatorname{sgn}(\sigma)\) so it's enough to find \(\operatorname{sgn}(
\tau)\). If \(n\) is even, then we have
\[2 \xrightarrow{\tau} 4 \xrightarrow{\tau} \cdots \xrightarrow{\tau} n
\xrightarrow{\tau} 3 \xrightarrow{\tau} 5 \xrightarrow{\tau} \cdots
\xrightarrow{\tau} (n - 1) \xrightarrow{\tau} 2\]
hence \(\tau = (2 \, 4 \cdots n \, 3 \cdots (n - 1))\) is a cycle of length
\(n - 1\) (it contains every number except \(1\)) hence \(\operatorname{sgn}(
\tau) = 1\) (odd cycles have even sign). Similarly, if \(n\) is odd,
then
\[2 \xrightarrow{\tau} 4 \xrightarrow{\tau} \cdots \xrightarrow{\tau} (n - 1)
\xrightarrow{\tau} 2; \quad 3 \xrightarrow{\tau} 5 \xrightarrow{\tau} \cdots
\xrightarrow{\tau} n \xrightarrow{\tau} 3\]
Hence \(\tau\) is the product of two disjoint cycles of the same length,
so it has sign \((\pm 1)^2 = 1\) here as well. Hence we always have
\[1 = \operatorname{sgn}(\tau) = - \operatorname{sgn}(\sigma)\]
forcing
\[\det(F_n) = f_{n - 1, 2} f_{n, 3} - f_{n, 2} f_{n - 1, 3}.\]
\end{document}
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
\documentclass[letterpaper,10pt]{article}
% http://stackoverflow.com/a/11331659/1068170
\usepackage{upquote}
\usepackage[margin=1in]{geometry}
\usepackage{amsmath,amsthm,amssymb,hyperref,fancyhdr}
%% https://tex.stackexchange.com/a/396829/32270
\allowdisplaybreaks
\renewcommand*{\today}{November 27, 2013}
\renewcommand{\qed}{\(\blacksquare\)}
\hypersetup{colorlinks=true,urlcolor=blue}
\pagestyle{fancy}
\lhead{}
\rhead{Toying Around with Finite Difference Approximations \\ Danny Hermes}
\renewcommand{\headrulewidth}{0pt}
\DeclareMathOperator{\rank}{rank}
\title{Finite Approximations of \(u^{(k)}(c)\)}
\begin{document}
\maketitle
\pagebreak
\tableofcontents
\section{Introduction}
If we seek to approximate the derivative \(u^{(k)}\) (of a function of one
real variable) with \(n\) points, we can do so by taking advantage of
Taylor's theorem with well chosen points. Doing so is in many ways quite
natural and when creating these approximations there are a few natural
questions that arise.
\section{Motivation} Take the following as a motivation. Due to the
definition of the derivative, we can approximate \(u'(c)\) with \(2\) points
via quantities like
\[D_{+} u(c) = \frac{u(c + h) - u(c)}{h}, \, D_{-} u(c) = \frac{u(c) -
u(c - h)}{h}, \text{ and } D_0 u(c) =\frac{u(c + h) - u(c - h)}{2h}. \]
Taylor's theorem tells us that
\[u(c \pm h) = u(c) \pm h u'(c) + \frac{h^2}{2} u''(c) \pm \frac{h^3}{6}
u'''(c) + \frac{h^4}{24} u^{(4)}(c) + \mathcal{O}(h^5),\]
so that
\[D_{\pm} u(c) - u'(c) = \frac{1}{h} \mathcal{O}\left(\pm \frac{h^2}{2} u''(c)
\right) = \mathcal{O}(h) \text{ and } D_0 u(c) - u'(c) = \frac{1}{h}
\mathcal{O}\left(\frac{2 h^3}{6} u'''(c)\right) = \mathcal{O}(h^2).\]
Additionally, in a mean-value theorem inspired mode, we define
\[D^2 u(c) = \frac{\frac{u(c + h) - u(c)}{h} - \frac{u(c) - u(c - h)}{h}}{h} =
\frac{u(c + h) - 2 u(c) + u(c - h)}{h^2}.\]
as an approximation for \(u''(c)\) so that
\[D^2 u(c) - u''(c) = \frac{1}{h^2} \mathcal{O}\left(\frac{2 h^4}{24}
u^{(4)}(c)\right) = \mathcal{O}(h^2).\]
We see that we can (easily) find three different approximations for \(u'(c)\)
using only \textbf{two} points and with orders of accuracy \(\mathcal{O}(h)\)
and \(\mathcal{O}(h^2)\). Similarly, we can find an approximation for \(u''(c)\)
using \textbf{three} points with order of accuracy \(\mathcal{O}(h^2)\).
\section{Natural Questions} Along with the conclusions of our motivating
examples, a few natural questions arise:
\begin{itemize}
\item Given an approximation of \(u^{(k)}(c)\) with \(n\) points, what is the
maximum order of accurary?
\item What is the largest \(k\) such that \(u^{(k)}(c)\) can be approximated
with \(n\) points?
\item What conditions on the points must be satisfied to achieve the maximum
order of accurary?
\end{itemize}
\section{Framework for approximations} Before answering these questions, we'll
introduce some notation and explicitly write down what it means to say
``approximate with \(n\) points''.
In general, if we want to approximate \(u^{(k)}(c)\) with \(n\) points, we
seek to find coefficients \(\left\{\alpha_i\right\}_{i = 1}^n\) and
\textbf{distinct} points \(\left\{c_i\right\}_{i = 1}^n\) so that
\[u^{(k)}(c) \approx \sum_{i = 1}^n \alpha_i u(c_i).\]
In order to get the order in terms of \(\mathcal{O}(h^p)\) we instead write
our points as \(\left\{c + \beta_i h\right\}_{i = 1}^n\), which are centered
around \(c\) and with differences from the center \(\left\{c - c_i\right\}\)
that are \(\mathcal{O}(h)\). The condition that the \(c_i\) be distinct
also means that the \(\beta_i\) must be distinct.
Using these points in a Taylor approximation of order \(p\) we have:
\[u(c + \beta_i h) = \sum_{j = 0}^p u^{(j)}(c) \frac{\left(
\beta_i h\right)^j}{j!} + \mathcal{O}\left(h^{p + 1}\right).\]
Expanding these within \(\sum_{i = 1}^n \alpha_i u(c + \beta_i h)\) we have
\[u^{(k)}(c) \approx \sum_{j = 0}^p u^{(j)}(c) \left[\sum_{i = 1}^n \alpha_i
\beta_i^j\right] \frac{h^j}{j!} + \mathcal{O}\left(h^{p + 1} \sum_{i = 1}^n
\alpha_i \right).\]
We approximate \(u^{(k)}(c)\) by setting the coefficients of the \(u^{(j)}(c)\)
in the sum. To do this to approximate \(u^{(k)}(c)\) we need \(k\) to be in
the sum hence \(p \geq k\). In order for all the terms in the Taylor expansion
to approximate our derivative, we need to satisfy the system:
\[\left[ \begin{array}{c c c}
1 & \cdots & 1 \\
\beta_1 & \cdots & \beta_n \\
\vdots & \ddots & \vdots \\
\beta_1^p & \cdots & \beta_n^p \end{array}\right]
\left[ \begin{array}{c}
\alpha_1 \\
\vdots \\
\alpha_n \end{array}\right] =
\left[\underbrace{0 \, \cdots \, 0}_{\text{first } k \text{ terms}} \,
\frac{k!}{h^k} \, \underbrace{0 \, \cdots \, 0}_{\text{last } (p - k)
\text{ terms}}\right]^T.\]
%% H/T: http://tex.stackexchange.com/a/180269/32270
\section[\texorpdfstring{Largest $k$ given $n$}{Largest k given n}]{Largest \(k\) given \(n\)}
Using the above description we can show
that we must have \(k < n\).
Assuming the contrary --- \(k \geq n\) --- the first \(n\) rows of the
system must be \(0\). This is because they correspond to the terms
\[u(c) h^0, u'(c) h^1, \ldots, u^{(n - 1)}(c) h^{n - 1},\]
all of which come before \(u^{(k)}(c) = u^{(n)}(c)\). These first \(n\) rows
result in the square system \(V_n \boldsymbol{\alpha} = \mathbf{0}\) where
\(V_n\) is the Vandermonde matrix
\[\left[ \begin{array}{c c c}
1 & \cdots & 1 \\
\beta_1 & \cdots & \beta_n \\
\vdots & \ddots & \vdots \\
\beta_1^{n - 1} & \cdots & \beta_n^{n - 1} \end{array}\right].\]
(This is the transpose of what's typically called the Vandermonde but this
is the matrix that arises for our problem.) However, it's well-known that
\(\det(V_n) = \displaystyle \prod_{1 \leq i < j \leq n} (\beta_j -
\beta_i)\). Since our \(\beta_i\) are distinct, we see \(\det(V_n) \neq 0\),
which tells us that \(\boldsymbol{\alpha} = V_n^{-1} \mathbf{0} =
\mathbf{0}\). This of course results in the approximation
\[u^{(k)}(c) \approx 0\]
which is \(\mathcal{O}(1)\) and hence is a very poor approximation. Effectively,
this means that we can't approximate (in any reasonable way) \(u^{(k)}(c)\)
using \(n\) points when \(k \geq n\).
\section{Maximum order of accurary} To determine the order of accurary of a
given approximation, assume there is a solution \(\left\{\alpha_i\right\},
\left\{\beta_i\right\}\) of
\[\left[ \begin{array}{c c c}
1 & \cdots & 1 \\
\beta_1 & \cdots & \beta_n \\
\vdots & \ddots & \vdots \\
\beta_1^p & \cdots & \beta_n^p \end{array}\right]
\left[ \begin{array}{c}
\alpha_1 \\
\vdots \\
\alpha_n \end{array}\right] =
\left[\underbrace{0 \, \cdots \, 0}_{\text{first } k \text{ terms}} \,
\frac{k!}{h^k} \, \underbrace{0 \, \cdots \, 0}_{\text{last } (p - k)
\text{ terms}}\right]^T.\]
If we consider the \(\beta_i\) to be fixed, we have \(\alpha_i =
\mathcal{O}\left(\frac{1}{h^k}\right)\) due to the strong influence
of the nonzero item on the right hand side of the system. This results
in the approximation
\begin{align*}
& u^{(k)}(c) \left[\sum_{i = 1}^n \alpha_i
\beta_i^k\right] \frac{h^k}{k!} + \sum_{0 \leq j \leq p, j \neq k}
u^{(j)}(c) \left[\sum_{i = 1}^n \alpha_i \beta_i^j\right] \frac{h^j}{j!} +
\mathcal{O}\left(h^{p + 1} \sum_{i = 1}^n \alpha_i\right) \\
&= u^{(k)}(c) \left[\frac{k!}{h^k}\right] \frac{h^k}{k!} + \sum_{0 \leq j
\leq p, j \neq k} u^{(j)}(c) \left[0\right] \frac{h^j}{j!} + \mathcal{O}\left(
h^{p + 1} \frac{1}{h^k}\right) \\
&= u^{(k)}(c) + \mathcal{O}\left(h^{p - k + 1}\right).
\end{align*}
This of course depends on the existence of a solution. We've already said that
we need \(p \geq k\) and \(n > k\). We claim (and will show) that
\begin{enumerate}
\item No solution exists for \(p \geq n + 1\).
\item A family of solutions does exist for \(p = n\).
\end{enumerate}
Assuming both of the above facts, this actually tells us that the best
approximation we can get is
\[\sum_{i = 1}^n \alpha_i u(c + \beta_i h) - u^{(k)}(c) = \mathcal{O}
\left(h^{n - k + 1}\right).\]
\section{A preliminary fact} In order to actually show that no solution
exists when \(p \geq n + 1\) and that a solution does exist for \(p = n\),
we will show later that some matrix is invertible and that some other
matrix has a dimension one kernel. As we'll see, doing this will involve
factoring out the Vandermonde matrix \(V_n\) from some other matrix and
hence will require representing powers \(\beta_i^j\) in terms of the
columns of \(V_n\); these must be linear combinations of \(\left\{1, \beta_i,
\ldots, \beta_i^{n - 1}\right\}\). If we want to find \(\beta_i^j\) for
\(0 \leq j \leq n - 1\), this is trivial to find as a linear combination
of the above elements. However, if we want higher powers, we'll need to
utilize Vieta's relations.
To do so, we define
\[g(X) = (X - \beta_1) \cdots (X - \beta_n)\]
and we know by Vieta's relations that
\[g(X) = X^n - s_1 X^{n - 1} + s_2 X^{n - 2} + \cdots + (-1)^{n} s_n\]
and by construction that \(g(\beta_i) = 0\) for each \(i\).
In the above, the \(s_j\) are the elementary symmetric polynomials in the
\(\beta_i\). For example
\[s_1 = \beta_1 + \cdots + \beta_n, \, s_2 = \sum_{1 \leq i < j \leq n}
\beta_i \beta_j, \text{ and } s_n = \beta_1 \cdots \beta_n.\]
Using these, we can write both \(\beta_i^n\) and \(\beta_i^{n + 1}\) as a
linear combination (with coefficients determined by the \(s_j\)) of the
lower powers of \(\beta_i\), as desired. First,
\[0 = g(\beta_i) = \beta_i^n - s_1 \beta_i^{n - 1} + s_2 \beta_i^{n - 2} +
\cdots + (-1)^{n} s_n \Rightarrow \beta_i^n = s_1 \beta_i^{n - 1} - s_2
\beta_i^{n - 2} + \cdots + (-1)^{n - 1} s_n.\]
If we wanted to compute this in terms of \(V_n\), we can see this as the dot
product
\[\left[\begin{array}{c c c c c} (-1)^{n - 1} s_n & (-1)^{n - 2} s_{n - 1} &
\cdots & -s_2 & s_1 \end{array}\right] (V_n)_i\]
(where \((V_n)_i\) is the \(i\)th column). For future use, we'll denote this
row vector as \(R_1^T\) (we use \(1\) since it's the vector which produces
\(1\) extra power of \(\beta_i\) from the first \(n\) nonnegative powers).
By similar, logic, to compute \(\beta_i^{n + 1}\) we see that
\begin{align*}
0 = \beta_i g(\beta_i) &= \beta_i^{n + 1} - s_1 \beta_i^n + s_2
\beta_i^{n - 1} + \cdots + (-1)^{n} s_n \beta_i \\
\Rightarrow \beta_i^{n + 1} &= s_1 \beta_i^n - s_2 \beta_i^{n - 1} + \cdots +
(-1)^{n - 1} s_n \beta_i \\
&= s_1 \left[s_1 \beta_i^{n - 1} - s_2 \beta_i^{n - 2} + \cdots + (-1)^{n - 1}
s_n\right] \\
&- s_2 \beta_i^{n - 1} + \cdots + (-1)^{n - 1} s_n \beta_i \\
&= \left[s_1 s_1 - s_2\right] \beta_i^{n - 1} - \left[s_1 s_2 - s_3\right]
\beta_i^{n - 2} + \cdots \\
&+ (-1)^{n - 2} \left[s_1 s_{n - 1} - s_n\right] \beta_i + (-1)^{n - 1}
s_1 s_n.
\end{align*}
Again, viewing this in terms of matrix computations we see it as the dot
product
\[\left[\begin{array}{c c c c c} (-1)^{n - 1} s_1 s_n & (-1)^{n - 2} \left(
s_1 s_{n - 1} - s_n\right) & \cdots & -\left(s_1 s_2 - s_3\right) & \left(
s_1 s_1 - s_2\right) \end{array}\right] (V_n)_i.\]
For future use, we'll denote this row vector as \(R_2^T\).
\section[\texorpdfstring{No solution exists for $p \geq n + 1$}{No solution exists for p geq n + 1}]{No solution exists for \(p \geq n + 1\)}
Now with the machinery
we need, we're ready to prove that accuracy greater than or equal to
\((n + 1) - k + 1\) is not possible. For such \(p\), the system
\[\left[ \begin{array}{c c c}
1 & \cdots & 1 \\
\beta_1 & \cdots & \beta_n \\
\vdots & \ddots & \vdots \\
\beta_1^p & \cdots & \beta_n^p \end{array}\right]
\left[ \begin{array}{c}
\alpha_1 \\
\vdots \\
\alpha_n \end{array}\right] =
\left[\underbrace{0 \, \cdots \, 0}_{\text{first } k \text{ terms}} \,
\frac{k!}{h^k} \, \underbrace{0 \, \cdots \, 0}_{\text{last } (p - k)
\text{ terms}}\right]^T\]
must contain the square subsystem
\[\left[ \begin{array}{c c c}
1 & \cdots & 1 \\
\vdots & \ddots & \vdots \\
\beta_1^n & \cdots & \beta_n^n \\
\beta_1^{n + 1} & \cdots & \beta_n^{n + 1} \end{array}\right]
\left[ \begin{array}{c}
\alpha_1 \\
\vdots \\
\alpha_n \end{array}\right] =
\left[ \begin{array}{c}
0 \\
\vdots \\
0 \end{array}\right]\]
where the matrix on the left does not contain the row corresponding to
\(u^{(k)}(c)\) or the row directly after it. (This last condition is to make
the system square since the exponents \(0, \ldots, n + 1\) are in \(n + 2\)
rows and we need to remove \(2\) of those rows, one of which must be the
nonzero row corresponding to \(k < n\).) When \(k = n - 1\), removing the
row directly after \(u^{(k)}(c)\) would remove the powers of \(\beta_i^n\)
so in this case we remove the row directly before it since we'll end up
working with those powers.
We introduce the notation \(A_n(k)\) for the ``augmented'' Vandermonde
matrix described above. Note that the last statement is equivalent to
\(A_n(n - 1) = A_n(n - 2)\) and for \(k < n - 1\) the rows of \(A_n(k)\)
correspond to the powers \(\left\{0, \ldots, k - 1, k + 2, \ldots, n,
n + 1\right\}\) of the \(\beta_i\).
Since we seek to show that no solution exists for \(p \geq n + 1\), we'll
show that \(A_n(k)\) is invertible, which means the system \(A_n(k)
\boldsymbol{\alpha} = \mathbf{0}\) has only the trivial solution \(
\boldsymbol{\alpha} = \mathbf{0}\). As we saw above, this would mean our
approximation was \(u^{(k)}(c) \approx 0\) which is \(\mathcal{O}(1)\) and
not a valuable approximation.
\section{Invertibility of first augmented Vandermonde} To show that
\(A_n(k)\) is invertible, we proceed much like we did before but first show
that we can factor this as \(A_n(k) = C_n(k) V_n\) for some coefficient matrix
\(C_n(k)\) which depends on \(k\). By construction, the first \(n - 2\) rows
of \(A_n(k)\) are powers of \(\beta_i\) lower than \(n\) hence they are rows
of \(V_n\). Clearly \(e_j^T V_n\) produces the \(j\)th row of \(V_n\) which
is the row of \(\beta_i^{j - 1}\). As we showed above, \(R_1^T V_n\) will
produce the row of \(\beta_i^n\) and \(R_2^T V_n\) will produce the row of
\(\beta_i^{n + 1}\). Altogether, the row exponents \(\left\{0, \ldots, k - 1,
k + 2, \ldots, n, n + 1 \right\}\) of \(A_n(k)\) correspond to
multiplications by \(\left\{e_1^T, \ldots, e_k^T, e_{k + 3}^T, \ldots,
R_1^T, R_2^T\right\}\). So if we define
\[C_n(k) = \left[\begin{array}{c c c c c c c} e_1 & \cdots & e_k &
e_{k + 3} & \cdots & R_1 & R_2 \end{array}\right]^T\]
then we must have \(A_n(k) = C_n(k) V_n\). (As before, since we define
\(A_n(n - 1) = A_n(n - 2)\) we'll also define \(C_n(n - 1) = C_n(n - 2)\).
Also, in the case that \(k = n - 2\), \(e_{k + 3} = e_{n + 1}\), so the
description of \(C_n(n - 2)\) shouldn't actually include \(e_{k + 3}\).
This makes perfect sense looking back at the exponents since the exponent
\(k + 2 = n\) corresponds to the exponent given by \(R_1\) rather than
the exponent given by \(e_{k + 3}\).)
In order to show that \(A_n(k)\) is invertible, it's enough to show that
it has nonzero determinant for \(1 \leq k \leq n - 2\). But
\[\det(A_n(k)) = \det(C_n(k)) \det(V_n).\]
As we showed above, \(\det(V_n) \neq 0\) since the \(\beta_i\) are distinct
so it remains to show that \(\det(C_n(k)) \neq 0\). Since determinant is
multilinear, in general
\[\det(\ldots, e_j, \ldots, v, \ldots) = \det(\ldots, e_j, \ldots, v -
v_j e_j, \ldots);\]
in other words, when computing the determinant, if one of the columns is the
basis vector \(e_j\) then we can assume all the other entries in row \(j\) are
\(0\). We know the first \(n - 2\) columns of \(C_n(k)^T\) are \(e_j\) for
\(1 \leq j \leq n\) except for \(j = k + 1, k + 2\). Hence, the only entries of
\(R_1, R_2\) that contribute to the determinant will be in rows \(k + 1,
k + 2\) of \(C_n(k)^T\). After removing all entries of \(R_1, R_2\) but those
in rows \(k + 1, k + 2\) we have a \(2 \times 2\) square matrix in the final
two columns and an augmented copy of \(I_{n - 2}\) in the first \(n - 2\)
columns hence
\[\pm \det(C_n(k)) = R_{1, k + 1} R_{2, k + 2} - R_{1, k + 2} R_{2, k + 1}\]
where the sign may be \(\pm 1\) depending on the cofactor expansion. (We
can actually show it's always \(1\) but it's not needed here.)
Hence to show \(\det(A_n(k)) \neq 0\) it's enough to show that \(R_{1, k + 1}
R_{2, k + 2} - R_{1, k + 2} R_{2, k + 1} \neq 0\). Converting this to the
symmetric polynomials involved, since \(k \leq n - 2\) we have
\[R_{1, k + 1} = (-1)^{n - k - 1} s_{n - k}, R_{1, k + 2} = (-1)^{n - k - 2}
s_{n - k - 1}\]
and
\[R_{2, k + 1} = (-1)^{n - k - 1} \left(s_1 s_{n - k} - s_{n - k + 1}\right),
R_{2, k + 2} = (-1)^{n - k - 2} \left(s_1 s_{n - k - 1} - s_{n - k}\right).\]
(We don't need to worry about \(R_{2, 1} = (-1)^{n - 1} s_1 s_n\) --- which
has a different form than the rest of \(R_2\) --- since \(k + 1 \geq 2\).
Also since \(k \leq n - 2\) the terms \(R_{r, k + 2}\) will always be
defined.) With these, we have
\begin{align*}
R_{1, k + 1} R_{2, k + 2} - R_{1, k + 2} R_{2, k + 1} &= (-1)^{n - k - 1}
s_{n - k} (-1)^{n - k - 2} \left(s_1 s_{n - k - 1} - s_{n - k}\right) \\
&- (-1)^{n - k - 2} s_{n - k - 1} (-1)^{n - k - 1} \left(s_1 s_{n - k} -
s_{n - k + 1}\right) \\
&= (-1)^{-3} \left(s_{n - k} s_1 s_{n - k - 1} - s_{n - k}^2\right)\\
&- (-1)^{-3} \left(s_{n - k - 1} s_1 s_{n - k} - s_{n - k - 1}
s_{n - k + 1}\right) \\
&= s_{n - k}^2 - s_{n - k - 1} s_{n - k + 1}.
\end{align*}
However, by \href{http://en.wikipedia.org/wiki/Newton's\_inequalities}
{Newton's inequalities}, we know that.
\[\left(\frac{s_{n - k}}{\binom{n}{n - k}}\right)^2 \geq
\frac{s_{n - k - 1}}{\binom{n}{n - k - 1}} \frac{s_{n - k + 1}}{\binom{n}
{n - k + 1}} \Longleftrightarrow s_{n - k}^2 \geq \left(\frac{k + 1}{n - k}
s_{n - k - 1}\right) \left(\frac{n - k + 1}{k} s_{n - k + 1}\right).\]
Since both of the fractions \(\frac{k + 1}{k}\) and \(\frac{n - k + 1}
{n - k}\) are greater than \(1\) when \(1 \leq k \leq n - 2\), we actually
see that \(s_{n - k}^2 > s_{n - k - 1} s_{n - k + 1}\) whenever
\(s_{n - k - 1} s_{n - k + 1} > 0\). If \(s_{n - k - 1} s_{n - k + 1} < 0\)
then \(s_{n - k}^2 \geq 0 > s_{n - k - 1} s_{n - k + 1}\) so the case
\(s_{n - k - 1} s_{n - k + 1} = 0\) is the only other potentially
problematic case.
We'd like to show that \(s_{n - k}^2 - s_{n - k - 1} s_{n - k + 1} = 0\)
can't occur when \(s_{n - k - 1} s_{n - k + 1} = 0\) and will show this by
contradiction. Assuming the two hold, then we also have \(s_{n - k}^2 = 0\)
forcing \(s_{n - k} = s_{n - k - 1} = 0\) or \(s_{n - k} = s_{n - k + 1}
= 0\). So it's enough to show that \(s_{j - 1} = s_j = 0\) forces
\(\beta_{i_0} = \beta_{i_1}\) for some \(i_0 \neq i_1\) which contradicts
our requirement of distinct \(\beta_i\). (This assumes \(n \geq 2\). If
\(n = 1\) we can't approximate any derivatives by the above so it's not
worth addressing.)
\section[\texorpdfstring{The equality $s_{j - 1} = s_j = 0$ can't occur for distinct $\beta_i$}{The equality s\{j-1\} = sj = 0 can't occur for distinct beta}]{The equality \(s_{j - 1} = s_j = 0\) can't occur for distinct \(\beta_i\)}
If \(j = n\) then \(0 = s_j = s_n\) means that the product of
all the \(\beta_i\) is \(0\), forcing \(\beta_{i_0} = 0\) for some \(i_0\).
Since all the \(\binom{n}{n - 1}\) terms in the sum \(s_{n - 1}\) involve
\(\beta_{i_0}\) except for \(\binom{n - 1}{n - 1} = 1\) of them,
\(\beta_{i_0} = 0\) combined with \(s_{n - 1} = 0\) tells us that
\[0 = s_{n - 1} = \beta_{i_0} \cdot \left(\text{something}\right) +
\prod_{i \neq i_0} \beta_i = \prod_{i \neq i_0} \beta_i.\]
Hence, there is also \(i_1 \neq i_0\) with \(\beta_{i_1} = 0\).
To do this for \(j < n\), we'll need a \hyperref[reduction-lemma]{lemma}
which reduces symmetric polynomials of \(n\) points to symmetric polynomials
of \(n - 1\) points. The lemma shows that for a given \(j < n\)
\[\frac{s_j(\beta_1, \ldots, \beta_n)}{\binom{n}{j}} = \frac{s_j\left(
\beta^{(1)}_1, \ldots, \beta^{(1)}_{n - 1}\right)}{\binom{n - 1}{j}}\]
where \(\left\{\beta^{(1)}_i\right\}\) is a set of \(n - 1\) points, each
of which lies between two consecutive \(\beta_i\). (Note we have used
\(s_j(\, \cdot \,)\) without defining it. This is simply the elementary
symmetric polynomial with respect to the variables listed.) Hence we see that
\[s_j(\beta_1, \ldots, \beta_n) = 0 \Longleftrightarrow s_j\left(
\beta^{(1)}_1, \ldots, \beta^{(1)}_{n - 1}\right) = 0.\]
From the \(\beta^{(1)}_{i}\) we can get another set of \(n - 2\) points that
also have \(s_j = 0\) and so on to even lower numbers of points. Let
\(\beta^{(r)}_1, \ldots, \beta^{(r)}_{n - r}\) denote this set of points after
\(r\) reductions (and for completeness of the notation let \(\beta^{(0)}_i =
\beta_i\)). We'll show that \(s_{j - 1} = s_j = 0\) forces at least two
members of the \(\beta^{(n - j)}_i\) to be zero which in turn forces at
least \(N_r = n - j - r + 2\) members of the \(\beta^{(r)}_i\) to be zero
for \(0 \leq r \leq n - j\). Hence when \(r = 0\) we have at least \(N_0 =
n - j + 2 \geq 1 + 2 = 3\) members equal to \(0\) which is not possible
since we have distinct \(\beta^{(0)}_i = \beta_i\).
First, when \(r = n - j\), we have \(n - r = j\) points with
\[s_{j - 1}\left(\beta^{(r)}_1, \ldots, \beta^{(r)}_j\right) = s_j\left(
\beta^{(r)}_1, \ldots, \beta^{(r)}_j\right) = 0.\]
As we showed above, this means that at least two members of the
\(\beta^{(r)}_i\) are equal to \(0\). Since we know that each
\(\beta^{(r)}_i\) lies between two of the \(\beta^{(r - 1)}_i\), this also
means that at least one of the \(\beta^{(r - 1)}_i\) lies between between
the two \(\beta^{(r)}_i\) which are equal to \(0\). Clearly the only points
with \(0 \leq p \leq 0\) are \(0\) hence we see at least one of the
\(\beta^{(r - 1)}_i\) is \(0\), say \(\beta^{(r)}_{i_0} = 0\). Since \(n -
(r - 1) = j + 1\), we have \(j + 1\) points at this level and of the
\(\binom{j + 1}{j}\) terms in the sum \(s_j\) only \(\binom{j}{j} = 1\) of
them don't involve \(\beta^{(r)}_{i_0}\). Hence the elementary symmetric
polynomial \(\widehat{s}_j\) for the \(j\) points excluding
\(\beta^{(r)}_{i_0}\) satisfies \(\widehat{s}_j = 0\) as well since
\[0 = s_j = \beta^{(r)}_{i_0} \cdot \left(\text{something}\right) +
\widehat{s}_j = 0 + \widehat{s}_j = \widehat{s}_j.\]
Similarly, of the \(\binom{j + 1}{j - 1}\) terms in
the sum \(s_{j - 1}\) only \(\binom{j}{j - 1} = j\) of them don't involve
\(i_0\). Hence the elementary symmetric polynomial \(\widehat{s}_{j - 1}\)
for the \(j\) points excluding \(\beta^{(r)}_{i_0}\) satisfies
\(\widehat{s}_{j - 1} = 0\) as well. Thus we are again reduced to the
situation where \(\widehat{s}_{j - 1} = \widehat{s}_j = 0\) with \(j\)
points and so find distinct \(i_1, i_2\) from the set excluding \(i_0\) with
\(\beta^{(r)}_{i_1} = \beta^{(r)}_{i_2} = 0\). This gives us \(3\) points
which agrees with \(N_r = n - (n - j - 1) - j + 2 = 3\) when \(r = n - j -
1\).
Carrying forward, we see that if we have at least \(N_r\) of the
\(\beta^{(r)}_i\) equal to \(0\) then backing up a step will result in
\(N_r - 1\) of the \(\beta^{(r - 1)}_i\) equal to \(0\) since the points at
the \(r\) reduction must lie between the points at the \(r - 1\) reduction.
But this reduces the \(n - (r - 1)\) points \(\beta^{(r - 1)}_i\) to \(n -
(r - 1) - (N_r - 1) = n - r - N_r + 2\) potentially nonzero points. For
\(r = n - j\) and \(r = n - j - 1\) we saw that \(N_r = n - r - j + 2\) so
that at each stage we reduce to \(n - r - N_r + 2 = j\) potentially nonzero
points and have \(\widehat{s}_{j - 1} = \widehat{s}_j = 0\). Again, by the
same logic as above this gives two more \(\beta^{(r - 1)}_i = 0\) bringing
the total to \(N_{r - 1} = (N_r - 1) + 2 = N_r + 1\). So in general,
reducing \(r\) by \(1\) raises \(N_r\) by \(1\) and the value
\[j = n - r - N_r + 2\]
remains invariant (this is needed to reduce to \(n - r - N_r + 2 = j\) points
when backing up a step/reducing \(r\)).
\section[\texorpdfstring{A family of solutions exists for $p = n$}{A family of solutions exists for p = n}]{A family of solutions exists for \(p = n\)}
Since we showed
the system is degenerate when \(p > n\), we know the maximal order for our
approximation which has not yet been ruled out occurs when \(p = n\). We'll
show both that it's possible for the system to have a solution and give a
condition on such a solution.
When \(p = n\) we have a system of \(p + 1 = n + 1\) equations
in \(2n\) indeterminates:
\[\left[ \begin{array}{c c c}
1 & \cdots & 1 \\
\beta_1 & \cdots & \beta_n \\
\vdots & \ddots & \vdots \\
\beta_1^n & \cdots & \beta_n^n \end{array}\right]
\left[ \begin{array}{c}
\alpha_1 \\
\vdots \\
\alpha_n \end{array}\right] =
\left[\underbrace{0 \, \cdots \, 0}_{\text{first } k \text{ terms}} \,
\frac{k!}{h^k} \, \underbrace{0 \, \cdots \, 0}_{\text{last } (n - k)
\text{ terms}}\right]^T.\]
Before, to get a square system with all zeroes on the right, we had to remove
\textbf{two} of the first \(n + 2\) rows but now we have only one to remove
and a unique choice at that: the row corresponding to \(\beta_i^k\). As
before we introduce the notation \(B_n(k)\) for the ``augmented'' Vandermonde
matrix where the rows of \(B_n(k)\) correspond to the powers \(\left\{0,
\ldots, k - 1, k + 1, \ldots, n\right\}\) of the \(\beta_i\). By similar
calculations as above we must have
\[B_n(k) = \left[\begin{array}{c c c c c c c} e_1 & \cdots & e_k &
e_{k + 2} & \cdots & R_1 \end{array}\right]^T V_n = D_n(k) V_n\]
for some coefficient matrix \(D_n(k)\). (In the case that \(k = n - 1\),
the \(e_{k + 2}\) above does not belong since \(k + 1\) corresponds to
\(R_1\) directly and the first \(0, \ldots, k - 1\) are the shifted
identity. As before we have \(k \leq n - 1\) since we can't approximate any
higher derivatives with \(n\) points.)
In order for a nontrivial solution to exist (the opposite of what we wanted
above), \(B_n(k)\) can't be invertible (also the opposite). Hence we need
\(\det(B_n(k)) = 0\). As we've seen, for distinct \(\beta_i\), \(\det(V_n)
\neq 0\) so we need \(\det(D_n(k)) = 0\). Again, due to multilinearity of
the determinant, all the entries in the last row --- \(R_1\) --- of
\(D_n(k)\) can be considered \(0\) except for \(R_{1,k + 1}\) since
\(e_{k + 1}\) is the only basis vector missing in the first \(n - 1\) rows.
Hence
\[\pm \det(D_n(k)) = R_{1, k + 1}\]
where the sign may be \(\pm 1\) depending on the cofactor expansion. (We
can actually show it's always \((-1)^{n - k - 1}\) but it's not needed here.)
Since \(R_{1, k + 1} = (-1)^{n - k - 1} s_{n - k}\), we see that a
\fbox{nontrivial solution can only exist if \(s_{n - k} = 0\)}.
\section{Inspecting the solutions} Having \(s_{n - k} = 0\) means that
we can pick any \(\beta_1, \ldots, \beta_{n - 1}\) we like and \(\beta_n\)
will be uniquely determined. To see this, denote \(\widehat{s}_j\) as the
\(j\)th symmetric sum in the \(\beta_1, \ldots, \beta_{n - 1}\) (note that
\(\beta_n\) is excluded). With this, we have
\[s_{n - k} = \beta_n \widehat{s}_{n - k - 1} + \widehat{s}_{n - k}\]
so that \(s_{n - k} = 0\) gives us the unique (only unique to \(k\)) choice
\[\beta_n = - \frac{\widehat{s}_{n - k}}{\widehat{s}_{n - k - 1}}.\]
(We define \(\widehat{s}_0 = 1\). This is needed for the
case \(s_1 = \beta_n \widehat{s}_0 + \widehat{s}_1\) to make sense and also
fits with the general case where \(1\) is just the value of an empty product.)
We see that if \(\widehat{s}_{n - k - 1} = 0\) then there is no solution,
so there is some part of \(\mathbf{R}^{n - 1}\) where no solution exists but
this is a measure \(0\) set. Once we've picked our \(\left(\beta_1, \ldots,
\beta_{n - 1}\right)\) to find \(\boldsymbol{\alpha}\) we need to determine
the kernel of \(B_n(k)\) since \(B_n(k) \boldsymbol{\alpha} = \mathbf{0}\).
Since the first \(n - 1\) rows of \(D_n(k)\) are linearly independent
standard basis vectors we have \(\rank(D_n(k)) \geq n - 1\). Since
\(\det(D_n(k)) = 0\) we know it can't have full rank hence the rank is
exactly \(n - 1\) meaning the kernel of \(D_n(k)\) is one dimensional.
Hence the same is true of \(B_n(k)\) since \(V_n\) is invertible. This is
quite helpful since it limits our choices of \(\boldsymbol{\alpha}\) to a
single scalar multiple of a basis vector of the kernel. To find a choice for
this basis vector we start with
\[B_n(k) v = 0 \Longleftrightarrow (D_n(k) V_n)v = 0 \Longleftrightarrow
D_n(k) w = 0, w = V_n v.\]
But since the first \(n - 1\) rows of \(D_n(k)\) are the standard basis vectors
this means \(0 = e_j^T w = w_j\) for \(j \neq k + 1\), making \(w_{k + 1}\)
the only nonzero component. Since \(R_1^T e_{k + 1} = R_{1, k + 1} = 0\) by
construction, \(w = e_{k + 1}\) will do the trick as our basis for the kernel
of \(D_n(k)\) hence \(v = V_n^{-1} w = V_n^{-1} e_{k + 1}\) works as a basis
for the kernel of \(B_n(k)\).
So we have \(\boldsymbol{\alpha} = \ell V_n^{-1} e_{k + 1}\) for some scalar
\(\ell\) and the final condition is
\[\left[\begin{array}{c c c} \beta_1^k & \cdots & \beta_n^k \end{array}
\right] \boldsymbol{\alpha} = \frac{k!}{h^k} \Longleftrightarrow \ell
\left[\begin{array}{c c c} \beta_1^k & \cdots & \beta_n^k \end{array}
\right] V_n^{-1} e_{k + 1} = \frac{k!}{h^k}.\]
As it turns out \(\left[\begin{array}{c c c} \beta_1^k & \cdots & \beta_n^k
\end{array} \right] V_n^{-1} e_{k + 1} = 1\) which tells us that \(\ell =
\frac{k!}{h^k}\). To see this note that \(e_j^T V_n\) gives the \(j\)th row
of \(V_n\), hence
\[e_{k + 1}^T V_n = \left[\begin{array}{c c c} \beta_1^k & \cdots & \beta_n^k
\end{array} \right] \Rightarrow \left(\left[\begin{array}{c c c} \beta_1^k &
\cdots & \beta_n^k \end{array} \right] V_n^{-1}\right) e_{k + 1} =
e_{k + 1}^T e_{k + 1} = 1.\]
Thus the solution is \(\boldsymbol{\alpha} = \frac{k!}{h^k} V_n^{-1} e_{k +
1}\); for one, this confirms our notion that \(\alpha_i = \mathcal{O}
\left(\frac{1}{h^k}\right)\). Alternatively, this is the same as the solution
we would have obtained had we just dropped the \((n + 1)\)st row from the
system and tried to solve \(V_n \boldsymbol{\alpha} = \frac{k!}{h^k}
e_{k + 1}\) (this would not have guaranteed the order condition, which
we needed \(s_{n - k} = 0\) for).
\section[\texorpdfstring{Choosing the $n$ points}{Choosing the n points}]{Choosing the \(n\) points}
Though we know the theoretical bound
on the estimate and a condition on points to achieve this bound, this
doesn't narrow down the choices for our \(\beta_i\). Given an approximation
for \(u^{(k)}(c)\), we know by the above that the \(u^{(n + 1)}(c)\) terms
in the Taylor expansion can never sum to \(0\). However, we may be able to
make the coefficient of this term small to get a better estimate.
For \(\left\{\alpha_i\right\}\), \(\left\{\beta_i\right\}\) which give a
maximum order approximation of \(u^{(k)}(c)\) we have
\[u^{(k)}(c) \approx \sum_{i = 1}^n \alpha_i u(c + \beta_i h) = u^{(k)}(c)
+ u^{(n + 1)}(c) \frac{h^{n + 1}}{(n + 1)!} \sum_{i = 1}^n \alpha_i
\beta_i^{n + 1} + \mathcal{O}\left(h^{n + 2} \sum_{1 = 1}^n \alpha_i
\right).\]
So the part of the coefficient we can control is determined by
\[\sum_{i = 1}^n \alpha_i \beta_i^{n + 1} = \left[\begin{array}{c c c}
\beta_1^{n + 1} & \cdots & \beta_n^{n + 1}\end{array}\right] \left[
\begin{array}{c} \alpha_1 \\
\vdots \\
\alpha_n \end{array}\right]\]
However, as we showed above \(R_2^T V_n = \left[\begin{array}{c c c}
\beta_1^{n + 1} & \cdots & \beta_n^{n + 1}\end{array} \right]\) hence
the coefficient is
\[R_2^T V_n \boldsymbol{\alpha} = R_2^T V_n \left(\frac{k!}{h^k} V_n^{-1}
e_{k + 1}\right) = \frac{k!}{h^k} R_2^T e_{k + 1} = \frac{k!}{h^k}
R_{2, k + 1}.\]
Since \(k + 1 \geq 2\) (\(R_{2, 1}\) is different than the rest of the entries
in \(R_2\)) we know \(R_{2, k + 1}\) has the form
\[R_{2, k + 1} = (-1)^{n - k - 1} \left(s_1 s_{n - k} - s_{n - k + 1}
\right) = (-1)^{n - k} s_{n - k + 1}\]
since we already have \(s_{n - k} = 0\).
Since \(s_{n - k}, s_{n - k + 1}\) are homogenous we can make \(s_{n - k +
1}\) as small as we want by simply scaling the \(\beta_i\) that make \(s_{n -
k} = 0\). Thus the infimum of \(\left|s_{n - k + 1}\right|\) is \(0\) but
this can never be attained since we showed before that the uniqueness
condition makes \(s_{n - k} = s_{n - k + 1} = 0\) impossible. However, it's
still worth noting that our approximation takes the form
\[u^{(k)}(c) - \frac{k! \cdot u^{(n + 1)}(c) \cdot s_{n - k + 1}}{(n + 1)!}
(-h)^{n - k + 1} + \mathcal{O}\left(h^{n - k + 2}\right).\]
\section{Reasonable Choices} Since we'd be approximating on a grid, it's
much more reasonable to assume that \((\beta_1, \ldots, \beta_n) \in
\mathbf{Z}^n\) than the much less restricted \(\mathbf{R}^n\). Now, the
equality
\[\beta_n = - \frac{\widehat{s}_{n - k}}{\widehat{s}_{n - k - 1}}\]
has some more meaning.
The computations iterate over tuples which satisfy \(s_{n - k} = 0\) and a
few more constraints. Since we seek to minimize \(s_{n - k + 1}\), we
require the tuple has no common divisor \(d > 1\), for if it did we could
divide by it and reduce the size of \(s_{n - k + 1}\) by a factor of
\(d^{n - k + 1}\). In addition, since tuples are the same up to re-ordering
we require that \(\beta_1 < \cdots < \beta_n\) (with the strictness due to
the uniqueness constraint on the \(\beta_i\)). Finally, since scaling by
\(-1\) has no impact on the zero condition we require that
\[\left|\beta_n\right| = \max_i \left|\beta_i\right|\]
(really only \(\left|\beta_1\right|\) and \(\left|\beta_n\right|\) were
candidates, one being the smallest negative, the other the largest positive).
With these tuples in hand we compute some minima of \(\left|s_{n - k + 1}
\right|\):
\begin{align*}
%% n = 2
n &= 2 \\
\left|s_{2}\left(-1, 1\right)\right| &= 1 \text{ when } k = 1 \\
%% s_{2 - 1}\left(-1, 1\right) &= 0 \\
%% \left|s_{2 - 1 + 1}\left(-1, 1\right)\right| &= 1 \\
%% n = 3
n &= 3 \\
\left|s_{3}\left(-2, 3, 6\right)\right| &= 36 \text{ when } k = 1 \\
\left|s_{2}\left(-1, 0, 1\right)\right| &= 1 \text{ when } k = 2 \\
%% s_{3 - 1}\left(-2, 3, 6\right) &= 0 \\
%% \left|s_{3 - 1 + 1}\left(-2, 3, 6\right)\right| &= 36 \\
%% s_{3 - 2}\left(-1, 0, 1\right) &= 0 \\
%% \left|s_{3 - 2 + 1}\left(-1, 0, 1\right)\right| &= 1 \\
%% n = 4
n &= 4 \\
\left|s_{4}\left(-2, -1, 1, 2\right)\right| &= 4 \text{ when } k = 1 \\
\left|s_{3}\left(-2, 1, 2, 4\right)\right| &= 20 \text{ when } k = 2 \\
\left|s_{2}\left(-2, -1, 1, 2\right)\right| &= 5 \text{ when } k = 3 \\
%% s_{4 - 1}\left(-2, -1, 1, 2\right) &= 0 \\
%% \left|s_{4 - 1 + 1}\left(-2, -1, 1, 2\right)\right| &= 4 \\
%% s_{4 - 2}\left(-2, 1, 2, 4\right) &= 0 \\
%% \left|s_{4 - 2 + 1}\left(-2, 1, 2, 4\right)\right| &= 20 \\
%% s_{4 - 3}\left(-2, -1, 1, 2\right) &= 0 \\
%% \left|s_{4 - 3 + 1}\left(-2, -1, 1, 2\right)\right| &= 5 \\
%% n = 5
n &= 5 \\
\left|s_{5}\left(-2, -1, 1, 3, 6\right)\right| &= 36 \text{ when } k = 1 \\
\left|s_{4}\left(-2, -1, 0, 1, 2\right)\right| &= 4 \text{ when } k = 2 \\
\left|s_{3}\left(-2, 0, 1, 2, 4\right)\right| &= 20 \text{ when } k = 3 \\
\left|s_{2}\left(-2, -1, 0, 1, 2\right)\right| &= 5 \text{ when } k = 4 \\
%% s_{5 - 1}\left(-2, -1, 1, 3, 6\right) &= 0 \\
%% \left|s_{5 - 1 + 1}\left(-2, -1, 1, 3, 6\right)\right| &= 36 \\
%% s_{5 - 2}\left(-2, -1, 0, 1, 2\right) &= 0 \\
%% \left|s_{5 - 2 + 1}\left(-2, -1, 0, 1, 2\right)\right| &= 4 \\
%% s_{5 - 3}\left(-2, 0, 1, 2, 4\right) &= 0 \\
%% \left|s_{5 - 3 + 1}\left(-2, 0, 1, 2, 4\right)\right| &= 20 \\
%% s_{5 - 4}\left(-2, -1, 0, 1, 2\right) &= 0 \\
%% \left|s_{5 - 4 + 1}\left(-2, -1, 0, 1, 2\right)\right| &= 5
n &= 6 \\
\left|s_{6}\left(-3, -2, -1, 1, 2, 3\right)\right| &= 36 \text{ when } k = 1 \\
\left|s_{5}\left(-2, -1, 0, 1, 3, 6\right)\right| &= 36 \text{ when } k = 2 \\
\left|s_{4}\left(-3, -2, -1, 1, 2, 3\right)\right| &= 49 \text{ when } k = 3 \\
\left|s_{3}\left(-2, -1, 0, 1, 3, 7\right)\right| &= 50 \text{ when } k = 4 \\
\left|s_{2}\left(-3, -2, -1, 1, 2, 3\right)\right| &= 14 \text{ when } k = 5 \\
n &= 7 \\
\left|s_{7}\left(-3, -2, -1, 1, 2, 4, 12\right)\right| = \left|s_{7}\left(-4,
-2, -1, 1, 3, 4, 6\right)\right| &= 576 \text{ when } k = 1 \\
\left|s_{6}\left(-3, -2, -1, 0, 1, 2, 3\right)\right| &= 36 \text{ when }
k = 2 \\
\left|s_{5}\left(-5, -4, -3, -1, 0, 2, 14\right)\right| &= 2036 \text{ when }
k = 3 \\
\left|s_{4}\left(-3, -2, -1, 0, 1, 2, 3\right)\right| &= 49 \text{ when }
k = 4 \\
\left|s_{3}\left(-3, -1, 0, 1, 2, 3, 5\right)\right| &= 70 \text{ when }
k = 5 \\
\left|s_{2}\left(-3, -2, -1, 0, 1, 2, 3\right)\right| &= 14 \text{ when }
k = 6 \\
n &= 8 \\
\left|s_{8}\left(-4, -3, -2, -1, 1, 2, 3, 4\right)\right| &= 576 \text{ when }
k = 1 \\
\left|s_{7}\left(-3, -2, -1, 0, 1, 2, 4, 12\right)\right| = \left|s_{7}
\left(-4, -2, -1, 0, 1, 3, 4, 6\right)\right| &= 576 \text{ when } k = 2 \\
\left|s_{6}\left(-4, -3, -2, -1, 1, 2, 3, 4\right)\right| &= 820 \text{ when }
k = 3 \\
\left|s_{5}\left(-8, -4, -3, -2, 0, 1, 2, 11\right)\right| &= 3276 \text{ when }
k = 4 \\
\left|s_{4}\left(-2, -1, 0, 1, 2, 3, 4, 5\right)\right| &= 231 \text{ when }
k = 5 \\
\text{this broke the} &\text{ pattern for } s_{2m} \\
\left|s_{3}\left(-3, -2, -1, 0, 1, 3, 4, 9\right)\right| &= 182 \text{ when }
k = 6 \\
\left|s_{2}\left(-4, -3, -2, -1, 1, 2, 3, 4\right)\right| &= 30 \text{ when }
k = 7
\end{align*}
\section{Finding Minimizers: Programming}
We can freely abstract our problem as
\[\min_{\boldsymbol{\beta}} \left|s_{d + 1}\right| \qquad \text{such that }
s_d = 0, \boldsymbol{\beta} \in \mathbf{Z}^n, \;
\beta_i \neq \beta_j \; \forall i, j.\]
This is only well-defined if \(d + 1 \leq n\) since there are no more
elementary symmetric polynomials with only \(n\) points. We also must have
\(0 < d\) since \(s_0 = 1\) can never equal \(0\).
Since these are symmetric, we can WLOG say
\[\beta_1 \leq \beta_2 \leq \cdots \leq \beta_n.\]
Since the \(\left\{\beta_j\right\}\) must be distinct integers, we can say
\begin{align*}
\beta_1 + 1 &\leq \beta_2 \\
\beta_2 + 1 &\leq \beta_3 \\
&\vdots \\
\beta_{n - 1} + 1 &\leq \beta_n.
\end{align*}
What's more, we must have \(\beta_1 < 0\). (If not then we'd have \(s_d > 0\)
since there would be no negative terms to cancel with.)
With this, we can move this problem into the positive orthant
\(\mathbf{Z}_{\geq 0}^n\). We write
\[\beta_1 = - 1 - p_1 \leq -1 \qquad \text{for } p_1 \geq 0\]
and further define
\begin{align*}
p_2 = \beta_2 - \beta_1 - 1 &\geq 0 \\
p_3 = \beta_3 - \beta_2 - 1 &\geq 0 \\
&\vdots \\
p_n = \beta_n - \beta_{n - 1} - 1 &\geq 0 \\
\end{align*}
Now by construction we have
\(\mathbf{p} = \left(p_1, \ldots, p_n\right) \in \mathbf{Z}_{\geq 0}^n\).
\subsection{Example: $n = 2$}
We have
\[\beta_1 = -1 - p_1, \quad \beta_2 = 1 + \beta_1 + p_2 = -p_1 + p_2\]
The only allowable values is \(d = 1\). We have
\[s_1 = \beta_1 + \beta_2 = -1 - 2 p_1 + p_2\]
and
\[s_2 = \beta_1 \beta_2 = \left(-1 - p_1\right)\left(-p_1 + p_2\right).\]
To solve
\[s_1 = 0 \Longrightarrow p_2 = 1 + 2 p_1 \Longrightarrow \left|s_2\right| =
\left|\left(-1 - p_1\right)\left(1 + p_1\right)\right| =
\left(1 + p_1\right)^2.\]
For \(p_1 \geq 0 > -1\), this function is increasing so choosing
\(p_1 = 0\) minimizes which gives \(p_2 = 1 + 2(0) = 1\).
Converting back to \(\left\{\beta_j\right\}\), we confirm the solution
we've seen before:
\[\boxed{\beta_1 = -1 - 0 = -1, \qquad \beta_2 = -0 + 1 = 1}.\]
\subsection{Example: $n = 3$}
We add to the above with
\[\beta_3 = 1 + \beta_2 + p_3 = 1 - p_1 + p_2 + p_3\]
and
\begin{align*}
s_1 &= \left[-1 - 2 p_1 + p_2\right] + \beta_3 = - 3 p_1 + 2 p_2 + p_3 \\
s_2 &= \left[\left(-1 - p_1\right)\left(-p_1 + p_2\right)\right] +
\beta_3 \left[-1 - 2 p_1 + p_2\right] \\
&= 3 p_{1}^{2} - 4 p_{1} p_{2} - 2 p_{1} p_{3} + p_{2}^{2} +
p_{2} p_{3} - p_{2} - p_{3} - 1 \\
s_3 &= \left(-1 - p_1\right)\left(-p_1 + p_2\right)\left(1 - p_1 + p_2 + p_3\right)
\end{align*}
\paragraph{When \(d = 1\):} We can use
\begin{align*}
s_1 &= 0 \\
\Longrightarrow p_3 &= 3 p_1 - 2 p_2 \\
\Longrightarrow s_2 &=
- 3 p_{1}^{2} + 3 p_{1} p_{2} - 3 p_{1} - p_{2}^{2} + p_{2} - 1.
\end{align*}
Viewing this as a parabola in \(p_2\), it is downward opening (lead coefficient
\(-1\)). Since
\[\frac{\partial s_2}{\partial p_2} = 3 p_1 - 2 p_2 + 1\]
the max value of \(s_2\) (the down-opening parabola) occurs when
\(p_2 = \frac{3 p_1 + 1}{2}\), at which point
\[s_2\left(p_1, \frac{3 p_1 + 1}{2}\right) =
- \frac{3}{4}\left(p_1 + 1\right)^2.\]
Thus \(s_2\) is non-positive for \textbf{any} (real) choice of \(p_1\),
hence
\[\left|s_2\right| = 3 p_{1}^{2} - 3 p_{1} p_{2} +
3 p_{1} + p_{2}^{2} - p_{2} + 1.\]
To minimize these, we first look to the value above. For a fixed \(p_1\)
the ``best'' choice for \(p_2\) gives
\[p_3 = 3 p_1 - 2 \left(\frac{3 p_1 + 1}{2}\right) = -1.\]
This is not allowed, so we need to consider other choices of \(p_2\) (again
for a fixed \(p_1\)). Letting
\[p_2 = \frac{3 p_1 + 1}{2} + \delta \Longrightarrow p_3 = -1 - 2 \delta,
\; \left|s_2(p_1, p_2)\right| = \frac{3}{4}\left(p_1 + 1\right)^2 + \delta^2.\]
Thus to minimize \(\left|s_2\right|\) (for a fixed \(p_1\)) we want an
acceptable \(\delta\) that is as close to zero as possible. We see that
\[0 \leq p_3 = -1 - 2 \delta \Longrightarrow \delta \leq -\frac{1}{2}.\]
The parity of \(p_1\) determines the value of \(\delta\) nearest to
\(-\frac{1}{2}\) such that \(p_2 \in \mathbf{Z}\):
\[\delta = \begin{cases}
-\frac{1}{2} & \text{if } p_1 \equiv 0 \bmod{2} \\
-1 & \text{if } p_1 \equiv 1 \bmod{2}
\end{cases} \Longrightarrow \min \left|s_2\right| = \begin{cases}
\frac{3}{4}\left(p_1 + 1\right)^2 + \frac{1}{4} &
\text{if } p_1 \in \left\{0, 2, 4, \ldots\right\} \\
\frac{3}{4}\left(p_1 + 1\right)^2 + 1 &
\text{if } p_1 \in \left\{1, 3, 5, \ldots\right\}
\end{cases}.\]
Noting that \(\left(p_1 + 1\right)^2\) is increasing for \(p_1 \geq 0 > -1\), we
have two candidates for \(\min \left|s_2\right|\):
\[\frac{3}{4}\left(0 + 1\right)^2 + \frac{1}{4} = 1 \quad \text{and} \quad
\frac{3}{4}\left(1 + 1\right)^2 + 1 = 4\]
Thus we choose
\[p_1 = 0 \Longrightarrow p_2 = \frac{3 \cdot 0 + 1}{2} - \frac{1}{2} = 0
\Longrightarrow p_3 = 3 (0) - 2(0) = 0.\]
From this we confirm the solution we've seen before:
\[\boxed{\beta_1 = -1 - 0 = -1, \qquad \beta_2 = -0 + 0 = 0, \qquad
\beta_3 = 1 - 0 + 0 + 0 = 1}.\]
\paragraph{When \(d = 2\):} We can use
\begin{align*}
0 &= s_2 = 3 p_{1}^{2} - 4 p_{1} p_{2} - 2 p_{1} p_{3} + p_{2}^{2} +
p_{2} p_{3} - p_{2} - p_{3} - 1 \\
\Longrightarrow p_3 &=
\frac{3 p_1^2 - 4 p_1 p_2 + p_2^2 - p_2 - 1}{2 p_1 - p_2 + 1} \\
&= \frac{\left(2 p_1 - p_2 + 1\right)^2 - \left(p_1 + 1\right)^2 -
\left(2 p_1 - p_2 + 1\right)}{2 p_1 - p_2 + 1} \\
&= \left(2 p_1 - p_2\right) -
\frac{\left(p_1 + 1\right)^2}{2 p_1 - p_2 + 1}
\end{align*}
The only case this is not true is if
\[0 = 2 p_1 - p_2 + 1\]
which in turn forces (via \(s_2 = 0\)):
\[3 p_1^2 - 4 p_1 p_2 + p_2^2 - p_2 - 1 = 0\]
Solving this system of two variables in two unknowns gives
\(p_1 = p_2 = -1\). Thus we know that no triple satisfying
\(s_2 = 0\) can have \(2 p_1 - p_2 + 1 = 0\) since by definition
\(p_1, p_2 \geq 0\).
One can also show that \(p_3 \geq 0\) forces
\[\frac{p_2 - 1}{3} \leq p_1 \leq \frac{p_2 - 2}{2} \quad \text{or}
\quad p_2 + 1 \leq p_1.\]
With this, we have
\[s_3 = \frac{\left(p_1 + 1\right)^2 \left(p_1 - p_2\right)^2}{2 p_1 - p_2 + 1}
= \frac{E^2 \left(E - D\right)^2}{D}\]
where
\[D = 2 p_1 - p_2 + 1, E = p_1 + 1 \Longrightarrow p_2 - p_1 = p_1 - D + 1\]
To minimize \(s_3\), neither \(E = 0\) or \(E - D = 0\) is allowed. The first
requires \(p_1 = -1\) which isn't positive and the second forces \(p_2 = p_1\)
which in turn forces \(p_3 = -1\). With this substitution we have
\[p_3 = D - 1 - \frac{E^2}{D}.\]
In particular that tells us \(D \mid E^2\) since
\(\frac{E^2}{D} = D - 1 - p_3 \in \mathbf{Z}\). To minimize
\(\left|s_3\right|\), we consider the cases \(D\) positive and negative
separately and then try to minimize for a fixed \(E\).
\[\left|s_3\right| = \begin{cases}
\frac{E^2 \left(E - D'\right)^2}{D'} & \text{if } D' = D > 0 \\
\frac{E^2 \left(E + D'\right)^2}{D'} & \text{if } -D' = D < 0
\end{cases} = \begin{cases}
E^2 \left(\frac{E^2}{D'} + D' - 2 E\right) & \text{if } D' = D > 0 \\
E^2 \left(\frac{E^2}{D'} + D' + 2 E\right) & \text{if } -D' = D < 0
\end{cases}\]
Thus minimizing corresponds to positive \(D'\) that minimize
\[0 < \frac{\left(E - D'\right)^2}{D'} = \frac{E^2}{D'} + D' - 2 E \quad
\text{and} \quad
0 < \frac{\left(E + D'\right)^2}{D'} = \frac{E^2}{D'} + D' + 2 E.\]
Though \(\frac{E^2}{D'} + D' - 2 E < \frac{E^2}{D'} + D' + 2 E\) (since
\(E = p_1 + 1\) is always positive) we have, values of \(D'\) that
may be usable for one expression may not be for the other. This
is because we need to satisfy
\[0 \leq p_2 = \begin{cases}
2E - 1 - D' & \text{if } D' = D > 0 \\
2E - 1 + D' & \text{if } -D' = D < 0
\end{cases} \quad \text{and} \quad 0 \leq p_3 = \begin{cases}
\left(D' - \frac{E^2}{D'}\right) - 1 & \text{if } D' = D > 0 \\
\left(\frac{E^2}{D'} - D'\right) - 1 & \text{if } -D' = D < 0
\end{cases}.\]
These inequalities mean that
\[E < D' \leq 2E - 1 \text{ when } D' = D \quad \text{and} \quad
1 \leq D' < E \text{ when } D' = -D.\]
We know \(\frac{E^2}{D'} + D'\) is minimized at \(D' = E\) (by
AM-GM inequality, for example) but we've seen above that
\(D' = E\) is not allowed in either case. As \(D'\) moves out
from \(E\) in either direction, that value increases from the
minimum value \(2E\). Thus the best value of \(D'\) in either
\(\left[1, E\right)\) or \(\left(E, 2E - 1\right]\) is the
one closest to \(E\).
Since we must have \(D' \mid E^2\), there is a unique choice
\(D' \in \left[1, E\right)\) that is closest to \(E\). Conversely, the
equivalent value \(D'' = \frac{E^2}{D'}\) is the unique choice
\(D'' \in \left(E, E^2\right]\) that is closest to \(E\).
In the case that \(D'' \in \left(E, 2E - 1\right]\), then we must have
\[\frac{E^2}{D''} + D'' - 2E = D' + \frac{E^2}{D'} - 2E <
\frac{E^2}{D'} + D' + 2E.\]
Hence for a fixed \(E\), we have
\[\min \left|s_3\right| = \begin{cases}
E^2 \left(\frac{E^2}{D'} + D' - 2 E\right) & \text{if }
\exists E < D' \leq 2E - 1 \text{ such that } D' \mid E^2 \\
E^2 \left(\frac{E^2}{D'} + D' + 2 E\right) & \text{otherwise}
\end{cases}\]
Where the values of \(D'\) above are understood to be the factors of \(E^2\)
nearest to \(E\) (above and below).
We \textbf{claim} that \(\min \left|s_3\right| = 36\) and can check this
by considering all values \(1 \leq E \leq 6\).
\begin{itemize}
\item \(E = 1\): No possible values since
\(\left[1, E\right) = \left(E, 2E - 1\right] = \varnothing\)
\item \(E = 2\): Only possible values are \(1 = D' = \left[1, E\right)\) and
\(3 = D' \in \left(E, 2E - 1\right]\). But since \(3 \nmid E^2\) our only
choice is \(-D = D' = 1\). This yields
\[\left|s_3\right| = 2^2\left(\frac{2^2}{1} + 1 + 2(2)\right) = 36.\]
\item \(E = 3\): The only value in \(\left[1, E\right) = \left[1, 3\right)\)
dividing \(E^2 = 9\) is \(D' = 1\) and no value in
\(\left(E, 2E - 1\right] = \left(3, 5\right]\) divides \(E^2\).
Thus our only choice gives
\[\left|s_3\right| = 3^2\left(\frac{3^2}{1} + 1 + 2(3)\right) = 144.\]
\item \(E = 4\): The values in \(\left[1, E\right) = \left[1, 4\right)\)
dividing \(E^2 = 16\) are \(D' = 1, 2\) and no value in
\(\left(E, 2E - 1\right] = \left(4, 7\right]\) divides \(E^2\).
Thus our choice nearest \(E = 4\) gives
\[\left|s_3\right| = 4^2\left(\frac{4^2}{2} + 2 + 2(4)\right) = 288.\]
\item \(E = 5\): The only value in \(\left[1, E\right) = \left[1, 5\right)\)
dividing \(E^2 = 25\) is \(D' = 1\) and no value in
\(\left(E, 2E - 1\right] = \left(5, 9\right]\) divides \(E^2\).
Thus our only choice gives
\[\left|s_3\right| = 5^2\left(\frac{5^2}{1} + 1 + 2(5)\right) = 900.\]
\item \(E = 6\): The values in \(\left[1, E\right) = \left[1, 6\right)\)
dividing \(E^2 = 36\) are \(D' = 1, 2, 3, 4\) and the only value in
\(\left(E, 2E - 1\right] = \left(6, 11\right]\) dividing \(E^2\) is
\(D' = 9\). Thus we take the choice above \(E = 6\), which gives
\[\left|s_3\right| = 6^2\left(\frac{6^2}{9} + 9 - 2(6)\right) = 36.\]
\end{itemize}
Thus the minimum \(\left|s_3\right| = 36\) occurs when
\[E = 2, D = -D' = -1 \Longrightarrow p_1 = 1, p_2 = 2E - D - 1 = 4,
p_3 = D - 1 - \frac{E^2}{D} = 2\]
and
\[E = 6, D = D' = 9 \Longrightarrow p_1 = 5, p_2 = 2E - D - 1 = 2,
p_3 = D - 1 - \frac{E^2}{D} = 4\]
From this we confirm the solution(s) we've seen before:
\[\boxed{\beta_1 = -1 - 1 = -2, \qquad \beta_2 = -1 + 4 = 3, \qquad
\beta_3 = 1 - 1 + 4 + 2 = 6}.\]
and
\[\boxed{\beta_1 = -1 - 5 = -6, \qquad \beta_2 = -5 + 2 = -3, \qquad
\beta_3 = 1 - 5 + 2 + 4 = 2}.\]
\textbf{NOTE}: These are actually the same since multiplying by \(-1\)
preserves \(s_d = 0\) and also preserves the value of
\(\left|s_{d + 1}\right|\). If we seek to find a \textbf{unique}
solution we could require that
\[\left|\beta_1\right| \leq \left|\beta_n\right| \Longleftrightarrow
-\beta_1 \leq \beta_n \Longleftrightarrow 0 \leq \beta_1 + \beta_n\]
(noting that just as \(\beta_1 < 0\) is required for cancellation,
\(\beta_n > 0\) is also required).
\subsection{Example: $n = 4$}
We add to the above with
\[\beta_4 = 1 + \beta_3 + p_4 = 2 - p_1 + p_2 + p_3 + p_4\]
and
\begin{align*}
s_1 &= \left[- 3 p_1 + 2 p_2 + p_3\right] + \beta_4
= 2 - 4 p_1 + 3 p_2 + 3 p_2 + p_4 \\
s_2 &= \left[\beta_1 \beta_2 + \beta_2 \beta_3 + \beta_3 \beta_1\right]
+ \beta_4\left[\beta_1 + \beta_2 + \beta_3\right] \\
&= \left[3 p_{1}^{2} - 4 p_{1} p_{2} - 2 p_{1} p_{3} + p_{2}^{2} +
p_{2} p_{3} - p_{2} - p_{3} - 1\right] \\
&+ \left(2 - p_1 + p_2 + p_3 + p_4\right)\left[- 3 p_1 + 2 p_2 + p_3\right]
\end{align*}
(We'll leave out the rest for now.)
\paragraph{When \(d = 1\):} We have
\[s_1 = 0 \Longrightarrow \beta_4 = -\left(\beta_1 + \beta_2 + \beta_3\right)\]
hence
\begin{align*}
s_2 &= \left[\beta_1 \beta_2 + \beta_2 \beta_3 + \beta_3 \beta_1\right]
+ \beta_4\left[\beta_1 + \beta_2 + \beta_3\right] \\
&= \left(\beta_1 \beta_2 + \beta_2 \beta_3 + \beta_3 \beta_1\right)
- \left(\beta_1 + \beta_2 + \beta_3\right)^2 \\
&= -\left(\beta_1^2 + \beta_1 \beta_2 + \beta_1 \beta_3 +
\beta_2^2 + \beta_2 \beta_3 + \beta_3^2\right) \\
&= - \frac{1}{2} \left(\beta_1 + \beta_2\right)^2
- \frac{1}{2} \left(\beta_1 + \beta_3\right)^2
- \frac{1}{2} \left(\beta_2 + \beta_3\right)^2
\end{align*}
Luckily this means we can just deal with minimizing
\begin{align*}
2\left|s_2\right| &= \left(\beta_1 + \beta_2\right)^2
+ \left(\beta_1 + \beta_3\right)^2 + \left(\beta_2 + \beta_3\right)^2 \\
&= \left(- 2 p_{1} + p_{2} - 1\right)^{2} + \left(- 2 p_{1} + p_{2} +
p_{3}\right)^{2} + \left(- 2 p_{1} + 2 p_{2} + p_{3} + 1\right)^{2}.
\end{align*}
We set \(A = -2 p_1 + p_2\), this becomes
\[2\left|s_2\right| = \left(A - 1\right)^2 + \left(A + p_3\right)^2
+ \left(A + p_2 + p_3 + 1\right)^2\]
Since
\[\beta_4 = 3 p_{1} - 2 p_{2} - p_{3} \Longrightarrow p_4 =
4 p_{1} - 3 p_{2} - 2 p_{3} - 2.\]
\subsection{Algorithm: Advancing Front}
We use an advancing front
\begin{align*}
s_1 + \cdots + s_n &= 0 \\
s_1 + \cdots + s_n &= 1 \\
s_1 + \cdots + s_n &= 2 \\
& \vdots
\end{align*}
until some maximum value is reached.
As a proxy, when \(n = 2m\) is even, note that
\[\boldsymbol{\beta} = \left(-m, -(m - 1), \ldots, -1, 1, \ldots,
m - 1, m\right)\]
gives \(s_{2j - 1} = 0\) since multiplying each term
by \(-1\) does nothing to the sequence but changes the sum by
a factor of \((-1)^{2j - 1} = -1\).
Similarly for the odd sums when \(n = 2m + 1\) is odd, we can
use
\[\boldsymbol{\beta} = \left(-m, -(m - 1), \ldots, -1, 0,
1, \ldots, m - 1, m\right)\]
to get \(s_{2j - 1} = 0\).
Both of these will give use worst cases for \(\left|s_{2j}\right|\) and
will tell us when to \textbf{stop advancing our front}.
\section{Lemma: Symmetric Polynomial Order Reduction} \label{reduction-lemma}
For real values \(X_1 \leq \cdots \leq X_n\), define the
\(j\)th symmetric polynomial
\[s_j(X_1, \ldots, X_n) = \sum_{1 \leq i_1 < \cdots < i_j \leq n} X_{i_1}
\cdots X_{i_j}\]
\textit{Claim}: There exist \(n - 1\) points \(Y_1,
\ldots, Y_{n - 1}\) \textbf{between} the \(\left\{X_j\right\}\), i.e. satisfying
\[X_1 \leq Y_1 \leq X_2 \leq \cdots \leq X_{n - 1} \leq Y_{n - 1} \leq X_n,\]
such that
\[\frac{s_j(X_1, \ldots, X_n)}{\binom{n}{j}} = \frac{s_j(Y_1,
\ldots, Y_{n - 1})}{\binom{n - 1}{j}}.\]
\begin{proof}
We know the polynomial
\[g(X) = (X - X_1) \cdots (X - X_n)\]
takes the form
\begin{align*}
g(X) &= X^n - s_1(X_1, \ldots, X_n) X^{n - 1} + \cdots + (-1)^n
s_n(X_1, \ldots, X_n) \\
&= X^n + \sum_{i = 1}^n (-1)^i s_i(X_1, \ldots, X_n) X^{n - i}
\end{align*}
by Vieta's relations. Since \(g\) is degree \(n\) and has \(n\) real
roots we know that \(g'(X)\) has \(n - 1\) real roots. Between any two
distinct roots \(X_i < X_{i + 1}\), Rolle's Theorem tells us that there
exists \(X_i < Y_i < X_{i + 1}\) such that \(g'(Y_{i + 1}) = 0\). For
nondistinct \(X_i\), if \(X_i = \cdots = X_{i + r - 1}\) for some number
of repetitions \(r\) then \((X - X_i)^r\) divides \(g(X)\) hence \((X -
X_i)^{r - 1}\) divides \(g'(X)\). Thus, in the case the roots are distinct,
fully repeated, or somewhere in between, there are always \(r - 1\) roots
between \(X_i\) and \(X_{i + r - 1}\). Letting \(i = 1\), \(r = n\), we see
that we have roots \(Y_i\) of \(g'(X)\) satisfying:
\[X_1 \leq Y_1 \leq \cdots \leq Y_{n - 1} \leq X_n.\]
Since \(g'(X) = n X^{n - 1} + \cdots\) we know that
\[g'(X) = n (X - Y_1) \cdots (X - Y_{n - 1})\]
and by Vieta
\[\frac{g'(X)}{n} = X^{n - 1} - s_1\left(Y_1, \ldots, Y_{n - 1}\right) X^{n - 2} +
\cdots + (-1)^{n - 1} s_{n - 1}\left(Y_1, \ldots, Y_{n - 1}\right).\]
Instead, differentiating the original we have
\[g'(X) = n X^{n - 1} - (n - 1) s_1(X_1, \ldots, X_n) X^{n - 2} +
\cdots + (-1)^{n - 1} s_{n - 2}\left(X_1, \ldots, X_n\right).\]
So for \(1 \leq j \leq n - 1\) (i.e. \(s_j\) among \(s_1, \ldots, s_{n - 1}\)),
we have
\[(n - j) s_j\left(X_1, \ldots, X_n\right) = n s_j\left(Y_1, \ldots, Y_{n - 1}\right).\]
Noting that \((n - j) \binom{n}{j} = n \binom{n - 1}{j}\), we
conclude that
\[\frac{s_j(X_1, \ldots, X_n)}{\binom{n}{j}} = \frac{s_j(Y_1,
\ldots, Y_{n - 1})}{\binom{n - 1}{j}}. \tag*{\qedhere}\]
\end{proof}
\end{document}
Display the source blob
Display the rendered blob
Raw
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment