## Archive for December, 2008

### Distance between two lines

December 28, 2008

Suppose we are given two lines in $\mathbb{R}^n$, $\mathcal{L}_1$ and $\mathcal{L}_2$. Line $\mathcal{L}_1$ passes through the point $x_0 \in \mathbb{R}^n$ and its direction is given by vector $u \in \mathbb{R}^n \setminus \{0_n\}$

$\mathcal{L}_1 = \left\{ x_0 + \lambda u \mid \lambda \in \mathbb{R} \right\}$.

Line $\mathcal{L}_2$ passes through the point $y_0 \in \mathbb{R}^n$ and its direction is given by vector $v \in \mathbb{R}^n \setminus \{0_n\}$

$\mathcal{L}_2 = \left\{ y_0 + \gamma v \mid \gamma \in \mathbb{R} \right\}$.

What is the distance between lines $\mathcal{L}_1$ and $\mathcal{L}_2$?

__________

Lines as functions

Let vector-valued functions $x : \mathbb{R} \rightarrow \mathbb{R}^n$ and $y : \mathbb{R} \rightarrow \mathbb{R}^n$ be defined as

$x(\lambda) = x_0 + \lambda u, \quad{} y(\gamma) = y_0 + \gamma v$.

Lines $\mathcal{L}_1$ and $\mathcal{L}_2$ are thus the images of set $\mathbb{R}$ under functions $x : \mathbb{R} \rightarrow \mathbb{R}^n$ and $y : \mathbb{R} \rightarrow \mathbb{R}^n$, respectively

$\mathcal{L}_1 = \left\{ x(\lambda) \mid \lambda \in \mathbb{R} \right\}, \quad{} \mathcal{L}_2 = \left\{ y(\gamma) \mid \gamma \in \mathbb{R} \right\}$.

Function $x: \mathbb{R} \rightarrow \mathbb{R}^n$ maps each $\lambda \in \mathbb{R}$ into point $x(\lambda) \in \mathcal{L}_1$. Therefore, choosing a particular value of $\lambda$ corresponds to choosing a point on line $\mathcal{L}_1$, and vice versa (because function $x$ is injective). Thus, each point on the line $\mathcal{L}_1$ can be unambiguously specified by a value of $\lambda$. Similarly, following the same line of thought, each point on the line $\mathcal{L}_2$ can be unambiguously specified by a value of $\gamma$.

__________

Distance between two points

In order to compute the distance between lines $\mathcal{L}_1$ and $\mathcal{L}_2$, we first need to know how to compute the distance between two points in $\mathbb{R}^n$, i.e., we need to choose a metric on $\mathbb{R}^n$. For the sake of simplicity, we will use the Euclidean metric, i.e., we will use the 2-norm to measure lengths and distances. In other words, we will be working in normed vector space $(\mathbb{R}^n, \| \cdot \|_2)$.

Let us fix parameters $\lambda$ and $\gamma$, i.e., $\lambda = \bar{\lambda}$ and $\gamma = \bar{\gamma}$, and compute the distance between fixed points $x(\bar{\lambda})$ and $y(\bar{\gamma})$

$\textbf{dist} ( x(\bar{\lambda}), y(\bar{\gamma}) ) = \| x( \bar{\lambda} ) - y(\bar{\gamma} )\|_2$.

Pictorially, we have that $\textbf{dist} ( x(\bar{\lambda}), y(\bar{\gamma}) )$ is the length of the line segment connecting fixed points $x(\bar{\lambda})$ and $y(\bar{\gamma})$

To each pair of parameters $(\lambda, \gamma)$, it is assigned a non-negative scalar $\textbf{dist} ( x(\lambda), y(\gamma) )$

$(\lambda, \gamma) \mapsto ( x(\lambda), y(\gamma) ) \mapsto \textbf{dist} ( x(\lambda), y(\gamma) )$.

__________

Distance between a point and a line

Suppose now that we vary $\gamma$, while $\lambda$ remains fixed, $\lambda = \bar{\lambda}$. As we vary $\gamma$, we travel along line $\mathcal{L}_2$ and hence the distance between free point $y(\gamma) \in \mathcal{L}_2$ and fixed point $x(\bar{\lambda}) \in \mathcal{L}_1$ is given by

$\textbf{dist} (x(\bar{\lambda}), y(\gamma)) = \| x(\bar{\lambda}) - y(\gamma)\|_2$.

Since $\gamma$ is now free, $\textbf{dist} (x(\bar{\lambda}), y(\gamma))$ is a function of $\gamma$.

For example, let us pick three distinct values for $\gamma$, say $\gamma_1, \gamma_2, \gamma_3$. The image below depicts the line segments connecting fixed point $x(\bar{\lambda}) \in \mathcal{L}_1$ and points $y(\gamma_1), y(\gamma_2), y(\gamma_3) \in \mathcal{L}_2$. The length of these line segments gives us the distance between the fixed point and each of the three points on line $\mathcal{L}_2$.

The distance between fixed point $x(\bar{\lambda}) \in \mathcal{L}_1$ and line $\mathcal{L}_2$ is thus

$\textbf{dist} (x(\bar{\lambda}), \mathcal{L}_2) = \displaystyle \min_{\gamma \in \mathbb{R}} \| x(\bar{\lambda}) - y(\gamma) \|_2$.

We have an optimization problem: we would like to find the value of parameter $\gamma \in \mathbb{R}$ which minimizes function $\textbf{dist} (x(\bar{\lambda}), y(\gamma))$. This minimizer is (see appendix A)

$\gamma^* = \displaystyle \arg \min_{\gamma \in \mathbb{R}} \| x(\bar{\lambda}) - y(\gamma) \|_2 \Rightarrow \gamma^* = \displaystyle \frac{v^T ( x(\bar{\lambda}) - y_0 )}{ v^T v}$.

Evaluating function $y$ at $\gamma^*$ we obtain $y( \gamma^*)$, the point on line $\mathcal{L}_2$ that is closest to fixed point $x(\bar{\lambda}) \in \mathcal{L}_1$

$y( \gamma^*) = \displaystyle y_0 + P_v ( x(\bar{\lambda}) - y_0 )$,

where

$P_v = \displaystyle \frac{v v^T}{v^T v}$

is a symmetric, idempotent ($P_v^2 = P_v$) projection matrix, and $P_v ( x(\bar{\lambda}) - y_0 )$ is the orthogonal projection of vector $x(\bar{\lambda}) - y_0$ onto vector $v$. It can be shown that vectors  $x(\bar{\lambda}) - y( \gamma^*)$ and $v$ are orthogonal (see appendix B).

For example, let us consider the $n=3$ case: we have two lines in $\mathbb{R}^3$, $\mathcal{L}_1$ and $\mathcal{L}_2$, where $\mathcal{L}_2$ lies on plane $\mathcal{P}$. Pictorially, we have

We would like to find the point on line $\mathcal{L}_2$ that is closest to fixed point $x(\bar{\lambda}) \in \mathcal{L}_1$. Projecting vector $x(\bar{\lambda}) - y_0$ orthogonally onto vector $v$, we obtain

The image above illustrates rather nicely that vectors  $x(\bar{\lambda}) - y( \gamma^*)$ and $v$ are orthogonal (see appendix B). The distance between fixed point $x(\bar{\lambda})$ and line $\mathcal{L}_2$ is

$\textbf{dist} (x(\bar{\lambda}), \mathcal{L}_2) = \| x(\bar{\lambda}) - y( \gamma^*) \|_2$,

where $y( \gamma^*) = \displaystyle y_0 + P_v ( x(\bar{\lambda}) - y_0 )$. Please note that plane $\mathcal{P}$ serves no purpose other than to make it easier to visualize in 3-d (plane $\mathcal{P}$ gives an impression of depth). In other words, plane $\mathcal{P}$ has some “artistic value”, but no “mathematical value”.

__________

Distance between two lines

Let us now vary both $\lambda$ and $\gamma$. As we vary $\lambda$ and $\gamma$, we travel along lines $\mathcal{L}_1$ and $\mathcal{L}_2$, respectively. Since both parameters are free, the distance between free points $x(\lambda) \in \mathcal{L}_1$ and $y(\gamma) \in \mathcal{L}_2$ is

$\textbf{dist} (x(\lambda), y(\gamma)) = \| x(\lambda) - y(\gamma) \|_2$,

which is a function of both $\lambda$ and $\gamma$. The distance between lines $\mathcal{L}_1$ and $\mathcal{L}_2$ is thus

$\textbf{dist} (\mathcal{L}_1, \mathcal{L}_2) = \displaystyle \min_{(\lambda, \gamma) \in \mathbb{R}^2} \| x(\lambda) - y(\gamma) \|_2$.

The original problem of computing the distance between two lines has thus been cast as an optimization problem: we would like to find the pair $(\lambda, \gamma) \in \mathbb{R}^2$ which minimizes $\textbf{dist} (x(\lambda), y(\gamma))$. Once we have found the pair of minimizers $(\lambda^*, \gamma^*)$, the distance between $\mathcal{L}_1$ and $\mathcal{L}_2$ is

$\textbf{dist} (\mathcal{L}_1, \mathcal{L}_2) = \displaystyle \| x(\lambda^*) - y(\gamma^*) \|_2$.

The pair $(\lambda^*, \gamma^*)$ is not necessarily unique. For example, if the lines are parallel to each other, then there are infinitely many pairs of minimizers $(\lambda^*, \gamma^*)$.

Trying to find the pair $(\lambda, \gamma) \in \mathbb{R}^2$ which minimizes $\textbf{dist} (x(\lambda), y(\gamma))$ is a one-stage 2-dimensional optimization problem. However, we can solve this optimization problem in two stages, where at each stage we must solve a 1-dimensional optimization problem

$\textbf{dist} (\mathcal{L}_1, \mathcal{L}_2) = \displaystyle \min_{\lambda \in \mathbb{R}} \min_{\gamma \in \mathbb{R}} \| x(\lambda) - y(\gamma) \|_2$.

Note that

$\displaystyle \min_{\gamma \in \mathbb{R}} \| x(\lambda) - y(\gamma) \|_2 = \textbf{dist} (x(\lambda), \mathcal{L}_2)$

is the distance between point $x(\lambda)$ and line $\mathcal{L}_2$, and thus

$\textbf{dist} (\mathcal{L}_1, \mathcal{L}_2) = \displaystyle \min_{\lambda \in \mathbb{R}} \textbf{dist} (x(\lambda), \mathcal{L}_2)$.

We already know how to compute the distance between a point on line $\mathcal{L}_1$ and line $\mathcal{L}_2$: given a value of $\lambda$, the choice of $\gamma$ which minimizes $\textbf{dist} (x(\lambda), y(\gamma))$ is $\gamma = \Gamma (\lambda)$, where function $\Gamma : \mathbb{R} \rightarrow \mathbb{R}$ is defined as (see appendix A)

$\Gamma (\lambda) = \displaystyle \frac{v^T ( x(\lambda) - y_0 )}{ v^T v}$,

and the point on line $\mathcal{L}_2$ which is closest to $x(\lambda)$ is

$y(\Gamma (\lambda)) = y_0 + P_v \left( x(\lambda) - y_0 \right)$.

Therefore, for a given value of $\lambda$, we have that the distance between $x(\lambda)$ and line $\mathcal{L}_2$ is

$\textbf{dist} (x(\lambda), \mathcal{L}_2) = \textbf{dist} (x(\lambda), y(\Gamma(\lambda))) = \| x(\lambda) - y( \Gamma (\lambda)) \|_2$.

Note that $\textbf{dist} (x(\lambda), \mathcal{L}_2)$ is a function of $\lambda$ only, as we have already optimized function $\textbf{dist} (x(\lambda), y(\gamma))$ for $\gamma$. Finally, the distance between lines $\mathcal{L}_1$ and $\mathcal{L}_2$ can be found by solving a 1-dimensional optimization problem in $\lambda$

$\textbf{dist} (\mathcal{L}_1, \mathcal{L}_2) = \displaystyle \min_{\lambda \in \mathbb{R}} \|x(\lambda) - y( \Gamma (\lambda) )\|_2$.

Once we have found the $\lambda^*$ which minimizes the distance $\| x(\lambda) - y( \Gamma (\lambda)) \|_2$, the distance between lines $\mathcal{L}_1$ and $\mathcal{L}_2$ is simply

$\textbf{dist} (\mathcal{L}_1, \mathcal{L}_2) = \displaystyle \| x(\lambda^*) - y( \Gamma (\lambda^*)) \|_2$.

Note that:

• $\textbf{dist} (\mathcal{L}_1, \mathcal{L}_2) = 0$: lines $\mathcal{L}_1$ and $\mathcal{L}_2$ do intersect.
• $\textbf{dist} (\mathcal{L}_1, \mathcal{L}_2) > 0$: lines $\mathcal{L}_1$ and $\mathcal{L}_2$ do not intersect.

The value of $\textbf{dist} (\mathcal{L}_1, \mathcal{L}_2)$ can be written in terms of points $x_0, y_0$ and direction vectors $u, v$, as I will show on some future posts.

__________

The shortest distance game

It might seem a bit obscure that the one-stage 2-dimensional optimization problem

$\textbf{dist} (\mathcal{L}_1, \mathcal{L}_2) = \displaystyle \min_{(\lambda, \gamma) \in \mathbb{R}^2} \| x(\lambda) - y(\gamma) \|_2$

is equivalent to a two-stage optimization problem where at each stage one must solve a 1-dimensional optimization problem

$\textbf{dist} (\mathcal{L}_1, \mathcal{L}_2) = \displaystyle \min_{\lambda \in \mathbb{R}} \min_{\gamma \in \mathbb{R}} \| x(\lambda) - y(\gamma) \|_2$.

Personally, I like to look at this from a game-theoretic viewpoint. Let us play a sequential game! We shall call the shortest distance game. I play first, then you play:

1. I choose a value for $\lambda \in \mathbb{R}$, I tell you what $\lambda$ I chose.
2. You then choose $\gamma = \Gamma (\lambda)$ such that $\textbf{dist} (x(\lambda), y(\gamma))$ is minimized.

Note that you are required to play optimally, i.e., to minimize distance. Hence you have no freedom in the choice of $\gamma$: you choose $\gamma$ as a response to my choice of $\lambda$. Your choice is now coupled to mine, and thus, we are left with only one degree of freedom, the choice of $\lambda$. I now choose the value of $\lambda$ which minimizes $\| x(\lambda) - y( \Gamma (\lambda)) \|_2$

$\textbf{dist} (\mathcal{L}_1, \mathcal{L}_2) = \displaystyle \min_{\lambda \in \mathbb{R}} \| x(\lambda) - y( \Gamma (\lambda)) \|_2$.

Fortunately, I can use calculus to find the minimizer $\lambda^*$. Trying all values in $\mathbb{R}$ for parameter $\lambda$ would be rather tedious ;-)

__________

Appendix A

Note that the value of parameter $\gamma$ which minimizes the distance $\| x(\bar{\lambda}) - y(\gamma) \|_2$ is the same that minimizes the squared distance $\| x(\bar{\lambda}) - y(\gamma) \|_2^2$. Finding the minimizer of the squared norm is much easier, thus we introduce a cost function $C : \mathbb{R} \rightarrow [0, \infty)$ defined as

$C(\gamma) = \displaystyle \frac{1}{2} \| x(\bar{\lambda}) - y(\gamma) \|_2^2$.

Differentiating with respect to $\gamma$, we obtain

$\begin{array}{rl} \displaystyle \frac{d C}{d \gamma} (\gamma) &= \displaystyle ( x(\bar{\lambda}) - y(\gamma) )^T (- v)\\ &= \displaystyle v^T ( y(\gamma) - x(\bar{\lambda}) ) \\ &= \displaystyle v^T \left( \gamma v - ( x(\bar{\lambda}) - y_0 ) \right) \\ &=\displaystyle \|v\|_2^2 \gamma - v^T ( x(\bar{\lambda}) - y_0 ).\\ \end{array}$

The first derivative of $C(\cdot)$ must vanish at extrema. From the geometry of the problem, we know that there are no maxima (note that the cost function is unbounded above). Therefore we can find minima by checking the first derivative alone, without having to check the second derivative. We have

$\displaystyle \frac{d C}{d \gamma} (\gamma^*) = 0 \Rightarrow \displaystyle \|v\|_2^2 \gamma^* - v^T ( x(\bar{\lambda}) - y_0 ) = 0$,

and since $v \neq 0_n$, the minimizer is

$\gamma^* = \displaystyle \frac{v^T ( x(\bar{\lambda}) - y_0 )}{ v^T v}$.

Note that this minimizer depends on the value of fixed parameter $\bar{\lambda}$.

__________

Appendix B

We would like to show that vectors $x(\bar{\lambda}) - y( \gamma^*)$ and $v$ are orthogonal. Recall that $y( \gamma^*) = \displaystyle y_0 + P_v ( x(\bar{\lambda}) - y_0 )$, and thus

$\displaystyle x(\bar{\lambda}) - y( \gamma^*) = ( I_n - P_v) \left(x(\bar{\lambda}) - y_0\right)$.

Multiplying both sides on the left by $P_v$, we obtain

$\displaystyle P_v \left(x(\bar{\lambda}) - y( \gamma^*)\right) = P_v \left( I_n - P_v\right) \left(x(\bar{\lambda}) - y_0\right)$

and since $P_v (I_n - P_v) = P_v - P_v^2 = O_{n \times n}$ (recall that $P_v^2 = P_v$), then we finally have

$P_v \left( x(\bar{\lambda}) - y( \gamma^*) \right) = 0_n$,

which means that vectors $x(\bar{\lambda}) - y( \gamma^*)$ and $v$ are orthogonal.

__________

Related:

### Matrix decompositions with sparsity constraints

December 12, 2008

Suppose we are given a finite square matrix $A \in \mathbb{R}^{n \times n}$. We would like to factorize $A$ into $N$ (where $N$ is finite) matrices

$A = W_{N-1} W_{N-2} \ldots W_ {2} W_{1} W_{0}$,

where factors $W_0, W_1, \ldots, W_{N-1} \in \mathbb{R}^{n \times n}$ are required to have a certain sparsity pattern, i.e., a subset of their $n^2$ entries must be zero.

Definition: Let $\mathcal{I}_n = \{1, 2, \ldots, n\}$ and $\mathcal{I}_n^2 = \mathcal{I}_n \times \mathcal{I}_n$. A sparsity pattern $\mathcal{S}_n$ is a set of ordered pairs $(i,j) \in \mathcal{I}_n^2$ that indicate which entries of a $n \times n$ matrix are constrained to be zero.

Note that $\mathcal{S}_n \subseteq \mathcal{I}_n^2$. If there are no sparsity constraints, then $\mathcal{S}_n$ is an empty set. Set $\mathcal{I}_n^2 \setminus \mathcal{S}_n$ contains the indices of the entries of a $n \times n$ matrix which are unconstrained, i.e., “free”.

__________

Problem: Given a matrix $A \in \mathbb{R}^{n \times n}$ and a particular sparsity pattern $\mathcal{S}_n$, how can we find if it is possible to factorize $A$ into $N$ factor matrices (where $N$ is finite) whose sparsity pattern is the specified one? If such factorization is possible, how can we compute each factor $W_1, W_2, \ldots, W_{N-1}$ and $N$?

__________

This problem is easy to pose, but I have found it very hard to attack. I have tried various approaches… and none worked. Does anyone have any ideas?

__________

Example: suppose we are given a $5 \times 5$ real matrix

$A = \left[\begin{array}{ccccc} 1 & 1 & 1 & 1 & 1\\ 1 & 1 & 1 & 1 & 1\\ 1 & 1 & 1 & 1 & 1\\ 1 & 1 & 1 & 1 & 1\\ 1 & 1 & 1 & 1 & 1\\ \end{array}\right]$.

We would like to factorize matrix $A$ into $N$ factors $\{W_k\}_{k =0}^{N-1} \in \mathbb{R}^{5 \times 5}$ whose sparsity pattern is represented pictorially by

$\left[\begin{array}{ccccc} \times & \times & 0 & 0 & 0\\ \times & \times & \times & 0 & 0\\ 0 & \times & \times & \times & 0\\ 0 & 0 & \times & \times & \times\\ 0 & 0 & 0 & \times & \times\\\end{array}\right]$,

where a “$\times$” in the $(i,j)$ entry means that the $(i,j)$ entry is unconstrained, i.e., it is allowed to take any value in $\mathbb{R}$. If you are familiar with numerical PDE’s, you certainly encountered matrices with this sparsity pattern when solving diffusion or wave equations in one spatial dimension using finite difference methods.

__________

Even though this example seems fairly “simple” (it’s only a $5 \times 5$ matrix after all!), I have not yet found an approach that works.

### The Connection Machine

December 11, 2008

One day when I was having lunch with Richard Feynman, I mentioned to him that I was planning to start a company to build a parallel computer with a million processors. His reaction was unequivocal, “That is positively the dopiest idea I ever heard.” For Richard a crazy idea was an opportunity to either prove it wrong or prove it right. Either way, he was interested.

Danny Hillis [1]

When Danny Hillis (b. 1956) was a graduate student at MIT‘s Artificial Intelligence (AI) lab in the early 1980s, he built a massively parallel supercomputer which he named the Connection Machine. At the time, computers were still somewhat ENIAC-like: sequential, single-processor machines. By contrast, the Connection Machine had over 65,000 processors! The Connection Machine was the topic of Danny Hillis’ PhD thesis [2], supervised by Prof. Gerald Jay Sussman.

[ Connection Machine 2 ]

Danny Hillis‘ motivation to build such a parallel computer stemmed from his desire to explore new paradigms of computation outside the traditional von Neumann architecture. Hillis believed that Computer Science needed fresh ideas, and he turned to Physics as a source of inspiration [3].

Although the Connection Machine was initially intended for applications in Artificial Intelligence, later versions of it (CM-2, CM-5) were used with remarkable success in Computational Physics (e.g., Lattice QCD [4]).

In 1983 Danny Hillis co-founded a company named Thinking Machines Corporation (TMC)  in order to exploit possible commercial applications of the Connection Machine paradigm. Folly, poor business sense, and competition from established supercomputer firms forced TMC into bankruptcy in 1993 [5].

__________

References:

[1] Danny Hillis, Richard Feynman and the Connection Machine.

[2] Danny Hillis, The Connection Machine, Ph.D. thesis, Department of Electrical Engineering and Computer Science (EECS), MIT, 1988.

[3] Danny Hillis, New Computer Architectures and their relationship to Physics, International Journal of Theoretical Physics, Vol 21, Nos. 3/4, 1982.

[4] Geoffrey C. Fox, Roy D. Williams, Paul C. Messina, Parallel Computing Works: QCD on the Connection Machine, Morgan Kaufmann Publishers, 1994.

[5] Gary A. Taubes, The Rise and Fall of Thinking Machines, Inc.com, September 15, 1995.

### Building a polynomial from its roots II

December 10, 2008

Suppose we are given an $n$-tuple of (not necessarily distinct) complex numbers $(w_1, w_2, \ldots, w_n)$ and we construct a monic univariate polynomial of degree $n$ over field $\mathbb{C}$ whose $n$ roots are the elements of $(w_1, w_2, \ldots, w_n)$

$p_n(z) = \displaystyle\prod_{i=1}^n (z - w_i) = \displaystyle\sum_{m=0}^n a(m,n) z^m$,

where $a(m,n) \in \mathbb{C}$ is the coefficient of monomial $z^m$ of degree $m$ in polynomial $p_n(z)$ of degree $n$. This double-indexing might seem a bit confusing, but its usefulness will be made evident later on.

Usually, we take a univariate polynomial and try to find its roots. Now we are interested in the reverse operation, i.e., we would like to find the values of the coefficients $\{a(m,n)\}_{m=0}^{n}$ of polynomial $p_n(z)$ in terms of the roots $(w_1, w_2, \ldots, w_n)$. Note that every permutation of the $n$-tuple $(w_1, w_2, \ldots, w_n)$ results in the same monic polynomial of degree $n$.

_____

Constructing polynomials

Say we know the coefficients of polynomial $p_k(z)$ of degree $k$. Then we can write the coefficients $\{a(m, k+1)\}_{m=0}^{k+1}$ of polynomial $p_{k+1}(z)$ in terms of the coefficients  $\{a(m, k)\}_{m=0}^{k}$ and the $(k+1)$-th element of $n$-tuple $(w_1, w_2, \ldots, w_n)$. On my previous post on this topic, I derived the recursion

$a(m,k+1) = \displaystyle\left\{\begin{array}{rl} - w_{k+1} a(0,k) & \text{if} \quad{} m = 0\\ a(m-1,k) - w_{k+1} a(m,k) & \text{if} \quad{} 1 \leq m \leq k\\ a(k,k) & \text{if} \quad{} m=k+1\\ \end{array}\right.$

Let us start with a monic polynomial of degree zero, $p_0(z) = 1$, whose only coefficient is $a(0,0) = 1$. We take the first element of the $n$-tuple $(w_1, w_2, \ldots, w_n)$ and construct the monic polynomial $p_1(z) = z - w_1$ of degree $1$ using the recursion above. Then we take the 2nd root $w_2$ and construct the monic polynomial of degree $2$ whose roots are $w_1, w_2$

$p_2(z) = z^2 - (w_1 + w_2) z + w_1 w_2$.

Iterating successively, we obtain a monic polynomial $p_n(z)$ of degree $n$ whose roots are $w_1, w_2, \ldots, w_n$.

_____

A graphical approach

Consider the following set of 15 nodes arranged in 5 columns and 5 rows. Each node is labelled with an ordered pair $(i,j)$, where $i$ is the row index and $j$ is the column index.

We now attach a complex number $a(i,j)$ to each node $(i,j)$. The coefficient of monomial $z^m$ in polynomial $p_n(z)$ can be thought of as the number attached to node $(m,n)$. For example, the coefficients of polynomial $p_2(z)$ are the numbers attached to the nodes of the 3rd column (note that we count rows and columns starting at the zero index, like in the C programming language). The double-indexing now makes some sense ;-)

Finally, we add directed edges linking nodes in adjacent columns, and obtain the following directed graph (which reminds me of the trellis diagrams used in convolutional coding)

Recall the recursion formula which allows us to write the coefficients of polynomial $p_{k+1}(z)$ in terms of the coefficients of polynomial $p_{k}(z)$

$a(m,k+1) = \displaystyle\left\{\begin{array}{rl} - w_{k+1} a(0,k) & \text{if} \quad{} m = 0\\ a(m-1,k) - w_{k+1} a(m,k) & \text{if} \quad{} 1 \leq m \leq k\\ a(k,k) & \text{if} \quad{} m=k+1\\ \end{array}\right.$.

Since the coefficients $a(\cdot, \cdot)$ are attached to the nodes of the directed graph depicted above, we can use the recursion formula to compute the values attached to the nodes of a given column as a function of the values attached to the nodes in the column to the left.

This graphical approach is (most likely) of no practical use. Nonetheless, this approach does reveal the “structure” of the process of constructing a polynomial from its roots, and such structure is of much greater beauty than cumbersome algebraic manipulation.

_____

An example

Suppose we want to compute the coefficients of the monic polynomial $p_4(z) = (z - 1)^4$, which has a root of multiplicity $4$ at $z = 1$. In this case we have $w_1 = w_2 = w_3 = w_4 = 1$. From the well-known binomial theorem, we know that

$p_4(z) = z^4 - 4 z^3 + 6 z^2 - 4 z + 1$,

and therefore, the coefficients are $a(0,4) = 1$, $a(1,4) = -4$, $a(2,4) = 6$, $a(3,4) = -4$, and $a(4,4) = 1$. Note that the absolute values of these coefficients are the binomial coefficients and can be found in the 5th row of Pascal’s triangle.

Let us now construct this polynomial one step at a time using the graphical approach described above. We start with monic polynomial $p_0(z) = 1$, whose only coefficient is $a(0,0) = 1$

and then we compute the coefficients of the monic polynomial $p_1(z) = z-1$ using the recursion formula. Graphically, this means that the values of the 2 nodes in the 2nd column can be computed from $a(0,0)$, as follows

where the directed edges now have weights. For example, the edge linking nodes $(0,0)$ and $(0,1)$ has weight $-1$, while the edge linking $(0,0)$ and $(1,1)$ has weight $1$. If the weight of an edge is equal to $1$, then we do not label the edge in order to avoid cluttering the diagram.

From the coefficients of $p_1(z)$ and the second root, $w_2 = 1$, we can compute the coefficients of $p_2(z)$ using the recursion again, i.e., the values of the nodes in the 3rd column can be computed from the values in the nodes in the 2nd column, as follows

The values of the nodes in the 3rd column are known, and since the 3rd root is $w_3 = 1$ we can now obtain the coefficients of $p_3(z)$

and since $w_4 = 1$, we can finally obtain the coefficients of the monic polynomial $p_4(z)$, which are the values of the nodes in the 5th column

In this example the roots I chose have zero imaginary parts, but this approach still works if the roots have nonzero imaginary parts. This method could be used to compute the coefficients of polynomials of degrees larger than $4$, of course.