Archive for December, 2008

Distance between two lines

December 28, 2008

Suppose we are given two lines in \mathbb{R}^n, \mathcal{L}_1 and \mathcal{L}_2. Line \mathcal{L}_1 passes through the point x_0 \in \mathbb{R}^n and its direction is given by vector u \in \mathbb{R}^n \setminus \{0_n\}

\mathcal{L}_1 = \left\{ x_0 + \lambda u \mid \lambda \in \mathbb{R} \right\}.

Line \mathcal{L}_2 passes through the point y_0 \in \mathbb{R}^n and its direction is given by vector v \in \mathbb{R}^n \setminus \{0_n\}

\mathcal{L}_2 = \left\{ y_0 + \gamma v \mid \gamma \in \mathbb{R} \right\}.

What is the distance between lines \mathcal{L}_1 and \mathcal{L}_2?

__________

Lines as functions

Let vector-valued functions x : \mathbb{R} \rightarrow \mathbb{R}^n and y : \mathbb{R} \rightarrow \mathbb{R}^n be defined as

x(\lambda) = x_0 + \lambda u, \quad{} y(\gamma) = y_0 + \gamma v.

Lines \mathcal{L}_1 and \mathcal{L}_2 are thus the images of set \mathbb{R} under functions x : \mathbb{R} \rightarrow \mathbb{R}^n and y : \mathbb{R} \rightarrow \mathbb{R}^n, respectively

\mathcal{L}_1 = \left\{ x(\lambda) \mid \lambda \in \mathbb{R} \right\}, \quad{} \mathcal{L}_2 = \left\{ y(\gamma) \mid \gamma \in \mathbb{R} \right\}.

Function x: \mathbb{R} \rightarrow \mathbb{R}^n maps each \lambda \in  \mathbb{R} into point x(\lambda) \in \mathcal{L}_1. Therefore, choosing a particular value of \lambda corresponds to choosing a point on line \mathcal{L}_1, and vice versa (because function x is injective). Thus, each point on the line \mathcal{L}_1 can be unambiguously specified by a value of \lambda. Similarly, following the same line of thought, each point on the line \mathcal{L}_2 can be unambiguously specified by a value of \gamma.

__________

Distance between two points

In order to compute the distance between lines \mathcal{L}_1 and \mathcal{L}_2, we first need to know how to compute the distance between two points in \mathbb{R}^n, i.e., we need to choose a metric on \mathbb{R}^n. For the sake of simplicity, we will use the Euclidean metric, i.e., we will use the 2-norm to measure lengths and distances. In other words, we will be working in normed vector space (\mathbb{R}^n,  \| \cdot \|_2).

Let us fix parameters \lambda and \gamma, i.e., \lambda = \bar{\lambda} and \gamma = \bar{\gamma}, and compute the distance between fixed points x(\bar{\lambda}) and y(\bar{\gamma})

\textbf{dist} ( x(\bar{\lambda}), y(\bar{\gamma}) ) = \| x( \bar{\lambda} ) - y(\bar{\gamma} )\|_2.

Pictorially, we have that \textbf{dist} ( x(\bar{\lambda}), y(\bar{\gamma}) ) is the length of the line segment connecting fixed points x(\bar{\lambda}) and y(\bar{\gamma})

Distance between 2 fixed points

To each pair of parameters (\lambda, \gamma), it is assigned a non-negative scalar \textbf{dist} ( x(\lambda), y(\gamma) )

(\lambda, \gamma) \mapsto ( x(\lambda), y(\gamma) ) \mapsto \textbf{dist} ( x(\lambda), y(\gamma) ).

__________

Distance between a point and a line

Suppose now that we vary \gamma, while \lambda remains fixed, \lambda = \bar{\lambda}. As we vary \gamma, we travel along line \mathcal{L}_2 and hence the distance between free point y(\gamma) \in \mathcal{L}_2 and fixed point x(\bar{\lambda}) \in \mathcal{L}_1 is given by

\textbf{dist} (x(\bar{\lambda}), y(\gamma)) = \| x(\bar{\lambda}) - y(\gamma)\|_2.

Since \gamma is now free, \textbf{dist} (x(\bar{\lambda}), y(\gamma)) is a function of \gamma.

For example, let us pick three distinct values for \gamma, say \gamma_1, \gamma_2, \gamma_3. The image below depicts the line segments connecting fixed point x(\bar{\lambda}) \in \mathcal{L}_1 and points y(\gamma_1), y(\gamma_2), y(\gamma_3) \in \mathcal{L}_2. The length of these line segments gives us the distance between the fixed point and each of the three points on line \mathcal{L}_2.

Distances between a fixed point in L1 and three points in L2

The distance between fixed point x(\bar{\lambda}) \in \mathcal{L}_1 and line \mathcal{L}_2 is thus

\textbf{dist} (x(\bar{\lambda}), \mathcal{L}_2) = \displaystyle \min_{\gamma \in \mathbb{R}} \| x(\bar{\lambda}) - y(\gamma) \|_2.

We have an optimization problem: we would like to find the value of parameter \gamma \in \mathbb{R} which minimizes function \textbf{dist} (x(\bar{\lambda}), y(\gamma)). This minimizer is (see appendix A)

\gamma^* = \displaystyle \arg \min_{\gamma \in \mathbb{R}} \| x(\bar{\lambda}) - y(\gamma) \|_2 \Rightarrow \gamma^* = \displaystyle \frac{v^T  ( x(\bar{\lambda}) - y_0 )}{ v^T v}.

Evaluating function y at \gamma^* we obtain y( \gamma^*), the point on line \mathcal{L}_2 that is closest to fixed point x(\bar{\lambda}) \in \mathcal{L}_1

y( \gamma^*) = \displaystyle y_0 + P_v  ( x(\bar{\lambda}) - y_0 ),

where

P_v = \displaystyle \frac{v v^T}{v^T v}

is a symmetric, idempotent (P_v^2 = P_v) projection matrix, and P_v ( x(\bar{\lambda}) - y_0 ) is the orthogonal projection of vector x(\bar{\lambda}) - y_0 onto vector v. It can be shown that vectors  x(\bar{\lambda}) - y( \gamma^*) and v are orthogonal (see appendix B).

For example, let us consider the n=3 case: we have two lines in \mathbb{R}^3, \mathcal{L}_1 and \mathcal{L}_2, where \mathcal{L}_2 lies on plane \mathcal{P}. Pictorially, we have

Two lines in 3-space

We would like to find the point on line \mathcal{L}_2 that is closest to fixed point x(\bar{\lambda}) \in \mathcal{L}_1. Projecting vector x(\bar{\lambda}) - y_0 orthogonally onto vector v, we obtain

Two lines in 3-space - distance between fixed point in L1 and L2The image above illustrates rather nicely that vectors  x(\bar{\lambda}) - y( \gamma^*) and v are orthogonal (see appendix B). The distance between fixed point x(\bar{\lambda}) and line \mathcal{L}_2 is

\textbf{dist} (x(\bar{\lambda}), \mathcal{L}_2) = \| x(\bar{\lambda}) - y( \gamma^*) \|_2,

where y( \gamma^*) = \displaystyle y_0 + P_v  ( x(\bar{\lambda}) - y_0 ). Please note that plane \mathcal{P} serves no purpose other than to make it easier to visualize in 3-d (plane \mathcal{P} gives an impression of depth). In other words, plane \mathcal{P} has some “artistic value”, but no “mathematical value”.

__________

Distance between two lines

Let us now vary both \lambda and \gamma. As we vary \lambda and \gamma, we travel along lines \mathcal{L}_1 and \mathcal{L}_2, respectively. Since both parameters are free, the distance between free points x(\lambda) \in \mathcal{L}_1 and y(\gamma) \in \mathcal{L}_2 is

\textbf{dist} (x(\lambda), y(\gamma)) = \| x(\lambda) - y(\gamma) \|_2,

which is a function of both \lambda and \gamma. The distance between lines \mathcal{L}_1 and \mathcal{L}_2 is thus

\textbf{dist} (\mathcal{L}_1, \mathcal{L}_2) = \displaystyle \min_{(\lambda, \gamma) \in \mathbb{R}^2} \| x(\lambda) - y(\gamma) \|_2.

The original problem of computing the distance between two lines has thus been cast as an optimization problem: we would like to find the pair (\lambda, \gamma) \in \mathbb{R}^2 which minimizes \textbf{dist} (x(\lambda), y(\gamma)). Once we have found the pair of minimizers (\lambda^*, \gamma^*), the distance between \mathcal{L}_1 and \mathcal{L}_2 is

\textbf{dist} (\mathcal{L}_1, \mathcal{L}_2) = \displaystyle \| x(\lambda^*) - y(\gamma^*) \|_2.

The pair (\lambda^*, \gamma^*) is not necessarily unique. For example, if the lines are parallel to each other, then there are infinitely many pairs of minimizers (\lambda^*, \gamma^*).

Trying to find the pair (\lambda, \gamma) \in \mathbb{R}^2 which minimizes \textbf{dist} (x(\lambda), y(\gamma)) is a one-stage 2-dimensional optimization problem. However, we can solve this optimization problem in two stages, where at each stage we must solve a 1-dimensional optimization problem

\textbf{dist} (\mathcal{L}_1, \mathcal{L}_2) = \displaystyle \min_{\lambda \in \mathbb{R}} \min_{\gamma \in \mathbb{R}} \| x(\lambda) - y(\gamma) \|_2.

Note that

\displaystyle \min_{\gamma \in \mathbb{R}} \| x(\lambda) - y(\gamma) \|_2 = \textbf{dist} (x(\lambda), \mathcal{L}_2)

is the distance between point x(\lambda) and line \mathcal{L}_2, and thus

\textbf{dist} (\mathcal{L}_1, \mathcal{L}_2) = \displaystyle \min_{\lambda \in \mathbb{R}} \textbf{dist} (x(\lambda), \mathcal{L}_2).

We already know how to compute the distance between a point on line \mathcal{L}_1 and line \mathcal{L}_2: given a value of \lambda, the choice of \gamma which minimizes \textbf{dist} (x(\lambda), y(\gamma)) is \gamma = \Gamma (\lambda), where function \Gamma : \mathbb{R} \rightarrow \mathbb{R} is defined as (see appendix A)

\Gamma (\lambda) =  \displaystyle \frac{v^T  ( x(\lambda) - y_0 )}{ v^T v},

and the point on line \mathcal{L}_2 which is closest to x(\lambda) is

y(\Gamma (\lambda)) = y_0 + P_v \left( x(\lambda) - y_0 \right).

Therefore, for a given value of \lambda, we have that the distance between x(\lambda) and line \mathcal{L}_2 is

\textbf{dist} (x(\lambda), \mathcal{L}_2) = \textbf{dist} (x(\lambda), y(\Gamma(\lambda))) = \| x(\lambda) - y( \Gamma (\lambda)) \|_2.

Note that \textbf{dist} (x(\lambda), \mathcal{L}_2) is a function of \lambda only, as we have already optimized function \textbf{dist} (x(\lambda), y(\gamma)) for \gamma. Finally, the distance between lines \mathcal{L}_1 and \mathcal{L}_2 can be found by solving a 1-dimensional optimization problem in \lambda

\textbf{dist} (\mathcal{L}_1, \mathcal{L}_2) = \displaystyle \min_{\lambda \in \mathbb{R}} \|x(\lambda) - y( \Gamma (\lambda) )\|_2.

Once we have found the \lambda^* which minimizes the distance \| x(\lambda) - y( \Gamma (\lambda)) \|_2, the distance between lines \mathcal{L}_1 and \mathcal{L}_2 is simply

\textbf{dist} (\mathcal{L}_1, \mathcal{L}_2) = \displaystyle \| x(\lambda^*) - y( \Gamma (\lambda^*)) \|_2.

Note that:

  • \textbf{dist} (\mathcal{L}_1, \mathcal{L}_2) = 0: lines \mathcal{L}_1 and \mathcal{L}_2 do intersect.
  • \textbf{dist} (\mathcal{L}_1, \mathcal{L}_2) > 0: lines \mathcal{L}_1 and \mathcal{L}_2 do not intersect.

The value of \textbf{dist} (\mathcal{L}_1, \mathcal{L}_2) can be written in terms of points x_0, y_0 and direction vectors u, v, as I will show on some future posts.

__________

The shortest distance game

It might seem a bit obscure that the one-stage 2-dimensional optimization problem

\textbf{dist} (\mathcal{L}_1, \mathcal{L}_2) = \displaystyle \min_{(\lambda, \gamma) \in \mathbb{R}^2} \| x(\lambda) - y(\gamma) \|_2

is equivalent to a two-stage optimization problem where at each stage one must solve a 1-dimensional optimization problem

\textbf{dist} (\mathcal{L}_1, \mathcal{L}_2) = \displaystyle \min_{\lambda \in \mathbb{R}} \min_{\gamma \in \mathbb{R}} \| x(\lambda) - y(\gamma) \|_2.

Personally, I like to look at this from a game-theoretic viewpoint. Let us play a sequential game! We shall call the shortest distance game. I play first, then you play:

  1. I choose a value for \lambda \in \mathbb{R}, I tell you what \lambda I chose.
  2. You then choose \gamma = \Gamma (\lambda) such that \textbf{dist} (x(\lambda), y(\gamma)) is minimized.

Note that you are required to play optimally, i.e., to minimize distance. Hence you have no freedom in the choice of \gamma: you choose \gamma as a response to my choice of \lambda. Your choice is now coupled to mine, and thus, we are left with only one degree of freedom, the choice of \lambda. I now choose the value of \lambda which minimizes \| x(\lambda) - y( \Gamma (\lambda)) \|_2

\textbf{dist} (\mathcal{L}_1, \mathcal{L}_2) = \displaystyle \min_{\lambda \in \mathbb{R}} \| x(\lambda) - y( \Gamma (\lambda)) \|_2.

Fortunately, I can use calculus to find the minimizer \lambda^*. Trying all values in \mathbb{R} for parameter \lambda would be rather tedious ;-)

__________

Appendix A

Note that the value of parameter \gamma which minimizes the distance \| x(\bar{\lambda}) - y(\gamma) \|_2 is the same that minimizes the squared distance \| x(\bar{\lambda}) - y(\gamma) \|_2^2. Finding the minimizer of the squared norm is much easier, thus we introduce a cost function C : \mathbb{R} \rightarrow [0, \infty) defined as

C(\gamma) = \displaystyle \frac{1}{2} \| x(\bar{\lambda}) - y(\gamma) \|_2^2.

Differentiating with respect to \gamma, we obtain

\begin{array}{rl} \displaystyle \frac{d C}{d \gamma} (\gamma) &= \displaystyle ( x(\bar{\lambda}) - y(\gamma) )^T (- v)\\ &=  \displaystyle v^T ( y(\gamma) - x(\bar{\lambda}) ) \\ &= \displaystyle v^T \left( \gamma v - ( x(\bar{\lambda}) - y_0 ) \right) \\ &=\displaystyle \|v\|_2^2 \gamma - v^T ( x(\bar{\lambda}) - y_0 ).\\ \end{array}

The first derivative of C(\cdot) must vanish at extrema. From the geometry of the problem, we know that there are no maxima (note that the cost function is unbounded above). Therefore we can find minima by checking the first derivative alone, without having to check the second derivative. We have

\displaystyle \frac{d C}{d \gamma} (\gamma^*) = 0 \Rightarrow \displaystyle \|v\|_2^2 \gamma^* - v^T ( x(\bar{\lambda}) - y_0 ) = 0,

and since v \neq 0_n, the minimizer is

\gamma^* =  \displaystyle \frac{v^T  ( x(\bar{\lambda}) - y_0 )}{ v^T v}.

Note that this minimizer depends on the value of fixed parameter \bar{\lambda}.

__________

Appendix B

We would like to show that vectors x(\bar{\lambda}) - y( \gamma^*) and v are orthogonal. Recall that y( \gamma^*) = \displaystyle y_0 + P_v ( x(\bar{\lambda}) - y_0 ), and thus

\displaystyle x(\bar{\lambda}) - y( \gamma^*) = ( I_n - P_v) \left(x(\bar{\lambda}) - y_0\right).

Multiplying both sides on the left by P_v, we obtain

\displaystyle P_v \left(x(\bar{\lambda}) - y( \gamma^*)\right) = P_v \left( I_n - P_v\right) \left(x(\bar{\lambda}) - y_0\right)

and since P_v (I_n - P_v) = P_v - P_v^2 = O_{n \times n} (recall that P_v^2 = P_v), then we finally have

P_v \left( x(\bar{\lambda}) - y( \gamma^*) \right) = 0_n,

which means that vectors x(\bar{\lambda}) - y( \gamma^*) and v are orthogonal.

__________

Related:

Matrix decompositions with sparsity constraints

December 12, 2008

Suppose we are given a finite square matrix A \in \mathbb{R}^{n \times n}. We would like to factorize A into N (where N is finite) matrices

A = W_{N-1} W_{N-2} \ldots W_ {2} W_{1} W_{0},

where factors W_0, W_1, \ldots, W_{N-1}  \in \mathbb{R}^{n \times n} are required to have a certain sparsity pattern, i.e., a subset of their n^2 entries must be zero.

Definition: Let \mathcal{I}_n = \{1, 2, \ldots, n\} and \mathcal{I}_n^2 = \mathcal{I}_n \times \mathcal{I}_n. A sparsity pattern \mathcal{S}_n is a set of ordered pairs (i,j) \in \mathcal{I}_n^2 that indicate which entries of a n \times n matrix are constrained to be zero.

Note that \mathcal{S}_n \subseteq \mathcal{I}_n^2. If there are no sparsity constraints, then \mathcal{S}_n is an empty set. Set \mathcal{I}_n^2 \setminus \mathcal{S}_n contains the indices of the entries of a n \times n matrix which are unconstrained, i.e., “free”.

__________

Problem: Given a matrix A \in \mathbb{R}^{n \times n} and a particular sparsity pattern \mathcal{S}_n, how can we find if it is possible to factorize A into N factor matrices (where N is finite) whose sparsity pattern is the specified one? If such factorization is possible, how can we compute each factor W_1, W_2, \ldots, W_{N-1} and N?

__________

This problem is easy to pose, but I have found it very hard to attack. I have tried various approaches… and none worked. Does anyone have any ideas?

__________

Example: suppose we are given a 5 \times 5 real matrix

A = \left[\begin{array}{ccccc} 1 & 1 & 1 & 1 & 1\\ 1 & 1 & 1 & 1 & 1\\ 1 & 1 & 1 & 1 & 1\\ 1 & 1 & 1 & 1 & 1\\ 1 & 1 & 1 & 1 & 1\\ \end{array}\right].

We would like to factorize matrix A into N factors \{W_k\}_{k =0}^{N-1} \in \mathbb{R}^{5 \times 5} whose sparsity pattern is represented pictorially by

\left[\begin{array}{ccccc} \times & \times & 0 & 0 & 0\\ \times & \times & \times & 0 & 0\\ 0 & \times & \times & \times & 0\\ 0 & 0 & \times & \times & \times\\ 0 & 0 & 0 & \times & \times\\\end{array}\right],

where a “\times” in the (i,j) entry means that the (i,j) entry is unconstrained, i.e., it is allowed to take any value in \mathbb{R}. If you are familiar with numerical PDE’s, you certainly encountered matrices with this sparsity pattern when solving diffusion or wave equations in one spatial dimension using finite difference methods.

__________

Even though this example seems fairly “simple” (it’s only a 5 \times 5 matrix after all!), I have not yet found an approach that works.

The Connection Machine

December 11, 2008

One day when I was having lunch with Richard Feynman, I mentioned to him that I was planning to start a company to build a parallel computer with a million processors. His reaction was unequivocal, “That is positively the dopiest idea I ever heard.” For Richard a crazy idea was an opportunity to either prove it wrong or prove it right. Either way, he was interested.

Danny Hillis [1]

When Danny Hillis (b. 1956) was a graduate student at MIT‘s Artificial Intelligence (AI) lab in the early 1980s, he built a massively parallel supercomputer which he named the Connection Machine. At the time, computers were still somewhat ENIAC-like: sequential, single-processor machines. By contrast, the Connection Machine had over 65,000 processors! The Connection Machine was the topic of Danny Hillis’ PhD thesis [2], supervised by Prof. Gerald Jay Sussman.

Connection Machine 2

[ Connection Machine 2 ]

Danny Hillis‘ motivation to build such a parallel computer stemmed from his desire to explore new paradigms of computation outside the traditional von Neumann architecture. Hillis believed that Computer Science needed fresh ideas, and he turned to Physics as a source of inspiration [3].

Although the Connection Machine was initially intended for applications in Artificial Intelligence, later versions of it (CM-2, CM-5) were used with remarkable success in Computational Physics (e.g., Lattice QCD).

In 1983 Danny Hillis co-founded a company named Thinking Machines Corporation (TMC)  in order to exploit possible commercial applications of the Connection Machine paradigm. Competition from established supercomputer firms forced TMC into bankruptcy in 1993.

__________

References:

[1] Danny Hillis, Richard Feynman and the Connection Machine.

[2] Danny Hillis, The Connection Machine, PhD thesis, Department of Electrical Engineering and Computer Science (EECS), MIT, 1988.

[3] Danny Hillis, New Computer Architectures and their relationship to Physics, International Journal of Theoretical Physics, Vol 21, Nos. 3/4, 1982.

__________

Related:

Building a polynomial from its roots II

December 10, 2008

Suppose we are given an n-tuple of (not necessarily distinct) complex numbers (w_1, w_2, \ldots, w_n) and we construct a monic univariate polynomial of degree n over field \mathbb{C} whose n roots are the elements of (w_1, w_2, \ldots, w_n)

p_n(z) = \displaystyle\prod_{i=1}^n (z - w_i) = \displaystyle\sum_{m=0}^n a(m,n) z^m,

where a(m,n) \in \mathbb{C} is the coefficient of monomial z^m of degree m in polynomial p_n(z) of degree n. This double-indexing might seem a bit confusing, but its usefulness will be made evident later on.

Usually, we take a univariate polynomial and try to find its roots. Now we are interested in the reverse operation, i.e., we would like to find the values of the coefficients \{a(m,n)\}_{m=0}^{n} of polynomial p_n(z) in terms of the roots (w_1, w_2, \ldots, w_n). Note that every permutation of the n-tuple (w_1, w_2, \ldots, w_n) results in the same monic polynomial of degree n.

_____

Constructing polynomials

Say we know the coefficients of polynomial p_k(z) of degree k. Then we can write the coefficients \{a(m, k+1)\}_{m=0}^{k+1} of polynomial p_{k+1}(z) in terms of the coefficients  \{a(m, k)\}_{m=0}^{k} and the (k+1)-th element of n-tuple (w_1, w_2, \ldots, w_n). On my previous post on this topic, I derived the recursion

a(m,k+1) = \displaystyle\left\{\begin{array}{rl} - w_{k+1} a(0,k) & \text{if} \quad{} m = 0\\ a(m-1,k) - w_{k+1} a(m,k) & \text{if} \quad{} 1 \leq m \leq k\\ a(k,k) & \text{if} \quad{} m=k+1\\ \end{array}\right.

Let us start with a monic polynomial of degree zero, p_0(z) = 1, whose only coefficient is a(0,0) = 1. We take the first element of the n-tuple (w_1, w_2, \ldots, w_n) and construct the monic polynomial p_1(z) = z - w_1 of degree 1 using the recursion above. Then we take the 2nd root w_2 and construct the monic polynomial of degree 2 whose roots are w_1, w_2

p_2(z) = z^2 - (w_1 + w_2) z + w_1 w_2.

Iterating successively, we obtain a monic polynomial p_n(z) of degree n whose roots are w_1, w_2, \ldots, w_n.

_____

A graphical approach

Consider the following set of 15 nodes arranged in 5 columns and 5 rows. Each node is labelled with an ordered pair (i,j), where i is the row index and j is the column index.

graph-labels1

We now attach a complex number a(i,j) to each node (i,j). The coefficient of monomial z^m in polynomial p_n(z) can be thought of as the number attached to node (m,n). For example, the coefficients of polynomial p_2(z) are the numbers attached to the nodes of the 3rd column (note that we count rows and columns starting at the zero index, like in the C programming language). The double-indexing now makes some sense ;-)

Finally, we add directed edges linking nodes in adjacent columns, and obtain the following directed graph (which reminds me of the trellis diagrams used in convolutional coding)

graph-with-directed-edges

Recall the recursion formula which allows us to write the coefficients of polynomial p_{k+1}(z) in terms of the coefficients of polynomial p_{k}(z)

a(m,k+1) = \displaystyle\left\{\begin{array}{rl} - w_{k+1} a(0,k) & \text{if} \quad{} m = 0\\ a(m-1,k) - w_{k+1} a(m,k) & \text{if} \quad{} 1 \leq m \leq k\\ a(k,k) & \text{if} \quad{} m=k+1\\ \end{array}\right..

Since the coefficients a(\cdot, \cdot) are attached to the nodes of the directed graph depicted above, we can use the recursion formula to compute the values attached to the nodes of a given column as a function of the values attached to the nodes in the column to the left.

This graphical approach is (most likely) of no practical use. Nonetheless, this approach does reveal the “structure” of the process of constructing a polynomial from its roots, and such structure is of much greater beauty than cumbersome algebraic manipulation.

_____

An example

Suppose we want to compute the coefficients of the monic polynomial p_4(z) = (z - 1)^4, which has a root of multiplicity 4 at z = 1. In this case we have w_1 = w_2 = w_3 = w_4 = 1. From the well-known binomial theorem, we know that

p_4(z) = z^4 - 4 z^3 + 6 z^2 - 4 z + 1,

and therefore, the coefficients are a(0,4) = 1, a(1,4) = -4, a(2,4) = 6, a(3,4) = -4, and a(4,4) = 1. Note that the absolute values of these coefficients are the binomial coefficients and can be found in the 5th row of Pascal’s triangle.

Let us now construct this polynomial one step at a time using the graphical approach described above. We start with monic polynomial p_0(z) = 1, whose only coefficient is a(0,0) = 1

example-step0and then we compute the coefficients of the monic polynomial p_1(z) = z-1 using the recursion formula. Graphically, this means that the values of the 2 nodes in the 2nd column can be computed from a(0,0), as follows

example-step1

where the directed edges now have weights. For example, the edge linking nodes (0,0) and (0,1) has weight -1, while the edge linking (0,0) and (1,1) has weight 1. If the weight of an edge is equal to 1, then we do not label the edge in order to avoid cluttering the diagram.

From the coefficients of p_1(z) and the second root, w_2 = 1, we can compute the coefficients of p_2(z) using the recursion again, i.e., the values of the nodes in the 3rd column can be computed from the values in the nodes in the 2nd column, as follows

example-step2

The values of the nodes in the 3rd column are known, and since the 3rd root is w_3 = 1 we can now obtain the coefficients of p_3(z)

example-step3

and since w_4 = 1, we can finally obtain the coefficients of the monic polynomial p_4(z), which are the values of the nodes in the 5th column

example-step4

In this example the roots I chose have zero imaginary parts, but this approach still works if the roots have nonzero imaginary parts. This method could be used to compute the coefficients of polynomials of degrees larger than 4, of course.


Follow

Get every new post delivered to your Inbox.

Join 47 other followers