Posts Tagged ‘Mathematics’

Matrix decompositions with sparsity constraints

December 12, 2008

Suppose we are given a finite square matrix A \in \mathbb{R}^{n \times n}. We would like to factorize A into N (where N is finite) matrices

A = W_{N-1} W_{N-2} \ldots W_ {2} W_{1} W_{0},

where factors W_0, W_1, \ldots, W_{N-1}  \in \mathbb{R}^{n \times n} are required to have a certain sparsity pattern, i.e., a subset of their n^2 entries must be zero.

Definition: Let \mathcal{I}_n = \{1, 2, \ldots, n\} and \mathcal{I}_n^2 = \mathcal{I}_n \times \mathcal{I}_n. A sparsity pattern \mathcal{S}_n is a set of ordered pairs (i,j) \in \mathcal{I}_n^2 that indicate which entries of a n \times n matrix are constrained to be zero.

Note that \mathcal{S}_n \subseteq \mathcal{I}_n^2. If there are no sparsity constraints, then \mathcal{S}_n is an empty set. Set \mathcal{I}_n^2 \setminus \mathcal{S}_n contains the indices of the entries of a n \times n matrix which are unconstrained, i.e., “free”.

__________

Problem: Given a matrix A \in \mathbb{R}^{n \times n} and a particular sparsity pattern \mathcal{S}_n, how can we find if it is possible to factorize A into N factor matrices (where N is finite) whose sparsity pattern is the specified one? If such factorization is possible, how can we compute each factor W_1, W_2, \ldots, W_{N-1} and N?

__________

This problem is easy to pose, but I have found it very hard to attack. I have tried various approaches… and none worked. Does anyone have any ideas?

__________

Example: suppose we are given a 5 \times 5 real matrix

A = \left[\begin{array}{ccccc} 1 & 1 & 1 & 1 & 1\\ 1 & 1 & 1 & 1 & 1\\ 1 & 1 & 1 & 1 & 1\\ 1 & 1 & 1 & 1 & 1\\ 1 & 1 & 1 & 1 & 1\\ \end{array}\right].

We would like to factorize matrix A into N factors \{W_k\}_{k =0}^{N-1} \in \mathbb{R}^{5 \times 5} whose sparsity pattern is represented pictorially by

\left[\begin{array}{ccccc} \times & \times & 0 & 0 & 0\\ \times & \times & \times & 0 & 0\\ 0 & \times & \times & \times & 0\\ 0 & 0 & \times & \times & \times\\ 0 & 0 & 0 & \times & \times\\\end{array}\right],

where a “\times” in the (i,j) entry means that the (i,j) entry is unconstrained, i.e., it is allowed to take any value in \mathbb{R}. If you are familiar with numerical PDE’s, you certainly encountered matrices with this sparsity pattern when solving diffusion or wave equations in one spatial dimension using finite difference methods.

__________

Even though this example seems fairly “simple” (it’s only a 5 \times 5 matrix after all!), I have not yet found an approach that works.

Building a polynomial from its roots II

December 10, 2008

Suppose we are given an n-tuple of (not necessarily distinct) complex numbers (w_1, w_2, \ldots, w_n) and we construct a monic univariate polynomial of degree n over field \mathbb{C} whose n roots are the elements of (w_1, w_2, \ldots, w_n)

p_n(z) = \displaystyle\prod_{i=1}^n (z - w_i) = \displaystyle\sum_{m=0}^n a(m,n) z^m,

where a(m,n) \in \mathbb{C} is the coefficient of monomial z^m of degree m in polynomial p_n(z) of degree n. This double-indexing might seem a bit confusing, but its usefulness will be made evident later on.

Usually, we take a univariate polynomial and try to find its roots. Now we are interested in the reverse operation, i.e., we would like to find the values of the coefficients \{a(m,n)\}_{m=0}^{n} of polynomial p_n(z) in terms of the roots (w_1, w_2, \ldots, w_n). Note that every permutation of the n-tuple (w_1, w_2, \ldots, w_n) results in the same monic polynomial of degree n.

_____

Constructing polynomials

Say we know the coefficients of polynomial p_k(z) of degree k. Then we can write the coefficients \{a(m, k+1)\}_{m=0}^{k+1} of polynomial p_{k+1}(z) in terms of the coefficients  \{a(m, k)\}_{m=0}^{k} and the (k+1)-th element of n-tuple (w_1, w_2, \ldots, w_n). On my previous post on this topic, I derived the recursion

a(m,k+1) = \displaystyle\left\{\begin{array}{rl} - w_{k+1} a(0,k) & \text{if} \quad{} m = 0\\ a(m-1,k) - w_{k+1} a(m,k) & \text{if} \quad{} 1 \leq m \leq k\\ a(k,k) & \text{if} \quad{} m=k+1\\ \end{array}\right.

Let us start with a monic polynomial of degree zero, p_0(z) = 1, whose only coefficient is a(0,0) = 1. We take the first element of the n-tuple (w_1, w_2, \ldots, w_n) and construct the monic polynomial p_1(z) = z - w_1 of degree 1 using the recursion above. Then we take the 2nd root w_2 and construct the monic polynomial of degree 2 whose roots are w_1, w_2

p_2(z) = z^2 - (w_1 + w_2) z + w_1 w_2.

Iterating successively, we obtain a monic polynomial p_n(z) of degree n whose roots are w_1, w_2, \ldots, w_n.

_____

A graphical approach

Consider the following set of 15 nodes arranged in 5 columns and 5 rows. Each node is labelled with an ordered pair (i,j), where i is the row index and j is the column index.

graph-labels1

We now attach a complex number a(i,j) to each node (i,j). The coefficient of monomial z^m in polynomial p_n(z) can be thought of as the number attached to node (m,n). For example, the coefficients of polynomial p_2(z) are the numbers attached to the nodes of the 3rd column (note that we count rows and columns starting at the zero index, like in the C programming language). The double-indexing now makes some sense ;-)

Finally, we add directed edges linking nodes in adjacent columns, and obtain the following directed graph (which reminds me of the trellis diagrams used in convolutional coding)

graph-with-directed-edges

Recall the recursion formula which allows us to write the coefficients of polynomial p_{k+1}(z) in terms of the coefficients of polynomial p_{k}(z)

a(m,k+1) = \displaystyle\left\{\begin{array}{rl} - w_{k+1} a(0,k) & \text{if} \quad{} m = 0\\ a(m-1,k) - w_{k+1} a(m,k) & \text{if} \quad{} 1 \leq m \leq k\\ a(k,k) & \text{if} \quad{} m=k+1\\ \end{array}\right..

Since the coefficients a(\cdot, \cdot) are attached to the nodes of the directed graph depicted above, we can use the recursion formula to compute the values attached to the nodes of a given column as a function of the values attached to the nodes in the column to the left.

This graphical approach is (most likely) of no practical use. Nonetheless, this approach does reveal the “structure” of the process of constructing a polynomial from its roots, and such structure is of much greater beauty than cumbersome algebraic manipulation.

_____

An example

Suppose we want to compute the coefficients of the monic polynomial p_4(z) = (z - 1)^4, which has a root of multiplicity 4 at z = 1. In this case we have w_1 = w_2 = w_3 = w_4 = 1. From the well-known binomial theorem, we know that

p_4(z) = z^4 - 4 z^3 + 6 z^2 - 4 z + 1,

and therefore, the coefficients are a(0,4) = 1, a(1,4) = -4, a(2,4) = 6, a(3,4) = -4, and a(4,4) = 1. Note that the absolute values of these coefficients are the binomial coefficients and can be found in the 5th row of Pascal’s triangle.

Let us now construct this polynomial one step at a time using the graphical approach described above. We start with monic polynomial p_0(z) = 1, whose only coefficient is a(0,0) = 1

example-step0and then we compute the coefficients of the monic polynomial p_1(z) = z-1 using the recursion formula. Graphically, this means that the values of the 2 nodes in the 2nd column can be computed from a(0,0), as follows

example-step1

where the directed edges now have weights. For example, the edge linking nodes (0,0) and (0,1) has weight -1, while the edge linking (0,0) and (1,1) has weight 1. If the weight of an edge is equal to 1, then we do not label the edge in order to avoid cluttering the diagram.

From the coefficients of p_1(z) and the second root, w_2 = 1, we can compute the coefficients of p_2(z) using the recursion again, i.e., the values of the nodes in the 3rd column can be computed from the values in the nodes in the 2nd column, as follows

example-step2

The values of the nodes in the 3rd column are known, and since the 3rd root is w_3 = 1 we can now obtain the coefficients of p_3(z)

example-step3

and since w_4 = 1, we can finally obtain the coefficients of the monic polynomial p_4(z), which are the values of the nodes in the 5th column

example-step4

In this example the roots I chose have zero imaginary parts, but this approach still works if the roots have nonzero imaginary parts. This method could be used to compute the coefficients of polynomials of degrees larger than 4, of course.

Building a polynomial from its roots

November 22, 2008

Suppose we are given a set of n \geq 2 distinct real numbers \mathcal{R} = \{r_1, r_2, \ldots, r_n\}, and we build a monic univariate polynomial (over field \mathbb{R}) of degree n whose n distinct roots are the elements of set \mathcal{R}

p_n(x) = \displaystyle\prod_{i=1}^n (x - r_i) = \displaystyle\sum_{m=0}^n a_{m,n} x^m,

where a_{m,n} \in \mathbb{R} is the coefficient of monomial x^m. The second subscript in a_{m,n} tells us that the coefficient corresponds to a polynomial whose degree is n.

Problem: for n \geq 2, what are the values of the coefficients \{a_{m,n}\}_{m=0}^{n} in terms of the roots \{r_i\}_{i=1}^n?

At first glance, this problem seems elementary. However, I had never thought of it and I found it quite interesting. I prefer to think of examples before attempting to see the “big picture”, so let us consider the simplest cases.

_____

Case n=1

The monic polynomial of degree 1 whose root is r_1 is

p_1(x) = x - r_1

and the coefficients are

\left[\begin{array}{c}a_{0,1}\\a_{1,1}\end{array}\right] = \left[\begin{array}{c} -r_1\\1\end{array}\right].

_____

Case n=2

The monic polynomial of degree 2 whose roots are r_1, r_2 is

\begin{array}{rl} p_2(x) &= (x - r_1) (x - r_2)\\ &= x^2 - (r_1 + r_2) x + r_1 r_2\end{array}

and the coefficients are

\left[\begin{array}{c}a_{0,2}\\a_{1,2}\\a_{2,2}\end{array}\right] = \left[\begin{array}{c}r_1 r_2\\-(r_1+r_2)\\1\end{array}\right].

_____

Case n=3

The monic polynomial of degree 3 whose roots are r_1, r_2, r_3 is

\begin{array}{rl} p_3(x) &= (x - r_1) (x - r_2) (x - r_3)\\ &= x^3 - (r_1+r_2 + r_3) x^2 + (r_1 r_2 + r_1 r_3 + r_2 r_3) x - r_1 r_2 r_3\\\end{array}

and the coefficients are

\left[\begin{array}{c}a_{0,3}\\a_{1,3}\\a_{2,3}\\a_{3,3}\end{array}\right] = \left[\begin{array}{c} - r_1 r_2 r_3\\ r_1 r_2 + r_1 r_3 + r_2 r_3\\ -(r_1+r_2 + r_3)\\1\end{array}\right].

_____

The coefficients \{a_{m,n}\}_{m=0}^{n} can be computed in a recursive fashion. Consider the monic polynomials

p_{k}(x) = \displaystyle\prod_{i=1}^{k} (x - r_i)

and

p_{k+1}(x) = \displaystyle\prod_{i=1}^{k+1} (x - r_i) = (x - r_{k+1}) \displaystyle\prod_{i=1}^k (x - r_i),

then p_{k+1} =  (x - r_{k+1}) p_k(x), and thus

\begin{array}{rl} p_{k+1}(x) &= (x - r_{k+1})\displaystyle\sum_{m=0}^k a_{m,k} x^m\\ &= \displaystyle\sum_{m=0}^k a_{m,k} x^{m+1} - \displaystyle\sum_{m=0}^k r_{k+1} a_{m,k} x^m.\\\end{array}

Note that

\begin{array}{rl} \displaystyle\sum_{m=0}^k a_{m,k} x^{m+1} &= \displaystyle\sum_{m=1}^k a_{m-1,k} x^m + a_{k,k} x^{k+1}\\ \displaystyle\sum_{m=0}^{k} r_{k+1} a_{m,k} x^m &= r_{k+1} a_{0,k} + \displaystyle\sum_{m=1}^k r_{k+1} a_{m,k} x^m\end{array}

and therefore

p_{k+1}(x) = a_{k,k} x^{k+1} + \displaystyle\sum_{m=1}^k \left(a_{m-1,k} - r_{k+1} a_{m,k}\right) x^m - r_{k+1} a_{0,k}.

We can write p_{k+1}(x) in the expanded form in terms of the coefficients \{a_{m,k+1}\}_{m=0}^{k+1}

p_{k+1}(x) = a_{k+1,k+1} x^{k+1} + \displaystyle\sum_{m=1}^k a_{m,k+1} x^m + a_{0,k+1}

and therefore we can write the coefficients \{a_{m,k+1}\}_{m=0}^{k+1} in terms of the coefficients \{a_{m,k}\}_{m=0}^{k}, as follows

a_{m,k+1} = \displaystyle\left\{\begin{array}{rl} - r_{k+1} a_{0,k} & \text{if} \quad{} m = 0\\ a_{m-1,k} - r_{k+1} a_{m,k} & \text{if} \quad{} 1 \leq m \leq k\\ a_{k,k} & \text{if} \quad{} m=k+1\\ \end{array}\right.

Hence, starting with the zero-degree monic polynomial p_0(x), whose only coefficient is a_{0,0} = 1, we can build p_1(x), the monic polynomial of degree 1 whose root is r_1. From p_1(x), and given a second root r_2, we can build p_2(x), the monic polynomial of degree 2 whose roots are r_1, r_2. Iterating successively, we can build p_n(x), the monic polynomial of degree n whose roots are r_1,r_2, \ldots, r_n.

Representing complex numbers as 2×2 matrices

November 9, 2008

We can represent complex numbers as 2 \times 2 real matrices, such that arithmetic operations (+, -, \times, \div) on complex numbers become equivalent to arithmetic operations on their matrix representations.

Consider a matrix-valued function M: \mathbb{C} \rightarrow \mathbb{R}^{2 \times 2} defined as

M(z) = \left[\begin{array}{cc} \Re\{z\} & -\Im\{z\}\\ \Im\{z\} & \Re\{z\}\\ \end{array}\right],

where \Re\{z\} and \Im\{z\} are the real and imaginary parts of z \in \mathbb{C}, respectively. It can be shown that for all z_1, z_2 \in \mathbb{C}, we have that

  • M(z_1 + z_2) = M(z_1) + M(z_2).
  • M(z_1 - z_2) = M(z_1) - M(z_2).
  • M(z_1 z_2) = M(z_1) M(z_2).
  • M(z_1 / z_2) = [M(z_2)]^{-1} M(z_1), if z_2 \neq 0.

There seems to be an homomorphism here.

__________

Matrix Representation of a Complex Number

Let z = a + i b, where i = \sqrt{-1} and a,b \in \mathbb{R}, be a complex number. Its matrix representation is

M(z) = M(a + i b) = \left[\begin{array}{cc} a & -b\\ b & a\\ \end{array}\right] = a \left[\begin{array}{cc} 1 & 0\\ 0 & 1\\ \end{array}\right] + b \left[\begin{array}{cc} 0 & -1\\ 1 & 0\\ \end{array}\right].

Let matrices I_2, Q \in \mathbb{R}^{2 \times 2} be

I_2 = \left[\begin{array}{cc} 1 & 0\\ 0 & 1\\ \end{array}\right], \quad{} Q = \left[\begin{array}{cc} 0 & -1\\ 1 & 0\\ \end{array}\right],

then the matrix representation of z \in \mathbb{C} is

M(z) = M(a + i b) = a I_2 + b Q.

Note that both I_2 and Q are orthogonal matrices, and therefore Q^{-1} = Q^T. Since I_2 is the 2 \times 2 identity matrix, we have that I_2 and Q commute, i.e., I_2 Q = Q I_2.

The matrix representation of the complex conjugate of z is

M(z^*) = M(a - i b) = a I_2 - b Q = M^T (z),

which follows from the fact that matrix Q is skew-symmetric, i.e., Q^T = -Q. The complex conjugate of a complex number can thus be computed by transposing its matrix representation. Since Q^T = Q^{-1} and Q^T = - Q, we have Q^2 = - I_2.

The square of the magnitude of a complex number, |z|^2 = zz^*, is equal to the determinant of its matrix representation, as follows

\det(M(z)) = \det(M(a + i b)) = \begin{vmatrix} a & -b \\ b & a\\ \end{vmatrix} = a^2 + b^2 = |z|^2.

A complex number written in the cartesian form, z = a + i b, can be converted to its polar form, z = \rho e^{i \theta}, where \rho = \sqrt{a^2 + b^2} and \theta = \arg(z). Given a complex number in the polar form z = \rho e^{i \theta}, its cartesian form is

z = \rho (\cos(\theta) + i \sin(\theta)),

and its matrix representation is

M(z) = \rho \left[\begin{array}{cc} \cos(\theta) & -\sin(\theta)\\ \sin(\theta) & \cos(\theta)\\ \end{array}\right] = \rho R(\theta),

where

R(\theta) = \left[\begin{array}{cc} \cos(\theta) & -\sin(\theta)\\ \sin(\theta) & \cos(\theta)\\ \end{array} \right]

is a rotation matrix that rotates a vector in \mathbb{R}^2 by a counterclockwise angle \theta radians, leaving the vector’s \|\cdot\|_2 unchanged. Matrix R(\theta) is orthogonal, and thus R^{-1} (\theta) = R^T (\theta). Given that

\sin(-\theta) = - \sin(\theta), \quad{} \cos(-\theta) = \cos(\theta),

we have R(-\theta) = R^T (\theta) = R^{-1}. As \det(R(\theta)) = 1 for any value of \theta, we can conclude that R(\theta) \in \text{SO}(2).

Finally, note that Q = R(\frac{\pi}{2}), i.e., matrix Q rotates a vector in \mathbb{R}^2 \pi/2 radians counter-clockwise. If we multiply a complex number by i, we rotate it by a counterclockwise angle of \pi/2 radians

i z = i \rho e^{i \theta} = \rho e^{i (\theta + \frac{\pi}{2})},

and in terms of matrix representations, this is equivalent to multiplying M(z) by matrix Q. It does not matter if we multiply by Q on the left or on the right, because matrices I_2 and Q do commute.

__________

Arithmetic Operations

Let z_1 = a_1 + i b_1 and z_2 = a_2 + i b_2 be two complex numbers. Their addition/subtraction is thus

z_1 \pm z_2 = (a_1 \pm a_2) + i (b_1 \pm b_2),

and from the rules of matrix addition / subtraction it can easily be shown  that M(z_1) \pm M(z_2) = M(z_1 \pm z_2).

It’s easier to compute products of complex numbers if we write them in the polar form, i.e., z_1 = \rho_1 e^{i \theta_1} and z_2 = \rho_2 e^{i \theta_2}. The product is thus z_1 z_2 = \rho_1 \rho_2 e^{i (\theta_1 + \theta_2)}. The product of the matrix representations is

M(z_1) M(z_2) = \rho_1 R(\theta_1) \rho_2 R(\theta_2) = \rho_1 \rho_2 R(\theta_1 + \theta_2) = M(z_1 z_2),

since R(\theta_1) R(\theta_2) = R(\theta_1 + \theta_2). Dividing z_1 \in \mathbb{C} by z_2 \in \mathbb{C} \setminus \{0\} yields

\displaystyle\frac{z_1}{z_2} = \frac{\rho_1}{\rho_2} e^{i (\theta_1 - \theta_2)},

whose corresponding matrix representation is

\displaystyle M(z_1 / z_2) = \frac{\rho_1}{\rho_2} R(\theta_1 - \theta_2).

Note that

M^{-1} (z_2) M(z_1) = \displaystyle \frac{\rho_1}{\rho_2} R(-\theta_2) R(\theta_1) = \frac{\rho_1}{\rho_2} R(\theta_1 - \theta_2)

and therefore M^{-1} (z_2) M(z_1) = M(z_1 / z_2).

We can conclude that arithmetic operations on complex numbers are equivalent to arithmetic operations on the matrices representing such complex numbers.

I have absolutely no idea what use this matrix representation could possibly have, but it’s quite interesting nonetheless.

__________

Related:

Thinking about permutations

January 18, 2008

Consider a set S with |S|=n elements (where all elements of S are distinct). We can build ordered lists of n elements by taking elements from set S and using them (with no repetition) to build n-tuples. With n elements we can build n! distinct n-tuples.

Let us think in terms of vectors in \mathbb{R}^n, where each vector represents a particular ordered list of the elements of some set with n elements. If x, y \in \mathbb{R}^n are permutations of one another, then they contain the same elements, but in a different order; consequently, for every p \in \mathbb{N} we have

\displaystyle\sum_{i=1}^n x_i^p = \displaystyle\sum_{i=1}^n y_i^p.

Let I: \mathbb{R}^n \times \mathbb{R}^n \rightarrow \{0,1\} be an indicator function, such that I(x,y) = 1 if and only if x,y are permutations of one another (I(x,y)=0 otherwise). Let g_p: \mathbb{R}^n \rightarrow \mathbb{R} be defined as

g_p(z) = \displaystyle\sum_{i=1}^n z_i^p,

for all p \in \mathbb{N}. We can then conclude that

I(x,y) = 1 \Rightarrow g_p(x) = g_p(y), \quad{} \forall p \in \mathbb{N}.

The condition g_p(x) = g_p(y), \quad{} \forall p \in \mathbb{N} is necessary. Is it sufficient?! In other words, if g_p(x) = g_p(y), \quad{} \forall p \in \mathbb{N} is true, can we conclude that vectors x,y are permutations of one another?

This is an interesting problem. Think about it!

Update (Jan. 25, 2008): this post was featured on the 25th edition (Silver Jubilee) of the Carnival of Mathematics.


Follow

Get every new post delivered to your Inbox.

Join 76 other followers