Optimal Account Balancing

June 20, 2009 by Rod Carvalho

Suppose that Alice owes Bob $5 and is owed $10 by Bob. These debts can be represented as the weighted directed graph

IOU graph - Alice and Bobwhich we will call an IOU graph. Alice pays Bob $5 and Bob pays Alice $10. The debts are paid and $15 have been exchanged in a total of two transactions. Let us now consider the following cash flow diagram

FBD - Alice and Boband, accounting for the net cash flows, we obtain

FBD - Alice and Bob 2which illustrates that if Bob paid Alice $5, then the debts would still be paid, though only $5 would be exchanged and only one transaction would be made. This looks much cleverer and more efficient than exchanging $15 in a total of two transactions. Hence, the following IOU graph

IOU graph - Alice and Bob 2is “equivalent” to the original IOU graph. In both IOU graphs, if debts are paid then Alice will be $5 up, while Bob will be $5 down.

We add a third agent, Charlie. Consider now the IOU graph

IOU graph - Alice Bob and Charliewhere:

  • Alice must pay a total of $20 and be paid a total of $30.
  • Bob must pay a total of $15 and be paid a total of $20.
  • Charlie must pay a total of $35 and be paid a total of $20.

Hence, after the debts are paid, Alice will have $30 – $20 = $10 more in her account, Bob will have $20 – $15 = $5 more in his account, and Charlie will have $20 – $35 = -$15 more (= $15 less) in his account. Note that $10 + $5 – $15 = $0, i.e., money is conserved. In total, $70 have been exchanged in six transactions. We can do better than that. For example, we could pay all debts in just three transactions, as illustrated in the following IOU graph

iou-graph-alice-bob-and-charlie-2bwhere a total of $20 are exchanged. Note that Alice is $10 up, Bob is $5 up, and Charlie is $15 down, just like before. We can do even better than this. If Charlie pays Bob’s debt to Alice, then all debts can be paid in just two transactions as shown in the IOU graph below

IOU graph - Alice Bob and Charlie 3where only $15 are exchanged. This is an “optimal” debt-paying scheme because the debts cannot be paid in less than two transactions, nor exchanging less than $15. Do you agree?

__________

Debts and IOU Graphs

An IOU graph is a weighted directed graph given by an ordered triple \mathcal{G} = (\mathcal{N}, \mathcal{E}, w), where:

  • \mathcal{N} is the node set. Each node of the IOU graph represents a different agent.
  • \mathcal{E} \subset \mathcal{N} \times \mathcal{N} is the edge set. Each directed edge is an ordered pair (i,j), where i is the edge’s tail and j is the edge’s head. If (i,j) \in \mathcal{E} then node i owes money to node j. We do not allow nodes to owe money to themselves and, therefore, for each i \in \mathcal{N} we must have (i,i) \notin \mathcal{E}.
  • w: \mathcal{E} \rightarrow \mathbb{R}_{\geq 0} is a weight function that assigns positive weights to each edge in \mathcal{E}. The weight of edge (i, j) \in \mathcal{E} is given by w\left( (i, j) \right) > 0, and it tells us the amount of money that node i owes to node j.

We define the in-neighborhood and out-neighborhood of node i as follows

\begin{array}{ll} \mathcal{N}_i^{in} &= \displaystyle\{ j \in \mathcal{N} \mid (j,i) \in \mathcal{E}\}\\\\ \mathcal{N}_i^{out} &= \displaystyle\{ j \in \mathcal{N} \mid (i,j) \in \mathcal{E}\}\end{array}

and illustrate them below

IN & OUT neighborhoods

The divergence [1] of node i of graph \mathcal{G} is a scalar defined as

\textbf{div}_i (\mathcal{G}) = \displaystyle \sum_{j \in \mathcal{N}_i^{out}} x_{ij} - \displaystyle \sum_{j \in \mathcal{N}_i^{in}} x_{ji}

and it gives us the amount of money node i owes to other nodes minus the amount of money node i is owed to by other nodes. The divergence of a node can be viewed as the discrete, graph-theoretic analogue of the divergence operator in vector calculus. We define also the divergence vector of graph \mathcal{G} as the vector whose i-th component is \textbf{div}_i (\mathcal{G})

\textbf{div} (\mathcal{G}) = \left[\begin{array}{c} \vdots\\ \textbf{div}_i (\mathcal{G})\\ \vdots \\ \end{array}\right].

The divergence vector of an IOU graph \mathcal{G} tells us how much money each node in the graph will lose after debts have been paid. We say that two IOU graphs (on the same node set) are equi-divergent if their divergence vectors are the same. This means that, although the debt relationships are different, both IOU graphs result in every node getting paid what he is owed and paying what he owes.

__________

Optimal Account Balancing

Suppose we are given an IOU graph \mathcal{G}_0 = (\mathcal{N}, \mathcal{E}_0, w_0). We would like to find a new IOU graph \mathcal{G} = (\mathcal{N}, \mathcal{E}, w) on the same node set \mathcal{N}, such that the two IOU graphs are equi-divergent

\textbf{div} (\mathcal{G}) = \textbf{div} (\mathcal{G}_0).

Moreover, we would like the new IOU graph to be “optimal” in some sense. I can think of the two following optimality criteria:

minimum money flow: we want to minimize the total amount of money being exchanged between the nodes. This is the same as minimizing the sum of the (positive) weights of the edges of the IOU graph.

least number of transactions: we want to minimize the total number of transactions being made. This is the same as minimizing the number of edges in the IOU graph, i.e., maximizing the sparsity of the IOU graph.

Can you think of other interesting optimality criteria?

It would be reasonable to consider the case where the transaction fees are nonzero. If these are variable and proportional to the amount of money being transferred, then we would like to minimize the amount of money flowing in the network of agents. If the transaction fees are fixed, then we would like to minimize the total number of transactions being made. For example, in the two examples above the minimum money flow solution is the same as the least number of transactions solution. Is this always the case?

In order to find the optimal IOU graph, we start with a weighted directed graph on node set \mathcal{N} whose edge set is the one of the directed complete graph on \mathcal{N}. Hence, every two distinct nodes are connected by two directed edges with opposite directions. The weight of edge (i,j) is denoted by x_{ij} \geq 0. Since some of these weights might be zero, there might be edges in this graph which do not represent debt relationships! Therefore, although this graph is weighted and directed, it is not necessarily an IOU graph. We optimize for the n (n-1) nonnegative variables \{x_{ij}\} and, in the end, we remove the edges whose weight is zero so that we obtain an IOU graph. I view this approach as somewhat akin to scaffolding ;-)

For example, considering the minimum money flow criterion, we have the following minimum cost flow problem [1]

\begin{array}{ll} \text{minimize} & \displaystyle \sum_{i \in \mathcal{N}}\sum_{j \neq i} x_{ij}\\\\ \text{subject to} & \displaystyle \sum_{j \neq i} x_{ij} - \displaystyle \sum_{j \neq i} x_{ji} = \textbf{div}_i (\mathcal{G}_0), \quad{} \forall i \in \mathcal{N}\\\\ & x_{ij} \geq 0, \quad{} \forall i \neq j \in \mathcal{N}\\\\\end{array}

which can be solved via linear programming [2]. In words: we want to find the debts x_{ij} \geq 0 such that all debts are paid (each node should be paid what it is owed minus what it owes to other nodes) and the total amount of debt is minimized, which results in the total amount of money flowing to be also minimized.

The least number of transactions criterion is much harder to tackle because we want to maximize the sparsity of the graph, while ensuring that the graph and the original IOU graph equi-divergent. We want to optimize the topology of the graph and, hence, a combinatorial flavor is added to the problem. I will write more about it on future posts.

__________

A Numerical Example

Let us again consider the 3-agent example where the agents are Alice, Bob and Charlie. The IOU graph is

IOU graph - Alice Bob and Charliewhere the node set is \mathcal{N} = \{\text{a}, \text{b}, \text{c}\}. We introduce vector

x = (x_{ab}, x_{ac}, x_{ba}, x_{bc}, x_{ca}, x_{cb})

where x_{ij} \geq 0 is the amount of money node i owes to node j. The divergence of the new graph at each node must be the same as the divergence of the given IOU graph at each node, which yields the equality constraint C x = b, where

C = \left[\begin{array}{rrrrrr} -1 & -1 & 1 & 0 & 1 & 0\\ 1 & 0 & -1 & -1 & 0 & 1\\ 0 & 1 & 0 & 1 & -1 & -1\\\end{array}\right]

is the incidence matrix of the graph, and b = [\begin{array}{ccc} 10 & 5 & -15\end{array}]^T is the opposite of the divergence vector of the given IOU graph above. This equality constraint guarantees that the given IOU graph and the new graph we obtain are equi-divergent.

Let I_6 be the 6 \times 6 identity matrix, let 1_6 be the 6-dimensional vector of ones, and let 0_6 be the 6-dimensional vector of zeros. The 6 nonnegativity constraints x_{ij} \geq 0 can be written in matrix form as (-I_6) x \leq 0_6. Considering the minimum money flow criterion, we obtain the linear program

\begin{array}{ll} \text{minimize} & 1_6^T x\\\\ \text{subject to} & C x = b, \quad{} (-I_6) x \leq 0_6\\\\\end{array}

Note that the equality constraint C x = b is equivalent to two inequality constraints: C x \leq b and C x \geq b. Thus, we can transform the linear program above into a linear program with inequality constraints only

\begin{array}{ll} \text{minimize} & 1_6^T x\\\\ \text{subject to} & \left[\begin{array}{r} C\\ -C\\ -I_6\\\end{array}\right] x \leq \left[\begin{array}{r} b\\ -b\\ 0_6\\\end{array}\right]\\\\\end{array}

If you have MATLAB, you can easily solve these linear programs using function linprog. The following Python 2.5 script solves the latter linear program using CVXOPT:


from cvxopt import matrix
from cvxopt import solvers

# -------------------
# auxiliary matrices
# -------------------

# defines incidence matrix
C = matrix([ [-1.0, 1.0, 0.0],
             [-1.0, 0.0, 1.0],
             [1.0, -1.0, 0.0],
             [0.0, -1.0, 1.0],
             [1.0, 0.0, -1.0],
             [0.0, 1.0, -1.0] ])

# defines desired divergence vector
d = matrix( [-10.0, -5.0, 15.0] )

# defines 6x6 identity matrix   
Id = matrix([ [1, 0, 0, 0, 0, 0],
              [0, 1, 0, 0, 0, 0],
              [0, 0, 1, 0, 0, 0],
              [0, 0, 0, 1, 0, 0],
              [0, 0, 0, 0, 1, 0],
              [0, 0, 0, 0, 0, 1] ])

# ---------------
# linear program
# ---------------

# defines objective vector
c = matrix(1.0, (6,1))

# defines inequality constraint matrices
G = matrix( [C, -C, -Id] )
h = matrix( [-d, d, matrix(0.0, (6,1))] )

# solves linear program
solvers.options['show_progress'] = True
solution = solvers.lp(c, G, h)

# prints solution
print solution['x']

The solution is

x = \left[\begin{array}{r}5.86\text{e-}008\\ -1.50\text{e-}008\\ -1.36\text{e-}008\\ -1.99\text{e-}008\\ 1.00\text{e+}001\\ 5.00\text{e+}000\\\end{array}\right],

which is “contaminated” with numerical noise. Adjusting the tolerances of the numerical solver, one can reduce the numerical noise and obtain the true solution, which is x = (0, 0, 0, 0, 10, 5). Removing the edges with zero weight, we obtain an IOU graph with 2 edges

IOU graph - Alice Bob and Charlie 3

and, hence, only two transactions are required, and only $15 need to be exchanged. This is the exact same solution we obtained previously via heuristic methods! :-)

__________

Concluding Remarks

The two optimal debt-paying schemes considered on this post rely on an “accountant” who has perfect global information about all the debts in the network of agents. This accountant solves an optimization problem to find out what transactions should be carried out. This is a centralized approach, as the decision power is centralized in the accountant. Another (and arguably more interesting) approach would be a decentralized one, where the nodes have only local information (i.e., each node only knows who owes him money and to whom he owes money) and where each node solves a local optimization problem using, for example, message-passing algorithms. I will write about it on future posts.

The only modeling of this problem I have found was on Demetri Spanos‘ PhD thesis [3]. If you know of any papers on this topic, please let me know.

Last but not least, allow me to add a personal note: I first started thinking of this problem years ago, while going to dinners with friends and trying to figure out how to split the bill in an efficient manner. The real world is a great source of inspiration when it comes to interesting problems.

__________

References

[1]  Dimitri Bertsekas, Network Optimization: Continuous and Discrete Models, Athena Scientific, 1998.

[2] Stephen Boyd and Lieven Vandenberghe, Convex Optimization, Cambridge University Press, 2004.

[3] Demetri Spanos, Distributed gradient systems and dynamic coordination, Ph.D. thesis, California Institute of Technology, December 2005.

Interview with C.N. Yang and Jim Simons

June 16, 2009 by Rod Carvalho

Here’s an interview with renowned theoretical physicist and Nobel laureate Chen Ning Yang (of Yang-Mills fame) and renowned mathematician, hedge fund manager and philanthropist James Harris Simons (of Chern-Simons fame) on the deep connections between Geometry and Physics, the new Simons Center for Geometry and Physics, their careers and their views on various topics.

[ video courtesy of Stony Brook University ]

Related:

How would you design this contract?

May 15, 2009 by Rod Carvalho

"I give you all my risk and you give me all your capital. That's a mathematically optimal exchange."[ source ]

Let us start with the following joke:

Two business partners asked their lawyer to hold $20,000, making him promise to get both of their signatures before disbursing any of it. As soon as one partner left town, the other pressed the lawyer for $15,000, citing an emergency. The lawyer reluctantly gave it to him, and he disappeared. On his return, the other partner was irate, so the lawyer explained that he had donated the $15,000 out of his own pocket.

“Then give me the $20,000 you’re holding,” said the partner.

“All right,” said the lawyer. “Give me the two signatures.”

__________

These two business partners and their lawyer should have signed a contract. If the lawyer disbursed any of the partners’ money without having both of their signatures, he would face legal action for breach of contract. However, this contract would still not be immune to collusion: one of the two partners could split the money with the lawyer and disappear, and the other partner would never have the two signatures required by contract to have his money back. How would you design a better contract? By “better contract”, I mean:

  • a contract that would NOT give any of the parties involved an incentive to break it.
  • a contract that would be robust (and, ideally, immune) to collusion.

I make no claims that such a “perfect contract” even exists, but maybe one could devise a contract that would be better than the aforementioned one. Purely for recreational purposes, I propose a challenge: try to design the best contract you can think of!

If you have any ideas, please feel free to share them.

Radiology Art

April 4, 2009 by Rod Carvalho

The Radiology Art project was started by artist and medical student Satre Stuelke in the Summer of 2007. Stuelke uses a CT scanner to acquire DICOM images of various objects (e.g., electronic apparatuses, toys) and then processes them with OsiriX Imaging Software and Adobe Photoshop. Colors are assigned based on the varying densities of materials present throughout the object. The post-processed images are stunning.

For example, here’s a CT scan of a Teslovak KT88S vacuum tube:

CT scan of a Teslovak KT88S vacuum tube[ image courtesy of Satre Stuelke ]

Related:

The Trials of J. Robert Oppenheimer

February 27, 2009 by Rod Carvalho

In the days of my almost infinitely prolonged adolescence, I hardly took an action, hardly did anything that did not arouse in me a very great sense of revulsion and of wrong. My feeling about myself was always one of extreme discontent.

- J. Robert Oppenheimer (1904-1967)

The Trials of J. Robert Oppenheimer is a two-hour documentary film on the life of the brilliant and charismatic American nuclear physicist J. Robert Oppenheimer (1904-1967), from his youth and emergence as one of America’s leading physicists, to his leadership of the Manhattan Project, and his most tragic humiliation during the anti-Communist witch-hunt in the 1950s.

oppenheimer

The film was produced by David Grubin, and it features actor David Strathairn as Oppenheimer. You can watch it online for free. The transcript is also available.

__________

Related:

Distance between two words

January 31, 2009 by Rod Carvalho

Consider a finite set \Sigma with |\Sigma| distinct elements which we will call symbols. We will call set \Sigma an alphabet. A word (also known as string) of length n over set \Sigma is formed by constructing an n-tuple of symbols from alphabet \Sigma

w = (w_1, w_2, w_3, \ldots, w_n)

where w_i \in \Sigma for all  i \in \{1, 2, \ldots, n\}. A string of length n will be called an n-string. The set of all n-strings is given by the cartesian product

\Sigma^n = \Sigma \times \Sigma \times \ldots \times \Sigma = \displaystyle\prod_{i=1}^n \Sigma.

For example, in coding theory one usually works with binary words, i.e., words over alphabet \mathbb{F}_2 = \{0,1\}. The set of all binary words of length n is \mathbb{F}_2^n = \{0,1\}^n, whose cardinality is | \mathbb{F}_2^n | = 2^n.

In this post we will work only with words over the alphabet

\Sigma = \{\text{a} ,\text{b} ,\text{c} ,\text{d} ,\text{e} ,\text{f} ,\text{g} ,\text{h} ,\text{i} ,\text{j} ,\text{k} ,\text{l} ,\text{m} ,\text{n} ,\text{o} ,\text{p} ,\text{q} ,\text{r} ,\text{s} ,\text{t} ,\text{u} ,\text{v} ,\text{w} ,\text{x} ,\text{y} ,\text{z}\}

which is known as the Latin alphabet, and whose 26 symbols are known as letters. For example,  \textbf{cat} and \textbf{car} are represented by the 3-strings s_1 = (\text{c}, \text{a}, \text{t}) and s_2 = (\text{c}, \text{a}, \text{r}), respectively, where s_1, s_2 \in \Sigma^3.

__________

Hamming distance

Given two strings of equal length, the Hamming distance between them is the number of positions for which the corresponding symbols are different. In other words, the Hamming distance between two strings of equal length is the minimum number of symbol substitutions required to change one string into the other.

Let function d_{H} : \Sigma^n \times \Sigma^n \rightarrow \mathbb{Z}_0^{+} be defined as

d_{H} (v, w) = \displaystyle \sum_{i=1}^n \mu \left( v_i, w_i \right)

where function \mu : \Sigma \times \Sigma \rightarrow \{0,1\} is defined as

\mu (a, b) = \displaystyle\left\{\begin{array}{rl} 0 & \text{if} \quad{} a = b\\ 1 & \text{if} \quad{} a \neq b\\ \end{array}\right.

then d_H (s_1, s_2) is the Hamming distance between n-strings s_1 and s_2. For example, if s_1 = (\text{c}, \text{a}, \text{t}) and s_2 = (\text{c}, \text{a}, \text{r}), then we have d_H (s_1, s_2) = 1 because s_1 and s_2 only differ in one symbol (the last one).

Suppose we are given an n-string over the Latin alphabet \Sigma. If we replace the letter at position i with another letter in \Sigma, where i \in \{1, 2, \ldots, n\}, we can obtain a total of |\Sigma| - 1 new n-strings, each within Hamming distance 1 of the original n-string. Let us call these n-strings neighbors of the original n-string. Thus, an n-string has a total of (|\Sigma| - 1) n neighbors.

__________

Word graphs

If two n-strings are neighbors of one another when their Hamming distance is equal to 1, then a graphical approach would be the natural way to go. Let G = (N,E) be an undirected graph whose set of nodes is N = \Sigma^n (set of all n-strings over alphabet \Sigma), and whose set of edges is E \subset \Sigma^n \times \Sigma^n. Two nodes are connected by an edge if their Hamming distance is equal to 1. Hence the total number of nodes in G is |N| = |\Sigma^n| = 26^n, and the total number of edges in G is

|E| = \displaystyle\frac{n | \Sigma^n|}{2} \left(| \Sigma| - 1\right) = \displaystyle\frac{n}{2} | \Sigma|^n\left(| \Sigma| - 1\right),

because each node has (|\Sigma| - 1) n neighbors, there are |\Sigma^n| nodes, and each edge is counted twice so we need to divide the total by 2. Note that G is a regular graph (of degree |\Sigma| - 1) because all nodes have the same number of neighbors.

For example, let n=3 and let us consider the 3-string (\text{c}, \text{a}, \text{r}), which has (|\Sigma| - 1) n = 75 neighbors. Out of these 75 words, I could count 21 words which actually mean something in the English language:

bar, ear, far, gar, jar, mar, oar, par, tar, war, yar, cor, cur, cab, cad, cam, can, cap, cat, caw, cay.

Please let me know if I forgot to include any word in this list. These words form a sub-graph of the graph of all 3-strings. Note that the words

bar, car, ear, far, gar, jar, mar, oar, par, tar, war, yar

are all neighbors of one another, and hence, they form a clique of size 12. The words

car, cor, cur

form a clique of size 3. The words

cab, cad, cam, can, cap, car, cat, caw, cay

are all neighbors of one another as well, and thus they form a clique of size 9. The subgraph formed by the word “car” and its neighbors which make sense in the English language is illustrated below, where the three cliques are easy to identify

Undirected Word Graph with 3 Cliques

[ graph generated with Graphviz ]

It is not practical to depict the whole graph of 3-strings over alphabet \Sigma since it contains |\Sigma^3| = |\Sigma|^3 = 26^3 = 17576 nodes and |E| = 659100 edges!

__________

Python code

In case you are wondering: no, I did not type the 26 symbols of the Latin alphabet in \LaTeX. That would have been way too cumbersome. All it took was one single line of Python code:


", ".join([("\text{" + chr(i) + "}") for i in range(ord('a'),ord('z')+1)])

Suppose we would like to find all strings within Hamming distance of 1 of string (\text{c}, \text{a}, \text{t}). The following Python script automates the process:


def MutateString(s, alphabet, index):
    li = []
    for ch in alphabet:
        if ch != s[index]:
            li.append(s[:index] + ch + s[index+1:])
    return li

# build Latin alphabet (lowercase symbols only)
Sigma = [chr(i) for i in range(ord('a'),ord('z')+1)]

# obtain list of mutated strings
print "Mutate 1st symbol:\n" + ", ".join(MutateString('cat', Sigma, 0)) + '\n'
print "Mutate 2nd symbol:\n" + ", ".join(MutateString('cat', Sigma, 1)) + '\n'
print "Mutate 3rd symbol:\n" + ", ".join(MutateString('cat', Sigma, 2)) + '\n'

The graph of the word “car” and its neighbors was generated also by a Python script using the GvGen library.


def HammingDistance(s1, s2):
    """Computes the Hamming distance between strings s1 and s2"""
    assert len(s1) == len(s2)
    return sum(ch1 != ch2 for ch1, ch2 in zip(s1, s2))

import gvgen

# list of words
words = ['car', 'bar', 'ear', 'far', 'gar', 'jar', 'mar', 'oar', 'par', 'tar', 'war', 'yar', 'cor', 'cur', 'cab', 'cad', 'cam', 'can', 'cap', 'cat', 'caw', 'cay']

# creates the new graph instance
graph = gvgen.GvGen()
graph.smart_mode = 1

# creates graph nodes
nodes = []
for elem in words:
    nodes.append(graph.newItem(elem))

# creates graph edges
for i in nodes:
    for j in nodes:

        # extracts the labels of each node
        node1 = i['properties']['label']
        node2 = j['properties']['label']

        # links nodes i and j if the Hamming
        # distance between their labels is 1
        if HammingDistance(node1,node2) == 1:
            graph.newLink(i,j)

# outputs the Graphviz code to a .dot file
filename = "test.dot"
FileOut = open(filename,"w")
graph.dot(FileOut)
FileOut.close()

The script outputs a DOT file which can be read by Graphviz. In particular, I used GVedit (with layout engine = circo) to generate the diagram depicted above.

Remark: these snippets of code worked with Python 2.5.2. I can not guarantee that they work with other versions.

Composing music with Noteflight

January 16, 2009 by Rod Carvalho

A start-up company named Noteflight has created a web application that allows one to compose and play music within one’s web browser. For example, here’s a sheet of Mozart’s Piano Sonata No. 11 (Klaviersonate Nr. 11):

Mozart's Klaviersonate Nr.11

You can play it and it sounds quite good for a web app.

This is an interesting idea: not only can one write music without having to buy an application, but one can also share one’s creations with the entire community of Noteflight’s users. You can use the web app for free, although you need to create an account if you want to compose music (no sign up is required if you just want to play music).

This web app raises some interesting questions regarding copyright infringement. If I write a sheet of a song which was composed by someone else, am I violating any copyright law? If I share that music sheet with other Noteflight users, am I doing anything illegal? Will Noteflight be held liable for promoting the distribution of copyrighted material?

Colored Water Figures

January 1, 2009 by Rod Carvalho

Fotoopa is a photographer from Belgium who has shot amazing high speed photos of dancing and splashing colored water drops. Here’s an example of a droplet captured in mid-splash in vivid detail:

Colored Waterfigure - by Fotoopa

[ photo courtesy of fotoopa ]

Apart from the standard photographic equipment (camera, lenses, multiple flashes), fotoopa uses some high-tech hardware more commonly found on a physics research lab: lasers, LEDs, homemade electronics, power amplifiers, microcontrollers, FPGAs. It’s very, very impressive indeed.

Distance between two lines

December 28, 2008 by Rod Carvalho

Suppose we are given two lines in \mathbb{R}^n, \mathcal{L}_1 and \mathcal{L}_2. Line \mathcal{L}_1 passes through the point x_0 \in \mathbb{R}^n and its direction is given by vector u \in \mathbb{R}^n \setminus \{0_n\}

\mathcal{L}_1 = \left\{ x_0 + \lambda u \mid \lambda \in \mathbb{R} \right\}.

Line \mathcal{L}_2 passes through the point y_0 \in \mathbb{R}^n and its direction is given by vector v \in \mathbb{R}^n \setminus \{0_n\}

\mathcal{L}_2 = \left\{ y_0 + \gamma v \mid \gamma \in \mathbb{R} \right\}.

What is the distance between lines \mathcal{L}_1 and \mathcal{L}_2?

__________

Lines as functions

Let vector-valued functions x : \mathbb{R} \rightarrow \mathbb{R}^n and y : \mathbb{R} \rightarrow \mathbb{R}^n be defined as

x(\lambda) = x_0 + \lambda u, \quad{} y(\gamma) = y_0 + \gamma v.

Lines \mathcal{L}_1 and \mathcal{L}_2 are thus the images of set \mathbb{R} under functions x : \mathbb{R} \rightarrow \mathbb{R}^n and y : \mathbb{R} \rightarrow \mathbb{R}^n, respectively

\mathcal{L}_1 = \left\{ x(\lambda) \mid \lambda \in \mathbb{R} \right\}, \quad{} \mathcal{L}_2 = \left\{ y(\gamma) \mid \gamma \in \mathbb{R} \right\}.

Function x: \mathbb{R} \rightarrow \mathbb{R}^n maps each \lambda \in  \mathbb{R} into point x(\lambda) \in \mathcal{L}_1. Therefore, choosing a particular value of \lambda corresponds to choosing a point on line \mathcal{L}_1, and vice versa (because function x is injective). Thus, each point on the line \mathcal{L}_1 can be unambiguously specified by a value of \lambda. Similarly, following the same line of thought, each point on the line \mathcal{L}_2 can be unambiguously specified by a value of \gamma.

__________

Distance between two points

In order to compute the distance between lines \mathcal{L}_1 and \mathcal{L}_2, we first need to know how to compute the distance between two points in \mathbb{R}^n, i.e., we need to choose a metric on \mathbb{R}^n. For the sake of simplicity, we will use the Euclidean metric, i.e., we will use the 2-norm to measure lengths and distances. In other words, we will be working in normed vector space (\mathbb{R}^n,  \| \cdot \|_2).

Let us fix parameters \lambda and \gamma, i.e., \lambda = \bar{\lambda} and \gamma = \bar{\gamma}, and compute the distance between fixed points x(\bar{\lambda}) and y(\bar{\gamma})

\textbf{dist} ( x(\bar{\lambda}), y(\bar{\gamma}) ) = \| x( \bar{\lambda} ) - y(\bar{\gamma} )\|_2.

Pictorially, we have that \textbf{dist} ( x(\bar{\lambda}), y(\bar{\gamma}) ) is the length of the line segment connecting fixed points x(\bar{\lambda}) and y(\bar{\gamma})

Distance between 2 fixed points

To each pair of parameters (\lambda, \gamma), it is assigned a non-negative scalar \textbf{dist} ( x(\lambda), y(\gamma) )

(\lambda, \gamma) \mapsto ( x(\lambda), y(\gamma) ) \mapsto \textbf{dist} ( x(\lambda), y(\gamma) ).

__________

Distance between a point and a line

Suppose now that we vary \gamma, while \lambda remains fixed, \lambda = \bar{\lambda}. As we vary \gamma, we travel along line \mathcal{L}_2 and hence the distance between free point y(\gamma) \in \mathcal{L}_2 and fixed point x(\bar{\lambda}) \in \mathcal{L}_1 is given by

\textbf{dist} (x(\bar{\lambda}), y(\gamma)) = \| x(\bar{\lambda}) - y(\gamma)\|_2.

Since \gamma is now free, \textbf{dist} (x(\bar{\lambda}), y(\gamma)) is a function of \gamma.

For example, let us pick three distinct values for \gamma, say \gamma_1, \gamma_2, \gamma_3. The image below depicts the line segments connecting fixed point x(\bar{\lambda}) \in \mathcal{L}_1 and points y(\gamma_1), y(\gamma_2), y(\gamma_3) \in \mathcal{L}_2. The length of these line segments gives us the distance between the fixed point and each of the three points on line \mathcal{L}_2.

Distances between a fixed point in L1 and three points in L2

The distance between fixed point x(\bar{\lambda}) \in \mathcal{L}_1 and line \mathcal{L}_2 is thus

\textbf{dist} (x(\bar{\lambda}), \mathcal{L}_2) = \displaystyle \min_{\gamma \in \mathbb{R}} \| x(\bar{\lambda}) - y(\gamma) \|_2.

We have an optimization problem: we would like to find the value of parameter \gamma \in \mathbb{R} which minimizes function \textbf{dist} (x(\bar{\lambda}), y(\gamma)). This minimizer is (see appendix A)

\gamma^* = \displaystyle \arg \min_{\gamma \in \mathbb{R}} \| x(\bar{\lambda}) - y(\gamma) \|_2 \Rightarrow \gamma^* = \displaystyle \frac{v^T  ( x(\bar{\lambda}) - y_0 )}{ v^T v}.

Evaluating function y at \gamma^* we obtain y( \gamma^*), the point on line \mathcal{L}_2 that is closest to fixed point x(\bar{\lambda}) \in \mathcal{L}_1

y( \gamma^*) = \displaystyle y_0 + P_v  ( x(\bar{\lambda}) - y_0 ),

where

P_v = \displaystyle \frac{v v^T}{v^T v}

is a symmetric, idempotent (P_v^2 = P_v) projection matrix, and P_v ( x(\bar{\lambda}) - y_0 ) is the orthogonal projection of vector x(\bar{\lambda}) - y_0 onto vector v. It can be shown that vectors  x(\bar{\lambda}) - y( \gamma^*) and v are orthogonal (see appendix B).

For example, let us consider the n=3 case: we have two lines in \mathbb{R}^3, \mathcal{L}_1 and \mathcal{L}_2, where \mathcal{L}_2 lies on plane \mathcal{P}. Pictorially, we have

Two lines in 3-space

We would like to find the point on line \mathcal{L}_2 that is closest to fixed point x(\bar{\lambda}) \in \mathcal{L}_1. Projecting vector x(\bar{\lambda}) - y_0 orthogonally onto vector v, we obtain

Two lines in 3-space - distance between fixed point in L1 and L2The image above illustrates rather nicely that vectors  x(\bar{\lambda}) - y( \gamma^*) and v are orthogonal (see appendix B). The distance between fixed point x(\bar{\lambda}) and line \mathcal{L}_2 is

\textbf{dist} (x(\bar{\lambda}), \mathcal{L}_2) = \| x(\bar{\lambda}) - y( \gamma^*) \|_2,

where y( \gamma^*) = \displaystyle y_0 + P_v  ( x(\bar{\lambda}) - y_0 ). Please note that plane \mathcal{P} serves no purpose other than to make it easier to visualize in 3-d (plane \mathcal{P} gives an impression of depth). In other words, plane \mathcal{P} has some “artistic value”, but no “mathematical value”.

__________

Distance between two lines

Let us now vary both \lambda and \gamma. As we vary \lambda and \gamma, we travel along lines \mathcal{L}_1 and \mathcal{L}_2, respectively. Since both parameters are free, the distance between free points x(\lambda) \in \mathcal{L}_1 and y(\gamma) \in \mathcal{L}_2 is

\textbf{dist} (x(\lambda), y(\gamma)) = \| x(\lambda) - y(\gamma) \|_2,

which is a function of both \lambda and \gamma. The distance between lines \mathcal{L}_1 and \mathcal{L}_2 is thus

\textbf{dist} (\mathcal{L}_1, \mathcal{L}_2) = \displaystyle \min_{(\lambda, \gamma) \in \mathbb{R}^2} \| x(\lambda) - y(\gamma) \|_2.

The original problem of computing the distance between two lines has thus been cast as an optimization problem: we would like to find the pair (\lambda, \gamma) \in \mathbb{R}^2 which minimizes  \textbf{dist} (x(\lambda), y(\gamma)). Once we have found the pair of minimizers (\lambda^*, \gamma^*), the distance between \mathcal{L}_1 and \mathcal{L}_2 is

\textbf{dist} (\mathcal{L}_1, \mathcal{L}_2) = \displaystyle \| x(\lambda^*) - y(\gamma^*) \|_2.

The pair (\lambda^*, \gamma^*) is not necessarily unique. For example, if the lines are parallel to each other, then there are infinitely many pairs of minimizers (\lambda^*, \gamma^*).

Trying to find the pair (\lambda, \gamma) \in \mathbb{R}^2 which minimizes \textbf{dist} (x(\lambda), y(\gamma)) is a one-stage 2-dimensional optimization problem. However, we can solve this optimization problem in two stages, where at each stage we must solve a 1-dimensional optimization problem

\textbf{dist} (\mathcal{L}_1, \mathcal{L}_2) = \displaystyle \min_{\lambda \in \mathbb{R}} \min_{\gamma \in \mathbb{R}} \| x(\lambda) - y(\gamma) \|_2.

Note that

\displaystyle \min_{\gamma \in \mathbb{R}} \| x(\lambda) - y(\gamma) \|_2 = \textbf{dist} (x(\lambda), \mathcal{L}_2)

is the distance between point x(\lambda) and line \mathcal{L}_2, and thus

\textbf{dist} (\mathcal{L}_1, \mathcal{L}_2) = \displaystyle \min_{\lambda \in \mathbb{R}} \textbf{dist} (x(\lambda), \mathcal{L}_2).

We already know how to compute the distance between a point on line \mathcal{L}_1 and line \mathcal{L}_2: given a value of \lambda, the choice of \gamma which minimizes \textbf{dist} (x(\lambda), y(\gamma)) is \gamma = \Gamma (\lambda), where function \Gamma : \mathbb{R} \rightarrow \mathbb{R} is defined as (see appendix A)

\Gamma (\lambda) =  \displaystyle \frac{v^T  ( x(\lambda) - y_0 )}{ v^T v},

and the point on line \mathcal{L}_2 which is closest to x(\lambda) is

y(\Gamma (\lambda)) = y_0 + P_v \left( x(\lambda) - y_0 \right).

Therefore, for a given value of \lambda, we have that the distance between x(\lambda) and line \mathcal{L}_2 is

\textbf{dist} (x(\lambda), \mathcal{L}_2) = \textbf{dist} (x(\lambda), y(\Gamma(\lambda))) = \| x(\lambda) - y( \Gamma (\lambda)) \|_2.

Note that \textbf{dist} (x(\lambda), \mathcal{L}_2) is a function of \lambda only, as we have already optimized function \textbf{dist} (x(\lambda), y(\gamma)) for \gamma. Finally, the distance between lines \mathcal{L}_1 and \mathcal{L}_2 can be found by solving a 1-dimensional optimization problem in \lambda

\textbf{dist} (\mathcal{L}_1, \mathcal{L}_2) = \displaystyle \min_{\lambda \in \mathbb{R}} \|x(\lambda) - y( \Gamma (\lambda) )\|_2.

Once we have found the \lambda^* which minimizes the distance \| x(\lambda) - y( \Gamma (\lambda)) \|_2, the distance between lines \mathcal{L}_1 and \mathcal{L}_2 is simply

\textbf{dist} (\mathcal{L}_1, \mathcal{L}_2) = \displaystyle \| x(\lambda^*) - y( \Gamma (\lambda^*)) \|_2.

Note that:

  • \textbf{dist} (\mathcal{L}_1, \mathcal{L}_2) = 0: lines \mathcal{L}_1 and \mathcal{L}_2 do intersect.
  • \textbf{dist} (\mathcal{L}_1, \mathcal{L}_2) > 0: lines \mathcal{L}_1 and \mathcal{L}_2 do not intersect.

The value of \textbf{dist} (\mathcal{L}_1, \mathcal{L}_2) can be written in terms of points x_0, y_0 and direction vectors u, v, as I will show on some future posts.

__________

The shortest distance game

It might seem a bit obscure that the one-stage 2-dimensional optimization problem

\textbf{dist} (\mathcal{L}_1, \mathcal{L}_2) = \displaystyle \min_{(\lambda, \gamma) \in \mathbb{R}^2} \| x(\lambda) - y(\gamma) \|_2

is equivalent to a two-stage optimization problem where at each stage one must solve a 1-dimensional optimization problem

\textbf{dist} (\mathcal{L}_1, \mathcal{L}_2) = \displaystyle \min_{\lambda \in \mathbb{R}} \min_{\gamma \in \mathbb{R}} \| x(\lambda) - y(\gamma) \|_2.

Personally, I like to look at this from a game-theoretic viewpoint. Let us play a sequential game! We shall call the shortest distance game. I play first, then you play:

  1. I choose a value for \lambda \in \mathbb{R}, I tell you what \lambda I chose.
  2. You then choose \gamma = \Gamma (\lambda) such that \textbf{dist} (x(\lambda), y(\gamma)) is minimized.

Note that you are required to play optimally, i.e., to minimize distance. Hence you have no freedom in the choice of \gamma: you choose \gamma as a response to my choice of \lambda. Your choice is now coupled to mine, and thus, we are left with only one degree of freedom, the choice of \lambda. I now choose the value of \lambda which minimizes  \| x(\lambda) - y( \Gamma (\lambda)) \|_2

\textbf{dist} (\mathcal{L}_1, \mathcal{L}_2) = \displaystyle \min_{\lambda \in \mathbb{R}} \| x(\lambda) - y( \Gamma (\lambda)) \|_2.

Fortunately, I can use calculus to find the minimizer \lambda^*. Trying all values in \mathbb{R} for parameter \lambda would be rather tedious ;-)

__________

Appendix A

Note that the value of parameter \gamma which minimizes the distance \| x(\bar{\lambda}) - y(\gamma) \|_2 is the same that minimizes the squared distance \| x(\bar{\lambda}) - y(\gamma) \|_2^2. Finding the minimizer of the squared norm is much easier, thus we introduce a cost function C : \mathbb{R} \rightarrow [0, \infty) defined as

C(\gamma) = \displaystyle \frac{1}{2} \| x(\bar{\lambda}) - y(\gamma) \|_2^2.

Differentiating with respect to \gamma, we obtain

\begin{array}{rl} \displaystyle \frac{d C}{d \gamma} (\gamma) &= \displaystyle ( x(\bar{\lambda}) - y(\gamma) )^T (- v)\\ &=  \displaystyle v^T ( y(\gamma) - x(\bar{\lambda}) ) \\ &= \displaystyle v^T \left( \gamma v - ( x(\bar{\lambda}) - y_0 ) \right) \\ &=\displaystyle \|v\|_2^2 \gamma - v^T ( x(\bar{\lambda}) - y_0 ).\\ \end{array}

The first derivative of C(\cdot) must vanish at extrema. From the geometry of the problem, we know that there are no maxima (note that the cost function is unbounded above). Therefore we can find minima by checking the first derivative alone, without having to check the second derivative. We have

\displaystyle \frac{d C}{d \gamma} (\gamma^*) = 0 \Rightarrow \displaystyle \|v\|_2^2 \gamma^* - v^T ( x(\bar{\lambda}) - y_0 ),

and since v \neq 0_n, the minimizer is

\gamma^* =  \displaystyle \frac{v^T  ( x(\bar{\lambda}) - y_0 )}{ v^T v}.

Note that this minimizer depends on the value of fixed parameter \bar{\lambda}.

__________

Appendix B

We would like to show that vectors x(\bar{\lambda}) - y( \gamma^*) and v are orthogonal. Recall that y( \gamma^*) = \displaystyle y_0 + P_v ( x(\bar{\lambda}) - y_0 ), and thus

\displaystyle x(\bar{\lambda}) - y( \gamma^*) = ( I_n - P_v) \left(x(\bar{\lambda}) - y_0\right).

Multiplying both sides on the left by P_v, we obtain

\displaystyle P_v \left(x(\bar{\lambda}) - y( \gamma^*)\right) = P_v \left( I_n - P_v\right) \left(x(\bar{\lambda}) - y_0\right)

and since P_v (I_n - P_v) = P_v - P_v^2 = O_{n \times n} (recall that P_v^2 = P_v), then we finally have

P_v \left( x(\bar{\lambda}) - y( \gamma^*) \right) = 0_n,

which means that vectors x(\bar{\lambda}) - y( \gamma^*) and v are orthogonal.

__________

Related:

Discrete Mechanics and Optimal Control

December 22, 2008 by Rod Carvalho

Last November 25, Prof. Jerrold Marsden (Caltech) was at UC Berkeley to give a lecture on Discrete Mechanics and Optimal Control (flyer / video):

If you are interested in Discrete Mechanics and Optimal Control, I suggest you take a look at these papers:

Related: