## Archive for the ‘Optimization’ Category

### Distance between two lines

December 28, 2008

Suppose we are given two lines in $\mathbb{R}^n$, $\mathcal{L}_1$ and $\mathcal{L}_2$. Line $\mathcal{L}_1$ passes through the point $x_0 \in \mathbb{R}^n$ and its direction is given by vector $u \in \mathbb{R}^n \setminus \{0_n\}$

$\mathcal{L}_1 = \left\{ x_0 + \lambda u \mid \lambda \in \mathbb{R} \right\}$.

Line $\mathcal{L}_2$ passes through the point $y_0 \in \mathbb{R}^n$ and its direction is given by vector $v \in \mathbb{R}^n \setminus \{0_n\}$

$\mathcal{L}_2 = \left\{ y_0 + \gamma v \mid \gamma \in \mathbb{R} \right\}$.

What is the distance between lines $\mathcal{L}_1$ and $\mathcal{L}_2$?

__________

Lines as functions

Let vector-valued functions $x : \mathbb{R} \rightarrow \mathbb{R}^n$ and $y : \mathbb{R} \rightarrow \mathbb{R}^n$ be defined as

$x(\lambda) = x_0 + \lambda u, \quad{} y(\gamma) = y_0 + \gamma v$.

Lines $\mathcal{L}_1$ and $\mathcal{L}_2$ are thus the images of set $\mathbb{R}$ under functions $x : \mathbb{R} \rightarrow \mathbb{R}^n$ and $y : \mathbb{R} \rightarrow \mathbb{R}^n$, respectively

$\mathcal{L}_1 = \left\{ x(\lambda) \mid \lambda \in \mathbb{R} \right\}, \quad{} \mathcal{L}_2 = \left\{ y(\gamma) \mid \gamma \in \mathbb{R} \right\}$.

Function $x: \mathbb{R} \rightarrow \mathbb{R}^n$ maps each $\lambda \in \mathbb{R}$ into point $x(\lambda) \in \mathcal{L}_1$. Therefore, choosing a particular value of $\lambda$ corresponds to choosing a point on line $\mathcal{L}_1$, and vice versa (because function $x$ is injective). Thus, each point on the line $\mathcal{L}_1$ can be unambiguously specified by a value of $\lambda$. Similarly, following the same line of thought, each point on the line $\mathcal{L}_2$ can be unambiguously specified by a value of $\gamma$.

__________

Distance between two points

In order to compute the distance between lines $\mathcal{L}_1$ and $\mathcal{L}_2$, we first need to know how to compute the distance between two points in $\mathbb{R}^n$, i.e., we need to choose a metric on $\mathbb{R}^n$. For the sake of simplicity, we will use the Euclidean metric, i.e., we will use the 2-norm to measure lengths and distances. In other words, we will be working in normed vector space $(\mathbb{R}^n, \| \cdot \|_2)$.

Let us fix parameters $\lambda$ and $\gamma$, i.e., $\lambda = \bar{\lambda}$ and $\gamma = \bar{\gamma}$, and compute the distance between fixed points $x(\bar{\lambda})$ and $y(\bar{\gamma})$

$\textbf{dist} ( x(\bar{\lambda}), y(\bar{\gamma}) ) = \| x( \bar{\lambda} ) - y(\bar{\gamma} )\|_2$.

Pictorially, we have that $\textbf{dist} ( x(\bar{\lambda}), y(\bar{\gamma}) )$ is the length of the line segment connecting fixed points $x(\bar{\lambda})$ and $y(\bar{\gamma})$

To each pair of parameters $(\lambda, \gamma)$, it is assigned a non-negative scalar $\textbf{dist} ( x(\lambda), y(\gamma) )$

$(\lambda, \gamma) \mapsto ( x(\lambda), y(\gamma) ) \mapsto \textbf{dist} ( x(\lambda), y(\gamma) )$.

__________

Distance between a point and a line

Suppose now that we vary $\gamma$, while $\lambda$ remains fixed, $\lambda = \bar{\lambda}$. As we vary $\gamma$, we travel along line $\mathcal{L}_2$ and hence the distance between free point $y(\gamma) \in \mathcal{L}_2$ and fixed point $x(\bar{\lambda}) \in \mathcal{L}_1$ is given by

$\textbf{dist} (x(\bar{\lambda}), y(\gamma)) = \| x(\bar{\lambda}) - y(\gamma)\|_2$.

Since $\gamma$ is now free, $\textbf{dist} (x(\bar{\lambda}), y(\gamma))$ is a function of $\gamma$.

For example, let us pick three distinct values for $\gamma$, say $\gamma_1, \gamma_2, \gamma_3$. The image below depicts the line segments connecting fixed point $x(\bar{\lambda}) \in \mathcal{L}_1$ and points $y(\gamma_1), y(\gamma_2), y(\gamma_3) \in \mathcal{L}_2$. The length of these line segments gives us the distance between the fixed point and each of the three points on line $\mathcal{L}_2$.

The distance between fixed point $x(\bar{\lambda}) \in \mathcal{L}_1$ and line $\mathcal{L}_2$ is thus

$\textbf{dist} (x(\bar{\lambda}), \mathcal{L}_2) = \displaystyle \min_{\gamma \in \mathbb{R}} \| x(\bar{\lambda}) - y(\gamma) \|_2$.

We have an optimization problem: we would like to find the value of parameter $\gamma \in \mathbb{R}$ which minimizes function $\textbf{dist} (x(\bar{\lambda}), y(\gamma))$. This minimizer is (see appendix A)

$\gamma^* = \displaystyle \arg \min_{\gamma \in \mathbb{R}} \| x(\bar{\lambda}) - y(\gamma) \|_2 \Rightarrow \gamma^* = \displaystyle \frac{v^T ( x(\bar{\lambda}) - y_0 )}{ v^T v}$.

Evaluating function $y$ at $\gamma^*$ we obtain $y( \gamma^*)$, the point on line $\mathcal{L}_2$ that is closest to fixed point $x(\bar{\lambda}) \in \mathcal{L}_1$

$y( \gamma^*) = \displaystyle y_0 + P_v ( x(\bar{\lambda}) - y_0 )$,

where

$P_v = \displaystyle \frac{v v^T}{v^T v}$

is a symmetric, idempotent ($P_v^2 = P_v$) projection matrix, and $P_v ( x(\bar{\lambda}) - y_0 )$ is the orthogonal projection of vector $x(\bar{\lambda}) - y_0$ onto vector $v$. It can be shown that vectors  $x(\bar{\lambda}) - y( \gamma^*)$ and $v$ are orthogonal (see appendix B).

For example, let us consider the $n=3$ case: we have two lines in $\mathbb{R}^3$, $\mathcal{L}_1$ and $\mathcal{L}_2$, where $\mathcal{L}_2$ lies on plane $\mathcal{P}$. Pictorially, we have

We would like to find the point on line $\mathcal{L}_2$ that is closest to fixed point $x(\bar{\lambda}) \in \mathcal{L}_1$. Projecting vector $x(\bar{\lambda}) - y_0$ orthogonally onto vector $v$, we obtain

The image above illustrates rather nicely that vectors  $x(\bar{\lambda}) - y( \gamma^*)$ and $v$ are orthogonal (see appendix B). The distance between fixed point $x(\bar{\lambda})$ and line $\mathcal{L}_2$ is

$\textbf{dist} (x(\bar{\lambda}), \mathcal{L}_2) = \| x(\bar{\lambda}) - y( \gamma^*) \|_2$,

where $y( \gamma^*) = \displaystyle y_0 + P_v ( x(\bar{\lambda}) - y_0 )$. Please note that plane $\mathcal{P}$ serves no purpose other than to make it easier to visualize in 3-d (plane $\mathcal{P}$ gives an impression of depth). In other words, plane $\mathcal{P}$ has some “artistic value”, but no “mathematical value”.

__________

Distance between two lines

Let us now vary both $\lambda$ and $\gamma$. As we vary $\lambda$ and $\gamma$, we travel along lines $\mathcal{L}_1$ and $\mathcal{L}_2$, respectively. Since both parameters are free, the distance between free points $x(\lambda) \in \mathcal{L}_1$ and $y(\gamma) \in \mathcal{L}_2$ is

$\textbf{dist} (x(\lambda), y(\gamma)) = \| x(\lambda) - y(\gamma) \|_2$,

which is a function of both $\lambda$ and $\gamma$. The distance between lines $\mathcal{L}_1$ and $\mathcal{L}_2$ is thus

$\textbf{dist} (\mathcal{L}_1, \mathcal{L}_2) = \displaystyle \min_{(\lambda, \gamma) \in \mathbb{R}^2} \| x(\lambda) - y(\gamma) \|_2$.

The original problem of computing the distance between two lines has thus been cast as an optimization problem: we would like to find the pair $(\lambda, \gamma) \in \mathbb{R}^2$ which minimizes $\textbf{dist} (x(\lambda), y(\gamma))$. Once we have found the pair of minimizers $(\lambda^*, \gamma^*)$, the distance between $\mathcal{L}_1$ and $\mathcal{L}_2$ is

$\textbf{dist} (\mathcal{L}_1, \mathcal{L}_2) = \displaystyle \| x(\lambda^*) - y(\gamma^*) \|_2$.

The pair $(\lambda^*, \gamma^*)$ is not necessarily unique. For example, if the lines are parallel to each other, then there are infinitely many pairs of minimizers $(\lambda^*, \gamma^*)$.

Trying to find the pair $(\lambda, \gamma) \in \mathbb{R}^2$ which minimizes $\textbf{dist} (x(\lambda), y(\gamma))$ is a one-stage 2-dimensional optimization problem. However, we can solve this optimization problem in two stages, where at each stage we must solve a 1-dimensional optimization problem

$\textbf{dist} (\mathcal{L}_1, \mathcal{L}_2) = \displaystyle \min_{\lambda \in \mathbb{R}} \min_{\gamma \in \mathbb{R}} \| x(\lambda) - y(\gamma) \|_2$.

Note that

$\displaystyle \min_{\gamma \in \mathbb{R}} \| x(\lambda) - y(\gamma) \|_2 = \textbf{dist} (x(\lambda), \mathcal{L}_2)$

is the distance between point $x(\lambda)$ and line $\mathcal{L}_2$, and thus

$\textbf{dist} (\mathcal{L}_1, \mathcal{L}_2) = \displaystyle \min_{\lambda \in \mathbb{R}} \textbf{dist} (x(\lambda), \mathcal{L}_2)$.

We already know how to compute the distance between a point on line $\mathcal{L}_1$ and line $\mathcal{L}_2$: given a value of $\lambda$, the choice of $\gamma$ which minimizes $\textbf{dist} (x(\lambda), y(\gamma))$ is $\gamma = \Gamma (\lambda)$, where function $\Gamma : \mathbb{R} \rightarrow \mathbb{R}$ is defined as (see appendix A)

$\Gamma (\lambda) = \displaystyle \frac{v^T ( x(\lambda) - y_0 )}{ v^T v}$,

and the point on line $\mathcal{L}_2$ which is closest to $x(\lambda)$ is

$y(\Gamma (\lambda)) = y_0 + P_v \left( x(\lambda) - y_0 \right)$.

Therefore, for a given value of $\lambda$, we have that the distance between $x(\lambda)$ and line $\mathcal{L}_2$ is

$\textbf{dist} (x(\lambda), \mathcal{L}_2) = \textbf{dist} (x(\lambda), y(\Gamma(\lambda))) = \| x(\lambda) - y( \Gamma (\lambda)) \|_2$.

Note that $\textbf{dist} (x(\lambda), \mathcal{L}_2)$ is a function of $\lambda$ only, as we have already optimized function $\textbf{dist} (x(\lambda), y(\gamma))$ for $\gamma$. Finally, the distance between lines $\mathcal{L}_1$ and $\mathcal{L}_2$ can be found by solving a 1-dimensional optimization problem in $\lambda$

$\textbf{dist} (\mathcal{L}_1, \mathcal{L}_2) = \displaystyle \min_{\lambda \in \mathbb{R}} \|x(\lambda) - y( \Gamma (\lambda) )\|_2$.

Once we have found the $\lambda^*$ which minimizes the distance $\| x(\lambda) - y( \Gamma (\lambda)) \|_2$, the distance between lines $\mathcal{L}_1$ and $\mathcal{L}_2$ is simply

$\textbf{dist} (\mathcal{L}_1, \mathcal{L}_2) = \displaystyle \| x(\lambda^*) - y( \Gamma (\lambda^*)) \|_2$.

Note that:

• $\textbf{dist} (\mathcal{L}_1, \mathcal{L}_2) = 0$: lines $\mathcal{L}_1$ and $\mathcal{L}_2$ do intersect.
• $\textbf{dist} (\mathcal{L}_1, \mathcal{L}_2) > 0$: lines $\mathcal{L}_1$ and $\mathcal{L}_2$ do not intersect.

The value of $\textbf{dist} (\mathcal{L}_1, \mathcal{L}_2)$ can be written in terms of points $x_0, y_0$ and direction vectors $u, v$, as I will show on some future posts.

__________

The shortest distance game

It might seem a bit obscure that the one-stage 2-dimensional optimization problem

$\textbf{dist} (\mathcal{L}_1, \mathcal{L}_2) = \displaystyle \min_{(\lambda, \gamma) \in \mathbb{R}^2} \| x(\lambda) - y(\gamma) \|_2$

is equivalent to a two-stage optimization problem where at each stage one must solve a 1-dimensional optimization problem

$\textbf{dist} (\mathcal{L}_1, \mathcal{L}_2) = \displaystyle \min_{\lambda \in \mathbb{R}} \min_{\gamma \in \mathbb{R}} \| x(\lambda) - y(\gamma) \|_2$.

Personally, I like to look at this from a game-theoretic viewpoint. Let us play a sequential game! We shall call the shortest distance game. I play first, then you play:

1. I choose a value for $\lambda \in \mathbb{R}$, I tell you what $\lambda$ I chose.
2. You then choose $\gamma = \Gamma (\lambda)$ such that $\textbf{dist} (x(\lambda), y(\gamma))$ is minimized.

Note that you are required to play optimally, i.e., to minimize distance. Hence you have no freedom in the choice of $\gamma$: you choose $\gamma$ as a response to my choice of $\lambda$. Your choice is now coupled to mine, and thus, we are left with only one degree of freedom, the choice of $\lambda$. I now choose the value of $\lambda$ which minimizes $\| x(\lambda) - y( \Gamma (\lambda)) \|_2$

$\textbf{dist} (\mathcal{L}_1, \mathcal{L}_2) = \displaystyle \min_{\lambda \in \mathbb{R}} \| x(\lambda) - y( \Gamma (\lambda)) \|_2$.

Fortunately, I can use calculus to find the minimizer $\lambda^*$. Trying all values in $\mathbb{R}$ for parameter $\lambda$ would be rather tedious ;-)

__________

Appendix A

Note that the value of parameter $\gamma$ which minimizes the distance $\| x(\bar{\lambda}) - y(\gamma) \|_2$ is the same that minimizes the squared distance $\| x(\bar{\lambda}) - y(\gamma) \|_2^2$. Finding the minimizer of the squared norm is much easier, thus we introduce a cost function $C : \mathbb{R} \rightarrow [0, \infty)$ defined as

$C(\gamma) = \displaystyle \frac{1}{2} \| x(\bar{\lambda}) - y(\gamma) \|_2^2$.

Differentiating with respect to $\gamma$, we obtain

$\begin{array}{rl} \displaystyle \frac{d C}{d \gamma} (\gamma) &= \displaystyle ( x(\bar{\lambda}) - y(\gamma) )^T (- v)\\ &= \displaystyle v^T ( y(\gamma) - x(\bar{\lambda}) ) \\ &= \displaystyle v^T \left( \gamma v - ( x(\bar{\lambda}) - y_0 ) \right) \\ &=\displaystyle \|v\|_2^2 \gamma - v^T ( x(\bar{\lambda}) - y_0 ).\\ \end{array}$

The first derivative of $C(\cdot)$ must vanish at extrema. From the geometry of the problem, we know that there are no maxima (note that the cost function is unbounded above). Therefore we can find minima by checking the first derivative alone, without having to check the second derivative. We have

$\displaystyle \frac{d C}{d \gamma} (\gamma^*) = 0 \Rightarrow \displaystyle \|v\|_2^2 \gamma^* - v^T ( x(\bar{\lambda}) - y_0 ) = 0$,

and since $v \neq 0_n$, the minimizer is

$\gamma^* = \displaystyle \frac{v^T ( x(\bar{\lambda}) - y_0 )}{ v^T v}$.

Note that this minimizer depends on the value of fixed parameter $\bar{\lambda}$.

__________

Appendix B

We would like to show that vectors $x(\bar{\lambda}) - y( \gamma^*)$ and $v$ are orthogonal. Recall that $y( \gamma^*) = \displaystyle y_0 + P_v ( x(\bar{\lambda}) - y_0 )$, and thus

$\displaystyle x(\bar{\lambda}) - y( \gamma^*) = ( I_n - P_v) \left(x(\bar{\lambda}) - y_0\right)$.

Multiplying both sides on the left by $P_v$, we obtain

$\displaystyle P_v \left(x(\bar{\lambda}) - y( \gamma^*)\right) = P_v \left( I_n - P_v\right) \left(x(\bar{\lambda}) - y_0\right)$

and since $P_v (I_n - P_v) = P_v - P_v^2 = O_{n \times n}$ (recall that $P_v^2 = P_v$), then we finally have

$P_v \left( x(\bar{\lambda}) - y( \gamma^*) \right) = 0_n$,

which means that vectors $x(\bar{\lambda}) - y( \gamma^*)$ and $v$ are orthogonal.

__________

Related:

### Bandwidth-Sharing Games

September 18, 2008

In chapter 1 of Algorithmic Game Theory, Éva Tardos and Vijay Vazirani, discuss the well-known Tragedy of the Commons in the context of bandwidth-sharing:

Example: Suppose that $n$ players would like to have part of a shared resource: each player wants to send information along a shared channel of known maximum capacity, say $1$. In this game each player has an infinite set of strategies, where player $i$‘s strategy is to send $x_i \in [0,1]$ units of flow along the channel.

Assume that each player would like to have a large fraction of the bandwidth, but assume also that the quality of the channel deteriorates with the total bandwidth used. We will describe this game by a simple model, using a benefit or payoff function for each set of strategies. If the total bandwidth $\sum_j x_j$ exceeds the channel capacity, no player gets any benefit. If $\sum_j x_j < 1$ then the value for player $i$ is $x_i (1 - \sum_j x_j)$. This models exactly the kind of trade-off we have in mind: the benefit for a player deteriorates as the total assigned bandwidth increases, but it increases with his own share (up to a point).

__________

In my humble opinion, the authors’ original wording is a bit confusing, so I took the liberty of slightly rephrasing their example.

The authors’ definition of stable strategies is:

Definition: a set of strategies is stable if all players are playing their optimal selfish strategy, given the strategies of all other players.

Using some quick, back-of-the-envelope calculations, the authors show that the stable strategy for each player $i$ is to send

$x_i = \displaystyle\frac{1}{2}\left(1 - \displaystyle\sum_{j \neq i} x_j\right)$

units of flow. Note that each player’s optimal selfish strategy depends on all other players’ strategies. There’s a unique stable solution: each player $i$ sends

$x_i = \displaystyle\frac{1}{n+1}$

units of flow. The total bandwidth being used is thus

$\displaystyle\sum_{i = 1}^n x_i = \displaystyle\frac{n}{n+1}$,

which is less than the channel capacity.

The authors omit a rigorous derivation of this solution (due to time and editorial constraints, I suppose). Therefore, I figured I could present here a more detailed, step-by-step derivation.

__________

Introductory Remarks

Let $\mathcal{I} = \{1, 2, \ldots, n\}$ be a finite set of players. The set of available actions (pure strategies) for each player $i \in \mathcal{I}$ is $\mathcal{X}_i = [0,1]$. We denote by $x_i \in \mathcal{X}_i$ an action for player $i$, and by

$x_{-i} = (x_1, x_2, \ldots, x_{i-1}, x_{i+1}, \ldots, x_n)$

the vector of actions for all players in $\mathcal{I}$ except $i$. The $n$-tuple $x = (x_1, x_2, \ldots, x_n)$ is an action profile or outcome of the game, i.e., $x$ contains the actions (pure strategies) chosen by all players in $\mathcal{I}$. We denote by

$\mathcal{X} = \displaystyle\prod_{i \in \mathcal{I}} \mathcal{X}_i = [0,1]^n$

the set of action profiles (set of outcomes), and by

$\mathcal{X}_{-i} = \displaystyle\prod_{j \in \mathcal{I} \setminus \{i\}} \mathcal{X}_j = [0,1]^{n-1}$

the set of action profiles for all players except $i$. Note that

$x_i \in \mathcal{X}_i$, $x \in \mathcal{X}$, and $x_{-i} \in \mathcal{X}_{-i}$.

What we have is a one-shot simultaneous move game:

• it’s a one-shot game because there’s only one “round” or “stage”, and thus the players only have one chance to choose their actions.
• it’s a simultaneous move game because all players simultaneously choose an action $x_i$ from their set of available actions $\mathcal{X}_i = [0,1]$.

Each particular outcome benefits each player differently. Given two outcomes in $\mathcal{X}$, player $i$ will (in general) prefer one over the other. This preference can be encoded in each player’s utility function $u_i: \mathcal{X} \rightarrow \mathbb{R}$, which is defined as

$u_i(x) = \left\{\begin{array}{cl} x_i \displaystyle\left(1 - \displaystyle\sum_{j \in \mathcal{I}} x_j\right) & \textrm{if} \quad \sum_j x_j < 1\\ 0 & \textrm{if} \quad{} \sum_j x_j \geq 1\end{array}\right.$

but, for the sake of simplicity, we shall assume that the condition $\sum_j x_j < 1$ holds, so we can write the utility function simply as

$u_i(x) = x_i \displaystyle\left(1 - \displaystyle\sum_{j \in \mathcal{I}} x_j\right)$.

Let $\mathbf{1}_n = (1, 1, \ldots, 1)$ be the $n$-dimensional vector of ones. Then, the utility function for player $i \in \mathcal{I}$ can be written as

$u_i(x) = x_i \displaystyle\left(1 - \mathbf{1}_n^T x\right)$,

where $\mathbf{1}_n^T x$ is the total bandwidth being used. Alternatively, we can consider an utility function $u_i: \mathcal{X}_i \times \mathcal{X}_{-i} \rightarrow \mathbb{R}$ defined as

$u(x_i, x_{-i}) = x_i \left(1 - x_i - \mathbf{1}_{n-1}^T x_{-i}\right)$.

Since this is a simultaneous move game, all players must choose their actions in a simultaneous manner. It’s therefore impossible for any player to know the other players’ actions before choosing his own action.

Assumption: we assume that all players seek to maximize their utility. This implies, of course, that each players knows its utility function.

If player $i$ somehow knew the other players’ actions, his best response would be to choose action $x_i = B_i(x_{-i})$, where mapping $B_i: \mathcal{X}_{-i} \rightarrow \mathcal{X}_i$ is defined as

$B_i(x_{-i}) = \{ x_i \in \mathcal{X}_i \mid u_i(x_i, x_{-i}) \geq u_i(x_i', x_{-i}), \forall x_i' \in \mathcal{X}_i\}$.

In other words, given $x_{-i}$, player $i$‘s best move is to choose action $x_i = B_i(x_{-i})$, so that his utility is maximized. The maximum utility that player $i$ can attain (given $x_{-i}$) is $u_i(B_i(x_{-i}), x_{-i})$.

__________

My Derivation

Note that, given $x_{-i}$, player $i$‘s best response is

$B_i(x_{-i}) = \displaystyle\arg \max_{x_i \in \mathcal{X}_i} u_i(x_i, x_{-i})$.

Assuming that $\sum_j x_j < 1$, we have

$B_i(x_{-i}) = \displaystyle\arg \max_{x_i \in \mathcal{X}_i} \left\{x_i \left(1 - x_i - \mathbf{1}_{n-1}^T x_{-i}\right)\right\}$.

The maximizer is computed by seeking points where the partial derivative of $u_i(x_i, x_{-i})$ with respect to $x_i$ vanishes. Note that

$\displaystyle\frac{\partial u_i}{\partial x_i} (x_i, x_{-i}) = (1 - \mathbf{1}_{n-1}^T x_{-i}) - 2 x_i$,

and the maximixer is the vanishing set of function $\frac{\partial u_i}{\partial x_i} (x_i, x_{-i})$, which yields a unique solution

$x_i = B_i(x_{-i}) = \displaystyle \frac{1 - \mathbf{1}_{n-1}^T x_{-i}}{2}$.

We assumed that all players are utility-maximizers. Let us assume that all players are choosing best response actions, i.e., $x_i = B_i(x_{-i})$ for all $i \in \mathcal{I}$. The vector of action profiles ($x \in \mathcal{X}$) can be computed by solving the following system of linear equations (“linear system”)

$\left[\begin{array}{c}x_1\\ x_2\\ x_3\\ \vdots\\ x_n\end{array}\right] = \displaystyle\frac{1}{2} \left[\begin{array}{c} 1\\ 1\\ 1\\ \vdots\\ 1\end{array}\right] - \displaystyle\frac{1}{2} \left[\begin{array}{ccccc} 0 & 1 & 1 & \ldots & 1\\ 1 & 0 & 1 & \ldots & 1\\ 1 & 1 & 0 & \ldots & 1\\ \vdots & \vdots & \vdots & \ddots & \vdots\\ 1 & 1 & 1 &\ldots & 0 \end{array} \right] \left[\begin{array}{c}x_1\\ x_2\\ x_3\\ \vdots\\ x_n\end{array}\right]$

which can be written more compactly as

$x = \frac{1}{2} \mathbf{1}_n - \frac{1}{2} (\mathbf{1}_n \mathbf{1}_n^T - I_n) x$,

where $I_n$ is the $n \times n$ identity matrix. Rearranging the terms, the system becomes

$(I_n + \mathbf{1}_n \mathbf{1}_n^T) x = \mathbf{1}_n$.

It can be shown that $(I_n + \mathbf{1}_n \mathbf{1}_n^T)$ is invertible (see Appendix A), and therefore, the solution of the linear system exists and it is unique

$x = (I_n + \mathbf{1}_n \mathbf{1}_n^T)^{-1} \mathbf{1}_n$.

We can use a “trick” to solve the linear system. Note that

$(I_n + \mathbf{1}_n \mathbf{1}_n^T)^{-1} (I_n + \mathbf{1}_n \mathbf{1}_n^T) = I_n$

and therefore

$(I_n + \mathbf{1}_n \mathbf{1}_n^T)^{-1} = I_n - (I_n + \mathbf{1}_n \mathbf{1}_n^T)^{-1} \mathbf{1}_n \mathbf{1}_n^T$.

The (unique) solution of the linear system can thus be written as

$\begin{array}{rl} x &= \displaystyle (I_n + \mathbf{1}_n \mathbf{1}_n^T)^{-1} \mathbf{1}_n\\ &= \mathbf{1}_n - \underbrace{(I_n + \mathbf{1}_n \mathbf{1}_n^T)^{-1} \mathbf{1}_n}_{= x}\underbrace{\mathbf{1}_n^T \mathbf{1}_n}_{= n}.\end{array}$

Finally, we have

$(n+1) x = \mathbf{1}_n$,

which yields

$x = \left(\frac{1}{n+1}\right) \mathbf{1}_n$.

When all players choose best response actions, then $x_i = \frac{1}{n+1}$ for each $i \in \mathcal{I}$. This is the same solution which the authors derived using back-of-the-envelope calculations. Note that $\sum_j x_j < 1$ and therefore our assumption that the total bandwidth being used is less than the channel capacity does hold.

__________

Tragedy of the Commons

The value of each player’s utility function is

$u_i(x) \mid_{x = \left(\frac{1}{n+1}\right) \mathbf{1}_n} = \displaystyle\frac{1}{n+1} \left(1 - \frac{n}{n+1}\right) = \frac{1}{(n+1)^2}$.

The sum of all players’ utility functions is

$\displaystyle \sum_{i \in \mathcal{I}} u_i(x) \mid_{x = \left(\frac{1}{n+1}\right) \mathbf{1}_n} = \frac{n}{(n+1)^2} \approx \frac{1}{n}$.

As pointed out by Éva Tardos and Vijay Vazirani, this solution is a tragedy: when all players choose best response actions, the utility for each player is very low. Moreover, the more players trying to use the channel, the less “value” each player gets (which is intuitive).

This example illustrates the Tragedy of the Commons: each player benefits from using the shared channel and wants to use it as much as possible, while the “costs” of using the channel are distributed among all of those who use it. Players are selfish and want to maximize their own utility, which leads to over-exploitation of the shared resource. Selfishness does not always pay off.

__________

Appendix A

We would like to show that matrix $I_n + \mathbf{1}_n \mathbf{1}_n^T$ is invertible.

For starters, note that the $n^2$ entries of $n \times n$ matrix $\mathbf{1}_n \mathbf{1}_n^T$ are ones. Therefore, the columns/rows of $\mathbf{1}_n \mathbf{1}_n^T$ are not linearly independent. In fact, since all columns/rows of $\mathbf{1}_n \mathbf{1}_n^T$ are the same, we have $\textrm{rank}(\mathbf{1}_n \mathbf{1}_n^T) = 1$. The nullspace of $\mathbf{1}_n \mathbf{1}_n^T$ has dimension $n-1$, and therefore $n-1$ of the $n$ eigenvalues of $\mathbf{1}_n \mathbf{1}_n^T$ are zero.

Note also that matrix $\mathbf{1}_n \mathbf{1}_n^T$ is symmetric, and thus, its eigenvalues are real and its eigenvectors are orthogonal. Note that

$\mathbf{1}_n \mathbf{1}_n^T = n \left(\frac{1}{\sqrt{n}} \mathbf{1}_n\right)\left(\frac{1}{\sqrt{n}} \mathbf{1}_n\right)^T$

can be regarded as an eigendecomposition of $\mathbf{1}_n \mathbf{1}_n^T$: the only nonzero eigenvalue of $\mathbf{1}_n \mathbf{1}_n^T$ is $\lambda_1 = n$, and the eigenvector associated with it is $q_1 = \frac{1}{\sqrt{n}} \mathbf{1}_n$. Matrix $\mathbf{1}_n \mathbf{1}_n^T$ has $n-1$ zero eigenvalues: $\lambda_2 = \lambda_3 = \ldots = \lambda_n = 0$. Consequently, the nullspace of $\mathbf{1}_n \mathbf{1}_n^T$ is a $(n-1)$-dimensional linear subspace spanned by an orthonormal vector basis $\{q_2, q_3, \ldots, q_n\}$. We now form $n \times n$ matrices

$Q = \displaystyle [q_1 \mid q_2 \mid \ldots \mid q_n]$

and $\Lambda = \textrm{diag}(n, 0, \ldots, 0)$. We thus have an eigendecomposition

$\mathbf{1}_n \mathbf{1}_n^T = Q \Lambda Q^T$.

Since $\{q_2, q_3, \ldots, q_n\}$ is an orthonormal vector basis, and since $q_1$ is orthogonal to any vector in the nullspace, we have $Q Q^T = I_n$. Therefore

$I_n + \mathbf{1}_n \mathbf{1}_n^T = Q Q^T + Q \Lambda Q^T = Q (I_n + \Lambda) Q^T$.

Finally, we can conclude that the eigenvalues of $I_n + \mathbf{1}_n \mathbf{1}_n^T$ are the eigenvalues of $\mathbf{1}_n \mathbf{1}_n^T$ plus one

$\lambda_i (I_n + \mathbf{1}_n \mathbf{1}_n^T) = 1 + \lambda_i$

and thus the spectrum of matrix $I_n + \mathbf{1}_n \mathbf{1}_n^T$ is $\{n+1, 1, \ldots, 1\}$. Since none of the eigenvalues of $I_n + \mathbf{1}_n \mathbf{1}_n^T$ is zero, we can conclude that matrix $I_n + \mathbf{1}_n \mathbf{1}_n^T$ is invertible.