1 – Problem Statement
Say we have a network of agents where each agent takes measurements and communicates with a few other agents. The agents form a network. Each agent
- measures a given quantity
- stores the measurement in its internal state
- updates its internal state by replacing its own state with a weighted average of its neighboring agents’ states and its own state.
Our objective: we want each agent’s internal state to converge to the arithmetic average of all measurements.
What we want to know: which weights should each agent use to compute the weighted average?
What we want to devise is a sort of “network protocol” that allows a network of agents to compute the arithmetic average of all agents’ measurements. We want this protocol to be completely distributed, decentralized and non-hierarchical, i.e., all agents are “equal” (there is no hierarchy), and each agent follows the same rules.
The agents could compute the average in a centralized manner:
- the agents would elect a “leader” to be the network’s sink node
- the agents would pass messages to their neighbors and each node would route the messages to the sink node
- the sink node would collect all messages from all agents, and it would then compute the average of all measurements
- the sink node would broadcast the average value to its neighbors and the average would be diffused across the network.
Nevertheless, such a centralized approach has two major weaknesses:
- lack of robustness: if the sink node is removed, nothing works. The nodes will be routing messages to a sink node that no longer exists, and the average will not be computed nor broadcast.
- flooding: routing messages to the sink node will “flood” the network with messages. If we could somehow “combine” incoming messages at a given node and then pass just one message containing a linear combination of the incoming messages, we would save a lot of messages and thus avoid flooding the network. Some of you may think of this as a sort of Network Coding, and there are in fact similarities with Network Coding. However, in Network Coding one works with finite fields (otherwise the size of the alphabet would increase), while in this distributed averaging framework one works with the real field
.
One can then conclude that it would be great to devise a distributed, decentralized, non-hierarchical approach in which all nodes are simultaneously source nodes, routers and sink nodes. [1]
__________
2 – The Network as a Graph
Let the network be modeled as an undirected connected graph , where:
-
is the set of nodes.
is the set of edges. Graph
has no self-edges
.
The set of neighbors of node (i.e., the neighborhood of node
) is denoted by
.
Each edge of undirected graph is represented by an unordered pair
, where
,
, and
. If the graph were directed, then each edge would be represented by an ordered pair
, where node
would be the edge’s tail and node
would be the edge’s head.
The number of nodes and edges is given by the cardinality of sets and
: we have
nodes and
edges, where
and
. The extreme cases are:
is where graph
is a tree.
is where graph
is complete (i.e., every two distinct nodes are connected by an edge).
As mentioned before, graph has no self-edges. Consider now the “augmented” graph
, where the new (“augmented”) set of edges is
.
Note that . In other words, the augmented graph
is graph
plus
self-edges.
__________
3 – Distributed Averaging
Each agent, i.e., each network node takes a measurement
, and stores it in its internal state
. The state
is thus initialized with the measurement,
, and then the state is propagated forwards in time according to the recursion formula proposed by Lin Xiao and Stephen Boyd [2]
.
Note that this recursion can be interpreted as follows: node initializes its internal state with the measurement
, and then updates its state by replacing it with the weighted average of its own and its neighboring agents’ states. Note that
is node
’s self weight.
-
is the weight used by node
to weigh
, where
.
Note that each agent can only pass a message to its neighboring agents. Since graph is connected, a message from any given node will eventually reach any other node in the network.
We will denote
where:
is the vector whose entries are all ones.
is the vector of measurements.
Our goal is to find the set of weights and the set of self-weights
so that all agents’ internal states converge to the average value of all agents’ measurements
.
Let function be defined as
,
then the aforementioned recursive equations give us an algorithm that (if some conditions are satisfied) will compute in a decentralized and distributed manner, where each node in the network uses only local information (the value of its neighbors’ states). If all nodes could use global information, i.e., if all nodes knew all other nodes’ measurements, then computing the average would be extremely easy. Imposing the constraint that each node can only use local information is what makes this problem interesting.
__________
4 – Geometric Interpretation
Let
be the state vector at time step , then the aforementioned recursion equation can be written more compactly as
,
where:
is the weight matrix, whose sparsity pattern is imposed by graph
.
- the state vector is initialized with the measurement vector,
.
Since , we have
.
From a “geometric” point of view, our goal is to steer the state vector from to the desired / target state
in the least possible time. We are thus faced with the problem of finding a weight matrix with a sparsity pattern imposed by graph
such that the state vector converges to the desired state
.
Following Lin Xiao and Stephen Boyd’s rationale [2], we would like the sum of the vector’s entries to be preserved
so that the arithmetic average is also preserved. Imposing the constraint that the sum / mean should be preserved is equivalent to imposing the constraint that the state vector must lie on the hyperplane
.
Note that hyperplane is defined by the measurement vector
. Given two distinct measurement vectors
and
, we will have two distinct (and parallel) hyperplanes
and
.
In the next section, we will discuss some properties of the weight matrix.
__________
5 – The Weight Matrix
The weight matrix has a sparsity pattern imposed by the “structure” of the augmented graph
:
-
(in general) if
if
.
The fewer edges graph has, the more sparse matrix
is. For example, if the graph is the one depicted below
then the weight matrix is
,
and the adjacency matrix is
.
Note that the sparsity of the adjacency matrix is imposed by graph , while the sparsity of the weight matrix is imposed by graph
.
Let matrix be
,
then we have
.
Note that imposing the constraint that each node can only use local information is equivalent to imposing the constraint that the weight matrix must have a sparsity pattern given by graph
. If we relax this constraint and allow nodes to use global information, which is equivalent to allowing the weight matrix NOT to have a sparsity pattern imposed by graph
, then the desired / target vector can be reached in one single iteration:
and
.
We want to impose the sparsity constraint on the weight matrix, and we want the state vector to converge to the desired / target state
.
We thus have the following limit
.
Note also that imposing the condition implies that
and consequently
,
which means that the sum of the entries of each column of the weight matrix must be
.
If all weights are nonnegative, then is a left stochastic matrix.
Once again following Lin Xiao and Stephen Boyd’s rationale [2], we would like to be a fixed point of recursion
, which implies that
,
which means that
,
and consequently
,
i.e., the sum of the entries of each row of the weight matrix must be
.
If all weights are nonnegative, then is a right stochastic matrix.
Note that if the sum of the entries of each row and each column of the weight matrix must be
then
,
which yields
,
and thus
.
If , then
and
will necessarily be zero. In other words,
,
but if is true,
is not necessarily symmetric.
Summarizing, the weight matrix should have the following properties:
- sparsity pattern imposed by graph
.
to ensure sum / mean preservation.
to ensure that
is a fixed point of the recursion.
We now know the properties of matrix . The difficulty is now to find a matrix with the desired properties. As I will show on a later occasion, finding such matrix is not particularly difficult. What is difficult is to find a weight matrix that yields fast convergence. In [2] Lin Xiao and Stephen Boyd use optimization methods to find the weight matrix that yields the higher rate of convergence.
For a more in-depth analysis and discussion, I recommend you take a look at Lin Xiao’s PhD thesis [3].
__________
6 – Concluding Remarks
In this post, we talked about an abstract network of “agents”. This network could be a wireless sensor network, a team of robotic vehicles or a social network (among other possibilities). Irrespective of what the agents are, the network is always represented by a connected graph, with “agents” at the nodes and communications links at the edges.
We showed that finding a distributed, decentralized, non-hierarchical averaging protocol boils down to finding a weight matrix with certain desired properties. How to find such weight matrices will be addressed in future posts.
__________
7 – References
[1] Dimitri P. Bertsekas and John N. Tsitsiklis, “Parallel and Distributed Computation: Numerical Methods”, Prentice-Hall, Englewood Cliffs, New Jersey, 1989.
[2] Lin Xiao, Stephen Boyd, “Fast Linear Iterations for Distributed Averaging“, Systems and Control Letters, vol. 53, pp. 65-78, 2004.
[3] Lin Xiao, “Decomposition and Fast Distributed Iterations for Optimization of Networked Systems”, PhD thesis, Aeronautics and Astronautics Department, Stanford University, June 2004.
Tags: Average Consensus, Distributed Averaging, Distributed Computation


March 10, 2008 at 10:40 am |
Gosh, this is the longest post I ever wrote. It took me several days to write it. IMHO it’s also the best post I ever wrote.
Despite my late-night proof-reading, I am sure there are still typos. I will correct them as I find them.
In the future, I might update this post slightly in case I find a better, more simple or more elegant way to present the ideas I addressed in here.
March 10, 2008 at 12:42 pm |
Hey i enjoyed your post. It is the best post you have written in my opinion as well. I enjoy your posts otherwise.
But i generally like posts that involve a lot of personal research. I like posts that are like research essays themselves! I like writing in this style myself whenever i get the time. It is a different case that i have only put 2 or 4 in my personal page so far only!
Don’t worry about typos.
Keep it up man. FANTASTIC post. FANTASTIC!
March 10, 2008 at 1:23 pm |
As you pointed out, networks with a central “guiding brain” have a lot of problems. Like one point of operation means one point of failure. Also if the network size is increased many times. Say a thousand times, a lot of BW would be lost in the various nodes communicating their information to the central node. And global information is frequently out of date!So decentralized systems counter that very well. Giving a very robust alternative!
Something that came to my mind while reading this post were some papers on adaptive routing in MANETs, which (by definition) dynamically form a temporary network, without using any existing network infrastructure or centralized administration. Among the many advantages you counted on because of decentralisation. One more could be of power saving as we can then work out most used and least used routes in the network better as well.
A very good post indeed! The part on the weighted matrix especially.
March 15, 2008 at 10:21 pm |
I don’t quite get how the observations enter into the picture — after all, if the matrix is nonnegative, irreducible, aperiodic, the initial state is forgotten when your recursion is
.
My first impression is that the recursion should be
. Then the task would be to find a matrix with given sparsity pattern and large spectral gap which generates
as the solution of that fixed point equation
March 19, 2008 at 8:31 pm |
How long do you have to compute the average and to what precision?
March 20, 2008 at 8:31 am |
@ Yaroslav
My knowledge of matrix theory is kind of rusty right now, but I will be thinking on your suggestions. The recursion that I presented in this post is the one I have seen in most papers in the area of average consensus / distributed averaging. And there’s a reason for it: the recursion
can be regarded as a gradient descent method in which we want to minimize a convex, quadratic “cost” function. I will write more about it in my next post. It’s quite interesting.
@ David
The graph is connected, and therefore in a number of iterations equal to the graph’s diameter, each node should be able to compute the exact average of all measurements. If the network is too large and we can tolerate a small error in the computation of the average, then fewer recursions will be needed.
March 25, 2008 at 3:35 pm |
[...] of Reasonable Deviations explains how the natural problem Say we have a network of agents where each agent takes measurements and [...]
March 27, 2008 at 4:03 pm |
That post was a great read! We need more people blogging about their research! Keep up the good work.
April 10, 2008 at 4:22 am |
@ Aurelie
Thanks for the kudos :-)
I must say I am surprised that anyone had the “courage” (i.e., time and patience) to read the whole post, eh eh eh
April 28, 2008 at 6:35 pm |
Hey, nice post. I was searching the net for info on distributed averaging.
Has anybody ever worked on the problem where each agent’s measurement is not fixed but changes as a function of the agent’s current belief in the global average?