## Leakers versus plumbers

The Official Secrets Act is not to protect secrets, it is to protect officials.

– Sir Humphrey Appleby, Yes Minister

Imagine that you work for an organization in which secrecy is cherished and enforced. A sensitive document which proves that your organization has engaged in unethical / illegal activities has recently been leaked to a major newspaper, thus provoking a scandal that blemishes your organization’s reputation. As expected, your organization is greatly displeased with the leak and, therefore, a team of plumbers is created in order to find the leakers. If you are a plumber, you would like to maximize your chances of finding the leakers, whereas if you are a leaker, you would like to maximize your chances of evading the plumbers.

__________

Canary Traps

Suppose you are a plumber. You set up a classical canary trap in order to expose the leakers: you give slightly different versions of a sensitive document to different people, you wait until one of those versions is leaked to the press, and knowledge of which version got leaked allows you to determine who the leaker is.

For example, suppose further that you have five suspects. Let $D = \{D_1, D_2, \dots, D_5\}$ be the set of five documents (five slightly different versions of an original document), and let $S = \{S_1, S_2, \dots, S_5\}$ denote the set of five suspects. A canary trap is thus a bijective mapping $C: D \to S$ that assigns suspects to documents. Pictorially, this bijection can be represented by the following graph

Since a single suspect is assigned to each document (otherwise $C$ wouldn’t be a mapping), knowing which version is leaked allows the plumbers to find the leaker. Let us suppose that version $D_2$ is leaked to the press. Then, the plumbers immediately know that $S_2 = C(D_2)$ is the leaker, as depicted below

There is a problem, however. If the one who leaked the sensitive document to the press is not clueless, he will be conservative and assume that there is an ongoing counter-intelligence operation within the organization and, therefore, he will keep a low-profile and abstain from leaking any more documents. If none of the documents in $D$ are leaked, then the plumbers cannot determine who the leaker is. The canary trap fails.

The leaker can also steal someone else’s document and leak it to the press, so that the plumbers will hunt the wrong suspect. Note, however, that stealing other suspect’s version could leave an electronic trail (computer access is monitored, surveillance video cameras are everywhere) that could eventually serve to expose the actual leaker. Therefore, this countermeasure does not seem very wise to me.

__________

Pursuit and Evasion

As the canary trap failed to expose the leaker, the plumbers take drastic measures and have the five suspects polygraph-tested. Suppose that all five polygraph tests were negative. The plumbers then conclude that either the leaker is not among the five suspects, or there’s at least one leaker among the suspects and he managed to cheat the polygraph test.

At this point, one may wonder how were the plumbers able to narrow down their search to five suspects only. Let us suppose that the information leaked to the newspaper could have come from three reports only, and that these three reports were available to five people (the same people who later became the five suspects, of course). The plumbers then build the following access graph

which assigns suspects to reports, so that one can know which suspects had access to a given report. Please do note that, contrary to what happened with the canary trap, one can now assign more than one suspect to each report! This suggests that we should use binary relations instead of mappings. However, I will exploit a mathematical loophole, and use mappings nonetheless. Let $R = \{R_1, R_2, R_3\}$ be the set of three reports, and let $2^S$ be the power set of $S$ (i.e., the set of all subsets of $S$). Let $A : R \to 2^S$ be the access mapping, that assigns sets of suspects to each report (note that the codomain of $A$ is $2^S$, not $S$).

For example, suppose that the newspaper that published the leak included an excerpt from report $R_1$ in their article. The plumbers then know that the leaker is in the subset $A(R_1) = \{S_1, S_2, S_3\}$, as illustrated below

The hunt for the leaker has thus been further narrowed down, and now the plumbers have only three suspects. However, if the plumbers find evidence in the newspaper article that the leaked report was $R_2$, then they are able to immediately determine that the leaker was suspect $S_3$, as $A(R_2) = \{S_3\}$. A cautious, wise leaker might want to make sure he knows the access graph before leaking any document.

__________

Concluding Remarks

What exactly does one learn from this post? From the (admittedly simplistic and superficial) analysis above, one can conclude that the leaker should strive to maximally confuse the plumbers by leaking files that a lot of people have access to. The more suspects the plumbers have to deal with, the more time the leaker buys. Perhaps an innocent suspect will fail the polygraph test. Would the plumbers, in their own self-interest, crucify the innocent one in order to avoid a potentially long hunt for the true leaker? This question is rhetorical, of course.

The plumbers can also gain some insight from the analysis, I believe. The lesson to be learned is that the access mapping $A : R \to 2^S$ is the most valuable asset a plumber can have. If $R$ is a set of sensitive reports and $R_L \subseteq R$ is the subset of leaked reports, then computing the intersection $\bigcap_{r \in R_L} A(r)$ should allow the plumbers to maximally narrow down the list of suspects. But from then on, the analysis presented in this post cannot help much. Polygraph-test all the suspects that are left, and hope there won’t be a lot of false positives!

It would not be unreasonable to think of this leakers versus plumbers game as a pursuit-evasion one: the leaker wants to maximize the number of suspects on the plumbers’ list, while the plumbers want to minimize the number of suspects as much as they can. The leaker tries to maximize the plumbers’ uncertainty, the plumbers try to minimize it. A word of caution is in order, however. Assumptions do matter a lot. For example, we assumed that the leaker had perfect knowledge of the access graph, but what if he does not? Then he risks falling into a canary trap. Mathematical reasoning does serve to clarify one’s thought, but let us not be so arrogant as to believe that a simple model can capture all of the real world’s complexity. Common sense should not be abandoned.

As you might have noticed, I make no moral judgements in this post. That is due to the fact that I find leaking of classified information to be amoral. In certain circumstances, leaking is a moral imperative. In other circumstances, leaking might compromise a country’s national security. Secrecy often serves to protect a country, but it might also serve to protect the interests of a few people who, either due to malice or incompetence, did fail to fulfil their duty.

### 3 Responses to “Leakers versus plumbers”

1. Onkar Says:

Hi Rod, amazing post.

Something similar to what you have highlighted occurs in textbook copy protection. For e.g. in a book with 100 maths problems, the publisher deliberately inserts two wrong solutions. If a person engages in copying he will copy the 98 correct and 2 incorrect problems also. This can then be highlighted in a court of law.

As always your stuff and the way you combine the problem/solution with math is mind blowing. However, the frequency of your posts could improve a bit :-)

Cheers and Great work.

2. syntaxfree Says:

You really need to put weights on false positives and false negatives for an optimal plumbing strategy. You don’t want to lose highly-trained operatives or be sued for defamation.

That’s essentially the problem with inference as we see it nowadays: you can increase the power of a test arbitrarily, and there’s no good way to decide when to stop. I have a few ideas about this, but they hinge on far out dissident ideas on the philosophy of science.

Cool stuff. Nice to find out you never disappeared after all.

• Rod Carvalho Says:

My post is based on the assumption that different people have access to different documents. This would be a reasonable assumption if the organization under study happens to be an intelligence agency, where classified documents have different classification levels (e.g., top secret, secret, confidential, restricted, unclassified). The knowledge of the access graph / mapping allows the plumbers to narrow down the list of suspects as much as possible.

I agree that one should use weights, but only after the suspect list has been maximally culled. After all the set intersections are made, it would be wise to take into account the value of each suspect when allocating the plumbers’ resources. Ideally, the more valuable the suspect, the more time and effort should be invested in making sure one does not obtain a false positive.