Why data breaches have a lognormal distribution
I had an interesting discussion this morning about data breaches, and in this discussion the following following idea about the distribution of the size of data breaches came up.
It certainly looks like the size of data breaches follows a lognormal distribution, so that the number of records exposed in breaches doesn't follow a normal distribution, but the logarithm of the number of records exposed does.
Why should we expect this to be true?
One approach to understanding this gets fairly arcane. You might look at axioms for a reasonable metric for either security or vulnerability and then look at a maximum entropy distribution that fits the constraints that that suggests.
But there's probably a simpler approach.
The size of organizations seems to also follow a lognormal distribution. Let's suppose that the amount of sensitive data that an organization has is roughly proportional to the size of the organization. If data breaches are becoming a virtual certainty, we'd then expect to see that lognormal distribution of breach sizes from that alone, wouldn't we?