A closer look at the data
Here's some more information about the data shown in the previous two days' posts. The size of breaches appears to be lognormal, even when we just look at data for particular industry sectors or particular ways in which a breach happens. Here's a summary of this. It shows how many data points I looked at from the Open Security Foundation's data breach database, the logmean (mean of the base 10 logarithm) and log deviation (standard deviation of the base 10 logarithm) of the data for different cases, as well as the p-value that we get if we do a Kolmogorov-Smirnov test for normality with the base 10 logarithm of the data. In each case, there's a reasonable fit to a lognormal distribution (i.e., p > 0.05). In some cases, the fit is even very good.