A closer look at the data

Here's some more information about the data shown in the previous two days' posts. The size of breaches appears to be lognormal, even when we just look at data for particular industry sectors or particular ways in which a breach happens. Here's a summary of this. It shows how many data points I looked at from the Open Security Foundation's data breach database, the logmean (mean of the base 10 logarithm) and log deviation (standard deviation of the base 10 logarithm) of the data for different cases, as well as the p-value that we get if we do a Kolmogorov-Smirnov test for normality with the base 10 logarithm of the data. In each case, there's a reasonable fit to a lognormal distribution (i.e., p > 0.05). In some cases, the fit is even very good.

Category

N

Logmean

Logdeviation

P-value

Biz

845

3.33

1.35

0.12

Gov

430

3.52

1.31

0.66

Edu

492

3.24

1.03

0.95

Med

419

3.53

1.15

0.57

Hack

311

3.89

1.22

0.75

Lost

264

3.49

1.33

0.82

Stolen

724

3.50

1.12

0.49

Leave a Reply

Your email address will not be published. Required fields are marked *