Data breaches have a lognormal distribution
In many parts of information security, there's very little reliable data that you can use to help you make decisions. For data breaches, however, there's a fair amount of data available, and you can get a database of almost 2,000 data breaches from the Open Security Foundation at their web site datalossdb.org. There's lots of information in this database, and it shows some interesting patterns. In particular, here's what you see if you plot the number of records ("TotalAffected" in the OSF database) compromised by each breach from 2006 to the present. It's hard to see a pattern in this data.
On the other hand, if we make a histogram of the logarithm of the data, we get the following graph. That certainly looks like data from a normal distribution, so that the size of data breaches follows a lognormal distribution fairly closely, which is the blue line in this graph. That's a fairly good fit, isn't it?