Problems with the Ponemon data breach studies
The Ponemon data breach studies are one of the few sources of information that we have about data breaches, but their results may either overestimate or underestimate the true cost of a data breach because the breaches that are looked at in these studies aren't really representative of all breaches.
As I've noted before, the size of data breaches follows a lognormal distribution fairly closely. Historically this distribition has had a logmean (base 10) of about 3.4 and a logdeviation (base 10) of about 1.2. In other words, the base 10 logarithm of the breach size follows a normal distribution or "bell curve."
But when we look at the breaches that the Ponemon studies look at, the breaches don't seem to be representative of all breaches. The 2010 report (PDF), for example, looked at US breaches that exposed between 5,010 and 101,000 records. Here's what we get when we graph that range (of the log) of breach sizes:
So it certainly looks like that range of breaches isn't really representative of all breaches. It only includes breaches that are above-average in size but aren't too big, and it only represents about 31 percent of all breaches.
The Ponemon reports claim that they're carefully tailored to be representative of companies that suffer data breaches. As their 2010 U.S. Cost of a Data Breach report said,
This benchmark study examines data breach costs resulting in the loss or theft of protected personal data. As a benchmark study, Cost of a Data Breach differs greatly from the standard survey study, which typically requires hundreds of respondents for the findings to be statistically valid. Benchmark studies are valid because the sample is designed to represent the population studied. They intentionally limit the number of organizations participating and involve an entirely different data-gathering process.
A more representative sample of breaches would also include companies that suffered breaches that are both much larger and smaller than those interviewed for the 2010 report. Because those breaches weren't considered in this report, there's a good chance that the report either overestimates or underestimates the true cost of data breaches. Maybe we'll find out which one in a future report.