Securing Big Data in the IoT age
McDonald’s sold its 1 billionth hamburger in 1963. The company’s signage used to keep track in increments of 5 billion. It sold the 1 trillionth in 1993 and switched to the “Billions and Billions Served” slogan. Now it doesn’t mention the number sold anymore.
Sounds a lot like the growth of data storage. In 1987, a megabyte of hard disk storage cost $15. Now it costs 3 cents per gigabyte, and manufacturers ship seven exabytes a year. This has driven the phenomenal growth of big data and the Internet of Things (IoT).
Soon there will be billions and billions and even trillions of devices connected to the Internet, tracking everything. Including exactly where you are. Which is no big deal: We’ve been tracking your food’s physical movement through restaurants for years.
Big data and IoT security challenges
How will the security industry respond to this tremendous growth in the storage of personally identifiable information (PII), especially protected health information (PHI)? Laws exist to protect us from breaches and federal agencies levy large fines, but we still read about breaches on a monthly, if not weekly, basis. I myself have been hit by the recent Anthem and OPM data breaches. Are we going to be able to keep exabytes of PII and PHI data secure?
Billions and billions of encryption keys served
The way to protect sensitive data is via encryption, and the way to keep encryption granular is to use unique keys. The more people who have access to a particular encryption key, the greater the possibility data can be breached. If we serve more keys, then fewer people have access to each individual key. Which is what we want if we are to boost security.
So if we’re protecting exabytes of data, and this is growing rapidly, it seems logical that the number of keys we’ll need, maybe not this instant but someday soon enough, will be in the billions. Static key managers top out in the single-digit millions. To seriously consider protecting big data and IoT sources, we need dynamic key servers.
Static key servers, explained
A static key server, as its name implies, first generates one key for an identity pattern and then stores this key for future use. The key’s value does not change over time. To implement key rotation, key servers usually append some metadata to the identity—for example, the date or rotation group number—thus creating a new identity for a new key. To access a key, a user first authenticates and then requests the key associated with an identity. If authorized, the key server retrieves the existing key and provides it to the user.
Static key servers work well when the number of keys required over a system’s lifetime is in the low millions. As mentioned earlier, this capacity is more than sufficient for data-at-rest scenarios. Consider protecting every spindle of a 10 petabyte disk array: using four terrabyte drives at RAID 5 (20% redundancy) requires 3,215 drives. Rotating keys once a month requires 375,000 keys over the course of ten years. This is easily supported by static key servers with storage limits of 1 or 2 million keys.
In this corner: Mr. Dynamic
A dynamic key server also generates a key for an identity pattern, but it does not store that key. Access to a key works the same way as with a static key server, except the key is generated again for subsequent retrieval. A dynamic key server depends on a functional derivation per identity for a key: If the same identity is presented multiple times, the same key will be generated.
Continuing further, a dynamic key server supports automatic key rotation by appending a time to the identity. If rotation is defined for a particular key, the dynamic server will automatically calculate at what time to deliver what key. There is no need for the application or user to keep track of what rotation is needed for a particular use case. In this way, a dynamic key server allows more automation.
Example: Why securing mail with static key servers is difficult
Do you remember the release of Pretty Good Privacy (PGP) in the early ’90s? PGP was created and given away for free by Phil Zimmermann in response to federal intentions to require back doors in secure communications equipment. PGP uses public-private key pairs to prevent accidental disclosure of an encrypted communication. If a message is sent to three recipients, it is encrypted three different times and sent to three different destinations. That way each message may be decrypted only via the private key of the intended recipient. This blocks disclosure if a message arrives in the custody of an unintended recipient.
While revolutionary at its introduction, this scheme did not scale well with the growth of email. Imagine an email distributed to a hundred, a thousand, or even more recipients. Encrypting a unique copy of the message using the public key of each recipient places a tremendous computation, security, and storage burden on the email system, especially with static key management.
Now let’s consider an alternative scenario where the sender and the recipient list are the key identities when generating a message encryption key. A message sent from Alice to Bob and Chris would use a different key than one sent from Alice to just Chris. We now may use symmetric keys, since Bob would not be authorized to receive a key for a message where he is not a recipient. Using this scheme, the email system can send the same message to multiple recipients. And each recipient decrypts the message with the same symmetric key. We prevent accidental disclosure by simply not providing the decryption key to users who are not message recipients. This scheme, which does scale well, is known as identity-based encryption.
Using symmetric keys solves the scaling problem for the email system but not the limits of a static key server. For this scenario to work in practice, we must instead use a dynamic key server. Consider a user who sends a hundred emails a day. Suppose 80% of these emails are replies, while 20% are new messages. This implies that one user generates about 20 unique new recipient lists per day.
Thus, a single user generates 100 unique identity patterns per workweek. A one-week rotation policy results in 5,200 unique keys per year. Multiply this by 500 mailboxes, and we quickly exhaust the 2 million-key limit of most static key servers. This example shows why dynamic key servers are more useful for high-volume data protection applications.
Know what key server is best for your application
Now, I’m not saying that static key managers are useless. Quite the contrary; for some applications, such as data-at-rest protection of hard disk farms, they’re perfectly fine. But if your organization either depends on, is starting to depend on, or will depend on a big data or IoT project, carefully weigh the risks of not using a dynamic key server. Limiting the number of keys in the system because the key server can’t handle enough is too risky. If you don’t believe me, wait until you read about the upcoming data breach in next week’s paper.