The Philosophy of Tokenization

In 1846, Edgar Allen Poe published the essay "The Philosophy of Composition" in which he described how he came to write The Raven. What Poe says in this essay probably isn't true, but it almost certainly tells you how Poe wanted people to believe how he came to write The Raven. In other words, Poe may have invented more than the detective story. He may also have invented marketing.

Over 160 years after Poe wrote "The Philosophy of Composition" we see security vendors telling us things about their products that aren't really true but they want you to believe are true. What some tokenization vendors are claiming about their technology is a good example of this.

A tokenization system implements two functions: tokenize and detokenize. When a requesting application calls the tokenize function with a plaintext input, a tokenization system creates a token from the plaintext using a proprietary tokenization algorithm. The tokenization system then encrypts the plaintext using a cryptographic key that it gets from a key server. It then archives a copy of both the resulting ciphertext and the key used to encrypt it, and returns the token to the requesting application. When a requesting application calls the detokenize function with a token input, a tokenization system retrieves the archived ciphertext that corresponds to the token from its encrypted data archive and the key that was used to encrypt the ciphertext from its key archive. It then decrypts the ciphertext and returns the decrypted value to the requesting application.

Note that the secure operation of a tokenization system relies on the secure operation of several components: the tokenization server itself, as well as the key server that it uses, the key archive system and the encrypted data archive. The failure or compromise of any one of the components of this system will compromise the entire tokenization system.

The chances of an application being inappropriately implemented, configured or maintained is much greater that the chances of an adversary being able to defeat modern cryptography, so the reliability of a security system is limited by the chances of the system failing rather than an attacker defeating any type of cryptography. This means that tokenization systems are inherently less secure than systems with fewer components, which includes systems that implement encryption.

The security of a tokenization system also relies on the security of a proprietary tokenization algorithm. The operation performed by this algorithm is identical to the operations performed by well-known cryptographic algorithms like hash functions or encryption algorithms, which transform a message into a form from which it's infeasible to recover the original message. Oddly, the fact that its output is called a "token" instead of a "ciphertext" or a "message digest" has let tokenization algorithms avoid the careful public review that other cryptographic algorithms are subject to.

"Security through obscurity" has been known to be a bad system design principle for over 125 years – ever since Auguste Kerckoffs wrote "La cryptographie miltaire" in 1883. There's no reason why tokenization algorithms should be exempt from this general principle, and anyone thinking of using tokenization to protect sensitive data should ensure that the technology that they're considering is secure enough to withstand careful public review.

Tokenization really isn't more secure than encryption. Tokenization vendors just want us to think that it is.

Leave a Reply

Your email address will not be published. Required fields are marked *