Tokenization of Credit Card Numbers and the CAP Theorem
In the payment card industry there are a set of security standards and best practices that are defined by the Payment Card Industry Security Standards Council (PCI) with a goal of protecting card holder data. Entities such as brick-and-mortar merchants, e-commerce merchants, payment card processors, and acquirers are all required to follow the standards and best practices defined by PCI.
Protection of card holder data includes encryption and tokenization. In this article we shall focus on tokenization, in particular the concept of a card data vault and how the CAP theorem reduces to the choice of giving up consistency of the card data vault.
What is Tokenization?
The PCI DSS Tokenization Guidelines defines tokenization as:
Tokenization is a process by which the primary account number (PAN) is replaced with a surrogate value called a “token.”
De-tokenization is the reverse process of redeeming a token for its associated PAN value. The security of an individual token relies predominantly on the infeasibility of determining the original PAN knowing only the surrogate value.
There can be different implementations of a tokenization as described in the guidelines. One such approach is to define a card data vault that stores PANs and associated tokens.
On the surface it would appear that building such a system would be easy since the card vault can be implemented in a data store (either RDBMS or noSQL store) and the data stores schema could be simple, containing just the PAN, token and perhaps some timestamp information. There are plenty of companies that have attempted to build their own card vaults and many vendors offering commercial products. However we shall see later in this article that designing a card vault requires a distributed data store and a decision is needed on which compromises of the CAP Theorem your system is willing to accept.
The CAP Theorem
The CAP Theorem was initially observed by University of California, Berkeley computer scientist Eric Brewer in 2000 and later a proof was published by Seth Gilbert and Nancy Lynch. There are lots of good articles on the Internet on the CAP Theorem. In summary, the theorem states that for distributed data storage systems a system designer has to choose between two of the three following menu items:
- Consistency. In the card vault example this would imply that no matter which distributed tokenization service was used for tokenization or de-tokenization they would all return the exact same token for a given PAN. It would not be permissible, for example, to return two different tokens for a given PAN.
- Availability. The card vault is always available to service a request to tokenize or de-tokenize.
- Partition tolerance. This is perhaps the least understood of the three choices. In summary, for a distributed storage system the system can continue to operate even in the event of underlying data communications network failure, or hardware failure in a node.
Do I have a choice between C-A-P?
Let us now use the example of a Token Service Provider (TSP). Imagine the following use case for tokenization:
- Customer Alice is checking out at the local supermarket using her credit card. The merchant, to reduce PCI scope, does not want to store any card holder data in their systems. The merchant has contracted with BobPay payment processor that also acts as a TSP.
- Alice swipes her credit card, the transaction is sent (encrypted using end-to-end encryption) to BobPay, and in the authorization response a token representing the original PAN is returned to the merchant.
- The merchant stores the token for later use, such as settlement, charge backs.
BobPay provides services to merchants throughout the USA, and as such has multiple data centers.
Partition Tolerance is not optional
There is a nice article by Henry Robinson that explains any distributed storage system will suffer network and hardware outages, therefore partition tolerance is not optional, after all that is probably the reason that BobPay has multiple datacenters across the USA.
Can Alice wait?
Going back to our example of Alice at the merchant’s checkout. If the BobPay’s tokenization server is unavailable, how long would Alice wait before walking out of the merchants store without purchasing? What about the other lanes and stores and BobPay’s other merchants? Clearly availability of the tokenization solution is critical to BobPay’s business. Availability of the card vault is a requirement.
Consistency on the chopping block
We are now down to one item left on the menu – consistency. We will have to compromise on consistency of the card vault. What does this mean for a card vault and for the merchant systems?
In the card vault it means that it is possible to have a single PAN represented by more than one token. For example consider the following where the card vault system is fully functional, including a connected network between two datacenters.
In this diagram the flow is as follows:
- PAN is given to the card vault TS1 for tokenization
- The card vault creates the random token
- The PAN->token entry is replicated to the other site, TS2
- The token is returned
- A request for the same PAN on TS2…
- …yields the same results
Compare this flow to the following diagram where the network has been partitioned due to network failure:
- PAN is given to the card vault T1 for tokenization
- The card vault creates the random token
- Token is returned
- Same PAN is presented to the card vault T2
- A different token is generated…
- ..and returned
We can see that to meet the availability requirement it must be possible to tokenize the PAN on TS2, even though a token mapping exists on TS1.
This implies that applications of the tokenization service must expect more than one token for a given PAN.
Minimizing the impact
There are some techniques that can minimize the impact of non-unique PAN-Token mappings as follows:
- Eventual consistency. The data store will recovery from the partition event and the data will eventually become consistent. The application consistency will require application logic. In our merchant example the merchant should accept a message in for the form: “Replace token1 with token2”.
- Minimize the impact and remove complexity. Let us consider the properties of a card vault. Day one the vault will be empty and then constantly grow in size (in theory the size is bounded by the number of possible PANs). If we can also assume that the PAN-token entry is immutable (which is a reasonable assumption) then optimizations that can be made. For example using modern data storage techniques such as Hadoop and some real-time computation such as Storm the system could reduce and isolate the consistency issue. There is a good blog post called How to beat the CAP theorem that explains how to use Hadoop and Storm that could be applied to the card vault.
What other choices exists if I want CAP?
If you want a tokenization card vault solution then Voltage Security has a scalable solution, but what if you want all of the three menu items: consistency, availability and partition tolerance for tokenization? The answer is obvious: don’t store state, and thus don’t use a card vault.
One can completely eliminate the need for card vault by using tokenization techniques that utilize format preserving encryption (FPE). No card vault means no CAP Theorem, and the tokenization service can provide CAP.
However a FPE mechanism of tokenization will require key management. If the underlying key management system stores keys in a distributed key store then the CAP Theorem applies to the key management system itself! Luckily the Voltage Key Management system is stateless and is thus not subject to the CAP Theorem.
We have seem that tokenization using card vaults, like any other distributed data store, falls prey to the CAP Theorem and the only viable option is to provide an eventually consistent solution, requiring application logic to resolve duplicate tokens for the same PAN. If you are looking for a card vault tokenization solution from a vendor and they tell you they have all three of consistency, availability and partition tolerance the solution will be too good to be true.