Hashing and the PCI DSS

Some people want to use hashing to render cardholder information unreadable, but a closer look at hash functions shows that this technique ends up either being non-secure, or if it's done in a secure way, then it's equivalent to encryption because the security depends on the secrecy of secret information.

The FAQ for the PCI DSS has the following to say about using a cryptographic hash function to render cardholder data unreadable:

Are hashed Primary Account Numbers (PAN) considered cardholder data that must be protected in accordance with PCI DSS?

One-way hashing meets the intent of rendering the PAN unreadable in storage; however the hashing process and results, as well as the system(s) that perform the hashing, would still be in scope to assure that the PAN cannot be recovered. If the hashing result is transferred and stored within a separate environment, the hashed data in that separate environment would no longer be considered cardholder data and the system(s) storing the hashed data would be out of scope of PCI DSS. If however, the system hashes and stores the data on the same system, that system is considered to be storing cardholder data and is within PCI DSS scope. The difference lies in where the data is hashed and then stored. More on hashing: A hash is intended to be irreversible by taking a variable-length input and producing a fixed-length string of cipher text. As the PAN has been 'replaced', it should most often be considered out of scope in the same manner receipt of truncated PANs are out of scope. However, PCI DSS Requirement 3.4 also states that the hash must be strong and one-way. This implies that the algorithm must use strong cryptography (e.g. collisions would not occur frequently) and the hash cannot be recovered or easily determined during an attack. It is also a recommended practice, but not specified requirement, that a salt be included. Since the intent of hashing is that the merchant or service provider will never need to recover the PAN again, a recommended practice is to simply remove the PAN rather than allowing the possibility of a compromise cracking the hash and revealing the original PAN. If the merchant or service provider intends to recover and use the PAN, then hashing is not an option and they should evaluate a strong encryption method.

Note that including a salt is recommended but not required. The PCI SSC should consider revising this to require a salt and to reconsider how this affects determining exactly which systems are in scope and which ones are not for a PCI DSS assessment.

A hash function H takes a message M and calculates a message digest or hash D=H(M) from it. A cryptographic hash function is one in which the following three operations are adequately hard:

  1. Finding two messages M1 and M2 such that H(M1)=H(M2). This is called finding a collision.

  2. Given a message digest D, finding a message M with H(M)=D. This is called finding a preimage.

  3. Given a message M1 and its digest D=H(M1), find another message M2 that produces the same digest, or that D=H(M2). This is called finding a second preimage.

When a hash function is used to render cardholder data unreadable, we're really saying that it needs to be hard to find a preimage for a given message digest. If it's easy to do that, then an attacker can recover a PAN from a hash of the PAN, which means that the hash wasn't really unreadable. Making a hash of a PAN unreadable really requires more than just running a PAN through a cryptographic hash function. This is because there really aren't that many PANs possible.

You can divide a 16-digit PAN into three parts. The first six digits are the Issuer Identification Number (IIN). The next seven digits are an account number. The last digit is a checksum that's calculated from the previous 15 digits.

With a 16-digit PAN, there are 1016possible PANs. Calculating all 1016 possible message digests for these PANs sounds hard, but it doesn't require the level of effort required to make it as hard as breaking other forms of cryptography. It's roughly equivalent to the work required to break a 53-bit cryptographic key. That's a non-trivial amount of work, but one that isn't enough to really be considered secure against hackers today.

On the other hand, because the first six digits of a PAN can often be guessed, it's probably even easier to reverse a hash of a PAN than that because it's very reasonable for a hacker to be able to guess the IIN.

The IIN just tells you what type of card a PAN is from and what bank issued the card. If you're a hacker that manages to breach the security of a particular bank, for example, then it's very easy to greatly limit the range of possible IINs, leaving only the account number and the checksum that are unknown.

If you know the first six digits of a PAN, then reversing a hash function from a hash of the PAN is very easy. You only have to calculate 1010 possible message digests, which is roughly the work required to break a 33-bit cryptographic key. That's an amount of work that's fairly easy with today's computers, and one that's feasible for many hackers to do.

This means that if an attacker knows the IIN part of a PAN then replacing the PAN by a hash of the PAN doesn't really provide that much security for the PAN. It provides some security, but not enough to really defeat a moderately-determined attacker.

One way to make it harder for an attacker to recover a PAN from a hash of the PAN is to add additional information called a salt to the PAN when it's used to calculate a hash of it. So instead of calculating D=H(PAN), you might calculate D=H(PAN||SALT) instead. This makes it much harder for an attacker, but it also requires keeping the value of the salt secret to make it difficult for a hacker to find the value of a PAN from a hash of the PAN.

If the salt isn't secret then using it doesn't make it harder for an attacker to find a preimage of D, which means that it's no more difficult to recover a PAN from a hash of the PAN. If this is the case, then the reason behind replacing a PAN with a hash of the PAN doesn't make sense any more because the hash function is no longer reversible.

On the other hand, if the difficulty of recovering a PAN from a hash of the PAN depends on the secrecy of a salt, then there's no real difference between the protection provided by replacing a PAN with a hash of the PAN and replacing a PAN with an encrypted version of the PAN. In the case of using encryption, we call this value a cryptographic key. In the case of using a salted hash, we call this value a salt. In both cases, reversing the transformation is easy if an attacker has access to a secret. This means that for the purposes of complying with the PCI DSS, the two probably ought to be considered equivalent.

Leave a Reply

Your email address will not be published. Required fields are marked *