Why I like RC5 and RC6: Part 2
In Part 1 I talked about how great RC5 was. Secure, easy to analyze, fast and small, it seems like there's nothing bad to say about it. Well, here is the number 1 complaint: The rotation count in a round is based on 5 of 32 bits. I'll explain.
Here are the guts of RC5 (one roumd).
A = A XOR B
A = A rotate left by B
A = A + keyTable[i]
B = B XOR A
B = B rotate left by A
B = B + keyTable[i+1]
A 32-bit number can be rotated 0 to 31 bits. Any other rotate count is the same as some value in the range 0 to 31. That is, a 32-bit rotate is equivalent to a 0-bit rotate, a 33-bit and a 1-bit rotate are equivalent, and so on. Hence, when rotating A left by B, what is really happening is "rotate A left by the number that is in the last 5 bits of B." For example, if A = 0x4C1D7152 and B = 0xB199AF31, then A will be rotated by 17 bits.
0x4C1D7152 ROTL 0xB199AF31 =
0x4C1D7152 ROTL 0x00000011 = 0xE2A4983A
Why is this bad? Because so much of the security of RC5 comes from the rotate, and two similar plaintext blocks will produce very similar results. Maybe it's easiest to show with examples.
A = 0xFD84DE63
B = 0xB199AF31
A = A XOR B A = 0x4C1D7152
A ROTL B A = 0xE2A4983A
A = A + table[i] A = 0x7105B361
Now let's suppose the A and B are slightly different (one bit difference, the most significant bit of B)
A = 0xFD84DE63
B = 0x3199AF31
A = A XOR B A = 0xCC1D7152
A ROTL B A = 0xE2A5983A
A = A + table[i] A = 0x7106B361
Compare the results of the two.
0x7105B361 case 1
0x7106B361 case 2
When you have two plaintext blocks that are very similar (they differ by only one bit), if the ciphertext is similar, you have a problem. An attacker can look at two ciphertext blocks, see the similarities and deduce things about the plaintext.
In the case above, the two ciphertexts differ by only 2 bits out of 32 (7105… vs. 7106…). What you'd like is to have, on average, 16 bits differ.
On the other hand, that's only one round of RC5. Remember, there will be 20. In the next round, maybe one of the rotation counts will be different. In fact, in the example above, in the second round, after the first rotate, the two cases would look like this.
0xD72829EB case 1
0xEB341481 case 2
The two ciphertexts are starting to differ more. In other words, two similar plaintexts produced similar ciphertexts after 1 round, but after 2 rounds, they are very different. And the differences will only compound in the later rounds (as Ron Rivest put it, "an avalanche of change").
So why the complaint? Because if the rotation count were based on ALL the bits of the "other word", then the divergence would happen in the first round, not the second. Furthermore, analysts were able to come up with cases where the differences were smaller than the example here, and the divergence didn't happen until the third or fourth or even a later round.
Some analysis claimed that by the tenth round, the probability of no divergence was so low that it was virtually impossible. That is, the avalanche will begin by the 10th round at the latest. It was also true that it would be a very rare case that the avalanche begins later than the 3rd round. However, other analysis showed that it was possible the divergence could be so late that if only 12 rounds were used, it would be possible to apply differential cryptanalysis.
When the AES competition opened, Ron Rivest thought it might be a good idea to adjust RC5 so that all the bits of one word would be used to compute a rotation count. That would silence the only real technical criticism of the algorithm, thereby making the adjusted algorithm a much better candidate. The solution was to introduce a multiply in the round.
r = B * ((B shift left 1) + 1)
r ROTL 5
A = A XOR r
A ROTL r
A = A + table[i]
(Note that this is not the exact RC6. However, for ease of illustrating the point I have "adjusted" it slightly, but the operations are the same.)
If we used this form of RC5 in our example, the two cases would have generated the following r values in the first round.
0xD0B7BE7E case 1
0x3E57BE66 case 2
In case 1, the round count would be 30, and in case 2 it would be 6. The divergence would happen in the first round, not the second.
In fact, researchers showed that RC6 is highly resistant to differential cryptanalysis. The avalanche began in the first round virtually every time.
However, there is the problem of the multiply. This slows down the algorithm. People believe a multiply is expensive. Futhermore, one important use of encryption is on Smart cards, and many of them were originally 8-bit processors so a 32-bit multiply would be disastrous.
On most chips, an add and XOR are 1 cycle each. A rotate is either 1 cycle if a rotate instruction exists (such as Pentium) or 4 cycles if not (it has to be done using a copy, 2 shifts, and an add). On the other hand, back in the 80's, a multiply on most chips was 20 cycles.
But we weren't talking about the 80's. In the 90's, chips could do multiplies in 6 – 8 cycles. By 2000, Intel had a 2-cycle multiply, and almost all chip manufacturers had multiplies that were 4 or fewer cycles (Sparc was a notable exception, in 2000 their multiply took over 20 cycles. Why? I don't know and I'd like to talk to the people who were responsible to find out). Even the Smart card makers, by 2000, used 32-bit processors that had fast multiplies. However, the preception was still there, multiplies are expensive.
In Part 3 I'll compare RC5 and RC6 to other algorithms and look at why RC6 was not selected as the AES algorithm.