Marvin, the cryptographic bug that will never be fixed

October 11, 2023

There has been a flaw in the fundamental RSA encryption algorithm for 25 years. It reappears from time to time under different names and in different forms. They have been re-analyzing the problem for three years and have discovered, among other things, that it will never be fixed.

They have had to make such impressive achievements such as being able to perceive changes in processing time of the order of a few CPU cycles difference in response, across a network in production for miles by hopping six routers along the way. They have christened it Marvin.

In the novel "The Hitchhiker's Guide to the Galaxy" (from which the term "42" also comes from), Marvin is the Android that lasts until the end of the universe.

The bug dates back to 1998, when Daniel Bleichenbacher discovered that error messages thrown by SSL servers when processing the padding of the PKCS#1 (v1.5) format allowed attackers to decrypt the pre-shared key. Understanding the original bug is not that difficult and helps a lot to see why Marvin is a new-old problem.

Where is the problem coming from?

PKCS#1 is a format used within the RSA algorithm set that formats messages when they are too short. Padding is used to prevent them from being easily decrypted. In SSL/TLS a message is sent as a pre-shared-key, which is not very long and therefore must be padded to the length of the module. If the pre-shared-key is 48 bytes, then whatever remains up to 256 bytes (2058 bits key) is padding. All concatenated information is filled with this information:

0x00 + 0x02 + random numbers always non-zero to fill in+ 0x00 + swap-key.

The 0x00 and 0x02 mark the beginning. The SSL/TLS server will thus know that the message starts, that it must delete everything up to zero and then from 0x00 onwards it keeps what it is interested in.

The SSL/TLS server is used as an oracle for the attack. It is constantly bombarded with random strings and depending on its response to whether the padding is correct or not, we are guessing data. This was the origin of Bleichenbacher's original paper describing "the million message attack".

Given the way RSA works, the attacker must clear the m from this equation.

ms (mod n)

Where s are the random messages you send.

The server "oracle" will receive the completely random string s as if it were encrypted, it decrypts with the key that the server knows but we do not.

If we are lucky enough that when decrypting the random s, we get a 0x00, 0x02 at the beginning and then a bunch of random data without a zero, and after the 0x00 a string (something that will happen every 30,000 or 130,000 attempts) the server will respond that it is well formatted, but it is not encrypted with the right password.

And with that binary response, even if the message was not decrypted properly, the attacker will have managed to narrow down the range of possible keys. Clearing the equation, the result of ms (mod n) will be in a very specific range that the attacker can continue to narrow down with more and more questions to the server. After several million queries, it might know the answer.

But what if the server doesn't respond – no oracle? That's when side channel attacks come into play.

Even if it doesn't respond, there may be millisecond differences in its response if, although incorrect, the message is well formatted. These slight time differences will give you the clue. This is an important detail to understand Marvin.

Successive attacks

Further improvements were made to PKCS to counter this attack, so that completely random data could not be sent, and the padding obeyed a circumstantial hash (RSA-OAEP). However, with slight differences, the attack was still possible thanks to time side channels and that many SSL servers still use PKCS 1.5.

In fact, a variant of the attack was discovered in 2018. They called it ROBOT, "the Return of Bleichenbacher's Oracle Threat." Many servers were vulnerable. An attempt was made to fix it, playing with the timing. ROBOT showed that the mitigation implemented for the original bug was implemented incorrectly and attacks with the same basics as the original were still possible.

Successive fixes so that slight millisecond differences in server responses did not help the attacker were thought to be sufficient. But no.

Marvin, this new attack, which in reality is basically the ROBOT attack but more precise, has come to give it a major twist. They suspected that since 2020 the error would still be there, and they started an investigation that concludes that, thanks to much more exhaustive measurement and more precise statistics, there is no solution, because implementing it for "Marvin" is even more complicated than ROBOT.

Time-based side channel attacks

Fundamentally, what they have achieved is precision with timing. They haven't invented anything. Simply put, it was believed that timing attacks (from a side channel) were no longer possible because they wouldn't provide any clues to the attacker due to their insignificance. It was agreed that as long as the responses remained in the order of nanoseconds, they would no longer be considered exploitable.

According to the discoverers, however, it wasn't testing well. It took them three years to increase the amount and accuracy of testing until they were able to perceive differences on the order of a few CPU cycles difference in response, across a production network for miles, hopping six routers along the way. All thanks to sufficient data available and better statistical techniques. Impressive.

For instance, in GnuTLS a time difference in the response was leaked in other parts of the code (the one that decided what kind of error to show if the debug was active). Implementing a workaround for this involves modifying the software too much, and in practice, impossible. These imperceptible details have resurrected side channel attacks that were thought to be dead.

Therefore, having discovered that since there is no possible implementation that is "side channel free", anything that uses padding in RSA with PKCS#1 v1.5 could be susceptible to attack with this new method. It affects almost everything: OpenSSL, GnuTLS, Mozilla's NSS (even after the patch, according to the discoverer), M2Crypto....

The solution?

Stop using PKCS#1 in RSA for TLS (TLS 1.3 already does it, abandon all padding as much as possible after many patches in its previous versions).

Nowadays it is also possible to switch to ECDSA instead of RSA.

Four cyber security milestones that shaped the future of malware

Image from Fabrikasimf on Freepik.