This article explains how modern computers work and how the Meltdown bug exploited them. It details the process of exploiting a vulnerability in the CPU’s non-sequential execution technique to steal sensitive information from the system.
If you’re an avid follower of IT news, you may remember the security issues Meltdown and Spectre from a long time ago. These two bugs are powerful vulnerabilities that can steal information without anyone realizing it, no matter what kind of target you’re trying to steal it from. Attack tools like traditional viruses and malware leave a record of the attack, making it easier for antiviruses to catch them. However, Meltdown and Spectre leave no trace and can get to the heart of a computer program. That’s because of how Meltdown and Spectre work. In this article, we’ll explain how modern computers work and how the Meltdown bug works.
First, let’s understand how computers work. Computers can be divided into two main parts: the CPU for computation and RAM for storage. CPU stands for Central Processing Unit and is responsible for the overall task of calculating and processing data according to instructions. RAM stands for Random Access Memory and is responsible for storing the series of instructions that the CPU needs to execute, as well as intermediate results during calculations. This calculation also includes information from programs, which can be as small as storing player health for the game you’re currently playing, or as large as storing information from the operating system and kernel, the heart of what makes your computer work.
When the CPU uses RAM, it writes data to it via addresses. Computers can’t write numbers and letters freely like we do with exercise books. Everything is written in bytes, which means that a byte is a string of eight zeros or ones. When converted to decimal, only numbers from 0 to 255 can be stored in RAM. When writing characters, each character is converted to its corresponding number and stored. For example, the string “apple” is converted to 97, 112, 112, 108, 101, and stored in RAM in that order.
RAM was revolutionarily faster than hard disks at the time of its development, but as computers evolved, they required faster memory than RAM. A prime example is cache. Cache is a memory space that functions similarly to RAM, but exists in a separate space. Cache is built into the CPU by default and is much faster than RAM, but has a smaller capacity. When the CPU is processing instructions, it stores the results of calculations that are likely to be used frequently in the cache instead of RAM, which greatly speeds up processing.
As computer performance has increased, so have the processing speeds that programs demand of the CPU, and the CPU can no longer keep up with the demands of programs using classical sequential processing. It reached its de facto physical limits. So, starting around 2000, CPU manufacturers introduced a new feature called out-of-order execution. The idea is that instructions are stored sequentially in RAM, but the CPU executes earlier instructions first and stores the results so that when another instruction is needed later, it can quickly fetch the cached computational results to speed up overall processing.
But what if an earlier instruction is later found to be invalid? For example, if it asks for the password data for an administrator account stored in RAM, the CPU must stop executing the instruction. However, in non-sequential execution, the instruction is executed without checking its validity, and the resulting data is stored in the cache. If the program were to request this result data in a normal way, it would deny the request, even if the instruction had already been executed, but there is a flaw in this structure, and Meltdown exploits this weakness.
Now let’s look at the structure of a Meltdown attack program. We need a stopwatch to time the data read. First, the cache is cleaned by calculating the cache’s capacity. This is done to initialize the data in the cache and leave nothing to chance. Next, it gets the data at a specific address in RAM that contains the important data we want to know (let’s call this data α) and stores the random data in a space equal to 1000+α addresses in memory. The instruction is rejected because it fetches data from a protected address (which is the operating system’s information, so it should be protected), but due to non-sequential processing, the instruction is executed before it is rejected and eventually 1000+α addresses of data are stored in the cache.
The attacker then reads through the memory, measuring the time it takes to read the data at each address. While the access rate is generally constant, there are some addresses that read data at a dramatically faster rate than others. This is the data from the 1000+α rooms we just calculated and stored. If we eventually figure out which rooms are stored in the cache, we know the value of 1000+α, and we can find the value of α, which is important information about the system. Repeat the process and you’ve got all the information that’s important to the security of your computer, including passwords to administrator accounts.
What makes Meltdown so scary is that the non-sequential execution technique is used across Intel CPUs, and is also used by Qualcomm, Samsung, and other companies that make CPUs for mobile devices. In other words, almost every electronic device on the planet is exposed to this threat. Intel’s competitor AMD is not exposed to Meltdown because instructions are checked first when performing non-sequential execution, but it is exposed to the more difficult-to-handle “Spectre,” a bug that attacks vulnerabilities in speculative execution rather than non-sequential execution. As a result, all CPUs are exposed to the Spectre-Meltdown bug.
Many operating systems, including Windows, have issued emergency patches for this bug, but because it’s a silent attack, it’s impossible to know if you were attacked before the patch. Additionally, installing the patch was a way to block non-sequential execution, not a fix, which caused the computer’s processing speed to drop dramatically. Since this is a CPU design issue, there’s nothing you can do about it without replacing the hardware.
So far, we’ve explained how modern computers work and how Meltdown exploits them. The CPU, which is responsible for the computations in a computer, is equipped with non-sequential execution, where instructions are pre-executed, stored, and retrieved when needed, which dramatically speeds up processing, but the vulnerabilities in this process have led to serious security flaws. Along with Meltdown, there’s also a bug called Spectre, which we won’t discuss here because its explanation requires further explanation of speculative execution and attack techniques. However, Spectre is an example of a technology that was introduced to improve CPU performance that ended up causing problems. Hopefully, this will serve as a precedent for others outside of the computer field to think hard about performance improvements and see if we’re missing something important.