At Eurosoft, diagnosing computer hardware issues is what we are all about. Recently a customer wrote to us inquiring about a current hardware issue and what tests we have to address it. The issue is known as “row hammer” and is seen on DDR3 and DDR4 memory modules. Row hammer can cause data corruption on these types of memory modules which are prolific; currently occupying millions of memory slots on server motherboards in data-centers around the world.
First, we’ll give a small bit of background information in order to properly explain the issue.
Capacitors are electronic sub components that hold an electrical charge. A capacitor is made out of two conductive surfaces that are close to, but not physically touching each other. When voltage is applied to a capacitor, an electrical charge is formed between the two surfaces inside the capacitor. When this occurs, the capacitor is said to be charged.
Some applications will repeatedly check the same row of memory to monitor for a task to complete. When this happens, the memory controller issues an “Activate” command for each memory check.
DDR3 and DDR4 memory cells are tiny capacitors and are arranged in rows. If a row of memory cells becomes repeatedly “charged” or hammered by the memory controllers “Activate” command, they can induce a loss of charge on physically adjacent cells. Cells that lose charge are known as a bit flips or coupled bits. Since other applications could be using adjacent rows of memory cells these coupled bits could cause data corruption. The loss of electrical charge is induced through electromagnetic coupling, or leaked through conductive bridges or hot-carrier injection. Below is a graphic depicting coupled bits as a result of electromagnetic coupling.
According to a report from Yoongu Kim (Kim Yoongu, Flipping Bits in Memory Without Accessing Them DRAM Disturbance Errors, http://users.ece.cmu.edu/~omutlu/pub/dram-row-hammer_kim_talk_isca14.pdf), roughly 85% of DRAM produced from 2010 to present is affected by row hammer.
The Microtopology Memory Tests in Pc-Check and Pc-Check Windows are designed to cover reporting on the row hammer problem. The issue is reported as a Coupled Bits Detected error.
Eurosoft’s Microtopology Tests provide an exceptionally rigorous method of testing PC memory, and produces a highly reliable diagnostic report. It is a time-based test using a special “Microtopological Locality” algorithm, and is exceptionally sensitive to issues of noise and timing in the memory system as a whole.