Another approach is equalization. This has been on the drawing board for some time, but it finally appears to be gaining traction. “Equalization improves the data you receive at the end of your channel,” Synopsys’ Allan explained. “In very simple terms, it’s like intersymbol interference. When a series of bits is being transferred across the channel, by the time one bit is done, it’s actually into the next bit’s time domain. Signals going up and down and switching from ones to zeroes take longer than one unit interval. You’re not starting from a steady state low potential. You’re starting from a higher state. You’re offsetting the sampling point in your input receiver using decision feedback equalization. So how can I now optimize my input receivers, such that I’m going to have a similar margin for a one and a zero detection? I’m not really sensing something to put the reference voltage exactly in the middle.”
Also on the horizon is in-memory computing. While there have been a couple of commercial approaches using MRAM, researchers at Princeton University in a 2019 paper demonstrated an external DRAM controller in an FPGA that can be used with off-the-shelf DRAM to create a massively parallel computation. The researchers claimed this approach overcomes the so-called memory wall, in which logic performance has outpaced memory bandwidth.
So how much SRAM gets used versus DRAM? There is no simple formula for this, because isn’t an apples-to-apples comparison.
“There really is no magic way to do this,” said Cheng Wang, CTO and co-founder of Flex Logix. “Most of our design tradeoffs come from a performance estimation that models SRAM bandwidth, SRAM capacity, and DRAM bandwidth. Those are our three primary knobs. And basically, we have four standard sizes of computing, with different amounts of SRAM and DRAM bandwidth for our standard offering for IP. That’s based on our empirical data of running models to determine what works better. Some models can run better if we have 2X the amount of SRAM. If you can almost double your performance by doubling the SRAM, and you put another 20% of the area for 2X performance, that’s great. But there are a lot of other models that wouldn’t be able to benefit from the additional SRAM, and then you’re adding that area for nothing. That’s why it’s important to have cycle-accurate performance estimates. It’s not accurate down to a single cycle in our case, but it is accurate to 8%, which is more than what we need. And then you can do a lot of architectural analysis of proper SRAM/DRAM compute tradeoffs, which may differ by the type of workloads.”
This is complex math, and it’s becoming more complex as systems are disaggregated into heterogeneous elements, such as chipsets. “SRAM requires more transistors per bit to implement. It is less dense and more expensive than DRAM, and has higher power consumption during read and write,” says Takeo Tomine principal product manager at Ansys. “Currently, SRAM is designed on advanced finFET technology nodes that a CPU typically is designed with, and a finFET device is more prone to thermal effects (self-heat) due to higher thermal resistance of the device.”
In some cases, what type of memory to use, and where to use it, may come down to the expected lifespan of a device. “There are two major reliability concerns that lead to lifespan reduction of memories,” Tomine said. “One is that interconnecting reliability with technology node shrinking leads to the lifespan of memories due to self-heat causing severe electromigration (EM), which is one of the most critical reliability issues. EM lifetime improvements by material and process technologies continue, along with technology scaling. The second is reliability challenges from different architectures of devices. In moving device architectures from finFETs to nanosheets to CFETs, thermal resistivity increases drastically, which translates to a higher delta T device channel. Device self-heat will couple with metal joule heating. Self-heating of a device will impact gate oxide breakdown (time-dependent dielectric breakdown), and also will degrade HCI (hot carrier injection), which will worsen the BTI (bias temperature instability) of the device.”
Reliability is a measure of the ability of a memory device to perform without failure for a given amount of time. That timeframe can be very different for a smartphone, which is expected to last 4 years, versus automotive, military, or financial server applications, where the life expectancy can be 10 to 15 years (or more). Being able to understand the potential interactions that can affect the lifespan of memories is critical, and they can vary by architecture and by memory type and usage.
That also affects what kind of memory is used, and the overall system architecture. So if memories can be swapped out, lifespans are less relevant than if those memories are embedded into some type of advanced package and sealed up. “It’s like having a pool of DRAM cards, which can be upgraded today,” said Renesas’ Kanigicherla. “With HBM, you can’t do anything if something goes wrong, so you’re throwing away a very expensive chip. On the CPU side, the servers are very closely attached, and there is not much you can do to upgrade anything. That’s why this global shared memory concept works. Some of these solutions come in automatically.”
Latency adds another tradeoff. “Especially with HBM, you’re putting the processor and the DRAM very close together,” said Frank Ferro, group director for product marketing in Cadence’s IP Group. “There are a lot of advantages to doing that. HBM has been advancing pretty rapidly. We see improvements almost every two years in performance. So that curve is steep. But from a system design standpoint, 2.5D is still a challenge. Optimizing the interposer and helping customers design that is really a key part of the conversation.”
Since its invention in 1967, DRAM has been a linchpin for computing. While numerous memory technologies have challenged it over the years, nothing has displaced it. And given the frenzy of activity surrounding this technology, nothing will displace it in the foreseeable future.
Rather than one type of DRAM, there now are many types, and each of them is evolving and spawning new ideas. There is innovation on every level, from the physical connection of the memory to the processing elements, to the pooling of memory outside a rack of servers. And there is work underway to shorten the distance signals need to travel between memory and processor cores, which would reduce the amount of power needed to move that data and the time it takes per cycle.
Put in perspective, DRAM remains a dynamic and innovative field, and there are more innovations on the horizon and different ways to put together a memory solution that can have a big impact on performance, cost, reliability, and longevity.
1. Direct comparisons between SRAM and DRAM costs are not always clear-cut, according to Objective Analysis’ Handy, because SRAM is sold by the chip and DRAM by the byte. On the spot market, which is a clearinghouse for excess inventory, untested and unmarked DRAM sells for as little as $1 per gigabyte, while the same amount of SRAM would cost roughly $5,000.