A Pink Elephant at the recent OCP Summit

The 2023 Open Compute Project Global Summit happened almost two weeks ago, and it may have very well been the best, and perhaps most well attended Summit yet. The CXL track was excellent with a good balance of today’s memory expansion use cases, plus the more forward thinking pooling and disaggregation sessions, and I was very happy to see the step function increase in optical interconnect related sessions! Unfortunately I was also confronted with a frustrating “Pink Elephant” of sorts during the keynotes on the first day and I had found I had trouble shaking this concern for the remainder of the Summit.

I decided to call this a Pink Spotted Elephant because, just like the pachyderm of lore, it stands obtrusively in the middle of the room yet no one seems to see it, or at least they don’t speak of it.

*Zane Ball (VP Intel Corp.) Keynote at 2023 OCP Summit*¹

The Elephant appeared for me as Zane Ball (Vice President, Intel) was showing off a couple of Quanta DC-MHS motherboards on stage. Right in the middle of a quite informative presentation on Intel’s work supporting Modular Ecosystems and moves in support of sustainability, I found that all I could see and focus on was just how much area on those boards were taken up by multi-decade old DIMM memory sockets. It’s as if they were glaring at me!

The visual should not have been epiphanal. I and many many others who have a history in the world of data center infrastructure have seen countless motherboards in innumerable permutations of form factors, socket counts, and technology generations, with DIMM memory sockets serving as a common thread across them all. Perhaps it was the lighting – highlighting the shiny DIMM sockets – or maybe it was the irony that these boards were being shown off as archetypes of new thinking and standardization in modular system design yet they are visually dominated by a relatively old (although necessarily evolved through the years) technology.

For those who are not familiar with the history, DIMMs have have been used from the early SDRAM days up to the current DDR5 and evolved from the simplified yet very similar SIMM technology used on the ancient fast page mode and EDO DRAMs of the pre-Intel-Pentium era. Combining these two generations, one could say that we are using the same baseline technology for memory interconnect today that was originated in the 1980’s! To be fair, I could rightfully be accused of some hyperbole given the early memory bus speeds were in the 33-66MT/s range and modern DDR5 platforms have a bus speed of 6400MT/s – two orders of magnitude improvement in bandwidth and performance. Even with recognizing the tremendous work from Industry that was required to achieve this – in SoC/DRAM PHY and connector technology improvements, and the waves of pain that occurred along the way (I particularly remember the days circa DDR1-2 where motherboard and DIMM designers had to learn that transmission lines are not just those wires on poles that bring power to the data center!) – the DRAM interface is still composed of a wide, single-ended, bidirectional bus with a connector to match, just as it was in those early SIMM days!

One of the server boards Zane introduced us to is based on Intel “Monument Creek” technology (Note: there is a roughly equivalent standardized version being developed in JEDEC called MRDIMM² ). It’s nothing short of a bandwidth beast with 12 channels capable of 8800MT/s bandwidth – an aggregate over 4TB/s per socket! As impressive as this is, I submit that its existence is beginning to show the spots on our Elephant.

*Quanta 12 Memory Channel Granite Rapids DC-MHS Board on display during the keynote.*

CPU manufactures are capable of producing SoCs with 100’s of cores and the count keeps increasing. Although the “single thread” performance of any one core is only growing incrementally, that combined with the core count growth means there is an ever increasing need for memory bandwidth such as that needed to feed this 288 core Sierra Forest capable monster. (To be fair, AMD has its own beast with the 256 thread, 12 memory channel EPYC Bergamo. )

Our metaphorical elephant begins to materialize when we look at the data-rates and channel counts this memory interconnect technology is being pushed to in order to feed the never ending hunger for server platform scaling. Systems in the past evolved from four to eight, then ten and now twelve memory channels (and the future likely will reveal even higher counts with 16 channels rumored for AMDs future “Venice” CPU²) to scale memory capacity and bandwidth for growing core counts and the general demise of multi-socketed scale-out platforms. A quick count reveals about 150-ish signals from a CPU for a DDR5 channel so 12 channels requires ~1800 pins of the CPU socket! That is an incredible number of signals that have to be routed on package from the chiplet die, to package/socket assignments, then converted to sets of printed circuit board (PCB) traces via a package “breakout”, routed across the PCB in parallel with other 150-ish count channels to their final DIMM socket destination – all while maintaining good ordering (so they don’t have to cross back and forth over each other), signal integrity of the traces, avoiding crosstalk between the traces, ensuring proper reference plans and return current paths for the signaling, etc. Even if you are not a designer and don’t know exactly what I am talking about with some of this, you should take away the fact adding channels is getting very hard to do. Hard means more complex and larger CPU packaging, more layers on the motherboard to simply be able to route the signals, and in some cases more expensive motherboard material that is more better for the transmission of the types of high speed signals.

As we refer back to the sever board from the keynote, we not only notice the twelve channels, but also the introduction of 8800MT/s MCR enabled channels (Zane used the term “Monument Creek” but that appears to be equivalent to the MCR (Multiplexer Combined Ranks) technology publicized by SKHynix and Intel⁴).

*MCR Technology depiction (annotated SKHynix graphic)*

This is an exciting technology, enabled by a special data buffer that is designed into the MCR DIMM, but it exists for a reason. A standard RDIMM (no special buffer) that is used on by far the majority of servers today is just not capable of these bus speeds at this point, with 6400MT/s being the leading edge for production volume. It also appears that MCR technology may be limited to one DIMM per channel. What all this means is that the ability of the existing single-ended signaling multi-DIMM DRAM channel to scale much faster in bandwidth is hitting a wall. Although up to 8800MT/s second DIMMs are on the industry roadmap, I would not expect to see them until later in the decade, and they will almost certainly be limited to one DIMM per channel and likely a max of two ranks (a limit to how many DRAMs can be on the DIMM). The need for MCR-DIMMs is in a way Intel’s (and soon JEDEC ‘s) way of saying “we’re starting to see the elephant!”.

If channel counts are the elephant’s ears, and the bandwidth or signal speed limits its trunk, then the physical DIMM connector are its tusks.

Looking at at this photo taken from the OCP Expo Floor of an Intel DC-MHS motherboard one can estimate that the DIMM sockets are taking up a whopping ~30% of the mother board real estate in order to support the desired memory channel count while still allowing a tiny bit of room between the DIMMs for cooling airflow.

Not only do today’s DIMMs monopolize real estate, but they are loosing the battle to I/O speeds and signal integrity. DDR5 DIMMs are a FAR cry from their earlier SDRAM/DDR1 brethren – now optimized for surface mount, having an absolute minimum profile, and tremendous work having been put into the actual interconnect mechanicals to maximize reliability while minimizing signal discontinuities. Even with the advances, the connector technology brings with it more and more crosstalk and signal discontinuity complexities as speeds increase. It might be worth noting that NVIDIA did not choose to put the Gracehopper LPDDR memory on any sort of DIMM technology based field replaceable module. Speed and density were I’m sure prime considerations, but there had to be thought of the inevitability of a few dollar LPDDR component bringing down a $xx,xxx unit with no field replacement capability.

Although I found myself bothered by the visuals during the keynote, saying “I see the elephant!” isn’t meant to be a foretell of doom, but instead of opportunity. Dell and IBM along with partners have seen this opportunity and are having an impact on where Industry may go next. IBM began the effort in the DDR4 generation in support of their P10 product with what is called a DDIM or Differential DIMM.

*SMART Modular DDR4 Differential DIMM (DDIM)*

IMHO the DDIM is foreshadowing of things to come for memory interconnect. For the DDR4 generation it achieved the same memory bandwith as a top end DDR4 RDIMM, but with much fewer signals, and a much smaller connector. In order to do this, it broke from the single-ended, bidirectional protocol used in DIMM applications and instead used very high speed (25Gb/s vs DDR4 3.2Gb/s) differential signaling, an optimized OpenCAPI Memory Interface (OMI) Protocol and a much smaller connector. Like the MCR DIMM discussed above, it also used a buffer between the memory channel and the DRAMs but this is likely the new norm we need to accept for removable memory modules in the future. The Power CPU/Platform team at IBM have developed a revolutionary, extensible memory platform with the DDIM, and Industry would be well served by learning from this example as we look beyond DDR5.

Dell is also leading a revolutionary charge to move away from DIMMs (or their cousin SODIMMs in the case of client usage such as laptops) with their CAMM technology. CAMM (or “Compression Attached Memory Module”) provides a very dense and high speed friendly interconnect. In this configuration the memory exists in a mezzanine form versus the vertical DIMMs, but one could imagine creative ways to stack these vertically and fill the volume above a server board with memory. This technology improves both the signal density (and PCB connector area) and signal fidelity through the connector for ever increasing interconnect speeds.

Where am I going with this? To some degree this is an Industry “call to action”. I know from numerous conversations with companies in leadership positions within Industry and JEDEC that the problem is recognized, but I feel it is still being attacked with incremental solutions and thinking. Perhaps it’s time for all to say “we see the Elephant” and ramp up some of the revolutionary thinking we are seeing from a few such as those pointed out above, and revamp the memory subsystem interconnect for the server platform generations yet to come!

Note to readers who attended the OCP Summit: No actual pachyderms were spotted in the San Jose Convention Center, so you didn’t miss out on any excitement!

Open Compute Project – Youtube, “From Edge to Cloud System Design in the Era of AI”
https://www.tomshardware.com/news/amd-advocates-ddr5-mrdimms-with-speeds-up-to-17600-mts
https://www.inspire2rise.com/amd-next-gen-epyc-venice-processor.html
Based on Intel Hotchips presentations