Current AI accelerator chips like the AMD MI300 or the NVIDIA H100 have a thermal design power (TDP - a metric for how much power needs to be drawn away from the chip to remain operational) of around 500W. This is about the limit of what very advanced air cooling can deliver. Moreover, such chips are usually assembled on larger blades that go into a server rack. Such racks now have a total power consumption of several 10s of kWs. Again, at the limit of what air cooling can supply. So, either designers find a way to make those AI chips more power efficient and run cooler, or we need to find a better way to cool the next generation of more performant and power-hungry AI chips and server racks.
Many attempts are ongoing to design more power-efficient AI chips, ranging from in-memory computing via analog to optical computing. This topic justifies a separate analysis. In this article, we will take a closer look at the different cooling options.
Liquid cooling comes in many guises. The most common technique used today is a so-called cold plate. These are direct replacements for air heatsinks, those metal fins you see when opening your computer, sometimes also containing a fan. Replace the fan with a pump and the air with a liquid, and you will have a single-phase liquid cold plate. Obviously, the liquid must be carefully contained inside the heatsink to not damage the electronics. A similar scenario applies when the liquid effectively evaporates under the chip’s heat. Such a system is known as a two-phase cold plate. Liquid cold plates provide a lower thermal resistance to heat transfer than a traditional air fan. This means they can dissipate more heat with minimum rise in the chip’s temperature.
Ultimately, the best solution is to eliminate the thermal barriers and bring the coolants closer to the chip. A first option is to directly spray a liquid on the backside of the chip as demonstrated by imec. These so-called jet impingement coolers spray cold liquid directly onto the backside of the chip, with a higher flow rate for more cooling, targeting the hottest parts of the chip. The coolants are in direct contact with the chip, providing more effective cooling than traditional cold plates. Disadvantages are that care has to be taken so that the liquids do not interfere with the electrical operation of the chip and that thermal warping can occur, leading to mechanical yield issues.
An alternative, potentially even more effective approach, is to bring the liquids directly into the chip substrate as demonstrated at EPFL and at TSMC. With this approach, microchannels are etched into the substrate. These provide very close access for the liquids to the hot transistor junctions, reducing the thermal resistance and increasing the efficiency of this cooling method. A drawback is that such channels may impact the electrical performance of the transistors as well as the mechanical stability of the die.
An alternative method involves submerging the entire server in a dedicated dielectric liquid. This technology is known as immersion cooling. Such systems can use a single-phase or two-phase cooling loop, depending on whether the server’s heat transfer changes the phase of the dielectric liquid. Because of the direct contact of the liquids with the chips, these systems can be very effective coolers. Their disadvantage is the size and weight of the liquid baths containing the servers, which requires drastic design changes in data centers. Maintenance and uptime are also often cited as concerns when it comes to immersion cooling.
Recently a cross-over promising the best of cold plates and immersion systems is emerging. Companies like Ferveret and Iceotope have come up with solutions in which compute blades are individually submerged in stand-alone liquid-cooled chassis. These systems combine the efficiency of immersion cooling with the modularity of cold plates.