Clock-throttling isn't the answer: Innovative thermal design supports real-time military application needs
StoryJuly 21, 2016
The extended temperature requirements of many military applications often mean slowing down the clock of a hot CPU. Clock-throttling of today's powerful processors works against military users' needs for high performance, but reliability and size/weight requirements of typical systems leave them few other valid choices. Instead, innovative cooling-system design offers the full advantage of high-performance processors without running over the device's recommended temperature range.
Multicore processor architectures offer high-performance computing platforms that are optimized for the size, weight, and power (SWaP) requirements of defense and electronic warfare (EW) programs. The challenge, however, lies in designing a conduction-cooled platform that maximizes processor performance and does not compromise system reliability in extended-temperature environments.
Processor clock-throttling – a feature of modern processors that reduces the operating frequency of the processor in response to over-temperature conditions – has become an accepted design practice when operating over the industrial temperature range of -40 °C to +85 °C. However, for military customers of mission-critical, real-time applications, this approach may be unacceptable when lives are at risk, such as in systems using gunfire-control systems, EW countermeasures platforms, or radar high-power attach-mode operating systems.
Rugged system designs that employ clock-throttling are typically a patchwork application of external temperature sensors, thermal gap-pads, heat sinks, and metal cooling plates that provide a near-non-clock-throttling solution that yields a temperature delta of 20 °C to 25 °C between the rugged system’s processor core and the system’s cold plate (the system’s conduction-cooling interface). The challenge is to design a thermal management solution that minimizes clock-throttling but prevents the processor from operating at temperatures beyond its rated temperature limits. Qualitatively, this is not a bad thermal solution, but quantitatively it is not optimal.
A better option is a top-down, conduction-cooled, true non-clock-throttling design with a thermal delta of 10 °C between the rugged system’s processor core and cold plate over the industrial temperature range of 40 °C to +85 °C. This design enables military applications to take full advantage of high-performance processors without operating beyond specified system temperature limits or diminishing system reliability.
The thermal delta design challenge
To minimize clock-throttling while staying within the operational temperature limits of the processor, most system designers employ a disjointed amalgam of thermal gap-pad, heat sinks, cold plates, and thermal sensors under software thermal-management control. These approaches are integral to any good thermal-management solution, but taken separately, each of these design elements have limits to minimizing the thermal delta at the interface between the base of the conduction-cooled system and the processor’s die. Bottom line: the thermal delta is where the rubber hits the road.
Given that the maximum temperature at the base of a conduction-cooled system (thermal mounting interface) is 85 °C by specification, designers must produce a thermal-management solution that minimizes the thermal delta at the interface between the base of the conduction-cooled system and the processor’s die. For example, a design that produces a thermal delta of 20 °C to 25 °C when operating in environmental conditions of extreme heat, with a processor that has a maximum die temperature of 105 °C, will be forced to de-rate or throttle the operating frequency of the processor in order to maintain die temperatures within manufacture’s specification. Throttling will be required because for constant power dissipation, the thermal delta is constant. In other words, the thermal delta “tracks” linearly over the industrial temperature range of -40 °C to +85 °C. Therefore, if the base temperature of a conduction-cooled system under review is at +85 °C and the thermal delta for the system is 20 °C to 25 °C (up from the base temperature), then the processor will operate at 105 °C.
One important goal of a rugged, conduction-cooled system is to minimize the thermal delta between the base of the conduction-cooled system and the processor’s die. On the other hand, if the thermal delta could be constrained to only 10 °C, the processor die temperature will be 95 °C when the base of the system is at the maximum industrial temperature of 85 °C. Over constant power operation, the CPU would not reach the maximum die temperature of 105 °C. (Note: Most modern processors are “cavity down” and the top of the CPU package is actually the back side of the CPU die. Therefore, Tcase and Tjunction (die) are treated the same.)
A new approach for minimizing the thermal delta
Another approach employs a corrugated alloy slug with an extremely low thermal resistance to act as a heat spreader at the processor die instead of the typical use of thermal gap-pads to conduct heat from the CPU to the system’s interface to the cold plate. Once the heat is spread over a much larger area, a liquid silver compound in a sealed chamber transfers the heat from the spreader to the system’s enclosure. This approach yields a temperature delta of 10 °C or less from the CPU core to the cold plate, compared with more than 25 °C for typical approaches. This approach also requires that all printed circuit boards (PCBs) be designed with multiple power and ground planes, with specific thermal-management techniques for optimizing the heat flow from the CPU and other high-power dissipation devices to the system’s base. The base is the “base plate” as shown in Figure 1. In addition, internal and external thermal sensors that are monitored by layers of thermal-management software provide protection in the case of power spikes at unexpected processor loads. This multifaceted approach enables systems using Intel processors with a TjMax of 105 °C to operate in an industrial temperature environment (-40 °C to +85 °C) at full operational load without throttling the processor.
Figure 1: An illustration of General Micro Systems’ conduction-cooling system design.
(Click graphic to zoom)
Controlled experiment
We performed experiments on two configurations of rugged systems, one with the corrugated alloy design shown in the illustration, and the second without this technology, but with a carefully designed application of heat sinks and thermal gap-pads. In both experiments, throttling was disabled in BIOS settings and the onboard Intel Core i7 processors were loaded to 100 percent performance via third-party, off-the-shelf software. A typical profile for a formal thermal cycle as conducted in a controlled thermal chamber is illustrated in Figure 2.
Figure 2: A typical profile for a formal thermal cycle, illustrating the temperature of the base of a rugged system as measured by precision thermal couples. The flat part of the curves represent steady-state soak (dwell) times.
(Click graphic to zoom by 1.9x)
For the newly designed system, the processor temperature tracked linearly at approximately 10 °C above the base temperature of the system (as illustrated in Figure 3) over the rising temperature cycle as shown.
Figure 3: An illustration of a 10 °C delta between the base and processor as measured on a military embedded system using General Micro Systems’ conduction-cooled technology. The red line is a measure of the temperature of the system’s base, while the blue line is a measure of the system’s processor.
(Click graphic to zoom by 1.9x)
In the second, conventional, configuration, the processor temperature tracked linearly at approximately 20 °C above the base temperature of the system, as illustrated in Figure 4. If throttling were enabled in the BIOS settings, this system would clock-throttle.
Figure 4: An illustration of a 20 °C delta between the base and processor in a system using traditional heat sinks and thermal gap-pads. The red line is a measure of the temperature of the system’s base, while the orange line is a measure of the system’s processor.
(Click graphic to zoom by 1.9x)
The application of the conduction-cool technology kept the processor temperature below the specified maximum when operating at 100 percent processor load and at a system base temperature of +85 °C while throttling was disabled.
It’s clear that clock-throttling at maximum processor performance need not be an accepted design practice. A well-designed conduction-cooled system that minimizes the thermal delta at the interface between the base of the conduction-cooled system and the processor die maximizes processor performance without clock throttling. Moreover, this thermal performance meets the high-performance, high-reliability, and minimum-SWaP demands of military customers.
Rick Neil is a senior hardware design and production engineer at General Micro Systems, responsible for the GMS flagship SB1002-MD and SB1002-MDv3 rugged computing systems. He has 25-plus years of military and commercial design experience at Xerox, TRW Space Systems, Rainbow Technologies, Hughes Electronics, Lockheed Martin, SAIC, and SCE. Rick holds a BS in electrical engineering with an emphasis in computer architecture from the California State Polytechnic University, Pomona. Contact the author at [email protected].
General Micro Systems www.gms4sbc.com
Sidebar 1
(Click graphic to zoom by 3.0x)