GPU and FPGA go head-to-head
StoryNovember 16, 2020
Today’s embedded-system designers have a great variety of processor types to select from, with FPGAs [field-programmable gate arrays] and GPUs [graphics processing units] adding their own various advantages and disadvantages for consideration in contrast to the more familiar CPUs [central processing units]. Understanding these characteristics and how FPGAs and GPUs stack up can help system integrators make the right choice when choosing and installing a processor, to be used either individually or in combination with other types of processors.
Today’s embedded-system designers have a great variety of processor types to select from, with FPGAs [field-programmable gate arrays] and GPUs [graphics processing units] adding their own various advantages and disadvantages for consideration in contrast to the more familiar CPUs [central processing units]. Understanding these characteristics and how FPGAs and GPUs stack up can help system integrators make the right choice when choosing and installing a processor, to be used either individually or in combination with other types of processors.
FPGAs are hardware implementations of algorithms, and since a hardware implementation usually operates faster than a software implementation, they perform very well. Unlike FPGAs, GPUs execute software; performing complex algorithms takes many sequential GPU instructions compared to an FPGA’s hardware implementation. The advantage of a GPU is its high core count, which enables certain parallel algorithms to run much faster than a CPU, especially those using floating-point calculations. A 1,000-core GPU can run 1,000 floating-point calculations every clock cycle. For signal- and image-processing applications, the GPU is a natural fit. GPU performance typically beats CPUs for highly parallel math-intensive applications, and they are getting close to parity with FPGAs for performance per watt.
Historically, one drawback of FPGAs is that they are much harder to program compared to CPUs and GPUs. Software for CPUs is typically programmed using one of many readily available programming languages, such as Java, C, or Python. FPGAs are programmed with a hardware description language (HDL) such as Verilog, or a very-high-speed integrated circuit hardware description language (VHDL), which translates directly to FPGA logic cells. GPUs are often programmed using a software framework that shields the user from having to write code specifically for the GPU; instead, code is written at a high level. The same is becoming true for FPGAs: Software development frameworks are being designed that enable FPGA programming without HDL [hardware description language]. FPGA vendors have made frameworks available and built toolkits into their development environment, negating the need for direct HDL programming.
Heterogeneity/fabric connectivity
Embedded applications often require heterogeneous system architectures that combine CPU, FPGA, and GPU elements. While traditional embedded applications may include a single CPU and GPU processing element, some processor-intensive platforms integrate multiple CPU, GPU, and FPGA engines, implemented either on a single or multiple discrete cards connected over high-speed PCI Express (PCIe) or Ethernet fabric backplanes to communicate and execute tasks in parallel. Alternatively, some of the latest standalone GPU-accelerated modules offered by NVIDIA (i.e., Jetson AGX Xavier) integrate more than a half dozen different compute engines on a single system-on-module (SoM) that includes a CPU, GPU, Deep Learning accelerator, vision accelerator, multimedia engine, and the like. An example of a rugged commercial off-the-shelf (COTS) system based on this technology is Curtiss-Wright’s Parvus DuraCOR AGX Xavier small form factor modular mission computer that integrates the Jetson AGX Xavier’s NVIDIA CUDA-core accelerated graphics processing, artificial intelligence/deep learning inference, and edge-computing capabilities. (See above, Figure 1.)
One important feature of FPGAs is their any-to-any I/O connection that enables them to connect to a sensor, network, or storage device without a host CPU. A high-end radar system, for example, may need a number of discrete processing elements and compute stages to support multiple high-speed data inputs; FPGAs have some advantages in this case, as they can directly connect to these high-speed sensors and offer very high bandwidth.
Latency and determinism
As bus speeds increase, latency is expected to decrease for newer CPUs and GPUs; however, the latency of an FPGA is more deterministic. With an FPGA, it is feasible to have latency around one µs, whereas CPU latency tends to be around 50 µs.
Using a real-time operating system (RTOS) on a system, rather than a traditional OS, may help with determinism, but it doesn’t necessarily provide better latency. In other words, using an RTOS may provide a better idea of how fast the processor will execute, but it may not necessarily result in faster execution.
Many variables are in play when selecting a certain processor for a particular application. At the beginning of any new design program it is helpful to consult with your trusted supplier’s system architects, who solve these problems and make these types of decisions daily. The right choice can make all the difference.
Mike Southworth is product line manager for Curtiss-Wright Defense Solutions.
Curtiss-Wright Defense Solutions
https://www.curtisswrightds.com/