Making the case for using ATCA in military signal processingStory
December 13, 2013
Data in the modern battlefield has become as essential as munitions. Detection, target tracking, and the decisions that must be made ? based on data acquired from sensors and cameras mounted to Unmanned Aerial Vehicles (UAVs) or a myriad of radar and sonar devices on a cruiser ? all require sophisticated algorithms executing on powerful computing equipment. Traditional methods of Digital Signal Processing (DSP) have used specialized FPGA equipment, multiprocessor VME, and OpenVPX solutions, but a new class of computing has the potential to replace some of those expensive and highly specialized processing elements.
Advanced Telecom Computing Architecture (AdvancedTCA or ATCA) is an open computing standard that is very valuable to military applications requiring a huge amount of processing. Advances in microprocessor technology and accompanying software will make ATCA a very powerful technology for complex signal-processing applications of the future.
ATCA’s new DSP applications include subfields such as audio/speech signal processing, sonar and radar signal processing, sensor array processing, spectral estimation, statistical signal processing, digital image processing, signal processing for communications, control of systems, biomedical signal processing, and seismic data processing.
New technologies are emerging that will enable ATCA to address DSP applications, especially those in defense and aerospace.
These technologies include:
- High-performance multicore processors
- Updated vector processing units in cores
- High-speed fabrics in the ATCA backplane
- Advanced flow-control software on ATCA switches and blades
- Repurposing packet-processing software to target DSP applications
The trends driving the opportunity for defense contractors include the cadence of Intel Xeon processor performance and functionality, and underlying fabric interfaces moving from 10 G to 40 G with the release of PICMG3.1R2.
The inherent ruggedness of ATCA, having been designed for the telecom industry’s NEBS standards, lends itself to semi-rugged deployments such as shipboard manned, airborne, and transit-case applications. There is now an opportunity for defense contractors to leverage packet processing blades and software originally developed for telecom networks for very dense computing and signal processing.
This new category of ATCA blades, based on general-purpose processors but applied to DSP applications, can be termed algorithm processing blades.
Digital signal processing
In this instance, DSP can be defined as the mathematical manipulation of an information signal to modify or improve it in some way. The basic concept in a defense application can be characterized that:
1. Some kind of sensor device detects objects
2. A high-speed interface transfers this data to a rack with computing equipment
3. Analog data is either:
a. Converted to digital at the sensor, or
b. Converted to digital at the signal processing unit
Traditionally, DSP subsystems have been based on VME technology; there is a push for high-speed serial interfaces to replace the VME parallel bus, which is an ideal opportunity to evaluate technologies such as OpenVPX and ATCA. These computing architectures offer multiprocessor boards that support high-level DSP libraries and a host processor to manage data flows, as well as a range of ruggedization levels depending on the requirements of the application.
High-performance processing core
The latest generation of Intel Xeon processors, such as the Intel Xeon E5-E2600 v2 processor family (formerly code named Ivy Bridge), feature many high-speed interfaces into the processors. Beyond the 10 multithreaded cores running at up to 2.4 GHz clock speed, these processors also offer a large 25 MB Level 3 (L3) cache. Thanks to four integrated memory controllers, the memory interfaces provide a very fast method for moving data that is sent to the blade into the processor itself. A dual-processor ATCA blade offers very high-speed dual Intel QuickPath Interconnect (QPI) connections between the processors should the application need to move data between processors. These new processors feature 40 lanes per socket of 3rd generation PCI Express connectivity directly to the processors, whereas earlier generation devices offered PCI Express connectivity in a host bridge. This direct connectivity can be leveraged for high-speed fabric interfaces in ATCA. Along with the processors, Intel is developing hardware acceleration functionality that more traditionally may be seen in a dedicated packet processor.
Intel Advanced Vector Extensions
Introduced in the 2nd generation Intel Xeon family processors, Intel Advanced Vector Extensions (AVX) is a set of instructions for doing Single Instruction Multiple Data (SIMD) operations on Intel architecture CPUs. The 128-bit SIMD registers of Intel Streaming SIMD Extensions (SSE) have been expanded to 256 bits. This expansion potentially doubles floating-point operation performance when using single precision floating-point numbers. Intel AVX also offers specific instructions that support signal-processing applications and optimized libraries for AVX. Optimized VSIPL libraries are also available from third parties.
For applications such as radar detection, signal processing frequently requires multiple processors, which are often distributed across multiple blades. The performance boost provided by Intel AVX implemented in an ATCA system helps developers reduce processor and blade counts, thereby lowering BOM and design complexity. The reduced processor count and inherent efficiency of the ATCA bladed architecture can significantly lower power consumption.
The first is a 4x 10GBASE-KR Fabric configuration, defined in the ATCA specification as PICMG 3.1R2 “Option 3-KR,” with 10 Gigabit Ethernet (GbE) links through separate MACs and data running over four individual Fabric lanes. The second option is a single 40GBASE-KR4 Fabric configuration, defined in the ATCA specification as PICMG 3.1R2 “Option 9-KR,” with a single 40 Gbps link to a single 40 G MAC. Both options provide total bandwidth at 41.25 Gbps baud rate of 40 Gbps bit rate.
Figure 1: 40 G ATCA switch blade architecture (Emerson ATCA-F140).
(Click graphic to zoom by 1.9x)
Figure 1 on the following page shows the architecture of a 40 G ATCA switch blade, such as Emerson’s ATCA-F140. The new 40 G interfaces allow for inbound and outbound traffic at 40 G while still supporting the older 1 G and 10 G standards. How, exactly, does such a setup effectively get the data coming into the system to the correct processor payload blades? Advanced flow-management software has been developed to take individual IP streams, classify them, and then direct them to specific boards in the system. Furthermore, the software optimizes the return flow of the data as it exits the system.
Software such as Emerson’s FlowPilot add-on package performs just these functions, using software and hardware capabilities of the 40 G switch on the ATCA-F140. This software ensures fast packet handling inside the system, with multiple configuration options to tailor the function of FlowPilot to the feature set that is actually required. More importantly, FlowPilot will distribute flows across a number of configured blades according to configured parameters, ensuring that they remain constant over time and that the same inspection device receives the entire flow. Additional functions include health check on an application level along with link transparency, connecting left-side and right-side cables to a virtual connection.
Figure 2: 40 G ATCA packet processing blade based on dual 10-core Intel Xeon E5-2600 family processors (Emerson ATCA-7475).
(Click graphic to zoom by 1.9x)
Figure 2 shows the architecture of Emerson’s ATCA-7475 as an example 40 G ATCA packet processing blade based on dual 10-core Intel Xeon E5-2600 v2 family processors. Each CPU is connected to one 40 G Mellanox ConnectX3 Ethernet controller using 3rd generation PCI Express, allowing maximum throughput between controller and memory together with direct connection to a processing unit. Additional 10 G ports to the external world to add preprocessing capabilities can be added using a Rear Transition Module (RTM, a plug-in card that is connected to the rear of a blade inside the chassis to add interfaces and features) with four to six 10 G ports.
The blade is designed to take advantage of the packet-processing capabilities of the Intel Communications Chipset 89x0. This device provides offloaded hardware acceleration to improve the cryptographic and compression performance of the processors. The ATCA-7475 also allows the mounting of a mezzanine module featuring two more Intel Communications Chipset 8920 devices to take further advantage of the offload capabilities.
Intel Data Plane Development Kit
Intel has made available a lightweight runtime environment for Intel architecture processors, offering low overhead and run-to-completion mode to maximize packet processing performance: the Intel Data Plane Development Kit (Intel DPDK).The Intel DPDK focuses on how the individual processor cores can be more tightly managed outside of any encumbrance of the operating system activity and allows those cores to act in quite a deterministic fashion. Additional libraries around memory, queue, and buffer management help manage the flow of how the data moves to individual cores, between cores, or another core outside the system.
It provides a selection of optimized and efficient libraries, also known as the Environment Abstraction Layer (EAL), which are responsible for initializing and allocating low-level resources, hiding the environment specifics from the applications and libraries, and gaining access to the low-level resources, such as memory space, PCI devices, timers, and consoles.
The EAL provides an optimized Poll Mode Driver (PMD); memory and buffer management; and timer, debug, and packet-handling APIs, some of which may also be provided by the Linux OS.
To facilitate interaction with application layers, the EAL, together with the standard GNU C Library (GLIBC), provide full APIs for integration with higher-level applications.
The 40 G ATCA blade based on dual Intel Xeon E5-E2600 v2 processors, such as Emerson Network Power’s ATCA-7475 packet processing blade, is tailored for digital signal processing to create an algorithm processing blade. One physical core on each device is dedicated to control plane applications based on Linux. This core works in tandem with the 40 G network interface controller to move the data in and out of the other processor cores at optimal speed. The rest of the cores, meanwhile, are available to run individual DSP algorithms. A section of data would be distributed to each core and processed to completion without interruption. The combination of high-performance processors with Intel AVX and 40 G blade interfaces creates a set of DSP engines from general-purpose processors that can run at a very high speed.
1. Packetized sensor data enters into the ATCA switch as 10 G or 40 G data
2. Flow control software on the switch load balances and distributes the data to the appropriate processor board
3. Flow control software on the blade then load balances and distributes the data to the specific algorithm running in a specific thread of a specific core
4. With the assistance of the AVX coprocessor, the DSP algorithm is completed without interruption
5. Flow control on the board and switch then directs the results to another payload board either for further processing or out of the system
ATCA right platform for defense, aero
ATCA is an ideal computing platform to address digital signal processing applications, especially those in defense and aerospace. The other benefits of ATCA for defense and aerospace contractors, such as being a truly open architecture, with inherent ruggedness and power efficiency, mean that the time is right to leverage the packet processing blades and software originally developed for telecom networks for very dense computing and future complex signal-processing applications.
Rob Persons is a senior field applications engineer, Embedded Computing, with Emerson Network Power. He applies his experience in embedded real-time systems, VMEbus and ATCA hardware, and real-time software to help Emerson’s embedded computing customers accelerate their projects. His 30-year career has included avionics software development and field support of military, aerospace, telecom, and medical-equipment customers. He has also represented Emerson on standards bodies and conference advisory boards. Rob holds a dual Bachelor of Science degree in Computer Science and Zoology from the University of Central Florida. Readers can reach Rob at [email protected].
Emerson Network Power 614 888 0246 www.emersonnetworkpower.com