AVX: A leap forward for DSP performance

Story

June 20, 2011

Steve Edwards

Curtiss-Wright

AVX delivers doubled performance over AltiVec, and leaves the door open for a long future in DSP applications.

Intel’s recent introduction of its Advanced Vector Extensions (AVX) represents a significant advance for improving calculation performance in DSP-processing applications. AVX doubles the vector registers’ width from the 128-bit wide status quo provided by earlier vector extensions such as AltiVec, supported by Freescale’s processors, and Intel’s earlier Streaming SIMD Extensions (SSE). Intel first publicized AVX in March 2008 and launched it at the beginning of this year with the introduction of the second-generation Core i7 processors previously codenamed Sandy Bridge.

Doubling floating-point operations

Since the doubling to 256 bits, AVX registers can hold twice as many integer or floating-point values as the 128-bit AltiVec and SSE implementations. Vector extensions support integer calculation, but for DSP applications the important format is 32-bit single precision floating point. Floating point is the preferred fomat in military signal processing applications primarily because of the increased software development productivity as compared to the effort of managing the inevitable underflows and overflows of integer registers. AltiVec, first introduced with the PowerPC 7400, has always featured 128-bit registers, each of which could hold four “floats.” The PPC 7400 processors, and later Intel processors with SSE, were able to perform eight simultaneous floating-point operations per clock cycle. By doubling the register size to 256 bits, AVX fits eight 32-bit values into each register, enabling 16 floating-point operations per cycle. This effectively, in one fell swoop, doubles the peak performance.

Putting AVX to work

The Fast Fourier Transform (FFT) is at the core of many DSP applications. Used to decompose periodic signals into their constituent sine waves, also known as frequency bins, the FFT is a de facto figure of merit to gauge processor performance.

How does AVX stack up to AltiVec and SSE? In application tests using VSIPL, the combination of AVX and a second-generation Intel Core i7 (2.1 GHz) processor did well when compared to AltiVec running on a Freescale 8640 (1 GHz) processor. Running a range of 1D complex FFTs with sample range sizes from 256 bytes to 512 Kbytes on a single core, the Intel/AVX platform’s performance ranged 5 to 14x faster than the 8640. When compared on multiple 1D complex FFTs, the Intel/AVX platform measured 5 to 10x faster. When performance was compared running a complex matrix transpose, the Intel/AVX platform was rated at 7 to 26x faster. Also, in a test of vector multiply speed, the Intel/AVX was 4.6 to 72x faster than the 8640/AltiVec.

While some of this performance is coming from the faster clock of the Intel processor, these tests and others show that the AVX instruction set is contributing greatly to overall processor performance. Intel has published direct SSE versus AVX comparisons with benchmarks run on the same second-generation Core i7 processor with FFT performance improvements ranging from 1.2 to 1.8x. It is difficult to create a benchmark measuring pure AltiVec versus pure AVX because the amount of memory available to both types of processors differs. At the upper end of sample sizes, memory performance becomes more important, and the newer Core i7 features larger caches and higher memory bandwidth than are available to the 8640. These results underline the significance that doubling the raw performance of a floating-point machine holds for DSP applications. (For a copy of the test data, please contact the author.)

Curtiss-Wright Controls Embedded Computing (CWCEC) has several AVX-enabled products in development, including the DSP-oriented CHAMP-AV8 dual Core i7 board (Figure 1) aimed at leveraging this jump in performance.

Figure 1: The 6U OpenVPX CHAMP-AV8 from Curtiss-Wright Controls Embedded Computing features dual quad-core, second-generation Core i7 processors with 269 GFLOPS peak.

(Click graphic to zoom by 1.9x)

Changing the DSP performance landscape

While AltiVec, long the de facto standard for vector math calculation in military DSP platforms, changed the landscape for many years and enabled the use of general-purpose processors as an alternative to dedicated DSP chips, AVX is truly a significant upgrade. There might be greater improvements yet in store. When Intel upgraded the SSE instruction set for AVX, they rearchitected it and made it more easily extendable. This leaves the door open for easier transitions to even larger registers and keeps Intel’s processors competitive as new alternatives for DSP – such as General Purpose Graphics Processing Units (GPGPUs) – begin to emerge.

To learn more, e-mail Steve at [email protected].