Many algorithms in digital signal processing are based on the use of linear discrete signal transforms. Mathematically, such a transform is a matrix-vector multiplication y=M .x where is the sampled signal, and M is the transform over some base field K. Crucial for the applicability of a signal transform M is the existence of fast algorithms that allow its computation with O(n\log n) operations (or better) compared to O(n^2) arising from a direct matrix-vector multiplication. The problem of finding these algorithms for different transforms has been a major research topic leading to a vast number of publications in signal processing and mathematics.

Probably the most famous example of a signal transform is the discrete Fourier transform (DFT), which is used in harmonic analysis to decompose a signal into its frequency components. Important algorithms for the DFT include the “fast Fourier transform” (FFT) found by Cooley/Tukey in 1965 (originally due to Gauss), Rader’s algorithm for prime size (1968), Winograd’s algorithms (1980), and several others.

Another important class of transforms consists of the 8 different types of discrete cosine and sine transforms (DCTs and DSTs), respectively, also called discrete trigonometric transforms
(DTTs). Their applications include image and video compression, (see the 1990 book by Rao). Important algorithms for the trigonometric transforms have been developed by many, including by Chen, Smith and Fralick (Chen:77), Wang (Wang:84), Yip and Rao (Yip:84,Yip:88), Vetterli and Nussbaumer (Vetterli:84), Lee (Lee:84), Feig (Feig:90), Chan and Ho (Chan:90), Steidl and Tasche (Steidl:91), and Feig and Winograd (Feig:92).

Each of these algorithms has been derived through insightful manipulation of the transform matrix entries. The algorithms are highly structured, a property that can be used to write them as sparse factorizations of the transform matrix in a very concise way using mathematical operators.

In SMART (see papers below), we present an algebraic approach to the class of the 16 trigonometric transforms in the framework of algebra representation theory. In particular, the work presents the discrete cosine and sine transforms as decomposition matrices of certain regular modules associated to four series of Chebyshev polynomials. SMART uses algebraic methods to derive most of their known fast algorithms. SMART gives insight into the structure and the existence of these algorithms and extends the relationship between signal processing and algebra that is currently mainly restricted to the discrete Fourier transform.

SMART has been supported by two grants from NSF, CISE Division, CCR Program.

SMART Papers

(For additional papers visit the SMART web site .

DSP on Graphs

We are interested in developing digital signal processing concepts like the Fourier transform, filtering, spectrum for signals that are indexed by arbitrary graphs. This builds on the algebraic signal processing, part of a previous project, called SMART.


With the growing complexity and diversity of computer platforms, the design of high-performance software implementations of digital signal processing (DSP) algorithms has become an increasingly more difficult task. When high performance is an issue, the designers have to fine tune software implementations to utilize the specific features of the target platform. This requires expert knowledge in both algorithm development and computer architecture, and the hand-coding of the implementations becomes a tedious and time-consuming task. Furthermore, the ever-changing hardware and compiler technology requires frequent re-implementation, since the platforms are often changed or upgraded. These problems are tackled by what are commonly known as automatic software performance tuning systems, which are automatically adapting software implementations to a wide range of platforms. Because of the complexity of the problem, most performance tuning systems implement only basic functions that are used as building blocks in more complex applications. There is a growing interest in automatic tuning of DSP algorithms since DSP applications typically require high-performance algorithm implementations. Many DSP applications include digital filtering and wavelet-based processing, and the number of new applications is increasing fast. In our work, we develop SPIRAL, a generator of Libraries of tuned software implementations of fast DSP transforms. SPIRAL generates fast code for a number of different transforms, including the discrete Fourier transform, the discrete trigonometric transforms, i.e., the discrete cosine and discrete sine transforms (all their sixteen variants), the Walsh-Hadamard transform, the wavelet transform, as well as other digital signal processing kernels like filtering. SPIRAL is now looking also at reconfigurable hardware (e.g., FPGAs) implementations, as well as joint harwdare/ software implementations, of signal processing algorithms.

The work in SPIRAL is featured in a paper in the February 2005 IEEE Proceedings Special Issue on Performance Tuning.

Major support for SPIRAL was initially provided by the DARPA ACMP OPAL Initiative through an ARMY research grant. SPIRAL has been supported by an NSF ITR medium size grant as well as several other NSF grants, by a major grant from DARPA under the Discovery and Exploitation of Structure in Algorithms (DESA) Program, and an ONR grant. SPIRAL has been also supported by several industrial grants from INTEL, Mercury among others. INTEL licensed SPIRAL technology. SPIRAL is currently commercialized by SPIRALGEN, a start-up cofunded by Moura and four coleagues that licensed the SPIRAL Technology from CMU.

SPIRAL Journal Papers (for additional papers see the SPIRALweb site)

  • Franz Franchetti, Markus Püschel, Yevgen Voronenko, Srinivas Chellappa, and José M. F. Moura, “Discrete Fourier Transform on Multicores,” IEEE Signal Processing Magazine, Vol. 26:6, pp.: 90-102, November 2009; DOI: 10.1109/MSP.2009.934155. (IEEEXplore link here.)
  • Markus Püschel, José M. F. Moura, Jeremy Johnson, David Padua, Manuela Veloso, Bryan W. Singer, Jianxin Xiong, Franz Franchetti, Aca Gacic, Yevgen Voronenko, Kang Chen, Robert W. Johnson, and Nick Rizzolo “SPIRAL: Code Generation for DSP Transforms,” IEEE Proceedings, Volume:93, number 2, pp. 232-275 , February, 2005. Invited paperSpecial issue on Program Generation, Optimization, and Platform Adaptation.
  • Markus Püschel, José M. F. Moura, Bryan Singer, Jianxin Xiong, Jeremy Johnson, David Padua, Manuela Veloso, and Robert Johnson, “SPIRAL: A Generator for Platform-Adapted Libraries of Signal Processing Algorithms,” The International Journal of High Performance Computing and Applications, vol. 18:1, pp.21-45, Spring 2004.

Some of the First SPIRAL Conference Papers

(Find additional papers at the

  • Thammanit Pipatsrisawat, Aca Gacic, Franz Franchetti, Markus Püschel, and José M. F. Moura, “Performance Analysis of the Filtered Backprojection Image Reconstruction Algorithm,” ICASSP’05IEEE International Conference on Signal Processing, vol. 5, pp.:153-156, Philadelphia, PA, March 18-23, 2005.
  • Aca Gacic, Markus Püschel, and José M. F. Moura, “Automatically Generated High-Performance Code for Discrete Wavelet Transforms,” ICASSP�04, IEEE International Conference on Signal Processing, Montreal, Qu�bec, Canada, May 17-21, 2004.
  • José M. F. Moura and Markus Püschel, “SPIRAL: An Overview,” ODES: 1st Workshop on Optimizations for DSP and Embedded Systems, in conjunction with International Symposium on Code Generation and Optimization (CGO), Hyatt at Fisherman’s Wharf, San Francisco, CA, March 23, 2003.
  • Aca Gacic, Markus Püschel, and José M. F. Moura, “Fast Automatic Software Implementations of FIR Filters,” ICASSP’03, IEEE International Conference on Signal Processing, Hong Kong, April, 2003.
  • Markus Püschel and José M. F. Moura, ” Generation and Manipulation of DSP Transform Algorithms,” IEEE Digital Signal Processing Workshop, Atlanta, Georgia. September 2002.
  • Markus Püschel, Bryan Singer, M. Manuela Veloso, José M. F. Moura, “Fast Automatic Generation of DSP Algorithms,” in Computational Science – ICCS 2001, eds. V. N. Alexandrov, J. J. Dongarra, B. A. Juliano, R. S. Renner, C. J. K. Tan, ICCS’01, The 2001 International Conference on Computational Science, presented in Special Session on “Architecture-Specific Automatic Performance Tuning,” Hilton San Francisco and Towers, San Francisco, May 28 – 30, 2001.

Some Outdated SPIRAL Presentations

  • José M. F. Moura, “SPIRAL: Tuning DSP Transforms to Computing Platforms,” Seminar at IBM Watson Research Center, Yorktown Heights, 2002.
  • Franz Franchetti, M. Püschel, Christoph Ueberhuber, and José M. F. Moura, “Short Vector SIMD Code Generation for DSP Algorithms,” HPEC’02, VIII High Performance Embedding Computing Workshop, Lincoln Lab., Lexington, MA, September, 2002.
  • José M. F. Moura, “SPIRAL: Tuning DSP Transforms to Computing Platforms,” Seminar at University of Florida, Gainsville, FL, June 17, 2002.
  • José M. F. Moura, J. Johnson, R. W. Johnson, D. Padua, Victor Prasanna, M. Püschel, Bryan Singer, Manuela Veloso, and Jianxin Xiong “Generating Platform Adapted DSP Libraries Using SPIRAL,” HPEC’01, VII High Performance Embedding Computing Workshop, Lincoln Lab., Lexington, MA, November 27-29, 2001.
  • José M. F. Moura, J. Johnson, R. W. Johnson, M. Püschel, D. Padua, and M. M. Veloso, “SPIRAL: Automatic Implementation of Signal Processing Algorithms,” HPEC’00, VI High Performance Embedding Computing Workshop, Lincoln Lab., Lexington, MA, September 20-22, 2000.

For additional team information, papers, presentations and information visit the SPIRAL website: