Esteban Gutiérrez1, Rodrigo Cádiz2,3, Carlos Sing Long4, Frederic Font1, and Xavier Serra1
1 Department of Information and Communications Technologies, Universitat Pompeu Fabra
2 Music Institute, Pontificia Universidad Católica de Chile
3 Department of Electrical Engineering, Pontificia Universidad Católica de Chile
4 Instituto de Ingeniería Matemática y Computacional, Pontificia Universidad Católica de Chile
This webpage provides supplementary materials for our paper "Fractional Fourier Sound Synthesis", to be presented at the International Computer Music Conference (ICMC) 2025 in Boston, USA.
This paper explores the innovative application of the Fractional Fourier Transform (FrFT) in sound synthesis, highlighting its potential to redefine time-frequency analysis in audio processing. As an extension of the classical Fourier Transform, the FrFT introduces fractional order parameters, enabling a continuous interpolation between time and frequency domains and unlocking unprecedented flexibility in signal manipulation. Crucially, the FrFT also opens the possibility of directly synthesizing sounds in the \(\alpha\)-domain, providing a unique framework for creating timbral and dynamic characteristics unattainable through conventional methods. This work delves into the mathematical principles of the FrFT, its historical evolution, and its capabilities for synthesizing complex audio textures. Through experimental analyses, we showcase novel sound design techniques, such as \(\alpha\)-synthesis and \(\alpha\)-filtering, which leverage the FrFT’s time-frequency rotation properties to produce innovative sonic results. The findings affirm the FrFT’s value as a transformative tool for composers, sound designers, and researchers seeking to push the boundaries of auditory creativity.
In this section, we explore sound synthesis and manipulation methods strongly based on the FrFT. While we believe much more can be done, we have chosen to keep this simple, as our aim is to consolidate the foundations of the creative use of the FrFT in audio.
The FrFT can transform any type of sound, and while it might generate interesting sounds, such a transform is extremely complex. Consequently, it would be challenging to feel in control when using it broadly. This problem, however, diminishes when the input sound is fixed to one whose spectrogram is fully understood. By doing so, and recalling the spectrogram rotation property of the FrFT, one can anticipate the results generated by the transform and gain some control over it.
Based on this, we introduce the \(\alpha\)-Synthesis method as the application of the FrFT, with the previously mentioned limitations, on pure sinusoids. Sinusoids are clear examples of sounds with simple spectrograms that are easy to understand, allowing the use of the FrFT to generate complex yet intelligible sounds.
It is well known that the Fourier Transform can be used to create filters in the frequency domain through simple multiplication. Additionally, the Convolution Theorem allows this concept to be translated to the time domain via convolutions. This property led to the development of Finite Impulse Response (FIR) and Infinite Impulse Response (IIR) filters, which can be designed to be quite accurate and, in some cases, computed more efficiently. Unfortunately, there is no known analog to the Convolution Theorem for the FrFT, so filtering a signal using this transform necessitates pure multiplication.
Given this, what exactly is filtering using the FrFT? The human auditory system has a well-studied sensitivity to frequency, making frequency domain filters a reasonable and understandable process in audio contexts. On the other hand, the FrFT transforms signals into the \(\alpha\)-domain, a mixture of frequency and time. To our knowledge, the human auditory system does not have a well-established understanding of this domain. Specifically, in the frequency domain, each band corresponds to a pure sinusoid in the time domain. However, a band in the \(\alpha\)-domain corresponds to a signal that heavily depends on the value of \(\alpha\), and to our understanding, no studies have explored human perception of such signals.
For our purposes, we propose the \(\alpha\)-Filtering method as a natural generalization of frequency domain filtering. Specifically, a signal is first transformed into the \(\alpha\)-domain using the FrFT, then multiplied with a kernel, and finally transformed back to the time domain using the corresponding inverse FrFT. Similar to the frequency domain, one can define \(\alpha\)-low pass, \(\alpha\)-band pass, and \(\alpha\)-high pass filters using appropriate kernels. However, it must be understood that "low," "band," and "high" in this context will generally not correspond to frequency. For example, an \(\alpha\)-low pass filtered signal might have richer high-frequency content than its \(\alpha\)-high pass counterpart.
Several videos and sound examples showcasing a series of sound examples built using the techniques discussed herein are included below. A brief explanation of each visual and sonic example is provided in this section.
The rotation property of the FrFT is well-studied; however, we decided to include an example in this research as it is fundamental to understanding its implications on sounds. For this example, we considered a sinusoid with a frequency of \(10025\) Hz (exactly in the middle of the sampleable frequency domain) and computed both the spectrogram of its FrFT and the real part of its FrFT for values of \(\alpha\) ranging from \(0\) to \(1\). Note that the FrFT in this case is computed over the entire signal. Each video correspond to one of the transformation mentioned before.
In our first group of sound examples, we used Method 1 to generate audio from the real part of the FrFT of several sinusoids with various window lengths. Specifically, one second of audio is generated from a sinusoid for each frequency in \(\{55, 220, 880\}\) Hz. These sounds are then processed using the FrFT with window sizes of \(\{0.5, 1\}\) seconds, hop sizes equal to half of the window sizes, and angles in \(\{0, 0.01, 0.05, 0.1, 0.25, 0.5\}\). Each video corresponds to one frequency and one window size setting, and all transforms are displayed sequentially with angles increasing as mentioned. Spectrum and spectrogram representations are available in separate videos.
In this group of sound examples, we use Method 1 by computing the FrFT of several sinusoids while manipulating the value of \(\alpha\) throughout the transform. Specifically, one second of audio is generated from a sinusoid for each frequency in \(\{55, 220, 880\}\) Hz. These sounds are then processed using the FrFT with window sizes of approximately \(0.046, 0.092, 0.18,\) and \(0.32\) seconds, with hop sizes equal to half of the respective window sizes, and values of \(\alpha\) increasing linearly from \(0\) to \(0.5\). Each video corresponds to one frequency, with all transforms displayed sequentially as the window sizes increase according to the values mentioned. Spectrum and spectrogram representations are provided in separate videos.
In our third group of sound examples, we used Method 2 to filter two sinusoids with an \\alpha\)-band pass filter. These \(\alpha\)-band pass kernels are created by multiplying in the \(\alpha\)-domain with the spectrum of an impulse response of the form $$ IR(t) = \exp(-0.5((tb)^2)) \cos(2\pi c t), $$ where \(t\) corresponds to time, \(b\) to the bandwidth of the filter, and \(c\) to the center frequency of the filter. The filters are applied over time using fixed values of \(\alpha\) and bandwidth \(b\), while varying the center frequencies \(c\). Specifically, two seconds of two sinusoids at frequencies \(220\) Hz and \(3520\) Hz are generated and \(\alpha\)-band pass filtered using the spectrum of impulse responses in the form of the equation above with \(b=1\), \(c\) increasing exponentially (base \(2\)) from \(100\) to \(10000\), and values of \(\alpha\) in \(\{0.01, 0.05, 0.1, 0.25, 0.5\}\). The filtering is done using window sizes of approximately \(0.19\) and \(0.38\) seconds, with hop sizes equal to half of the respective window sizes. Each video corresponds to a specific frequency and window size setting, with all transforms displayed sequentially as the angles increase according to the specified values.
For this example, various sound sources are transformed using the Fractional Fourier Transform (FrFT) with differing parameter settings. All audio materials originate from original recordings created by the first author for a previous project. Find a link with more examples below.
This research was supported by ANID Fondecyt Regular Grant #1230926, ANID Anillo ATE220041, Government of Chile, and the project “IA y Música: Cátedra en Inteligencia Artificial y Música (TSI-100929-2023-1)” funded by the “Secretaría de Estado de Digitalización e Inteligencia Artificial and the Unión Europea-Next Generation EU”. We would also like to thank Diego Vera for his contributions to an early version of this project during his undergraduate research.