Figure 2: A Digital Signal Processing System
In real-world DSP systems, we are often converting a continuous time signal into a discrete one via sampling (as shown in Figure 2). Because input is constantly streaming in, we can’t process all of it at once, especially for real-time applications. That is why we instead process blocks of length
at a time. This is accomplished by multiplying our sampled signal by a window function
All window functions are real, even, and finite. This means they have real and symmetric DTFTs. The most simply window is a box window (a
in the frequency domain). When the signal is multiplied by a window, it amounts to a periodic convolution in the frequency domain.
This periodic convolution means our choice of window function has an impact on our ability to resolve frequencies in the frequency domain.
- 1.Ifhas a wide “main lobe” at the DC frequencies, then the spectrum ofwill be blurred
- 2.Ifhas large “side lobes” at non DC frequencies, then spectral leakage occurs because larger frequencies start bleeding into lower ones.
Another factor which impacts our ability to resolve frequencies in frequency domain is the length of the window. Because an L point DFT samples the DTFT at L points, taking a larger window will resolve the DTFT better. If we don’t want to increase the window length (because doing so would increase the latency of our system), we can zero pad after windowing because zero padding has no impact on the DFT besides sampling the DTFT at more finely spaced samples.
By looking at the DFT of a signal
, we only get the frequency information across the entire duration of the signal. Likewise, just by looking at
, we get no frequency information and only temporal information. The STFT is a tool to see both at once.
Essentially, we slide a window function around
and compute the DTFT at every time point.
Just like before, we take our window and slide it around the signal, computing DFTs at every time point. If
, then we are essentially computing a zero-padded DFT. The DSTFT produces a spectrogram which we can display digitally.
As long as the window is never 0 and the windows don’t overlap,
When we compute the spectrogram of a signal, we can think of each coefficient as "tiling" the time-frequency plane. If we consider the normal N point DFT, each DFT coefficient is supported by N points in the time domain. Since the DFT samples the DTFT, it divides the range of
segments of width
. Each coefficient represents a section of this space, leading to a tiling looking like Figure 3 (for a 5 point DFT).
Figure 3: Time-Frequency tiling of a 5 point DFT
Thinking about the DSTFT, each coefficient is computed using
points of the original signal. Each coefficient still represents intervals of
in the frequency axis, so it will lead to a tiling which looks like Figure 4.
Figure 4: Time-Frequency tiling of the DSTFT
What these tilings show us is that because we have discretized time and frequency, there is some uncertainty regarding which times and frequencies each coefficient represents.
We can formalize this idea by considering a general transform. All transforms are really an inner product with a set of basis functions
is the projection of the signal onto the basis vector
. We can use Parseval’s relationship to see that
This means that we can think of our transform not only as a projection of the signal onto a new basis, but also as a projection of the spectrum of our signal onto the spectrum of our basis function. Remember that projection essentially asks "How much of a signal can be explained by the basis". We can formalize this by looking at the signal in a statistical sense and treat it as a probability distribution.
are the means of the signal and the spectrum.
are the variances. Together, they localize where our signal "lives" in the time-frequency spectrum. The uncertainty principle says
This means there is nothing we can do to get completely accurate time resolution and frequency resolution, and any decisions we make will lead to a tradeoff between them.
While the STFT gives us a better picture of a signal than a full-length DFT, one of its shortcomings is that each coefficient is supported by the same amount of time and frequency. Low frequencies don’t change as much as high frequencies do, so a lower frequency needs to be resolved with more time support whereas a fast signal would requires less time support to resolve properly.
The Wavelet transform essentially makes all of the boxes in Figure 4 different sizes.
We need an infinite number of functins to fully represent all frequencies properly, but at a certain level, we don’t care about our ability to resolve them better, so we stop scaling and use a low frequency function
to "plug" the remaining bandwidth.
In discrete time, the wavelet transform becomes
coefficients are the detailed coefficients and are computed using the mother wavelet. The capture higher frequency information. The
coefficients are the approximate coefficients computed using the father wavelet. They represent lower frequency information. The time frequency tiling for the DWT looks like Figure 5.
Figure 5: Time Frequency tiling of wavelets
Notice how each wavelet coefficient is supported by a different amount of time and frequency. We can choose different mother and father wavelets to describe our signals depending on how we want to tile the time-frequency plane.