Audio Filtering Array Exercise
Chris Tralie
Overview
The purpose of this exercise to give students practice with dynamic arrays, loops, and methods in C++ by implementing a "filtering" method, in which parts of an audio clip are removed based on different criteria.
Code
You can obtain the code/audio for this exercise by typing
git clone https://github.com/ursinus-cs174-f2022/Week1_AudioFiltering.git
Overview
Digital audio can be represented as an array of doubles between -1 and 1, which we refer to as samples. In this exercise, you will fill in a method filter
that takes in a sequence of audio samples x
and creates a new array of audio samples y
. There are three options, and you should implement at least one. For each option, you should copy over only the values of x that exceed a certain threshold
Below is more information about what the thresholds are for the three different options:
Option 1: Loudness Filtering
The method getEnergy
returns an array parallel to the array of audio samples that contains the "energy" of the audio in a small window around each sample. In particular, energy[i]
holds a number between 0 and 1 which is the mean of the square of the audio samples in x
between indices i-win
and i+win
. energy[i]
is higher if the audio samples in x
around index i
are louder, and lower otherwise. Let's say, for example, that we have the following audio:
Below is a plot that shows the original samples of x
(top plot), followed by the energy array (middle plot). The regions in which the person is speaking have higher energy, and the regions between speech has lower energy. If we choose a cutoff of 0.005 in energy (shown as the dotted line), and we only fill in samples in y that are above this cutoff, then we get the following audio (which is also shown as the bottom plot)
Option 2: Consonant Filtering
There's also a method called getCrossings
which fills in a proxy for consonant/vowel detection. In particular, crossings[i]
holds the number of zero crossings in x
between indices i-win
and i+win
. Generally, crossings[i]
is higher if x[i]
is in the middle of a consonant, and it is lower otherwise. This is because consonants are higher frequency than vowels, which means they have a smaller period (i.e. they go through more cycles in the same amount of time, which means they cross the x axis more). Let's say, for example, that we have the same audio clip:
If we create a new array y
in which the zero crossings in a window of size 4001 around x
must be at least 150, we get the following audio
Below is a plot that shows how this happens. As before, the original audio is up top, and the filtered audio is at the bottom. The middle plot shows the zero crossings counted in windows of size 4001 around each sample, and a threshold of 150 is drawn as a dotted line. As an example, consider the circled region around the letter "six." The zero crossings peak around s and x, but they dip around i.
Option 3: Vowel Filtering
You can also choose to create a method which is almost identical to the consonant filtering method, but where you only keep samples if the zero crossings are below a certain amount. Instead of keeping the consonants, this will keep the vowels. For example, if we create a new array y
in which the zero crossings in a window of size 4001 around x
must be at most 150, we get the following audio