Preparation#
As input, chatter simply takes a folder of raw WAV files, which have not yet been denoised, filtered, or normalized. The input folder can have recursive structure, if you, for example, have separate folders from files from different years, locations, or individuals.
chatter also requires a dictionary of configuration parameters that control the analysis pipeline.
Spectrogram Parameters
Parameter |
Description |
Default Value |
|---|---|---|
|
Sample rate for audio processing. |
|
|
FFT window size. |
|
|
Window length for FFT. |
|
|
Hop length between FFT windows. |
|
|
Number of Mel bands to generate. |
|
|
Minimum frequency for Mel spectrogram. |
|
|
Maximum frequency for Mel spectrogram. |
|
|
The dimensions spectrograms are resized to. |
|
Preprocessing Parameters
Parameter |
Description |
Default Value |
|---|---|---|
|
High-pass filter cutoff frequency in Hz. If |
|
|
Low-pass filter cutoff frequency in Hz. If |
|
|
Target dBFS for normalization. |
|
|
Noise reduction strength parameter. |
|
|
Amount of dynamic range compression to apply (dB). |
|
|
Output limiter threshold (dB). |
|
|
If |
|
|
Length of fade-in and fade-out applied to each clip (milliseconds). |
|
|
Number of seconds to skip at the start when estimating noise. |
|
|
If |
|
|
Name of the BioDenoising model to use. |
|
|
If |
|
|
Optional global noise floor for preprocessing (dBFS). If |
|
Pykanto Segmentation (recommended for recordings of variable quality)
Parameter |
Description |
Default Value |
|---|---|---|
|
Noise floor for segmentation (dB). |
|
|
Dynamic range above the noise floor to retain (dB). |
|
|
Maximum decibel value relative to full scale. |
|
|
Increment in decibels for thresholding. |
|
|
Minimum duration of silence between units (seconds). |
|
|
Maximum duration of a unit (seconds). |
|
|
Minimum duration of a unit (seconds). |
|
|
Gaussian sigma for image smoothing. |
|
|
Silence threshold used in segmentation. |
|
Simple Segmentation (recommended for recordings of high quality)
Parameter |
Description |
Default Value |
|---|---|---|
|
Noise floor used during segmentation (dB). |
|
|
Silence threshold for segmentation (dB). |
|
|
Minimum duration of silence between units (seconds). |
|
|
Maximum duration of a unit (seconds). |
|
|
Minimum duration of a unit (seconds). |
|
Autoencoder Parameters
Parameter |
Description |
Default Value |
|---|---|---|
|
Encoder–decoder architecture type: |
|
|
Number of latent dimensions. |
|
|
Batch size for training. |
|
|
Number of training epochs. |
|
|
Learning rate for the optimizer. |
|
|
VAE beta parameter for balancing reconstruction and latent space smoothness. |
|
Other Parameters
Parameter |
Description |
Default Value |
|---|---|---|
|
Time in seconds to define sequence boundaries. |
|
|
Lag size for the Vector Autoregression (VAR) model. |
|
|
If |
|
|
Font used for plot labels and titles. |
|
|
Length of audio clip (seconds) used in quick demonstration plots. |
|
|
Name of the vision backbone checkpoint used for embeddings. |
|
|
Device used for vision backbone ( |
|
Here is what a configuration dictionary looks like in practice. You do not need to specify every parameter; start from the defaults in chatter.config and override only what you need for your dataset.
#start from defaults and override a few key parameters
user_config = {
#spectrogram parameters
"fmin": 1700,
"fmax": 6000,
#preprocessing parameters
"high_pass": 1700,
"low_pass": 6000,
#simple segmentation parameters
"simple_noise_floor": -60,
#pykanto segmentation parameters
"pykanto_noise_floor": -65,
#autoencoder parameters
"ae_type": "convolutional",
"latent_dim": 32,
}
config = chatter.make_config(user_config)
I highly recommend that you experiment with the preprocessing and segmentation parameters for your particular use case. The spectrogram, autoencoder, and other parameters should work well across a wide range of species.