Preparation#

As input, chatter simply takes a folder of raw WAV files, which have not yet been denoised, filtered, or normalized. The input folder can have recursive structure, if you, for example, have separate folders from files from different years, locations, or individuals.

chatter also requires a dictionary of configuration parameters that control the analysis pipeline.

Spectrogram Parameters

Parameter	Description	Default Value
`sr`	Sample rate for audio processing.	`44100`
`n_fft`	FFT window size.	`2048`
`win_length`	Window length for FFT.	`1024`
`hop_length`	Hop length between FFT windows.	`128`
`n_mels`	Number of Mel bands to generate.	`224`
`fmin`	Minimum frequency for Mel spectrogram.	`1000`
`fmax`	Maximum frequency for Mel spectrogram.	`10000`
`target_shape`	The dimensions spectrograms are resized to.	`(128, 128)`

Preprocessing Parameters

Parameter	Description	Default Value
`high_pass`	High-pass filter cutoff frequency in Hz. If `None`, no high-pass filter is applied.	`None`
`low_pass`	Low-pass filter cutoff frequency in Hz. If `None`, no low-pass filter is applied.	`None`
`target_dbfs`	Target dBFS for normalization.	`-20`
`threshold`	Noise reduction strength parameter.	`1`
`compressor_amount`	Amount of dynamic range compression to apply (dB).	`-20`
`limiter_amount`	Output limiter threshold (dB).	`-10`
`static`	If `True`, use a static noise profile for denoising.	`True`
`fade_ms`	Length of fade-in and fade-out applied to each clip (milliseconds).	`20`
`skip_noise`	Number of seconds to skip at the start when estimating noise.	`3.0`
`use_biodenoising`	If `True`, apply BioDenoising before other preprocessing.	`False`
`biodenoising_model`	Name of the BioDenoising model to use.	`"biodenoising16k_dns48"`
`use_noisereduce`	If `True`, apply `noisereduce`-based denoising.	`True`
`noise_floor`	Optional global noise floor for preprocessing (dBFS). If `None`, it is estimated automatically.	`None`

Pykanto Segmentation (recommended for recordings of variable quality)

Parameter	Description	Default Value
`pykanto_noise_floor`	Noise floor for segmentation (dB).	`-65`
`pykanto_top_dB`	Dynamic range above the noise floor to retain (dB).	`65`
`pykanto_max_dB`	Maximum decibel value relative to full scale.	`-30`
`pykanto_dB_delta`	Increment in decibels for thresholding.	`5`
`pykanto_min_silence_length`	Minimum duration of silence between units (seconds).	`0.001`
`pykanto_max_unit_length`	Maximum duration of a unit (seconds).	`0.4`
`pykanto_min_unit_length`	Minimum duration of a unit (seconds).	`0.03`
`pykanto_gauss_sigma`	Gaussian sigma for image smoothing.	`3`
`pykanto_silence_threshold`	Silence threshold used in segmentation.	`0.2`

Simple Segmentation (recommended for recordings of high quality)

Parameter	Description	Default Value
`simple_noise_floor`	Noise floor used during segmentation (dB).	`-60`
`simple_silence_threshold_db`	Silence threshold for segmentation (dB).	`-40`
`simple_min_silence_length`	Minimum duration of silence between units (seconds).	`0.001`
`simple_max_unit_length`	Maximum duration of a unit (seconds).	`0.4`
`simple_min_unit_length`	Minimum duration of a unit (seconds).	`0.03`

Autoencoder Parameters

Parameter	Description	Default Value
`ae_type`	Encoder–decoder architecture type: `'convolutional'` or `'vector'`. Both are implemented as variational autoencoders.	`"convolutional"`
`latent_dim`	Number of latent dimensions.	`32`
`batch_size`	Batch size for training.	`32`
`epochs`	Number of training epochs.	`100`
`lr`	Learning rate for the optimizer.	`1e-4`
`beta`	VAE beta parameter for balancing reconstruction and latent space smoothness.	`0.5`

Other Parameters

Parameter	Description	Default Value
`seq_bound`	Time in seconds to define sequence boundaries.	`1.0`
`lag_size`	Lag size for the Vector Autoregression (VAR) model.	`3`
`dark_mode`	If `True`, plots will use a dark theme.	`True`
`font`	Font used for plot labels and titles.	`'Courier'`
`plot_clip_duration`	Length of audio clip (seconds) used in quick demonstration plots.	`5.0`
`vision_checkpoint`	Name of the vision backbone checkpoint used for embeddings.	`"facebook/dinov2-base"`
`vision_device`	Device used for vision backbone (`"mps"`, `"cuda"`, or `"cpu"`). If `None`, it is auto-detected.	`None`

Here is what a configuration dictionary looks like in practice. You do not need to specify every parameter; start from the defaults in chatter.config and override only what you need for your dataset.

#start from defaults and override a few key parameters
user_config = {
    #spectrogram parameters
    "fmin": 1700,
    "fmax": 6000,

    #preprocessing parameters
    "high_pass": 1700,
    "low_pass": 6000,

    #simple segmentation parameters
    "simple_noise_floor": -60,

    #pykanto segmentation parameters
    "pykanto_noise_floor": -65,

    #autoencoder parameters
    "ae_type": "convolutional",
    "latent_dim": 32,
}
config = chatter.make_config(user_config)

I highly recommend that you experiment with the preprocessing and segmentation parameters for your particular use case. The spectrogram, autoencoder, and other parameters should work well across a wide range of species.