chatter.data#

Core data processing utilities for Chatter.

This module also configures logging/noise behavior for a few noisy dependencies (e.g., TensorFlow Lite via birdnetlib) so that downstream pipelines like extract_species_clips run without spamming the console.

Functions

compute_spectrogram(y, sr, config)

Compute a mel spectrogram in dB scale from an audio time series.

preprocess_audio_data(audio, config)

Core preprocessing pipeline for a pydub AudioSegment.

segment_file(mel_spectrogram, config)

Segment a full mel spectrogram using the high-fidelity pykanto workflow.

segment_file_simple(y, sr, config)

Segment an audio array based on amplitude using frame-wise RMS energy.

slice_and_process_spectrograms(full_spec, ...)

Slice and process unit spectrograms from a full spectrogram matrix.

Classes

SpectrogramDataset(h5_path, indices)

PyTorch Dataset for lazy loading of spectrograms from an HDF5 file.

class chatter.data.SpectrogramDataset(h5_path, indices)[source]#

PyTorch Dataset for lazy loading of spectrograms from an HDF5 file.

This dataset reads spectrograms on demand from an HDF5 dataset named ‘spectrograms’. It is designed to work correctly with parallel DataLoader workers by opening a separate file handle per worker process.

h5_path#

Path to the HDF5 file containing the spectrograms.

Type:

str

indices#

List of integer indices referring to the entries in the ‘spectrograms’ dataset.

Type:

list of int

_h5#

Lazily opened HDF5 file handle. It is not pickled across processes.

Type:

h5py.File or None

__init__(h5_path, indices)[source]#

Initialize the spectrogram dataset for lazy HDF5 loading.

Parameters:
  • h5_path (str or Path) – Path to the HDF5 file containing the ‘spectrograms’ dataset.

  • indices (list of int) – List of dataset indices that this instance will expose through __getitem__.

chatter.data.preprocess_audio_data(audio, config)[source]#

Core preprocessing pipeline for a pydub AudioSegment.

This function applies the full preprocessing chain: fade in/out, format conversion, filtering, amplitude normalization, noise reduction (noisereduce or biodenoising), compression, limiting, and final normalization.

Parameters:
  • audio (pydub.AudioSegment) – Input audio segment.

  • config (dict) – Configuration dictionary containing preprocessing parameters.

Returns:

Processed audio data as a 1D numpy array of type int16.

Return type:

np.ndarray

chatter.data.compute_spectrogram(y, sr, config)[source]#

Compute a mel spectrogram in dB scale from an audio time series.

Parameters:
  • y (np.ndarray) – Audio time series.

  • sr (int) – Sample rate.

  • config (dict) – Configuration dictionary containing spectrogram parameters (n_fft, hop_length, etc.).

Returns:

Mel spectrogram in decibel scale.

Return type:

np.ndarray

chatter.data.segment_file(mel_spectrogram, config)[source]#

Segment a full mel spectrogram using the high-fidelity pykanto workflow.

This function applies an image-processing pipeline inspired by pykanto to identify syllable-like acoustic units in a spectrogram. It uses histogram equalization, median filtering, morphological operations, and Gaussian blurring, followed by pykanto’s ‘find_units’ to obtain onset and offset times.

Parameters:
  • mel_spectrogram (np.ndarray) – Full mel spectrogram in decibel scale, with shape (n_mels, time_frames).

  • config (dict) – Configuration dictionary containing segmentation parameters.

Returns:

Array of segment boundaries with shape (n_segments, 2), where each row contains the onset and offset times (in seconds or frames, depending on pykanto configuration). If no units are found or an error occurs, an empty array with shape (0,) is returned.

Return type:

np.ndarray

chatter.data.segment_file_simple(y, sr, config)[source]#

Segment an audio array based on amplitude using frame-wise RMS energy.

This function computes frame-wise RMS energy using short-time analysis, converts it to decibels relative to full scale (dBFS), and identifies non-silent intervals as contiguous regions where the RMS level exceeds a configurable threshold. The threshold can be specified directly in dBFS or as a linear amplitude ratio and is combined with a configurable noise floor to produce a single decision boundary. Detected regions separated by short silences are merged, and the resulting segments are filtered by minimum and optional maximum unit length constraints.

Parameters:
  • y (np.ndarray) – Audio time series as a one-dimensional NumPy array.

  • sr (int) – Sample rate of the audio time series in Hz.

  • config (dict) – Configuration dictionary containing segmentation parameters.

Returns:

Array of segment boundaries with shape (n_segments, 2), where each row contains the onset and offset times in seconds. If no valid intervals are found or an error occurs, an empty array with shape (0,) is returned.

Return type:

np.ndarray

chatter.data.slice_and_process_spectrograms(full_spec, segments, config, max_unit_len_sec, min_unit_len_sec=None)[source]#

Slice and process unit spectrograms from a full spectrogram matrix.

This function takes a full mel spectrogram and a collection of temporal segments, extracts the corresponding time windows, normalizes each segment, pads them to a common length, and downsamples them to a fixed target shape.

Parameters:
  • full_spec (np.ndarray) – Full mel spectrogram with shape (n_mels, time_frames) in decibel scale.

  • segments (np.ndarray) – Array of segment boundaries with shape (n_segments, 2), where each row contains the onset and offset times in seconds.

  • config (dict) – Configuration dictionary containing processing parameters.

  • max_unit_len_sec (float) – Maximum unit length in seconds to determine padding.

  • min_unit_len_sec (float, optional) – Minimum unit length in seconds. Segments shorter than this are dropped. If None, defaults to config[‘simple_min_unit_length’] or 0.0.

Returns:

A tuple containing:

  1. List of processed spectrograms, each with shape equal to config[‘target_shape’] and dtype float32.

  2. List of original indices for the returned spectrograms, useful for mapping back to metadata if some segments were skipped.

  3. Dictionary containing counts of dropped segments by reason.

If no segments are provided or all are skipped, empty lists are returned.

Return type:

tuple of (list of np.ndarray, list of int, dict)