chatter#

Chatter: a Python library for applying information theory and AI/ML models to animal communication.

This package provides tools for preprocessing audio files, segmenting them into syllable-like units, training variational autoencoders on spectrogram representations, and analyzing the resulting latent feature spaces.

Main Classes#

Analyzer

Audio preprocessing, segmentation, and spectrogram creation.

Trainer

Variational autoencoder training and feature extraction.

FeatureProcessor

Post-processing, dimensionality reduction, clustering, and visualization.

Example

>>> from chatter import Analyzer, Trainer, FeatureProcessor, make_config
>>> config = make_config({'sr': 22050, 'fmin': 500, 'fmax': 8000})
>>> analyzer = Analyzer(config)
>>> # ... preprocessing and segmentation
>>> trainer = Trainer(config)
>>> # ... training and feature extraction
>>> processor = FeatureProcessor(df, config)
>>> # ... analysis and visualization
class chatter.Analyzer(config, n_jobs=-1)[source]#

Main analysis class for audio preprocessing, segmentation, and spectrogram creation.

This class encapsulates the end-to-end pipeline for preparing audio data for autoencoder-based analyses. It provides methods for preprocessing audio files, segmenting them into syllable-like units, storing unit spectrograms in an HDF5 file, and managing associated metadata.

config#

Configuration dictionary containing all pipeline parameters, including audio preprocessing, spectrogram generation, and segmentation settings.

Type:

dict

n_jobs#

Number of parallel worker processes used for preprocessing and segmentation steps.

Type:

int

__init__(config, n_jobs=-1)[source]#

Initialize the Analyzer with a configuration and optional parallelism.

Parameters:
  • config (dict) – Configuration dictionary containing all pipeline parameters.

  • n_jobs (int, optional) – Number of parallel jobs to run for preprocessing and segmentation. If set to -1, all available CPU cores are used. The default is -1.

demo_preprocessing(input_dir)[source]#

Process a single random file (or segment) from the directory in memory and plot a comparison.

This function replicates the production preprocessing pipeline logic on a slice of audio defined by ‘plot_clip_duration’. It prioritizes segments with active audio content.

Parameters:

input_dir (str or Path) – Directory containing audio files to process.

demo_segmentation(input_dir, simple=False)[source]#

Segment a random file (or clip) in memory and visualize it.

This function picks a random audio file, applies the configured segmentation algorithm (simple or pykanto) to a specific clip, and plots the results matching the visual style of ‘plot_detected_units’.

Parameters:
  • input_dir (str or Path) – Directory containing audio files to process.

  • simple (bool, optional) – If True, use simple amplitude-based segmentation. If False, use image-based segmentation (pykanto). The default is False.

extract_species_clips(input_dir, output_dir, species, confidence_threshold=0.5, buffer_seconds=1.0, batch_size=None)[source]#

Recursively find audio files, detect a species, and export clips.

This method scans an input directory for audio files, runs BirdNET to detect a specific species, and saves the resulting audio clips to an output directory that mirrors the input’s structure. The process is executed in parallel across multiple CPU cores.

Parameters:
  • input_dir (str or Path) – The root directory to search for audio files.

  • output_dir (str or Path) – The root directory where the output clips will be saved.

  • species (str) – The common name of the target species to detect (e.g., “House Finch”).

  • confidence_threshold (float, optional) – The minimum confidence level (0-1) for a detection to be included. The default is 0.5.

  • buffer_seconds (float, optional) – The number of seconds to add to the start and end of each detected clip. The default is 1.0.

  • batch_size (int, optional) – Number of files to process per batch in each parallel submission. If None, a default value of ‘n_jobs * 2’ is used.

load_df(metadata_csv_path)[source]#

Load a DataFrame from a CSV file containing unit metadata.

Parameters:

metadata_csv_path (str or Path) – Path to the CSV file containing unit metadata.

Returns:

Loaded DataFrame if the file is found and successfully read. Returns None if the file is not found.

Return type:

pd.DataFrame or None

preprocess_directory(input_dir, processed_dir, batch_size=None)[source]#

Preprocess all audio files in a directory and its subdirectories.

This method performs batch preprocessing of audio files by calling ‘_preprocess_wav_worker’ in parallel. All supported audio file formats are discovered recursively under ‘input_dir’, preprocessed, and saved as standardized WAV files under ‘processed_dir’, preserving the directory structure.

Parameters:
  • input_dir (str or Path) – Directory containing raw audio files in various formats.

  • processed_dir (str or Path) – Directory in which to save preprocessed WAV files. The directory structure mirrors that of ‘input_dir’.

  • batch_size (int, optional) – Number of files to process per batch in each parallel submission. If None, a default value of ‘n_jobs * 2’ is used. The default is None.

Returns:

This method writes preprocessed files to disk and prints progress information but does not return a value.

Return type:

None

segment_and_create_spectrograms(processed_dir, h5_path, csv_path, simple=False, batch_size=None, presegment_csv=None)[source]#

Segment files and save spectrograms, optionally from a pre-segmented CSV.

If presegment_csv is provided, this method bypasses internal segmentation. Instead, it reads the given CSV for ‘source_file’, ‘onset’, and ‘offset’ information and generates spectrograms for those specific time slices.

If presegment_csv is None, it scans a directory of preprocessed WAV files, segments each file into syllable-like acoustic units using internal methods, converts segments into spectrograms, and stores them.

Parameters:
  • processed_dir (str or Path) – Directory containing preprocessed WAV files.

  • h5_path (str or Path) – Path to the output HDF5 file for spectrograms.

  • csv_path (str or Path) – Path to the CSV file for unit metadata.

  • simple (bool, optional) – If True, use simple amplitude-based segmentation. Default is False.

  • batch_size (int, optional) – Number of files to process per batch in parallel. Default is ‘n_jobs’.

  • presegment_csv (str or Path, optional) – Path to a CSV file with pre-defined segmentations. If provided, internal segmentation is skipped. Default is None.

Returns:

DataFrame containing metadata for all units.

Return type:

pd.DataFrame

class chatter.Trainer(config)[source]#

Trainer class for the unified variational autoencoder implementation.

This class manages model initialization, training, evaluation, and feature extraction for the shared Encoder, which can be configured as either a convolutional or vector-based VAE. It is designed to operate on a single device (CPU, CUDA GPU, or Apple MPS).

config#

Configuration dictionary containing model and training parameters.

Type:

dict

device#

Computation device used for training and inference.

Type:

torch.device

ae_type#

Type of autoencoder architecture (‘convolutional’ or ‘vector’).

Type:

str

ae_model#

Unified variational autoencoder model encapsulating encoder and decoder components.

Type:

Encoder

__init__(config)[source]#

Initialize the Trainer with a configuration dictionary.

Parameters:

config (dict) – Configuration dictionary containing model and training parameters.

extract_and_save_comp_viz_features(unit_df, h5_path, output_csv_path, checkpoint=None)[source]#

Extract features using a Hugging Face computer vision model for all units and save them.

This method loads a pretrained computer vision model, iterates through all spectrograms in the HDF5 dataset, encodes them into fixed-length embeddings, and writes a combined DataFrame containing both metadata and features to CSV. The features are stored in columns named ‘cv_feat_{i}’.

Parameters:
  • unit_df (pd.DataFrame) – DataFrame containing unit metadata with a column ‘h5_index’ referring to indices in the HDF5 ‘spectrograms’ dataset.

  • h5_path (str or Path) – Path to the HDF5 file containing spectrograms.

  • output_csv_path (str or Path) – Path to the CSV file in which to store metadata and features.

  • checkpoint (str, optional) – Hugging Face model checkpoint name. If None, a default checkpoint from self.config[‘vision_checkpoint’] is used, or ‘facebook/dinov3-vitb16-pretrain-lvd1689m’ if that key is not present. The default is None.

Returns:

DataFrame containing metadata and features for all units, or None if the model could not be loaded successfully.

Return type:

pd.DataFrame or None

extract_and_save_features(unit_df, h5_path, model_dir, output_csv_path)[source]#

Extract latent features for all units using the HDF5 file and save them.

This method loads a trained autoencoder model, iterates through all spectrograms in the HDF5 dataset, encodes them into latent features, and writes a combined DataFrame containing both metadata and latent features to CSV.

Parameters:
  • unit_df (pd.DataFrame) – DataFrame containing unit metadata with a column ‘h5_index’ referring to indices in the HDF5 ‘spectrograms’ dataset.

  • h5_path (str or Path) – Path to the HDF5 file containing spectrograms.

  • model_dir (str or Path) – Path to the saved model directory.

  • output_csv_path (str or Path) – Path to the CSV file in which to store metadata and latent features.

Returns:

DataFrame containing metadata and extracted features for all units, or None if the model could not be loaded successfully.

Return type:

pd.DataFrame or None

classmethod from_trained(config, model_dir)[source]#

Create a Trainer instance and load a pre-trained model.

This class method instantiates a new Trainer with the provided configuration and immediately loads model weights from the specified path. It enables direct use of methods such as ‘extract_and_save_features’ or ‘plot_reconstructions’ without retraining.

Parameters:
  • config (dict) – Configuration dictionary for the model and training.

  • model_dir (str or Path) – Path to the saved model directory.

Returns:

An instance of the Trainer class with the model weights loaded.

Return type:

Trainer

load_ae(model_dir)[source]#

Load pre-trained weights into the variational autoencoder model.

Parameters:

model_dir (str or Path) – Path to the saved model directory.

Returns:

This method loads model weights and sets the model to evaluation mode. It prints status messages and does not return a value.

Return type:

None

plot_reconstructions(unit_df, h5_path, num_examples=8)[source]#

Plot a side-by-side comparison of original and reconstructed spectrograms.

This method samples a set of unit spectrograms from the HDF5 dataset, passes them through the autoencoder, and visualizes the original and reconstructed spectrograms for qualitative inspection of model performance.

Parameters:
  • unit_df (pd.DataFrame) – DataFrame containing unit metadata with a column ‘h5_index’ referring to indices in the HDF5 ‘spectrograms’ dataset.

  • h5_path (str or Path) – Path to the HDF5 file containing spectrograms.

  • num_examples (int, optional) – Number of examples to plot. If the dataset contains fewer than ‘num_examples’ units, all available units are plotted. The default is 8.

Returns:

This method displays a matplotlib figure and does not return a value.

Return type:

None

train_ae(unit_df, h5_path, model_dir, subset=None)[source]#

Train the variational autoencoder using an HDF5 dataset.

This method creates a SpectrogramDataset from an HDF5 file, constructs a DataLoader, and runs a standard training loop for a configured number of epochs. It optionally trains on a random subset of units, and saves the trained model and loss history to disk.

Parameters:
  • unit_df (pd.DataFrame) – DataFrame containing unit metadata with a column ‘h5_index’ referring to indices in the HDF5 ‘spectrograms’ dataset.

  • h5_path (str or Path) – Path to the HDF5 file containing spectrograms.

  • model_dir (str or Path) – Directory in which to save the trained model and loss history CSV.

  • subset (float, optional) – Proportion of units to use for training. Must be in the range (0, 1) if provided. If None or outside this range, the full dataset is used. The default is None.

Returns:

This method trains a model and writes the results to disk but does not return a value.

Return type:

None

class chatter.FeatureProcessor(df, config)[source]#

Post-processing class for autoencoder features and associated metadata.

This class provides methods for dimensionality reduction (PaCMAP), clustering (BIRCH), computing within-sequence cosine distances, computing VAR-based surprisal scores, assigning sequence identifiers, and visualizing embedding structures.

df#

DataFrame containing latent features and associated metadata.

Type:

pd.DataFrame

config#

Configuration dictionary containing post-processing parameters such as ‘lag_size’ and ‘seq_bound’.

Type:

dict

__init__(df, config)[source]#

Initialize the FeatureProcessor with a DataFrame and configuration.

Parameters:
  • df (pd.DataFrame) – DataFrame containing latent features and corresponding metadata.

  • config (dict) – Configuration dictionary containing post-processing parameters.

assign_sequence_ids()[source]#

Assign sequence identifiers to syllables based on temporal proximity.

Sequences are defined separately for each ‘source_file’. Within each file, syllables are sorted by ‘onset’ time, and a new sequence is started whenever the silent gap between the previous syllable’s ‘offset’ and the current syllable’s ‘onset’ exceeds the threshold ‘seq_bound’ in seconds.

Requirements#

The DataFrame must contain: - ‘source_file’ : identifier for the audio file. - ‘onset’ : onset time (in seconds) for each syllable. - ‘offset’ : offset time (in seconds) for each syllable. The configuration must contain: - ‘seq_bound’ : float, maximum allowed silent gap in seconds.

returns:

The current instance, returned to enable method chaining. A new column ‘seq_id’ is added to the DataFrame.

rtype:

FeatureProcessor

compute_cosine_distances()[source]#

Compute cosine distance between subsequent latent features within sequences.

This method calculates the cosine distance between each pair of consecutive rows that share the same sequence identifier ‘seq_id’. For each sequence, the first item has an undefined previous neighbor and therefore receives a distance of NaN. The results are stored in a new column ‘cosine_dist’.

Requirements#

The DataFrame must contain: - Columns representing latent features. - A ‘seq_id’ column identifying sequences. - An ‘onset’ column to ensure temporal ordering within each sequence.

returns:

The current instance, returned to enable method chaining.

rtype:

FeatureProcessor

compute_density_probability(use_pacmap=False, scaled=True, **kwargs)[source]#

Compute probability density estimates for embeddings using denmarf.

This method fits a Masked AutoRegressive Flow (MAF) density estimator to the latent features (or PaCMAP coordinates) and assigns a log-probability density score to each unit. High scores indicate ‘typical’ points in high-density regions; low scores indicate outlines or rare examples.

Parameters:
  • use_pacmap (bool, optional) – If True, computes density on the 2D ‘pacmap_x/y’ coordinates instead of the full latent space. Default is False (recommended: density estimation is more rigorous in the full latent space).

  • scaled (bool, optional) – If True, standardizes features (zero mean, unit variance) before fitting. Highly recommended for neural density estimators. Default is True.

  • **kwargs – Additional keyword arguments to pass to the denmarf.DensityEstimate constructor or fit method.

Returns:

The current instance, returned to enable method chaining. A new column ‘density_log_prob’ is added to the DataFrame.

Return type:

FeatureProcessor

compute_dtw_distance(seq_id_1, seq_id_2)[source]#

Compute Dynamic Time Warping (DTW) cosine distance between two sequences.

This method calculates the DTW distance between the latent feature sequences of two specified sequence IDs. It first computes a local distance matrix using the cosine distance between all pairs of units from the two sequences, and then uses this matrix to find the optimal alignment path cost with DTW.

Parameters:
  • seq_id_1 (int) – The identifier for the first sequence.

  • seq_id_2 (int) – The identifier for the second sequence.

Returns:

The total DTW distance (cost) between the two sequences.

Return type:

float

Raises:

ValueError – If ‘seq_id’ or feature columns are not found in the DataFrame, or if one or both of the specified seq_ids do not exist.

compute_frequency_statistics(h5_path, return_traces=False)[source]#

Compute minimum, mean, and maximum frequency statistics for each unit.

This method loads spectrograms from the HDF5 file and calculates frequency statistics for each time bin of each unit. It always updates the internal DataFrame in place with summary statistics (global min, mean, max per unit), and can optionally return detailed per-time-bin traces.

Parameters:
  • h5_path (str or Path) – Path to the HDF5 file containing the spectrograms dataset.

  • return_traces (bool, optional) – If True, returns a dictionary with detailed per-time-bin frequency traces for each unit. If False (default), returns the FeatureProcessor instance to support method chaining.

Returns:

If return_traces is False (default), returns self after adding the new summary columns (‘min_freq’, ‘mean_freq’, ‘max_freq’, ‘time_bin_ms’) to self.df. If return_traces is True, returns a dictionary containing:

  • ’time_bins_info’: metadata about the time axis

  • ’units’: mapping from unit index (h5_index) to per-time-bin traces for ‘min_freq_trace’, ‘mean_freq_trace’, and ‘max_freq_trace’.

Return type:

FeatureProcessor or dict

Notes

If the metadata contains a ‘max_unit_length_s’ column (added during segmentation), it is used to determine the effective duration represented by each spectrogram column. Otherwise, the method falls back to the configuration’s max unit length settings.

compute_sse_resid()[source]#

Compute VAR-based sum of squared error residuals as a surprisal proxy.

This method fits a single global vector autoregression (VAR) model with a specified lag size across all sequences while respecting sequence boundaries defined by ‘seq_id’. It then computes per-timestep sum of squared error (SSE) residuals for each sequence, including short sequences and early time steps using reduced lag orders when necessary.

Requirements#

The DataFrame must contain: - Columns representing latent features. - A ‘seq_id’ column identifying sequences. Configuration must include: - ‘lag_size’ : int, the lag order p of the VAR model.

returns:

The current instance, returned to enable method chaining. A new column ‘sse_resid’ is added to the DataFrame containing SSE values or NaN where predictions are not defined (for example, the first time step of each sequence).

rtype:

FeatureProcessor

interactive_embedding_plot(h5_path, output_html_path, thumb_size=96, point_alpha=0.7, point_size=3)[source]#

Export a self-contained HTML file with an interactive embedding plot.

This method creates a standalone HTML file that uses Plotly.js in the browser (no Python backend required) to display the PaCMAP embedding. Hovering over points in the scatterplot updates a grid of pre-rendered spectrogram thumbnails for the focal unit and its nearest neighbors.

All required data (coordinates, neighbor indices, and spectrogram thumbnails encoded as base64 PNGs) are embedded directly into the HTML file so it can be shared and opened without any additional files.

Parameters:
  • h5_path (str or Path) – Path to the HDF5 file containing the spectrograms dataset.

  • output_html (str or Path) – Path at which to save the resulting HTML file.

  • thumb_size (int, optional) – Approximate size (in pixels) of the square spectrogram thumbnails. The default is 96.

  • point_alpha (float, optional) – Opacity of the scatter points (0.0 to 1.0). The default is 0.7.

  • point_size (int, optional) – Size of the scatter points in pixels. The default is 3.

Returns:

The current instance, returned to enable method chaining.

Return type:

FeatureProcessor

plot_birch_sse_elbow(k_range)[source]#

Plot the sum of squared errors for a range of cluster counts in BIRCH clustering.

This method computes and visualizes the sum of squared errors (SSE) corresponding to different numbers of clusters in BIRCH clustering. It helps identify an appropriate number of clusters using the elbow method.

Parameters:

k_range (iterable of int) – Iterable (for example, a list or range) of cluster counts ‘k’ to evaluate.

Returns:

The current instance, returned to enable method chaining.

Return type:

FeatureProcessor

run_birch_clustering(n_clusters_list)[source]#

Run BIRCH clustering for multiple values of ‘n_clusters’.

This method performs BIRCH clustering on the PaCMAP embeddings (columns ‘pacmap_x’ and ‘pacmap_y’) for each requested number of clusters and stores cluster labels in separate columns.

Parameters:

n_clusters_list (list of int) – List of ‘n_clusters’ values for which to compute BIRCH cluster assignments. For each value ‘n’, a column ‘birch_n’ is added to the DataFrame.

Returns:

The current instance, returned to enable method chaining.

Return type:

FeatureProcessor

run_pacmap(**kwargs)[source]#

Run PaCMAP dimensionality reduction on latent features and add coordinates.

This method automatically identifies feature columns, runs PaCMAP to embed them into a two-dimensional space, and stores the resulting coordinates in new columns ‘pacmap_x’ and ‘pacmap_y’ in the DataFrame.

Parameters:

**kwargs – Additional keyword arguments passed directly to the pacmap.PaCMAP constructor, allowing customization of the embedding.

Returns:

The current instance, returned to enable method chaining.

Return type:

FeatureProcessor

static_embedding_plot(h5_path, output_path=None, seed=42, focal_quantile=0.8, point_alpha=0.3, point_size=2, margin=0.02, zoom_padding=0.05, num_neighbors=3)[source]#

Create a publication-quality static plot of the embedding space.

This method generates a visualization that includes a 2D density map of the embedding and four “callouts” showing focal syllables and their nearest neighbors. Focal points are selected automatically from the fringes of each quadrant of the embedding space to ensure a representative sample of unique points. The plot is designed to be border-free with a seamless viridis background.

Parameters:
  • h5_path (str or Path) – Path to the HDF5 file containing the spectrograms dataset.

  • output_path (str or Path, optional) – Path to save the final PNG image. If None, the plot is displayed directly using plt.show(). The default is None.

  • seed (int, optional) – Seed for the random number generator to ensure reproducible selection of focal points. The default is 42.

  • point_alpha (float, optional) – The alpha (transparency) of the scatter points in the background. The default is 0.3.

  • point_size (int or float, optional) – The size of the scatter points in the background. The default is 2.

  • margin (float, optional) – The margin from the plot edge to the nearest edge of a callout group, as a fraction of the plot’s total width/height. The default is 0.02.

  • zoom_padding (float, optional) – The padding to add around the data points as a percentage of the data’s range, effectively controlling the zoom level. The default is 0.05 (5%).

  • num_neighbors (int, optional) – The number of nearest neighbors to display for each focal point. The default is 3.

chatter.get_default_config() Dict[str, Any][source]#

Return a deep copy of the default configuration dictionary.

Returns:

A copy of DEFAULT_CONFIG that can be safely modified.

Return type:

dict

chatter.make_config(user_config: Dict[str, Any] | None = None) Dict[str, Any][source]#

Create a finalized configuration dictionary by merging user overrides into the default configuration.

Parameters:

user_config (dict or None, optional) – Dictionary with user-specified overrides. Unknown keys trigger a warning.

Returns:

Finalized configuration dictionary.

Return type:

dict

Modules

analyzer

config

chatter.config

data

Core data processing utilities for Chatter.

features

models

chatter.models

trainer

utils

chatter.utils