chatter.models#

Neural network architectures for variational autoencoders.

Functions

ae_loss(x, x_recon, mu, log_var[, beta, ...])

Compute variational autoencoder loss with foreground weighting.

Classes

`ConvDecoder`(latent_dim, target_shape)	Convolutional decoder using a resize-convolution architecture.
`ConvEncoder`(latent_dim, target_shape)	Convolutional encoder for a variational autoencoder operating on spectrograms.
`Encoder`(ae_type, latent_dim[, input_dim, ...])	Unified autoencoder wrapper supporting both convolutional and vector architectures.
`VectorDecoder`(latent_dim, output_dim)	Fully connected decoder for an autoencoder.
`VectorEncoder`(input_dim, latent_dim)	Fully connected encoder for a variational autoencoder.

class chatter.models.ConvEncoder(latent_dim: int, target_shape: Tuple[int, int])[source]#

Convolutional encoder for a variational autoencoder operating on spectrograms.

This encoder processes single-channel spectrogram inputs with a series of convolutional layers followed by batch normalization and Mish activation. It outputs the mean and log variance of the latent distribution.

Parameters:

latent_dim (int) – Dimensionality of the latent space.
target_shape (tuple of int) – Shape of the input spectrogram as (height, width).

__init__(latent_dim: int, target_shape: Tuple[int, int]) → None[source]#: Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(x: Tensor) → Tuple[Tensor, Tensor][source]#

Perform a forward pass through the encoder.

Parameters:: x (torch.Tensor) – Input tensor with shape (batch_size, 1, height, width).
Returns:: A tuple (mu, log_var) representing the latent distribution parameters.
Return type:: tuple of torch.Tensor

class chatter.models.ConvDecoder(latent_dim: int, target_shape: Tuple[int, int])[source]#

Convolutional decoder using a resize-convolution architecture.

This decoder reconstructs spectrogram images from latent vectors using nearest neighbor upsampling followed by convolution to mitigate checkerboard artifacts.

Parameters:

latent_dim (int) – Dimensionality of the latent space.
target_shape (tuple of int) – Shape of the output spectrogram as (height, width).

__init__(latent_dim: int, target_shape: Tuple[int, int]) → None[source]#: Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(z: Tensor) → Tensor[source]#

Perform a forward pass through the decoder.

Parameters:: z (torch.Tensor) – Latent representation with shape (batch_size, latent_dim).
Returns:: Reconstructed spectrogram with shape (batch_size, 1, height, width).
Return type:: torch.Tensor

class chatter.models.VectorEncoder(input_dim: int, latent_dim: int)[source]#

Fully connected encoder for a variational autoencoder.

Parameters:

input_dim (int) – Dimensionality of the flattened input vector.
latent_dim (int) – Dimensionality of the latent space.

__init__(input_dim: int, latent_dim: int) → None[source]#: Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(x: Tensor) → Tuple[Tensor, Tensor][source]#

Perform a forward pass through the encoder.

Parameters:: x (torch.Tensor) – Flattened input tensor with shape (batch_size, input_dim).
Returns:: A tuple (mu, log_var) representing the latent distribution parameters.
Return type:: tuple of torch.Tensor

class chatter.models.VectorDecoder(latent_dim: int, output_dim: int)[source]#

Fully connected decoder for an autoencoder.

Parameters:

latent_dim (int) – Dimensionality of the latent representation.
output_dim (int) – Dimensionality of the flattened spectrogram output.

__init__(latent_dim: int, output_dim: int) → None[source]#: Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(z: Tensor) → Tensor[source]#

Perform a forward pass through the decoder.

Parameters:: z (torch.Tensor) – Latent representation with shape (batch_size, latent_dim).
Returns:: Reconstructed output with shape (batch_size, output_dim).
Return type:: torch.Tensor

class chatter.models.Encoder(ae_type: str, latent_dim: int, input_dim: int | None = None, target_shape: Tuple[int, int] | None = None)[source]#

Unified autoencoder wrapper supporting both convolutional and vector architectures.

Parameters:

ae_type (str) – Type of autoencoder architecture (‘convolutional’ or ‘vector’).
latent_dim (int) – Dimensionality of the latent space.
input_dim (int, optional) – Dimensionality of the flattened input (required for ‘vector’).
target_shape (tuple of int, optional) – Shape of the input spectrogram (required for ‘convolutional’).

__init__(ae_type: str, latent_dim: int, input_dim: int | None = None, target_shape: Tuple[int, int] | None = None) → None[source]#: Initialize internal Module state, shared by both nn.Module and ScriptModule.

reparameterize(mu: Tensor, log_var: Tensor) → Tensor[source]#

Apply the reparameterization trick for a variational autoencoder.

Parameters:

mu (torch.Tensor) – Mean of the latent distribution.
log_var (torch.Tensor) – Log variance of the latent distribution.

Returns:

Sampled latent vector.

Return type:

torch.Tensor

encode(x: Tensor) → Tuple[Tensor, Tensor][source]#

Encode an input tensor into the latent space.

Parameters:: x (torch.Tensor) – Input tensor.
Returns:: A tuple (mu, log_var) representing the latent distribution parameters.
Return type:: tuple of torch.Tensor

forward(x: Tensor) → Tuple[Tensor, Tensor, Tensor][source]#

Perform a full forward pass through the variational autoencoder.

Parameters:: x (torch.Tensor) – Input tensor.
Returns:: A tuple (x_recon, mu, log_var).
Return type:: tuple of torch.Tensor

chatter.models.ae_loss(x: Tensor, x_recon: Tensor, mu: Tensor, log_var: Tensor, beta: float = 1.0, fg_tau: float = 0.1, fg_alpha: float = 10.0) → Tensor[source]#

Compute variational autoencoder loss with foreground weighting.

This loss function combines a foreground-weighted L1 reconstruction term with a Kullback-Leibler divergence term. Foreground pixels are assigned higher weight to mitigate mode collapse on sparse spectrograms.

Parameters:

x (torch.Tensor) – Original input tensor.
x_recon (torch.Tensor) – Reconstructed output tensor.
mu (torch.Tensor) – Mean of the latent distribution.
log_var (torch.Tensor) – Log variance of the latent distribution.
beta (float, optional) – Weight for the KL divergence term. Default is 1.0.
fg_tau (float, optional) – Threshold for identifying foreground pixels. Default is 0.1.
fg_alpha (float, optional) – Weight multiplier for foreground pixels. Default is 10.0.

Returns:

Scalar loss value normalized by batch size.

Return type:

torch.Tensor

chatter.models

Contents

chatter.models#

chatter.models#