chatter.models#

chatter.models#

Neural network architectures for variational autoencoders.

Functions

ae_loss(x, x_recon, mu, log_var[, beta, ...])

Compute variational autoencoder loss with foreground weighting.

Classes

ConvDecoder(latent_dim, target_shape)

Convolutional decoder using a resize-convolution architecture.

ConvEncoder(latent_dim, target_shape)

Convolutional encoder for a variational autoencoder operating on spectrograms.

Encoder(ae_type, latent_dim[, input_dim, ...])

Unified autoencoder wrapper supporting both convolutional and vector architectures.

VectorDecoder(latent_dim, output_dim)

Fully connected decoder for an autoencoder.

VectorEncoder(input_dim, latent_dim)

Fully connected encoder for a variational autoencoder.

class chatter.models.ConvEncoder(latent_dim: int, target_shape: Tuple[int, int])[source]#

Convolutional encoder for a variational autoencoder operating on spectrograms.

This encoder processes single-channel spectrogram inputs with a series of convolutional layers followed by batch normalization and Mish activation. It outputs the mean and log variance of the latent distribution.

Parameters:
  • latent_dim (int) – Dimensionality of the latent space.

  • target_shape (tuple of int) – Shape of the input spectrogram as (height, width).

__init__(latent_dim: int, target_shape: Tuple[int, int]) None[source]#

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(x: Tensor) Tuple[Tensor, Tensor][source]#

Perform a forward pass through the encoder.

Parameters:

x (torch.Tensor) – Input tensor with shape (batch_size, 1, height, width).

Returns:

A tuple (mu, log_var) representing the latent distribution parameters.

Return type:

tuple of torch.Tensor

class chatter.models.ConvDecoder(latent_dim: int, target_shape: Tuple[int, int])[source]#

Convolutional decoder using a resize-convolution architecture.

This decoder reconstructs spectrogram images from latent vectors using nearest neighbor upsampling followed by convolution to mitigate checkerboard artifacts.

Parameters:
  • latent_dim (int) – Dimensionality of the latent space.

  • target_shape (tuple of int) – Shape of the output spectrogram as (height, width).

__init__(latent_dim: int, target_shape: Tuple[int, int]) None[source]#

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(z: Tensor) Tensor[source]#

Perform a forward pass through the decoder.

Parameters:

z (torch.Tensor) – Latent representation with shape (batch_size, latent_dim).

Returns:

Reconstructed spectrogram with shape (batch_size, 1, height, width).

Return type:

torch.Tensor

class chatter.models.VectorEncoder(input_dim: int, latent_dim: int)[source]#

Fully connected encoder for a variational autoencoder.

Parameters:
  • input_dim (int) – Dimensionality of the flattened input vector.

  • latent_dim (int) – Dimensionality of the latent space.

__init__(input_dim: int, latent_dim: int) None[source]#

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(x: Tensor) Tuple[Tensor, Tensor][source]#

Perform a forward pass through the encoder.

Parameters:

x (torch.Tensor) – Flattened input tensor with shape (batch_size, input_dim).

Returns:

A tuple (mu, log_var) representing the latent distribution parameters.

Return type:

tuple of torch.Tensor

class chatter.models.VectorDecoder(latent_dim: int, output_dim: int)[source]#

Fully connected decoder for an autoencoder.

Parameters:
  • latent_dim (int) – Dimensionality of the latent representation.

  • output_dim (int) – Dimensionality of the flattened spectrogram output.

__init__(latent_dim: int, output_dim: int) None[source]#

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(z: Tensor) Tensor[source]#

Perform a forward pass through the decoder.

Parameters:

z (torch.Tensor) – Latent representation with shape (batch_size, latent_dim).

Returns:

Reconstructed output with shape (batch_size, output_dim).

Return type:

torch.Tensor

class chatter.models.Encoder(ae_type: str, latent_dim: int, input_dim: int | None = None, target_shape: Tuple[int, int] | None = None)[source]#

Unified autoencoder wrapper supporting both convolutional and vector architectures.

Parameters:
  • ae_type (str) – Type of autoencoder architecture (‘convolutional’ or ‘vector’).

  • latent_dim (int) – Dimensionality of the latent space.

  • input_dim (int, optional) – Dimensionality of the flattened input (required for ‘vector’).

  • target_shape (tuple of int, optional) – Shape of the input spectrogram (required for ‘convolutional’).

__init__(ae_type: str, latent_dim: int, input_dim: int | None = None, target_shape: Tuple[int, int] | None = None) None[source]#

Initialize internal Module state, shared by both nn.Module and ScriptModule.

reparameterize(mu: Tensor, log_var: Tensor) Tensor[source]#

Apply the reparameterization trick for a variational autoencoder.

Parameters:
  • mu (torch.Tensor) – Mean of the latent distribution.

  • log_var (torch.Tensor) – Log variance of the latent distribution.

Returns:

Sampled latent vector.

Return type:

torch.Tensor

encode(x: Tensor) Tuple[Tensor, Tensor][source]#

Encode an input tensor into the latent space.

Parameters:

x (torch.Tensor) – Input tensor.

Returns:

A tuple (mu, log_var) representing the latent distribution parameters.

Return type:

tuple of torch.Tensor

forward(x: Tensor) Tuple[Tensor, Tensor, Tensor][source]#

Perform a full forward pass through the variational autoencoder.

Parameters:

x (torch.Tensor) – Input tensor.

Returns:

A tuple (x_recon, mu, log_var).

Return type:

tuple of torch.Tensor

chatter.models.ae_loss(x: Tensor, x_recon: Tensor, mu: Tensor, log_var: Tensor, beta: float = 1.0, fg_tau: float = 0.1, fg_alpha: float = 10.0) Tensor[source]#

Compute variational autoencoder loss with foreground weighting.

This loss function combines a foreground-weighted L1 reconstruction term with a Kullback-Leibler divergence term. Foreground pixels are assigned higher weight to mitigate mode collapse on sparse spectrograms.

Parameters:
  • x (torch.Tensor) – Original input tensor.

  • x_recon (torch.Tensor) – Reconstructed output tensor.

  • mu (torch.Tensor) – Mean of the latent distribution.

  • log_var (torch.Tensor) – Log variance of the latent distribution.

  • beta (float, optional) – Weight for the KL divergence term. Default is 1.0.

  • fg_tau (float, optional) – Threshold for identifying foreground pixels. Default is 0.1.

  • fg_alpha (float, optional) – Weight multiplier for foreground pixels. Default is 10.0.

Returns:

Scalar loss value normalized by batch size.

Return type:

torch.Tensor