chatter.models#
chatter.models#
Neural network architectures for variational autoencoders.
Functions
|
Compute variational autoencoder loss with foreground weighting. |
Classes
|
Convolutional decoder using a resize-convolution architecture. |
|
Convolutional encoder for a variational autoencoder operating on spectrograms. |
|
Unified autoencoder wrapper supporting both convolutional and vector architectures. |
|
Fully connected decoder for an autoencoder. |
|
Fully connected encoder for a variational autoencoder. |
- class chatter.models.ConvEncoder(latent_dim: int, target_shape: Tuple[int, int])[source]#
Convolutional encoder for a variational autoencoder operating on spectrograms.
This encoder processes single-channel spectrogram inputs with a series of convolutional layers followed by batch normalization and Mish activation. It outputs the mean and log variance of the latent distribution.
- Parameters:
latent_dim (int) – Dimensionality of the latent space.
target_shape (tuple of int) – Shape of the input spectrogram as (height, width).
- class chatter.models.ConvDecoder(latent_dim: int, target_shape: Tuple[int, int])[source]#
Convolutional decoder using a resize-convolution architecture.
This decoder reconstructs spectrogram images from latent vectors using nearest neighbor upsampling followed by convolution to mitigate checkerboard artifacts.
- Parameters:
latent_dim (int) – Dimensionality of the latent space.
target_shape (tuple of int) – Shape of the output spectrogram as (height, width).
- class chatter.models.VectorEncoder(input_dim: int, latent_dim: int)[source]#
Fully connected encoder for a variational autoencoder.
- Parameters:
input_dim (int) – Dimensionality of the flattened input vector.
latent_dim (int) – Dimensionality of the latent space.
- class chatter.models.VectorDecoder(latent_dim: int, output_dim: int)[source]#
Fully connected decoder for an autoencoder.
- Parameters:
latent_dim (int) – Dimensionality of the latent representation.
output_dim (int) – Dimensionality of the flattened spectrogram output.
- class chatter.models.Encoder(ae_type: str, latent_dim: int, input_dim: int | None = None, target_shape: Tuple[int, int] | None = None)[source]#
Unified autoencoder wrapper supporting both convolutional and vector architectures.
- Parameters:
ae_type (str) – Type of autoencoder architecture (‘convolutional’ or ‘vector’).
latent_dim (int) – Dimensionality of the latent space.
input_dim (int, optional) – Dimensionality of the flattened input (required for ‘vector’).
target_shape (tuple of int, optional) – Shape of the input spectrogram (required for ‘convolutional’).
- __init__(ae_type: str, latent_dim: int, input_dim: int | None = None, target_shape: Tuple[int, int] | None = None) None[source]#
Initialize internal Module state, shared by both nn.Module and ScriptModule.
- reparameterize(mu: Tensor, log_var: Tensor) Tensor[source]#
Apply the reparameterization trick for a variational autoencoder.
- Parameters:
mu (torch.Tensor) – Mean of the latent distribution.
log_var (torch.Tensor) – Log variance of the latent distribution.
- Returns:
Sampled latent vector.
- Return type:
torch.Tensor
- chatter.models.ae_loss(x: Tensor, x_recon: Tensor, mu: Tensor, log_var: Tensor, beta: float = 1.0, fg_tau: float = 0.1, fg_alpha: float = 10.0) Tensor[source]#
Compute variational autoencoder loss with foreground weighting.
This loss function combines a foreground-weighted L1 reconstruction term with a Kullback-Leibler divergence term. Foreground pixels are assigned higher weight to mitigate mode collapse on sparse spectrograms.
- Parameters:
x (torch.Tensor) – Original input tensor.
x_recon (torch.Tensor) – Reconstructed output tensor.
mu (torch.Tensor) – Mean of the latent distribution.
log_var (torch.Tensor) – Log variance of the latent distribution.
beta (float, optional) – Weight for the KL divergence term. Default is 1.0.
fg_tau (float, optional) – Threshold for identifying foreground pixels. Default is 0.1.
fg_alpha (float, optional) – Weight multiplier for foreground pixels. Default is 10.0.
- Returns:
Scalar loss value normalized by batch size.
- Return type:
torch.Tensor