EncoderDecoderTimeSeriesDataModule#

class pytorch_forecasting.data.data_module.EncoderDecoderTimeSeriesDataModule(time_series_dataset: TimeSeries, max_encoder_length: int = 30, min_encoder_length: int | None = None, max_prediction_length: int = 1, min_prediction_length: int | None = None, min_prediction_idx: int | None = None, allow_missing_timesteps: bool = False, add_relative_time_idx: bool = False, add_target_scales: bool = False, add_encoder_length: bool | str = 'auto', target_normalizer: TorchNormalizer | NaNLabelEncoder | EncoderNormalizer | str | list[TorchNormalizer | NaNLabelEncoder | EncoderNormalizer] | tuple[TorchNormalizer | NaNLabelEncoder | EncoderNormalizer] | None = 'auto', categorical_encoders: dict[str, NaNLabelEncoder] | None = None, scalers: dict[str, StandardScaler | RobustScaler | TorchNormalizer | EncoderNormalizer] | None = None, randomize_length: None | tuple[float, float] | bool = False, batch_size: int = 32, num_workers: int = 0, train_val_test_split: tuple = (0.7, 0.15, 0.15))[source]#

Bases: LightningDataModule

Lightning DataModule for processing time series data in an encoder-decoder format.

This module handles preprocessing, splitting, and batching of time series data for use in deep learning models. It supports categorical and continuous features, various scalers, and automatic target normalization.

Parameters:
  • time_series_dataset (TimeSeries) – The dataset containing time series data.

  • max_encoder_length (int, default=30) – Maximum length of the encoder input sequence.

  • min_encoder_length (Optional[int], default=None) – Minimum length of the encoder input sequence. Defaults to max_encoder_length if not specified.

  • max_prediction_length (int, default=1) – Maximum length of the decoder output sequence.

  • min_prediction_length (Optional[int], default=None) – Minimum length of the decoder output sequence. Defaults to max_prediction_length if not specified.

  • min_prediction_idx (Optional[int], default=None) – Minimum index from which predictions start.

  • allow_missing_timesteps (bool, default=False) – Whether to allow missing timesteps in the dataset.

  • add_relative_time_idx (bool, default=False) – Whether to add a relative time index feature.

  • add_target_scales (bool, default=False) – Whether to add target scaling information.

  • add_encoder_length (Union[bool, str], default="auto") – Whether to include encoder length information.

  • target_normalizer

    Union[NORMALIZER, str, List[NORMALIZER], Tuple[NORMALIZER], None],

    default=”auto”

    Normalizer for the target variable. If “auto”, uses RobustScaler.

  • categorical_encoders (Optional[Dict[str, NaNLabelEncoder]], default=None) – Dictionary of categorical encoders.

  • scalers

  • Optional[Dict[str – TorchNormalizer, EncoderNormalizer]]], default=None Dictionary of feature scalers.

  • Union[StandardScaler – TorchNormalizer, EncoderNormalizer]]], default=None Dictionary of feature scalers.

  • RobustScaler – TorchNormalizer, EncoderNormalizer]]], default=None Dictionary of feature scalers.

:paramTorchNormalizer, EncoderNormalizer]]], default=None

Dictionary of feature scalers.

Parameters:
  • randomize_length (Union[None, Tuple[float, float], bool], default=False) – Whether to randomize input sequence length.

  • batch_size (int, default=32) – Batch size for DataLoader.

  • num_workers (int, default=0) – Number of workers for DataLoader.

  • train_val_test_split (tuple, default=(0.7, 0.15, 0.15)) – Proportions for train, validation, and test dataset splits.

prepare_data_per_node#

If True, each LOCAL_RANK=0 will call prepare data. Otherwise only NODE_RANK=0, LOCAL_RANK=0 will prepare data.

allow_zero_length_dataloader_with_multiple_devices#

If True, dataloader with zero length within local rank is allowed. Default value is False.

Inherited-members:

Methods

collate_fn(batch)

from_datasets([train_dataset, val_dataset, ...])

Create an instance from torch.utils.data.Dataset.

load_from_checkpoint(checkpoint_path[, ...])

Primary way of loading a datamodule from a checkpoint.

load_state_dict(state_dict)

Called when loading a checkpoint, implement to reload datamodule state given datamodule state_dict.

on_after_batch_transfer(batch, dataloader_idx)

Override to alter or apply batch augmentations to your batch after it is transferred to the device.

on_before_batch_transfer(batch, dataloader_idx)

Override to alter or apply batch augmentations to your batch before it is transferred to the device.

on_exception(exception)

Called when the trainer execution is interrupted by an exception.

predict_dataloader()

An iterable or collection of iterables specifying prediction samples.

prepare_data()

Use this to download and prepare data.

save_hyperparameters(*args[, ignore, frame, ...])

Save arguments to hparams attribute.

setup([stage])

Prepare the datasets for training, validation, testing, or prediction.

state_dict()

Called when saving a checkpoint, implement to generate and save datamodule state.

teardown(stage)

Called at the end of fit (train + validate), validate, test, or predict.

test_dataloader()

An iterable or collection of iterables specifying test samples.

train_dataloader()

An iterable or collection of iterables specifying training samples.

transfer_batch_to_device(batch, device, ...)

Override this hook if your DataLoader returns tensors wrapped in a custom data structure.

val_dataloader()

An iterable or collection of iterables specifying validation samples.

Attributes

CHECKPOINT_HYPER_PARAMS_KEY

CHECKPOINT_HYPER_PARAMS_NAME

CHECKPOINT_HYPER_PARAMS_TYPE

hparams

The collection of hyperparameters saved with save_hyperparameters().

hparams_initial

The collection of hyperparameters saved with save_hyperparameters().

metadata

Compute metadata for model initialization.

name

predict_dataloader()[source]#

An iterable or collection of iterables specifying prediction samples.

For more information about multiple dataloaders, see this section.

It’s recommended that all data downloads and preparation happen in prepare_data().

Note

Lightning tries to add the correct sampler for distributed and arbitrary hardware There is no need to set it yourself.

Returns:

A torch.utils.data.DataLoader or a sequence of them specifying prediction samples.

setup(stage: str | None = None)[source]#

Prepare the datasets for training, validation, testing, or prediction.

Parameters:

stage (Optional[str], default=None) – Specifies the stage of setup. Can be one of: - "fit" : Prepares training and validation datasets. - "test" : Prepares the test dataset. - "predict" : Prepares the dataset for inference. - None : Prepares fit datasets.

test_dataloader()[source]#

An iterable or collection of iterables specifying test samples.

For more information about multiple dataloaders, see this section.

For data processing use the following pattern:

  • download in prepare_data()

  • process and split in setup()

However, the above are only necessary for distributed processing.

Warning

do not assign state in prepare_data

Note

Lightning tries to add the correct sampler for distributed and arbitrary hardware. There is no need to set it yourself.

Note

If you don’t need a test dataset and a test_step(), you don’t need to implement this method.

train_dataloader()[source]#

An iterable or collection of iterables specifying training samples.

For more information about multiple dataloaders, see this section.

The dataloader you return will not be reloaded unless you set :paramref:`~lightning.pytorch.trainer.trainer.Trainer.reload_dataloaders_every_n_epochs` to a positive integer.

For data processing use the following pattern:

  • download in prepare_data()

  • process and split in setup()

However, the above are only necessary for distributed processing.

Warning

do not assign state in prepare_data

Note

Lightning tries to add the correct sampler for distributed and arbitrary hardware. There is no need to set it yourself.

val_dataloader()[source]#

An iterable or collection of iterables specifying validation samples.

For more information about multiple dataloaders, see this section.

The dataloader you return will not be reloaded unless you set :paramref:`~lightning.pytorch.trainer.trainer.Trainer.reload_dataloaders_every_n_epochs` to a positive integer.

It’s recommended that all data downloads and preparation happen in prepare_data().

  • fit()

  • validate()

  • prepare_data()

  • setup()

Note

Lightning tries to add the correct sampler for distributed and arbitrary hardware There is no need to set it yourself.

Note

If you don’t need a validation dataset and a validation_step(), you don’t need to implement this method.

property metadata#

Compute metadata for model initialization.

This property returns a dictionary containing the shapes and key information related to the time series model. The metadata includes:

  • encoder_cat: Number of categorical variables in the encoder.

  • encoder_cont: Number of continuous variables in the encoder.

  • decoder_cat: Number of categorical variables in the decoder that are

    known in advance.

  • decoder_cont: Number of continuous variables in the decoder that are

    known in advance.

  • target: Number of target variables.

If static features are present, the following keys are added:

  • static_categorical_features: Number of static categorical features

  • static_continuous_features: Number of static continuous features

It also contains the following information:

  • max_encoder_length: maximum encoder length

  • max_prediction_length: maximum prediction length

  • min_encoder_length: minimum encoder length

  • min_prediction_length: minimum prediction length